​Recent duplications and rare structural variations revealed by comparative sequence analysis of low molecular weight glutenin subunits (LMW-GS) genes re-identified using LMWgsFinder in 26 genomes of the grass family

preprint OA: closed
Full text JSON View at publisher
Full text 265,089 characters · extracted from preprint-html · click to expand
​Recent duplications and rare structural variations revealed by comparative sequence analysis of low molecular weight glutenin subunits (LMW-GS) genes re-identified using LMWgsFinder in 26 genomes of the grass family | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article ​Recent duplications and rare structural variations revealed by comparative sequence analysis of low molecular weight glutenin subunits (LMW-GS) genes re-identified using LMWgsFinder in 26 genomes of the grass family Shengli Zhang, Xiaojing Shan, Yun Wang, Tairui Lu, Daxing Xu, and 9 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5789598/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 27 May, 2025 Read the published version in Theoretical and Applied Genetics → Version 1 posted 5 You are reading this latest preprint version Abstract LMW-GS are one of the primary components of wheat ( Triticum aestivum L.) seed storage proteins, which have an important impact on wheat end-use quality traits. Identifying LMW-GS genes accurately within wheat genomes has consistently presented a significant challenge. LMWgsFinder developed by this study was used to re-identify the LMW-GS genes in a total of 26 genomes of the grass family. Apart from six species, a total of 291 LMW-GS genes were identified. Except for the two versions of the TaCS ( Triticum aestivum Chinese Spring) genome, only 38.13% (98/257) of the LMW-GS genes identified by LMWgsFinder were annotated in the coding sequence (CDS) annotation files (provided by the sequencing research groups) of the remaining 18 genomes. EnSpm-like transposon activity mediated recent duplication or triplication of the same LMW-GS gene has been observed in 8 wheat species for the first time, indicating that the replication of LMW-GS genes has been ongoing alongside the evolution of wheat. Several cases of rare structural variations associated with the loss or acquisition of LMW-GS gene function have been discovered and experimentally verified. Twenty-one new LMW-GS genes were discovered in 15 species of Triticeae. The results of this study provide the first empirical support at the DNA level, with confirmed chromosomal localization information, for the widely accepted notion that LMW-GS genes undergo gene duplication during wheat evolution. Additionally, this study offers gene sequence resources and a wealth of valuable information for further research on LMW-GS gene function, molecular-assisted selection, gene aggregation breeding, and molecular design breeding. Wheat glutenins LMW-GS genes Genome Gene duplication Alleles Triticum aestivum Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Key Message LMWgsFinder developed by this study was used to re-identify the LMW-GS genes in a total of 26 genomes across the grass family and several important and novel findings were obtained. Introduction The evolution and speciation of common wheat ( Triticum aestivum L., hereinafter referred to as wheat) is a complex process involving two rounds of allopolyploidization events, followed by natural selection and domestication, ultimately resulting in the high-yield and high-quality wheat varieties we know today(Matsuoka, 2011 ;Feldman and Levy, 2012 ). Wheat is an allohexaploid species (2n = 6X = 42; AABBDD), whose A, B, and D genome donors are the wild diploid ancestors of three species: Triticum urartu (2n = 2X = 14; AA), an unconfirmed species related to Aegilops speltoides (2n = 2X = 14; SS), and Aegilops tauschii (2n = 2X = 14; DD) (Huo et al., 2018 ;Jia et al., 2023 ). The second spontaneous hybridization occurred approximately 8000 to 10000 years ago, which was the hybridization of Triticum turgidum (2n = 4X = 28; AABB) with an ancestral diploid Aegilops tauschii (DD), resulted in the speciation of wheat (Jia et al., 2023 ;Cavalet-Giorsa et al., 2024 ). With the addition of the Aegilops tauschii D genome, the gluten quality of wheat has been significantly improved, making it the only one among the main food crops for humans that can be used to make various foods such as noodles, bread, dumplings, cookies. (Jia et al., 2013 ;Ren et al., 2022 ). Wheat can be made into various foods mainly depending on the unique storage proteins contained in its seeds, including glutenin and gliadin (Dong et al., 2010 ;Huo et al., 2018 ). Based on molecular weight differences, glutenin can be further divided into high molecular weight glutenin subunits (HMW-GS) and LMW-GS (Dong et al., 2010 ;Ren et al., 2022 ). On average, 60% of the variation in gluten quality is caused by genotype differences. Specifically, changes in HMW-GS have been shown to explain 20–30% of the variation in gluten strength in common wheat(Carlos Guzmán et al., 2022 ). The effects of LMW-GS on gluten quality differ between common wheat and durum wheat. In common wheat, the impact of LMW-GS on gluten characteristics is usually smaller compared to HMW-GS, accounting for 10–20% of the observed variation(Carlos Guzmán et al., 2022 ). In contrast, LMW-GS have a greater impact on gluten quality than HMW-GS in durum wheat. HMW-GS and LMW-GS are linked together through intermolecular disulfide bonds to form large polymers, providing elasticity to wheat dough, while gliadins mainly exist in monomeric form, endowing wheat dough with extensibility (Huo et al., 2018 ). In the wheat gluten network, HMW-GS plays a skeletal role, while an increase in LMW-GS content can significantly improve the gluten strength of the dough (Li et al., 2020 ). The composition and proportion of high and low molecular weight glutenin directly affect the final functional properties of wheat flour (Dizlek and Awika, 2023 ). The Glu-3 loci encoding LMW-GSs are located on the short arms of group 1 chromosomes, with approximately 10–20 to 30–40 copies ranging in the genomes of different wheat varieties (Huang and Cloutier, 2008 ;Zhang et al., 2011 ;Zhang et al., 2013 ;Al-Khayri et al., 2023 ), while the Glu-1 loci encoding HMW-GS are located on the long arm of group 1 chromosomes, with a total of 6 copies, 2 on each of the A, B, and D subgenomes(Al-Khayri et al., 2023 ). Gliadins can be subdivided into α, δ, γ, and ω-gliadins based on their molecular characteristics (Ren et al., 2022 ). Among them, α-gliadin genes encoded by the Gli-2 loci are mapped to the short arm of group 6 chromosomes, while the other three types of gliadins are encoded by Gli-1 loci that are located on the short arm of group 1 chromosomes (Huo et al., 2017 ;Huo et al., 2018 ). The total copy number of these three types of gliadin genes in different wheat genomes is around 40 (Dong et al., 2016 ). It is worth noting that Glu-3 and Gli-1 are two tightly linked loci located on the short arm of group 1 chromosomes in wheat, and are closely cross-linked with multiple disease-resistant genes such as Lr21 , Pm3 . that also exhibit multiple copies in this region (Dong et al., 2010 ;Dong et al., 2016 ;Huo et al., 2018 ). Therefore, wheat end-use quality is a highly complex trait involving many genes. In terms of protein content, LMW-GS encoded by Glu-3 accounts for about 60–80% of total glutenins, while δ, γ, and ω gliadins encoded by Gli-1 account for about 75% of total gliadins (Zhang et al., 2013 ). Therefore, LMW-GS and these three gliadins are the main components of storage proteins in wheat grains, which have an important impact on the processing quality and nutritional quality of wheat flour (Huo et al., 2018 ). In the case of the same HMW-GS genes possessed, the types, quantities, and expression levels of LMW-GS genes and gliadin genes may have a significant impact on the quality stability of high-strong gluten wheat. The LMW-GS genes and gliadin genes are both single exon genes, with coding regions generally ranging from 800 to 1200 bp in length, and the molecular weight of the mature protein they encode varies between 30 and 45 kDa(Zhang et al., 2013 ). Due to the high copy number of LMW-GS genes and gliadin genes in the wheat genome, the high sequence similarity between different copies, the smaller molecular weight of encoded proteins compared to HMW-GS, and the overlap electrophoresis patterns between LMW-GS and gliadins in SDS-PAGE (sodium dodecyl sulfate-polyacrylamide gel electrophoresis) (Singh et al., 1991 ;Al-Khayri et al., 2023 ), it is difficult to distinguish them, even using MALDI-TOF-MS (Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry) technology(Liu et al., 2010 ;Zhang et al., 2013 ;Peng et al., 2016 ). Thus, significant difficulties appeared in the research on the composition of LMW-GS genes and gliadin genes in different wheat varieties and their relationship with end-use quality parameters (Zhang et al., 2011 ). Up to now, research on the LMW-GS genes has rarely delved into molecular levels such as gene editing and RNAi (Tyler et al., 2015 ;Qu et al., 2023 ). Instead, employing conserved or specific primer amplification, it mainly focused on LMW-GS gene copy number identification, chromosome localization, homologous cloning, real-time quantitative expression analysis, and their relationship with end-use quality traits of wheat (Ikeda et al., 2002 ;Yue et al., 2005 ;Li et al., 2010 ;Zhang et al., 2011 ;Zhang et al., 2013 ;Lee et al., 2016 ). LMW-GS gene identification studies were also conducted through capillary electrophoresis or Sanger sequencing after polymerase chain reaction (PCR) amplification (Zhao et al., 2004 ;Zhang et al., 2011 ;Zhang et al., 2013 ). However, due to the scarcity of complete sequence information, the complexity of the large gene families, and the unique repetitive domains that existed in the coding region, the above PCR based research or transcriptome sequencing based de novo assembly for expression analysis of LMW-GS genes has always been challenging (Huo et al., 2018 ). For example, how can LMW-GS genes with more than 99% similarity in CDS or those that have recently undergone gene duplication or triplication are identified? To date, there have been no reports addressing this issue; all reported results have treated them as the same locus or gene. Although recent advances in DNA sequencing technology and large genome assembly have enabled scientists to complete the sequencing and assembly of over 20 wheat genomes (Akpinar et al., 2018 ;Walkowiak et al., 2020 ;Sato et al., 2021 ;Shi et al., 2022 ;Jia et al., 2023 ), BLASTN analysis using the Chinese Spring (CS) LMW-GS gene sequences as queries revealed that the identification of LMW-GS genes in most genome annotations remains incomplete. Some genome annotations did not even include any LMW-GS genes. This is related to the different focus of the genome sequencing and annotation research group. Analysis showed that the quality of these genome sequencing and assembly is indeed very high, but due to the complexity of this LMW-GS gene family and the lack of specialized gene annotation tools, the annotation of most genes related to this family in the species that have completed whole genome sequencing is not complete enough. Previous studies have shown that different LMW-GS have similar structures in terms of amino acid composition (Dong et al., 2010 ), starting with a signal peptide (Sig, removed upon maturation) consisting of 20 amino acids at the N-terminus, followed by an N-terminal sequence (N-ter) containing 13 amino acids, then a central repetitive domain (Rep) consisting of approximately 70–186 amino acids, and finally a C-terminal (C-ter) (Renato and Stefania, 2004 ;Ren et al., 2022 ). According to the first amino acid residue in the mature protein, LMW-GS can be divided into three types: LMW-s, LMW-m, and LMW-i (Renato and Stefania, 2004 ;An et al., 2006 ;Dong et al., 2010 ). The difference in molecular weight of LMW-GS is mainly due to the varying number of repeating units in the Rep. It is generally believed that changes in the number of repeating units are mainly caused by the insertion or deletion of these units. According to the amino acid composition of C-ter, it can be further divided into cysteine-rich region I (C-terI), glutamine-rich region II (C-terII), and C-terminal conserved region (C-terIII). LMW-GS generally contains 8 cysteine residues, of which 7 are located in the C-ter region, and at least one is involved in the formation of the interchain S-S bonds (Lombardi et al., 2009 ). Based on the amino acid composition characteristics of LMW-GS mentioned above, we have developed a Perl program package named LMWgsFinder. This study re-identified LMW-GS genes in 26 whole genome sequences of wheat and related varieties across the grass family using LMWgsFinder. Additionally, experimental verification of related genes, analysis of transcriptome data supporting the annotated LMW-GS genes, and comparative sequence analysis of these genes were conducted. Several new results that are rarely reported were obtained, offering valuable resources for further in-depth research on LMW-GS gene function at the molecular levels such as gene editing and RNAi, and the improvement of wheat end-use quality traits through molecular-assisted selection and molecular design breeding. Materials and methods Genome sequence resources utilized in this study The genome sequences of different wheat varieties and their closely related species assembled at the chromosome levels were used as the analysis data source (Akpinar et al., 2018 ;Walkowiak et al., 2020 ;Sato et al., 2021 ;Shi et al., 2022 ;Jia et al., 2023 ) and the LMW-GS genes in the whole genome were re-identified using a perl scripts package (named LMWgsFinder). The genome sequences and their versions used in this study were: Triticum_aestivum _CSv1.0 (TaCSv1.0), Triticum_aestivum _CSv2.1 (TaCSv2.1), Triticum_aestivum _aikang58 (TaAK58), Triticum_aestivum _kenong9204 (TaKN9204), Triticum_aestivum _Fielder.V1 (TaFiel), Triticum_aestivum _arinalrfor.PGSBv2.1 (TaArin), Triticum_aestivum _jagger.PGSBv2.1 (TaJagg), Triticum_aestivum _julius.PGSBv2.1 (TaJuli), Triticum_aestivum _kariega.Tae_Kariega_v1 (TaKari), Triticum_aestivum _lancer.PGSBv2.1 (TaLanc), Triticum_aestivum _landmark.PGSBv2.1 (TaLand), Triticum_aestivum _mace.PGSBv2.1 (TaMace), Triticum_aestivum _mattis.PGSBv2.1 (TaMatt), Triticum_aestivum _norin61.PGSBv2.1 (TaNori), Triticum_aestivum _stanley.PGSBv2.2 (TaStan), Triticum_turgidum .Svevo.v1 (TrTu), Triticum_dicoccoides .WEWSeq_v.1.0 (Td), Triticum_spelta .PGSBv2.0 (Ts), Triticum_urartu .IGDB (Tu), Aegilops_tauschii .Aet_v4.0 (Aet), Setaria_italica _v2.0, Brachypodium_distachyon _v3.0, Hordeum_vulgare .MorexV3 (barley), Oryza_sativa .IRGSP-1.0 (rice), Sorghum_bicolor _NCBIv3 (sorghum), Zea_mays.Zm-B73-REFERENCE-NAM-5.0 (corn). The download addresses for the genome sequence and corresponding annotation files (CDS, Protein sequence, and GFF3) were: TaCSv1.0 and TaCSv2.1, URGI ( https://urgi.versailles.inra.fr/ ); TaAK58, Wheatdb ( https://triticeae.henau.edu.cn/aikang58/ ) (Jia et al., 2023 ); TaKN9204, CNCB ( https://download.cncb.ac.cn/gwh/Plants/ ) (Shi et al., 2022 ); Fielder, NBRP ( https://shigen.nig.ac.jp/wheat/komugi/genome/ ) (Sato et al., 2021 ); Others, EnsemblPlants (release-57) ( http://plants.ensembl.org/info/data/ftp/index.html ). Development of LMWgsFinder package A Perl (v5.26.2, built for x86_64-linux-thread multi) scripts package named LMWgsFinder ( https://github.com/slzhang20088/LMWgsFinder ) was developed in this study, and this package can be used to identify LMW-GS genes in the genomes of different wheat cultivars and their related species and to obtain related sequences and information. For more details, please refer to the supplementary information. Naming rules for LMW-GS genes To facilitate sequence analysis, this study adopted the following rules for naming LMW-GS genes. The prefixes " Glu-3_ " or "LMW-GS_" is omitted for all LMW-GS gene markers across subgenomes of A, B, and D. Therefore, designations such as Glu-3_A5 or LMW-GS _A5 are simplified to A5 , and so forth. For another example, in the name of Tastan_1DgD10p4292433a1065bp , “Ta” represents Triticum aestivum ; The third to sixth characters (i.e. stan, here) represent wheat varieties (arin, arinalrfor; jagg, jagger; juli, julius; kari, kariega; lanc, lancer; land, landmark; mace, mace; matt, mattis; nori, norin61; stan, stanley); The characters between the underline “_” and “g” represent the No. of chromosomes (1D, here; If it is “cont” or “scaf”, they represent contig and scaffold respectively); The characters between “g” and “p” represent LMW-GS genes ( D10 , here); “g” indicates gene; The number between “p” and “a”, or between “p” and “s” represents the starting position of the LMW-GS gene on the corresponding chromosome (contig or scaffold, if happened); “p” means the starting position (4,292,432, here); “s” or “a” represent the positive (sense) or negative (anti-sense) strands of DNA, respectively; The numbers before “bp” and after “a” or “s” indicate the length of CDS region of the LMW-GS gene (1065bp, here). Therefore, Tastan_1DgD10p4292433a1065bp represents the D10 gene located on the 1D chromosome antisense chain in the wheat variety Stanley, with a coding region starting at 4,292,433 and a coding region length of 1065bp. If there is a dot in the middle of the gene name, for example, “ A2.A1 ”, it indicates that the Begin_90bp is aligned to A2 , but the End_90bp is aligned to A1 . Phylogenetic analysis of LMW-GS family members The alignment of LMW-GS protein sequences was performed using ClustalW (Thompson et al., 2002 ), and an NJ tree was constructed using MEGA11 with 1000 bootstrap replications, and bootstrap values were used to estimate the relative support on each branch (Tamura et al., 2021 ). The phylogenetic tree was beautified by Evolview-v2 (He et al., 2016 ) ( https://evolgenius.info//evolview-v2/ ). Non-synonymous substitution rate (Ka) and synonymous substitution rate (Ks) were calculated by TBtools software (C. Chen et al., 2020 ). The divergence time was deduced using this formula: T = Ks/2λ × 10 − 6 million years ago (MYA), λ = 6.5 × 10 − 9 (Tamura et al., 2021 )。 Transcriptome data analysis The RNAseq data were downloaded from NCBI ( https://www.ncbi.nlm.nih.gov/sra ). Use the FastQC software to perform quality control on the FASTQ format sequences, and then utilize the FastX Toolkit's fastx_trimmer tool to trim the quality-controlled sequences to maintain a consistent read length by using the selected parameters -Q33 and -l 130. Next, the STAR (Spliced Transcripts Alignment to a Reference) software package was used for index construction and reads mapping, and finally, the RSEM (RNA-Seq by Expectation-Maximization) software package was employed to calculate gene expression levels. The 3D column chart of the gene expression profile was drawn using Excel 2010. Validation of LMW-GS genes with structural variations in the CDS region PCR amplification and Sanger sequencing methods were used to validate the structural variations in CDS regions of A3 and B5 genes that existed in the genomes of different wheat cultivars and their closely related species in this research. We employed a 50 µl PCR amplification system, including 25 µl Taq DNA polymerase (21502-01, 2× Magic Green Taq SuperMix, Shanghai Tolo Port Biotech), 2.5 µl of upstream and downstream primers each, 3 µl DNA template, and 17 µl sterile ultrapure water. The specific primers for A3 and B5 genes are shown in Table S13 . The genomic DNA of the wheat flag leaf at the heading stage was used as template. The PCR amplification program adopted was as follows: pre-denaturation at 95°C for 3 minutes, followed by 35 cycles of 95°C for 10 seconds, 65°C for 10 seconds, and 72°C for 1 minute, with a final extension at 72°C for 5 minutes. According to the results of agarose gel electrophoresis, the target PCR products were recovered from the gel and purified, then sequenced by Sangon Biotech (Shanghai) Company using the Sanger method. RNA extraction and quantitative real-time PCR analysis The young seeds samples were collected from the Yuandi experimental field in Xinxiang, Henan under consistent experimental conditions, and immediately frozen in liquid nitrogen for future use. Total RNA was extracted by Trizol (ET121, TransZol Plant, TransGen Biotech) from the young seeds of wheat on different days after anthesis (DAA), and the first strand cDNA was synthesized by a reverse transcription reagent kit (R122-01, HiScript Q RT SuperMix, Vazyme Biotech). A SYBR Green detection kit (22208, 2xQ5, Shanghai Tolo Port Biotech) was used to perform quantitative reverse transcription-PCR (qRT-PCR) assays on a real-time PCR system (QuantStudio 6 Flex, Applied Biosystems, USA). The qPCR reaction was performed using a 20 µL system with a two-step standard procedure (95°C for 10 seconds, 60°C for 30 seconds) for 40 cycles. The melting curve was obtained using the instrument’s default program. The primers for qRT-PCR were designed using AlleleID 6.01 and their sequences for the selected genes are listed in Table S13 . The specificity of the primers was checked using the WheatOmics website (Ma et al., 2021 ). The previously reported TaTubulin gene was employed as the housekeeping gene and the expression data were normalized to the TaTubulin gene (Xiang et al., 2011 ). The relative gene expression was calculated with the 2 –ΔCt method (Livak and Schmittgen, 2001 ). Statistical analysis and sequence alignment Two statistical methods were adopted for the experimental data analysis. For two groups comparison, Student’s t-test (two-sided) was used, *, p < 0.05; **, p < 0.01; ***, p < 0.001. For multiple comparisons, ANOVA was used followed by Duncan’s test. The agricolae package in R, Microsoft Excel 2010, GraphPad Prism 8.0, and IBM SPSS Statistics 25 were employed for data analysis (Clarke et al., 2022 ). The RepeatMasker v4.1.4 software was used for annotating repeat sequences. Alignments between two sequences were performed using Lasergene (version 7.1.0) software mainly. Protein multi-sequence alignments were performed using Clustal Omega (Madeira et al., 2024 ) ( https://www.ebi.ac.uk/jdispatcher/msa/clustalo ), and followed analysis of i-, m-, and s-type LMW-GS using Jalview (version: 2.11.3.3). NCBI-BLASTN refers to the use of BLASTN on the NCBI website ( https://blast.ncbi.nlm.nih.gov/Blast.cgi ) for online alignment and sequence retrieval, and Local BLASTN version was NCBI-blast-2.14.0+ (Zhang et al., 2000 ;Morgulis et al., 2008 ). Results Necessary and feasible for the re-identification of LMW-GS genes in grass genomes Based on gene coding region characteristics (Fig. 1A), LMWgsFinder has been developed in this study and offers a more effective and precise method for identifying LMW-GS genes in the genomes of wheat and its related varieties across the grass family compared to previous annotations (For more details, please refer to the supplementary information). This new, specialized annotation tool is crucial for comparative sequence analysis and further functional research on these genes. According to statistical data (Tables 1, 2, and S10), this study identified a total of 291 LMW-GS genes using LMWgsFinder across 20 different wheat varieties and related species. However, no LMW-GS genes were found in the genomes of the remaining 6 species ( Brachypodium distachyon , barley, rice, Setaria italica , sorghum, and corn). The transcriptome data analysis results showed that the genes, identified by LMWgsFinder, and their expression profiles were mostly consistent with the TPM and FPKM analysis results (Fig. 2). In contrast, using the reference genes (i.e. the LMW-GS genes in CS and xy54 [Xiaoyan54]) for BLASTN analysis to identify LMW-GS genes in these 20 different genomes detected only 132 genes which are approximately 45.36% (132/291) of the number identified by LMWgsFinder (Table 3). If the TaCS is not included, only 98 genes (38.13%, 98/257) are detected. This implies that it is both necessary and feasible to carry out the re-identification of LMW-GS genes using LMWgsFinder for wheat and related varieties based on previously published genome data. Comparative sequence analysis of LMW-GS genes in 14 hexaploid wheat cultivars With the rapid advancements in DNA sequencing technologies and large-genome assembly techniques, coupled with decreasing costs of DNA sequencing, international efforts have led to the sequencing and assembly of genomes from nearly 30 wheat varieties (Walkowiak et al., 2020;Sato et al., 2021;Shi et al., 2022;Jia et al., 2023). This includes projects such as the 10+ Wheat Genomes Project (https://10wheatgenomes.com/) (Walkowiak et al., 2020), the sequencing of Triticum spelta , and the genomes of notable wheat cultivars such as Aikang58 (Jia et al., 2023), Kenong9204 (Shi et al., 2022), and Fielder (Sato et al., 2021). To evaluate the performance of LMWgsFinder in identifying LMW-GS genes across different wheat cultivars, the tool was employed to analyze data from 14 hexaploid wheat genomes assembled at the chromosome level. The systematic identification and comparative sequence analysis of LMW-GS genes yielded the following results: A total of 234 LMW-GS genes were identified across the 14 wheat cultivars, comprising 38 Gap-type (Gap; defined by unresolved nucleotides [N] in the CDSs of these genes), 75 Pseudogenes (Pse; containing premature stop codons), 4 Incomplete (Inc; truncated coding regions without stop codons), and 117 Normal genes (Nor; intact coding regions with functional protein-coding capacity) (Table 1). The analysis revealed that the proportion of structurally normal LMW-GS genes was the highest among these genomes, accounting for 50.00% (117/234). Pseudogenes with one or more premature stop codons represented 32.05% (75/234). Notably, 16.24% (38/234) of the LMW-GS genes contained gaps. For instance, in the TaJagg genome, the number of LMW-GS genes with gaps was as high as 8, representing over half of the identified genes in that genome. Additionally, wheat cultivars TaAK58 and TaKN9204, which contain the 1BL/1RS translocation chromosome, do not have the 1BS chromosome in their genomes. Consequently, no LMW-GS genes were detected on the B subgenome in these cultivars (Table 1). Overall, the analysis of the 14 wheat cultivars revealed the following patterns regarding LMW-GS genes: (1) Distribution by subgenome: The A and B subgenomes contain a higher number of LMW-GS genes with gaps, while the D subgenome has the highest number of structurally normal LMW-GS genes (Table 2). (2) Functionally expressed genes: Six LMW-GS genes were identified as structurally normal and capable of expression across all wheat genomes analyzed. These genes are A5, B3, D2, D6, D7 , and D9 (Table 1, Figs. 3 and 5). (3) Pseudogenes: Five LMW-GS genes were identified as pseudogenes (either non-expressed or expressed at low levels). These include A4 (except for TaKari, where it is an incomplete gene), B1 (except for TaJuli_B1 , which has a gap), B5 (except for TaJagg_B5 , which has a gap), D4 , and D5 (Table 1 and Fig. 3). (4) Special Case of A3 Gene: (see the “Rare structural variations in the CDS of LMW-GS genes” sector). (5) New genes identified: Four new type LMW-GS genes ( A2.A1, A2.A6, A5.B4 , and D5.B4 ) were identified across the 14 wheat cultivars, totaling 12 new LMW-GS genes. Among these, two genes ( TaJagg_A5.B4 and TaKari_D5.B4 ) are structurally normal. The results of qRT-PCR confirmed that the D5.B4 gene can be expressed normally in wheat varieties such as Shanyou 225 (SY225), Zhengmai 9023 (ZM9023), Fielder, and Shijiazhuang 8 (SJZ8) (Figs. 5C, F, I, and M), while the A5.B4 gene can be expressed normally in SY225 and SJZ8 (Figs. 5C and M). These findings support that within wheat subgenomes or between subgenomes, LMW-GS homologous genes may form new, functional LMW-GS genes through mechanisms such as homologous recombination, gene duplication, and gene conversion (Wang et al., 2011;Qin et al., 2015;Hu et al., 2020). Further research is needed to elucidate the specific molecular mechanisms underlying these processes. It is noteworthy that this study identified a unique case among the 17 D subgenomes: only the TaMace genome contains both D1 and D10 genes. These two genes exhibit very high DNA and protein sequence similarity, with only 8 single nucleotide polymorphisms (SNPs) (Fig. S18A) and 7 amino acid differences (one of the 8 SNPs does not result in an amino acid change). This indicates that they are likely different alleles of the same gene locus in different wheat varieties (Table S1). The detection of different alleles at the same gene locus within the same genome indicates that the TaMace used for genome sequencing might be heterozygous at the D1/D10 locus. A similar phenomenon was observed in the expression analysis of LMW-GS genes (Figs. 5A-I, L-M). Additionally, our research team has previously completed PacBio sequencing of LMW-GS genes in the genomes of 57 wheat varieties with varying gluten strength. Analysis revealed that the D1/D10 and D2/D9 loci may be heterozygous in several wheat cultivars, including Xinmai 26 (XM26), Zhengmai 366 (ZM366), ZM9023, and SJZ8. Notably, there is a clear trend that higher-quality wheat cultivars exhibit a greater probability of heterozygosity at these loci, with ongoing experimental validation to further explore these findings. The duplication of LMW-GS genes in hexaploid wheat genomes Gene duplication is a common phenomenon in biological evolution and is a significant factor in the diversity of biological traits (Sanchez-Munoz, 2024;Thomas et al., 2024). In this study, we identified six different LMW-GS genes ( A3, B2, B5, B6, D6, and D8 ) that may exhibit gene duplication phenomena mediated possibly by EnSpm-like transposon in the genomes of seven hexaploid wheat cultivars (TaAK58, TaFiel, TaJagg, TaKari, TaLanc, TaNori, and Ts) (Figs. 1 B, C, and D, Tables 1 and S3). For instance, TaFiel_B2-I and TaFiel_B2-II are located approximately 36.2 Kb apart on chromosome 1B (Table S3), with 99.9% similarity in their coding sequences, differing by only one SNP. Similar cases include TaKari_B6-I vs. TaKari_B6-II (~20.7 kb, 99.3%), TaKari_A3-II vs. TaKari_A3-III (~38.5 kb, 100%), and TaLanc_B2-I vs. TaLanc_B2-II (~13.0 kb, 99.8%) (Table S3). Taking the A3, B2, and B5 genes in the CS genome as an example, the assembly of these genes was verified to be good through alignments with the BioNano genome map (Fig. S19). Additionally, the alignments result between the genomic region of the Fielder genome, which includes the B2 gene, and the CS genome BioNano map can roughly indicate that the B2 gene duplication has occurred in this genome (Fig. S19B). Dot plot analyses also revealed that the region harboring the B2 gene in the TaFiel genome has undergone segmental duplication (Fig. S20). Additionally, in the genome sequencing results of TaAK58, TaLanc, and TaNori, we identified four LMW-GS genes in contigs or scaffolds that were not assembled onto chromosomes but have coding sequences completely identical to those of the corresponding genes assembled onto chromosomes. These genes are TaAK58_D8-I, TaLanc_D6-I, TaNori_B6-I, and TaNori_B5-I (Table S3). The failure to assemble these genes onto their respective chromosomes may be related to the complex sequence composition of the genome regions or the heterozygous state of the genomic DNA segment containing these gene loci in the sequenced materials (Huo et al., 2018). Nonetheless, these genes are objectively present. In the five materials (TaAK58, TaArin, TaKN9204, TaMatt, and TaNori), the A6 gene is unique. Regardless of whether the coding region of this gene contains gaps, the distance between A6_I and A6_II on the assembled chromosomes within the corresponding genomes is nearly consistent, approximately 71.1 kb. Aside from genes with gaps, the similarity of the A6 gene coding sequences in the other two materials (TaAK58 and TaKN9204) is also relatively consistent, with 84.9% similarity between TaAK58_A6-I and TaAK58_A6-II , and 84.4% similarity between TaKN9204_A6-I and TaKN9204_A6-II (Table S3 and Fig. S6). It is noteworthy that TaKN9204_A6-II is a structurally normal gene with expression potential, while TaKN9204_A6-I is a pseudogene. In the case of xy54, the A3-2xy54 (structurally normal) and A3-3xy54 (pseudogene) sequences show very high similarity, reaching 98.5%. The Begin_90bp and End_90bp of these two genes are completely identical and are numbered as A6 in this study. Therefore, the A6-I and A6-II genes identified at the corresponding positions on the same chromosome in these materials should be considered as two different LMW-GS genes. The structurally normal TaKN9204_A6-II is a direct ortholog of xy54’s A3-2xy54 ( TaKN9204_A6-II vs. A3-2xy54 , 100%) (Fig. S8B), while the structurally abnormal TaKN9204_A6-I is a direct ortholog of xy54’s A3-3xy54 ( TaKN9204_A6-I vs. A3-3xy54 , 85.6%) (Fig. S8A). Similarly, the B6 gene in the materials TaJagg, TaKari, TaNori, and Ts exhibits a pattern where, despite approximately half of the B6 genes having gaps in their coding regions, the distance between the two B6 genes located on the chromosomes of these four wheat genomes is very close, approximately 20.7 kb (Table S3). In the TaKari genome, both B6 genes ( TaKari_B6-I vs. TaKari_B6-II , 99.3%) are normal, gap-free LMW-GS genes. Based on the Ks value of this gene pairs (0.01169), it is estimated that the B6 gene duplication occurred around 0.899 MYA. To roughly estimate the time of B6 gene duplication in the Ts genome, gaps in the coding region of the B6-I gene were removed, and corresponding bases in the B6-II gene that matched the gaps in B6-I were also removed (Fig. S9). The Ks value calculation showed that the Ks value for Ts_B6-I vs. Ts_B6-II is approximately 0.00466, with a divergence time of ~0.358 MYA. The duplication time of the B2 genes detected in the TaFiel and TaLanc genomes is around 0.325 MYA, with Ks values of ~0.00422 for both TaFiel_B2-I vs. TaFiel_B2-II and TaLanc_B2-I vs. TaLanc_B2-II . These gene duplication events in the various wheat genomes occurred later than the divergence time between common wheat and Td (approximately 0.432 MYA) (Chen et al., 2013). This indicates that these duplications happened independently in hexaploid wheat after it diverged from tetraploid wheat. This also supports the view that Triticum spelta is a derivative of common wheat rather than an ancestor, aligning with the findings of previous studies (Blatter et al., 2004;Wang et al., 2024). Furthermore, the A3-II and A3-III genes in TaKari, as well as the D6-I and D6-II genes in the TaLanc genome, have identical coding sequences (Table S3), indicating that they are recently formed gene duplications. This also indicates that the duplication of LMW-GS genes in the hexaploid wheat genome has been ongoing throughout the evolution and propagation of common wheat. In this study, three copies of the A3 gene were found in TaKari. Aside from TaKari_A3-I (which is a pseudogene), there are two additional structurally normal A3 genes ( TaKari_A3-II and TaKari_A3-III ), located approximately 38.5 kb apart with 100% similarity (Table S3, Figs. S7 and S13). In contrast, the homologous A3 gene in CS is a pseudogene, while the two tetraploid wheat varieties used in this study have normal coding regions for the A3 gene (Table 1). This indicates that the A3 gene underwent multiple duplications in the TaKari genome, potentially resulting in the formation of a functionally lost pseudogene ( A3-I ) through gene inactivation. Alternatively, this could be due to gene conversion from a donor material containing a pseudogene, or it might indicate an assembly error at the A3 locus in the genome. Similarly, this study also identified multiple duplications of the B6 gene in the TaNori genome (Table 1). Using LMWgsFinder, we detected multiple copies of the same LMW-GS gene in different wheat varieties sequenced and assembled by various research groups. This indicates that multiple duplications of the same LMW-GS gene may be a common phenomenon in wheat genomes. However, there have been no relevant reports so far. On the one hand, this is due to the limited number of wheat species for which whole genome sequencing has been completed in the past. On the other hand, it may also be closely related to the complexity of the genes in this family and the lack of an effective identification method. Rare structural variations in the CDS of LMW-GS genes Several cases of rare structural variations associated with the loss or acquisition of LMW-GS gene function have been discovered in this study. Among the 16 hexaploid wheat genomes, the A3 gene exhibits variability. It is structurally normal in the TaAK58 genome (Table 1 and Figs. 4A-C) and the latter two genes of A3 triplication occurred in the TaKari genome (Fig. S13A, Tables 1 and S3) are also structurally normal. In contrast, in most other hexaploid wheat genomes, A3 is identified as a pseudogene, indicating that this gene has undergone functional loss in the majority of hexaploid wheat varieties; while it retains normal structure and expression in a few cases (Table 1 and Fig. 5J). In a previous study, the author discovered, through mining the Wheat Union Database (Wang et al., 2020) (http://wheat.cau.edu.cn/WheatUnion/), that only two materials, MP39 (Zijiehong) and MP116 (Chuanmai42), had normal B5 gene coding regions among 145 MP numbered materials (labeled with “MP” and additional numbers). Among the other 308 materials, all B5 genes were pseudogenes, resulting in a functional gene ratio of only 0.44% (2/453). To verify the correctness of the previously identified B5 gene, this study not only used experimental evidence to confirm that the TaAK58_A3 gene is a structurally and functionally normal LMW-GS gene (Figs. 4A-C and 5J), but also validated the gene structure and expression of the previously discovered B5 gene. The results were as expected, that is, the B5 gene is a functional LMW-GS gene only in a few materials (Figs. 4D-G and 5K), while in most other wheat genomes, and it is a pseudogene (Huo et al., 2018). Moreover, the B5 gene is highly expressed in both MP39 and MP116, indicating it may have good application potential. Comparative sequence analysis of LMW-GS genes in four wheat related species LMWgsFinder is also effective in predicting LMW-GS genes in the genomes of diploid and tetraploid wheat relatives. Dong et al. (2016) conducted a detailed study of the LMW-GS regions in the Aet genome using techniques such as BAC library screening, Roche/454 Titanium platform (GS FLX+), and BioNano optical genome maps. They identified a total of five LMW-GS genes in this genome, all with normal coding regions. In this study (Tables 1 and S5), LMWgsFinder also annotated five LMW-GS genes in the Aet genome, four of which ( D9, D3, D6, and D7.D11 ) are consistent with Dong et al.’s results. The D10 gene showed a high degree of similarity (99.81%) with the result ( AetG53-LMW1 ) from Dong et al. (2016). The difference is that after 295 bp in the Aet_1DgD10p5071820a1067bp gene, there are two additional bases, AG, which leads to an early appearance of a stop codon in the coding region, making it a pseudogene (Figs. S10A-B). The comparison of CDSs indicates that the annotation of the LMW-GS gene by Aet_v4.0 is incomplete, as the D10 gene has not been annotated (Table S5). However, RNA-seq data analysis shows that the D10 gene ( AET1Gv20018400 ) in the Aet genome is expressed (Fig. S10C). This indicates that LMWgsFinder can effectively annotate LMW-GS genes in the Aet genome, regardless of whether the gene structure is normal or not. LMWgsFinder identified two structurally normal LMW-GS genes ( A2.A1 and A2.A6 ) with duplication events in the Tu genome (Table S3), and both gene pairs ( Tu_A2.A1-I vs. Tu_A2.A1-II and Tu_A2.A6-I vs. Tu_A2.A6-II ) exhibited 100% sequence similarity. Notably, Tu_A2.A1-I was not mapped to a chromosome, while the two copies of A2.A6 were located on chromosome 1A (Fig. S13A), approximately 1.4 Mb apart. In the tetraploid TrTu genome, LMWgsFinder identified fewer LMW-GS genes, totaling five. In contrast, the tetraploid Td genome contained a larger number of LMW-GS genes, totaling eight, with a majority being newly formed genes (62.5%, 5/8) (Table 1). The observations above indicate that gene duplication and the formation of new LMW-GS genes are ongoing processes in both hexaploid and tetraploid wheat genomes, highlighting the dynamic nature of LMW-GS gene evolution across different wheat species and ploidy levels (Zhang et al., 2023). In this study, LMWgsFinder identified a total of 23 LMW-GS genes in the diploid (Aet, Tu) and tetraploid (TrTu, Td) wheat. Among these, 60.87% (14/23) of the genes were structurally normal (Table 1). This indicates that after the evolution from diploid and tetraploid wheat to hexaploid wheat, some of the redundant LMW-GS gene copies became pseudogenes (Akhunov et al., 2013;Desjardins et al., 2020). Consequently, the proportion of structurally normal LMW-GS genes in wheat decreased from 60.87% in the diploid and tetraploid relatives to 50.00% (117/234) in wheat, while the proportion of pseudogenes increased from 21.74% (5/23) in the diploid and tetraploid relatives to 32.05% (75/234) in wheat. The proportion of LMW-GS genes with gaps in these four relative species was 17.39% (4/23), which is similar to the proportion found in 14 wheat varieties, which is 16.24% (38/234) (Table 1). This indicates that the occurrence of gap-containing LMW-GS genes is relatively consistent across both wheat relatives and common wheat. Among the 23 LMW-GS genes identified, 11 were new genes with different Begin_90bp and End_90bp sequences, constituting 47.83% (11/23) (Table 1). In contrast, LMWgsFinder identified only 5.13% (12/234) of such new genes with different Begin_90bp and End_90bp sequences in 14 common wheat genomes. This significant difference indicates that during the evolution from wild diploid and tetraploid ancestors to hexaploid common wheat through hybridization and chromosomal duplication, LMW-GS genes experienced considerable divergence (Li et al., 2010). As a result, the conserved 5’ and 3’ ends of LMW-GS genes in common wheat have lower sequence conservation in their diploid and tetraploid relatives. This divergence is reflected in the higher proportion of new genes with different Begin_90bp and End_90bp sequences identified in the wheat relatives compared to common wheat. Due to the limited availability of chromosome-level assemblies for diploid and tetraploid wheat species and the incomplete assembly of some LMW-GS gene loci in genomic sequencing, there may be some limitations in this study’s findings. However, the overall trends observed in this research are still considered to be reliable. Types, cysteine, and disulfide bonds of LMW-GS A total of 149 LMW-GS genes with normal open reading frames (ORFs) were identified across 20 genomes of wheat and its related species in this study. The amino acid length of these genes ranges from 285 to 393 residues (Table S7). The amino acid composition and structural features align with previous studies, exhibiting the typical Sig--(N-ter)--Rep--C-ter structural characteristics (Renato and Stefania, 2004;Dong et al., 2010;Ren et al., 2022) (Fig. S11). Among the 20 genomes, m-type LMW-GS was the most frequent (107), followed by s-type (33) and i-type was the least (9). Structurally, all three types of LMW-GS are highly conserved at both the N-terminal and C-terminal ends. The N-terminal is characterized by motif2, while the C-terminal comprises five motifs: motif4, motif3, motif5, motif8, and motif1 (Fig. S12). Each type contains 8 cysteines. In i-type, all cysteines are located in the C-ter region, whereas in m-type and s-type, only 7 cysteines are found in the C-ter region. The position of the eighth cysteine is the most conserved across all three types, followed by the seventh cysteine (Table S7). Previous research indicates that in i-type LMW-GS, the third and seventh cysteines can form intermolecular disulfide bonds, whereas in s- and m-types, the first and seventh cysteines are involved in intermolecular disulfide bonding (Du et al., 2020) (Fig. S11). The remaining cysteines typically form intramolecular disulfide bonds. The position of the first cysteine is relatively stable in s-type LMW-GS (at the 66th amino acid), while m-type LMW-GS have three possible positions for the first cysteine: the 25th, 46th, and 65th amino acids (Table S7). In terms of amino acid composition, m- and s- types are more similar to each other, especially in the C-ter region (Huang and Cloutier, 2008). However, both the N-ter and C-ter regions of the m-type are more diverse compared to the s-type, and the m-type can be further subdivided into more subtypes. This increased diversity may be related to the ongoing gene expansion in m-type LMW-GS. Phylogenetic analysis of LMW-GS genes To explore the phylogenetic relationships of LMW-GS genes, this study constructed a phylogenetic tree of 160 protein sequences (149 identified in this study and 11 from previous research in the xy54 genome) (Fig. 6). The analysis revealed a significant expansion of m-type LMW-GS genes. I-type LMW-GS genes are found only in the A subgenome (primarily A2 ), while s-type genes are restricted to the B and D subgenomes (mainly B2, B3, B6, D1/D10 , etc.). In contrast, m-type genes are present across all three subgenomes (A, B, and D), including A3/A5, B4, D2/D9, D3, D6, D7/D11 , and D8 .This is consistent with the previous results of Zhang et al. (2013). The D subgenome contributes the most to m-type genes, accounting for 75.70% (81/107) (Fig. S13). This indicates that the expansion of m-type genes is closely associated with the rapid evolution of the Glu3/Gli1 loci in the D subgenome (Huo et al., 2018). Compared with the Glu3/Gli1 homologous region in Aegilops tauschii , the donor of wheat D genome, within a time range of only about 8000 years since the emergence of wheat species, its D subgenome has rapidly evolved to add three additional genes, D4, D5 , and D8 , to this homologous region, among which D8 is an m-type LMW-GS gene (Figs. S13C and S16). The Glu3/Gli1 homologous region of wheat not only contains multiple copies of LMW-GS genes and gliadin genes (γ, δ, ω), but also numerous resistance genes such as Lr21 , Pm3 , that appear as multiple copies (Huo et al., 2018). The presence of these multi-copy genes facilitates DNA recombination and gene conversion in this region, leading to its further rapid evolution (Li et al., 2008). Additionally, the rapid evolution of the Glu3/Gli1 homologous region is closely related to its location at the chromosome ends, where recombination events occur frequently, and the specific reasons still need further research (Choulet et al., 2010;Akhunov et al., 2013;Dong et al., 2016). Origin and evolutionary analysis of novel LMW-GS genes To investigate the origin and evolutionary relationships of the newly identified A2.A1 LMW-GS genes that appear frequently in this study, an evolutionary tree was constructed containing 23 LMW-GS gene sequences (9 selected from NCBI-BLASTN results and the rest identified in this study). The results showed that (Fig. S14A), there were five A2.A1 genes from hexaploid wheat ( TaAK58_A2.A1, TaArin_A2.A1, TaMatt_A2.A1, TaNori_A2.A1, TaKN9204_A2.A1 ), which possess a close genetic relationship with the Tu-DQ857249.1 gene in Triticum urartu (U7); There were four A2.A1 genes ( TaStan_A2.A1, TrTu_A2.A1, Tu_A2.A1-II, Tu_A2.A1-I ), which are closely related to the Tu-KM085254.1 or Tu-KM085273.1 genes in Triticum urartu varieties (PI428255 or PI428200). This indicates that the nine A2.A1 genes identified in this study have different evolutionary origins, but can be divided into two groups. The first five A2.A1 genes are grouped and form a haplotype with A7, A6-I, and A6-II genes; the last four A2.A1 genes form a haplotype with the A2 gene (Fig. S13). The evolutionary relationships of A2.A6 genes are complex. The phylogenetic analysis of 29 related sequences (including 6 A2.A6 genes, 1 A6A2 gene, 4 A2 genes, 5 each of A6-I and A6-II genes, and 8 LMW-GS genes selected based on NCBI-BLASTN search results) (Fig. S14B) showed that four A2.A6 genes ( Td_A2.A6 , TaFiel_A2.A6 , Tu_A2.A6-II , and Tu_A2.A6-I ) were closely related to A2 in the Tu genome ( Tu_A2 ), while the other two A2.A6 genes ( Ts_A2.A6 and TaLand_A2.A6 ) have a relatively close genetic relationship with A6-II genes (such as TaArin_A6-II, TaMatt_A6-II, TaNori_A6-II ). The Td_A6.A2 gene has a close evolutionary relationship with TrTu_A2 and TaStan_A2 genes, as well as with the A6-I gene in multiple genomes (TaArin, TaNori, TaMatt, etc.), indicating that the Td_A6.A2 gene has the same evolutionary origin as A6-I and A2 in the aforementioned materials. However, due to the presence of gaps in these genes, the inferred evolutionary relationships may not be accurate and can only provide a certain degree of reference. In this study, a new LMW-GS gene, named D5.B4 , was identified in the B subgenomes of three materials (Td, TaKari, and TaJagg) (Table 1 and Fig. S13B). BLASTN searches in NCBI revealed that a gene sequence annotated as LMW-GS in the Triticum dicoccoides genome (XM_037590130.1) showed high similarity with the D5.B4 genes identified in the three genomes, with similarities of 100%, 99.73%, and 99.74%, respectively, and query coverage of 100%, 100%, and 90%. Phylogenetic analysis indicates that the D5.B4 genes from these three materials are indeed closely related to LMW-GS genes in the B and D subgenomes of different wheat cultivars and their relatives (Fig. S14D). The B1B3B6.D2 gene in the 1B subgenome of Td is a pseudogene ( Td_B1B3B6.D2 ). NCBI-BLASTN search results showed that this gene exhibits extremely high sequence similarity (99.40%) with an LMW-GS gene (KT156624.1) in the Aegilops sharonensis genome, differing by only six SNPs (Table S9). This gene also shows high similarity (97.99% to 98.8%) with related LMW-GS genes in the Triticum monococcum and Triticum urartu genomes (KR024657.1, KM010188.1, KR024658.1, KM085241.1, KM085237.1, and KM085243.1) (Table S9). Strangely, among the search results, the highest similarity of this gene to an LMW-GS gene in the Triticum dicoccoides genome is only 88.05% (XM_037590130.1) (Table S9). This discrepancy could be attributed to the relatively limited number of LMW-GS genes cloned and sequenced from the Triticum dicoccoides genome, resulting in fewer entries in the relevant databases. Alternatively, it also indicates that the B1B3B6.D2 gene in the Td genome has a complex origin, and its evolution may have involved horizontal gene transfer or gene introgression from related species (Kumar et al., 2019;Xiang et al., 2019). In this study, a novel LMW-GS gene ( Td_B5.D5D8 ) with a normal ORF was identified in the 1B subgenome of Td. NCBI-BLASTN results indicated that a gene from GenBank, with the accession number XM_037591446.1 (from the Triticum dicoccoides genome), exhibited 100% sequence identity and query coverage with Td_B5.D5D8 . However, it is annotated as a gamma-gliadin B-I-like gene. Local BLASTN analysis of the XM_037591446.1 sequence against gliadin genes in the CS and Aet genomes did not yield any matches. Subsequent NCBI-BLASTN searches of this sequence revealed matches exclusively with LMW-GS genes apart from its sequence. This observation casts doubt on the annotation of XM_037591446.1 in GenBank. Consequently, this gene was re-annotated as the LMW-GS gene for evolutionary analysis in this study. NCBI-BLASTN results for the Td_B5.D5D8 gene also showed high DNA sequence similarity with LMW-GS genes from various species of the Aegilops genus, including Aegilops sharonensis , Aegilops speltoides , Aegilops tauschii , Aegilops longissima , Aegilops bicornis , as well as with the B subgenome of wheat. Phylogenetic analysis of the top 12 NCBI-BLASTN hits with the highest bit scores (Fig. S14E) indicated that the evolutionary origin of Td_B5.D5D8 is closely related to LMW-GS genes in the Aegilops sharonensis and Aegilops speltoides genomes. It also highlighted that many orthologous genes in the wheat genome have evolved into pseudogenes (Chen et al., 2008). In this study, a new functional LMW-GS gene ( A5.B4 ) was identified in the TaJagg genome. Although the Begin_90bp and End_90bp of this gene align with the A5 (with a similarity of 93.33% and 6 base mismatches) and B4 (with a similarity of 88.89% and 10 base mismatches) genes, respectively, it was mapped to the 1D chromosome. To verify the correctness of the gene identification and localization, phylogenetic analysis was conducted using the top 20 sequences from NCBI-BLASTN results (Table S8), along with A5 and B4 genes identified in this study from various materials, and the A3-1xy54 ( A5 ) gene from xy54. The results of the phylogenetic analysis revealed that the A5.B4 gene from TaJagg ( Tajagg_1DgA5.B4p3104157s897bp ) is more closely related to LMW-GS genes in the wheat D subgenome (Fig. S14C). The sequence similarity of this gene with corresponding LMW-GS genes in GenBank is 99.89% with common wheat (JF736507.1) and 87.73% with Aegilops tauschii (JX828349.1), both of them with query coverage of 100% (Table S8). This confirms that the A5.B4 gene in the TaJagg genome indeed belongs to the LMW-GS genes on the 1D chromosome. NCBI-BLASTN results also showed that, based on currently available cloned LMW-GS genes worldwide, the A5.B4 gene appears to be rare in wheat (Table S8). However, it is possible that this gene is present in higher frequencies in different wheat genomes but has not yet been cloned or explored. The qRT-PCR results in Figs. 5C and M indicated that the A5.B4 gene is expressed normally in some wheat varieties. Therefore, further investigation is needed to assess the gene’s potential for practical applications in production. Discussion Comparison between LMWgsFinder and other methods for LMW-GS gene identification For cases without reference genome sequences, traditional methods mostly use conservative or specific primers for PCR amplification combined with capillary electrophoresis or Sanger sequencing to identify LMW-GS genes in the genomes of different wheat varieties and their relatives (Zhao et al., 2004 ;Zhang et al., 2011 ;Zhang et al., 2013 ). For the method of PCR amplification followed by Sanger sequencing, the complex DNA sequence composition and high similarity of specific regions among different LMW-GS genes inevitably introduce base mismatches during PCR amplification and bacterial cloning. Particularly when sequencing and comparing LMW-GS gene sequences across a large number of wheat varieties, this method results in relatively high costs and significant time consumption (Zhang et al., 2014 ). When studying the copy number of LMW-GS gene families using second-generation high-throughput sequencing, such as Illumina sequencing, the relatively short read lengths (generally 150-250bp) can lead to the assembly of different gene copies into chimeric sequences due to high sequence similarity among different copies (Hu et al., 2020 ). In contrast, PacBio sequencing, a third-generation sequencing technology, offers high throughput and long read lengths (for example, PacBio Sequel has an average read length of 8–12 kb) (Ren et al., 2022 ). It also allows for the pooling of multiple samples using barcode technology, significantly reducing sequencing costs. Furthermore, this technology does not exhibit GC bias and avoids the errors introduced by PCR amplification and bacterial cloning steps, making it an excellent choice for resolving the challenges of molecular-level identification of LMW-GS genes in wheat (Zhang et al., 2014 ;Ren et al., 2022 ). For cases where reference genome sequences have not been assembled at the chromosome level but resequencing resources are available (Guo et al., 2020 ;Hao et al., 2020 ;Zhou et al., 2020 ;Schulthess et al., 2022 ;Niu et al., 2023 ), previous researchers have used the k -mer-based pangenome analysis method to identify and predict the phenotype of wheat seed storage protein (SSP) genes using resequencing data, and have achieved good results (Zhang et al., 2024 ). However, this method groups all genes with SSP gene sequence similarities exceeding 99% together and selects the longest sequence from each group as a non-redundant representative SSP sequence for SSP gene identification. This approach may overlook newly arisen duplicated genes with sequence similarity above 99%, such as TaKari_B6, TaFiel_B2 , and TaLanc_B2 (Table S3 ). Our research group is currently improving the applicability of LMWgsFinder for resequencing data. It is expected that the improved software will be able to identify newly arisen duplicated LMW-GS genes in resequenced wheat varieties recently completed, and will also obtain complete coding region sequences of some LMW-GS genes. However, the identification results may include some LMW-GS genes with gaps, which mainly depend on the quantity (genome coverage) and quality (the ratio of high-quality reads to total reads) of the resequencing data of the variety. Based on the characteristics of the LMW-GS gene sequence, it is speculated that the identification results contain genes with gaps, which are mainly located in the repeat region in the middle of the genes. This creates favorable conditions for gap-filling in the LMW-GS gene. We believe that the improved version of LMWgsFinder software will assist researchers in the field of wheat plants to fully leverage the extensive wheat genome resequencing data to advance research on LMW-GS gene identification and application. At the same time, it will also achieve the efficient use of a vast amount of wheat genome resequencing data. In cases where reference genome sequences are available, which have been assembled at the chromosome level (Walkowiak et al., 2020 ;Sato et al., 2021 ;Shi et al., 2022 ;Jia et al., 2023 ), using LMWgsFinder for identifying LMW-GS genes in different wheat genomes proves to be more comprehensive, faster, and more reliable compared to direct BLASTN search with known LMW-GS gene sequences (such as those from the published CS and xy54 used in this study). Statistics (Table 1 and S10) show that, in this study, only 98 LMW-GS genes (38.13%, 98/257) were detected using BLASTN search when excluding the genes identified in TaCS. This indicates that, except for the TaCS, the annotation of the LMW-GS gene coding sequence in the corresponding genomes by the other 18 different genome sequencing research groups is not complete enough (Table S10 , Table S12 ). This is not only associated with the different focuses of the wheat whole genome sequencing research group, but also with the intricate sequence composition and the presence of multiple copies of the LMW-GS family genes (Huo et al., 2018 ). In addition to using BLASTN to compare and identify LMW-GS genes based on annotated CDSs from the target genome, an alternative method involves using reference LMW-GS genes as queries to perform BLASTN analysis on the whole genomic DNA sequences of the target species (Dong et al., 2010 ;Huo et al., 2018 ). This approach is feasible for identifying LMW-GS genes in the target genome that are highly similar to the reference gene sequences and do not contain gaps (Table S14 ). For example, in the identification results of BLASTN using the LMW-GS gene CDSs of CS and xy54 as query against whole-genome DNA sequence, only one out of 27 gap-containing LMW-GS genes was accurately identified, accounting for merely 3.7% (1/27) (Table S14 ). Moreover, this method often fails to capture the complete coding sequences of LMW-GS genes accurately and requires extensive manual checking and verification, which significantly reduces research efficiency (Table S14 ). In today’s era, where DNA sequencing costs are decreasing, an increasing number of wheat varieties have completed whole-genome sequencing and assembly (Walkowiak et al., 2020 ;Sato et al., 2021 ;Shi et al., 2022 ;Jia et al., 2023 ). Thus, the development of the LMWgsFinder, which can rapidly, effectively, and precisely identify LMW-GS genes in the entire wheat genome, is of great significance for advancing our understanding of the molecular mechanisms underlying the superior end-use quality possessed by certain wheat cultivars. It also provides a new and useful tool for genetic improvement and molecular design breeding of wheat quality. However, LMWgsFinder has inherent limitations. The software’s reidentification strategy depends on sequence conservation within the N-terminal 90 bp (Begin_90bp) and C-terminal 90 bp (End_90bp) regions of the LMW-GS gene CDS. Consequently, if an LMW-GS gene in the target genome lacks either the N-terminal or C-terminal 90 bp region, LMWgsFinder fails to annotate it. For example, the LMW-GS_ B2 gene exists in the genome of wheat variety Stanley, but it cannot be annotated properly, because of a gap presented in the starting region of this gene (Fig. 1 C). This renders the tool ineffective for identifying incomplete LMW-GS genes in the wheat genome. To address this issue, future versions could integrate alternative algorithms (e.g., homology-based gap filling) to improve robustness. Species-specificity and functional gain/loss variation of LMW-GS genes The LMWgsFinder was also used to identify the LMW-GS genes in the genomes of Brachypodium distachyon , barley, rice, Setaria italica , sorghum, and corn. The results showed that the LMW-GS genes are not present in these genomes, even though the rigor of the search was reduced (identity, 60%; e-value, 1e-5). This finding is consistent with the fact that, among all major cereal crops, only common wheat can be used to make noodles, steamed buns, dumplings, bread, biscuits, and other gluten-containing foods, as other cereals lack gluten proteins (Singh et al., 2024 ). However, unlike this study, Wang et al. ( 2012 ) used proteomics and molecular genetics techniques to investigate the grain glutenin proteins of Brachypodium distachyon L. and found that this plant contains 4–5 copies of LMW-GS genes. It is indicated that the LMW-GS gene sequence in Brachypodium may not be consistent with the known LMW-GS gene coding sequence characteristics in wheat. At least their Begin_90bp and End_90bp sequences differ to some extent (sequence consistency is less than 60%), or their coding sequence length exceeds 2 kb. The specific reasons need further exploration (Wang et al., 2012 ). Through microcollinearity analysis of the D5 gene homologous regions of 12 representative species in the grass family (Y. Chen et al., 2020 ), it was found that the five LMW-GS genes D3, D4, D5, D6 , and D7 in wheat have corresponding orthologous genes only in the Thinopyrum elongatum genome, and only one ( Tel1E01G022700 ) (Fig. S17 ). Based on the materials examined in this study, it is speculated that the LMW-GS gene meeting the identification criteria of this study is present only in several genera of Triticeae, such as the Triticum , Aegilops , Psathyrostachys , Dasypyrum , and Elytrigia . (Table S9 , Figs. S16 and S17). The genomes of other Triticeae species may not contain LMW-GS genes with more than 60% similarity to the Begin-90bp and End-90bp sequences of the corresponding LMW-GS genes found in wheat. This is consistent with the results identified by the author using LMWgsFinder in the whole genome sequences of these representative species. To address the growing demand for high-quality wheat, it is essential to conduct fundamental research on end-use quality-associated genes (e.g., LMW-GS genes) in wheat and its relatives. Such research will advance our understanding of wheat quality traits and facilitate molecular breeding strategies, including marker-assisted selection and genomic design, to improve agronomic and processing properties (Ge et al., 2021 ). Accurate identification and acquisition of the coding sequences of these relevant genes are essential for conducting gene function studies using techniques such as gene editing and genetic transformation (Tyler et al., 2015 ;Qu et al., 2023 ). For the cases of rare structural variations associated with the loss or acquisition of LMW-GS gene function, such as the A3 and B5 genes identified in this research, it cannot be ruled out that inaccuracies in the assembly of the Glu3/Gli1 homologous regions in some genomes might have led to inaccurate identification of LMW-GS genes. With the aid of technologies such as PacBio, Hi-C, Omini-C, and BioNano, the quality of wheat genome sequencing and assembly completed today has significantly improved, with scaffold N50 generally above 20 Mb (Shi et al., 2022 ;Jia et al., 2023 ). Therefore, the probability of DNA fragment misplacement in genome sequencing and assembly completed in recent years is relatively low (Keeble-Gagnere et al., 2018 ). In addition, previous studies have documented the formation of chimeric LMW-GS genes through illegitimate recombination between i-type and m-type genes or between s-type and m-type genes (Hu et al., 2020 ). Similar findings were observed in this study, where 21 new LMW-GS genes were identified in 20 genomes, with their Begin_90bp and End_90bp sequences derived from different LMW-GS genes. Copy number variation of LMW genes and evolution of the Glu3/Gli1 homologous region Copy number variation (CNV) is a common phenomenon in the biological world, closely related to evolutionary processes and environmental adaptation (Walkowiak et al., 2020 ). It is also a major contributor to phenotypic diversity and is speculated to be caused by chromosomal rearrangement, transposable elements (TEs) activity, or polyploidy (Bariah et al., 2020 ). CNV may have significant biological implications for species-specific genomic composition, evolutionary processes, and phylogenetics, as well as the expression and regulation of genes in specific genomic regions (Bariah et al., 2020 ). This study found significant CNVs of LMW-GS genes in the genomes of different wheat varieties and their closely related species (Table S11 ). Based on the materials involved in this study, the copy number of LMW-GS genes in the A and B subgenomes ranged from a maximum of 6 to a minimum of 2 (excluding varieties with 1BL/1RS translocation backgrounds in the B subgenome); in the D subgenome, the number ranged from a maximum of 9 to a minimum of 6; from the perspective of the entire genome, the copy number distribution range of LMW-GS genes varies from 13 to 20(Table S11 ). This result is consistent with the previous research conclusion of 10–20 copies, but differs significantly from the previous research results of 30–40 copies (Huang and Cloutier, 2008 ;Zhang et al., 2011 ;Zhang et al., 2013 ;Al-Khayri et al., 2023 ). The study inferring the latter conclusion may involve some undetected gene duplication or triplication in the wheat genome at that time, as well as heterozygous sites of LMW-GS gene families possibly present in the genome. For example, Ikeda et al. identified 12 LMW-GS genes (3, 2, and 7 in the A, B, and D subgenomes, respectively) in Norin61 by analyzing PCR products amplified with LMW-GS gene-specific primers (Ikeda et al., 2002 ). However, 20 copies of LMWGS genes were identified in the Norin61 genome in this study, where B5 and B6 underwent duplication and triplication, respectively. The causes of CNV in LMW-GS genes may be related to homologous recombination and chromosome rearrangements, such as inversions and translocations following polyploid formation (Bariah et al., 2020 ). These evolutionary events are closely associated with phenomena such as gene duplication, gene triplication, and gene loss. Additionally, the presence of numerous TEs in homologous regions is also related, as TE-mediated ectopic recombination can lead to dramatic chromosomal rearrangements and gene copy number variation (Bonchev and Willi, 2018 ). This study identified that in three of the 20 genomes analyzed, large segmental inversions/translocations occurred in the D sub-genome’s Glu3/Gli1 homologous regions, such as from D1/D10 to D8 in TaLanc and TaMatt, and from D10 to D5 in TaMace (Figs. S13 and S15). At least four varieties (TaJagg, TaLanc, TaMatt, Tu) also exhibited large segmental inversions in the A sub-genome’s Glu3/Gli1 homologous regions (Fig. S13 ). In addition, structural variations such as DNA fragment insertions or deletions were observed in some wheat genomes (Fig. S15 ). Microcollinearity analysis results indicated that compared with plants in the Ehrhartoideae (rice) and Panicoideae (sorghum, corn, etc.) subfamilies, the Glu3/Gli1 homologous region of Triticeae plants in the Pooideae subfamily have undergone rapid evolution (Gao et al., 2007 ;Dong et al., 2016 ;Huo et al., 2018 ) (Fig. S17 ). This study found that LMW-GS genes exhibit gene duplication and triplication in multiple wheat genomes. TaNori and TaKari, with B6 and A3 genes respectively, experienced LMW-GS gene tripling (Table 1 and S3). In addition, seven LMW-GS genes ( B2, B5, B6, D6, D8, A2.A1 , and A2.A6 ) were identified with gene duplication in eight genomes (Ts, TaAK58, TaFiel, TaJagg, TaLanc, TaNori, TaKari, and Tu) (Table 1 and S3). Of particular note, both TaNori and TaKari genomes exhibited not only gene triplication but also gene duplication events (TaNori, B6, B5 and TaKari, A3 , B6 ) (Table 1 and S3). Additionally, gene duplication was detected in two genes (TaLanc’s B2 and D6 , Tu’s A2.A1 and A2.A6 ) in the TaLanc and Tu genomes. Despite the analysis of LMW-GS gene triplication and gene duplication mentioned above involving 5 genes that could not be localized to chromosomes ( TaAK58_D8, TaLanc_D6, TaNori_B6, TaNori_B5, Tu_A2.A1 ), these genes are objectively present. The inability to localize them to chromosomes may be due to the complex DNA composition of the Glu3/Gli1 homologous regions in the corresponding genomes, such as the presence of numerous TEs or high segmental GC content (Gao et al., 2007 ;Huo et al., 2018 ). The D8 and B5 genes in the CS genome were initially not localized to the corresponding chromosomes, but with the aid of various technologies including BioNano, these genes were accurately positioned on specific chromosomes (Zhu et al., 2021 ). It was found that these two genes were located far from the main clusters of LMW-GS genes on the chromosomes (distance from B4 to B5 is 16,132.4 kb, and from D7 to D8 is 1,783.0 kb) (Fig. S13 ). The clustering of LMW-GS gene duplication events in specific wheat lineages may be attributed to the following mechanisms: (1) Chromosomal Structural Variations: Structural variations (e.g., tandem duplications, inversions) in the genomes of different hexaploid wheat cultivars may increase the probability of homologous recombination, promoting the spread of gene duplications within subpopulations. (2) Artificial Selection: LMW-GS genes encode glutenin subunits critical for dough elasticity and processing quality. In specific lineages, the clustering of duplications may be driven by artificial selection pressure (e.g., preferential retention of alleles linked to superior gluten traits during breeding), leading to the fixation and enrichment of certain duplicated variants. (3) Lineage-Specific Transposon Dynamics: Transposon activity may vary across wheat lineages. In some cultivated varieties, long-term domestication or breeding selection may weaken transposon silencing mechanisms, resulting in increased duplication frequency at specific genomic regions (e.g., Glu-3 loci). To infer the mechanism of gene duplication, sequence comparison analysis of the 1Mb homologous region of genomic DNA harboring A3 , B2 , B6 genes with chromosomal localization information was conducted, and it can be inferred that EnSpm-like transposon activity may mediate the duplication of LMW-GS genes in the wheat genomes (Fig. 1 B, C, and D). These data support that sequence variations, including gene duplication, deletion, inversion, and translocation, may occur frequently and independently in the Glu-3 homologous region evolution of different wheat genomes (Walkowiak et al., 2020 ) (Table 1 , Figs. S13 and S15). Declarations Funding This work was funded by the National Natural Science Foundation of China (Grants 31571667 and U1204315) and was also partially supported by the Key Scientific and Technological Projects in Henan Province (222102110376 and 242102111156). Author contributions SZ conceived this study. XS and TL participated in the qRT-PCR data collection and gene structural validation. SZ, XS, YW, DX, HG, YG, JZ, HS, and DL analyzed the data, and prepared figures and/or tables. XS, TL, and YF prepared the mRNA and cDNA samples. SZ wrote the manuscript. HH, ZR, and YG reviewed the drafts of the paper. All authors have read and approved the final manuscript. Acknowledgments The authors would like to thank Professor Daowen Wang and Xiuying Kong for their careful review of the manuscript and valuable comments. We thanks to Professor Mingcheng Luo’s group (Department of Plant Sciences, University of California, Davis, CA 95616, U.S.A.) for their helps on the BioNano genome map alignments with the sequences of genomic regions harboring the LMW-GS genes of wheat 1A and 1B subgenome. The authors also wish to thank Professor Jizeng Jia, Lifeng Gao, Chenyang Hao, Guangyao Zhao, Lingli Dong, and Dongcheng Liu for their assistance in collecting wheat cultivars and for their professional advice and opinions related to this article. We also appreciate all the people whose data were used in this study. Costs for open access publishing were funded by the National Natural Science Foundation of China (Grants 31571667 and U1204315) and was also partially supported by the Key Scientific and Technological Projects in Henan Province (222102110376 and 242102111156). Supporting Information Additional Supporting Information may be found in the online version of this article. Supporting figures: Figure S1 ~ S20. Supporting tables: Table S1 ~ S14. Text-based Supporting Information: Appendix S1. Data availability statement The original contributions presented in this study are included in the article/Supplementary material; further inquiries can be directed to the corresponding authors. Competing interests The authors have declared that no competing interests exist. References Akhunov ED, Sehgal S, Liang H, Wang S, Akhunova AR, Kaur G et al (2013) Comparative analysis of syntenic genes in grass genomes reveals accelerated rates of gene structure and coding sequence evolution in polyploid wheat. Plant Physiol 161:252–265. 10.1104/pp.112.205161 Akpinar BA, Biyiklioglu S, Alptekin B, Havrankova M, Vrana J, Dolezel J et al (2018) Chromosome-based survey sequencing reveals the genome organization of wild wheat progenitor Triticum dicoccoides. Plant Biotechnol J 16:2077–2087. 10.1111/pbi.12940 Al-Khayri JM, Alshegaihi RM, Mahgoub EI, Mansour E, Atallah OO, Sattar MN et al (2023) Association of High and Low Molecular Weight Glutenin Subunits with Gluten Strength in Tetraploid Durum Wheat (Triticum turgidum spp. Durum L). Plants (Basel) 12. 10.3390/plants12061416 An X, Zhang Q, Yan Y, Li Q, Zhang Y, Wang A et al (2006) Cloning and molecular characterization of three novel LMW-i glutenin subunit genes from cultivated einkorn (Triticum monococcum L). Theor Appl Genet 113:383–395. 10.1007/s00122-006-0299-x Bariah I, Keidar-Friedman D, Kashkush K (2020) Identification and characterization of large-scale genomic rearrangements during wheat evolution. PLoS ONE 15:e0231323. 10.1371/journal.pone.0231323 Blatter RH, Jacomet S, Schlumbaum A (2004) About the origin of European spelt (Triticum spelta L.): allelic differentiation of the HMW Glutenin B1-1 and A1-2 subunit genes. Theor Appl Genet 108:360–367. 10.1007/s00122-003-1441-7 Bonchev G, Willi Y (2018) Accumulation of transposable elements in selfing populations of Arabidopsis lyrata supports the ectopic recombination model of transposon evolution. New Phytol 219:767–778. 10.1111/nph.15201 Carlos Guzmán MI, Ibba JB, lvarez (2022) Mike Sissons, and Morris., C. Wheat Quality, in Wheat Improvement , eds. M.P. Reynolds & H.J. Braun. Springer International Publishing), 177–193 Cavalet-Giorsa E, Gonzalez-Munoz A, Athiyannan N, Holden S, Salhi A, Gardener C et al (2024) Origin and evolution of the bread wheat D genome. Nature. 10.1038/s41586-024-07808-z Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, He Y et al (2020) TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol Plant 13:1194–1202. 10.1016/j.molp.2020.06.009 Chen F, Zhao F, Xu C, Xia G (2008) Molecular characterization of LMW-GS genes from a somatic hybrid introgression line II-12 between Triticum aestivum and Agropyron elongatum in relation to quick evolution. J Genet Genomics 35:743–749. 10.1016/S1673-8527(08)60230-1 Chen Q, Kang HY, Fan X, Wang Y, Sha LN, Zhang HQ et al (2013) Evolutionary history of Triticum petropavlovskyi Udacz. et Migusch. inferred from the sequences of the 3-phosphoglycerate kinase gene. PLoS ONE 8:e71139. 10.1371/journal.pone.0071139 Chen Y, Song W, Xie X, Wang Z, Guan P, Peng H et al (2020) A Collinearity-Incorporating Homology Inference Strategy for Connecting Emerging Assemblies in the Triticeae Tribe as a Pilot Practice in the Plant Pangenomic Era. Mol Plant 13:1694–1708. 10.1016/j.molp.2020.09.019 Choulet F, Wicker T, Rustenholz C, Paux E, Salse J, Leroy P et al (2010) Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and transposable element spaces. Plant Cell 22:1686–1701. 10.1105/tpc.110.074187 Clarke JL, Qiu Y, Schnable JC (2022) Experimental Design for Controlled Environment High-Throughput Plant Phenotyping. Methods Mol Biol 2539:57–68. 10.1007/978-1-0716-2537-8_7 Desjardins SD, Ogle DE, Ayoub MA, Heckmann S, Henderson IR, Edwards KJ et al (2020) MutS homologue 4 and MutS homologue 5 Maintain the Obligate Crossover in Wheat Despite Stepwise Gene Loss following Polyploidization. Plant Physiol 183:1545–1558. 10.1104/pp.20.00534 Dizlek H, Awika JM (2023) Determination of basic criteria that influence the functionality of gluten protein fractions and gluten complex on roll bread characteristics. Food Chem 404:134648. 10.1016/j.foodchem.2022.134648 Dong L, Huo N, Wang Y, Deal K, Wang D, Hu T et al (2016) Rapid evolutionary dynamics in a 2.8-Mb chromosomal region containing multiple prolamin and resistance gene families in Aegilops tauschii. Plant J 87:495–506 Dong L, Zhang X, Liu D, Fan H, Sun J, Zhang Z et al (2010) New insights into the organization, recombination, expression and functional mechanism of low molecular weight glutenin subunit genes in bread wheat. PLoS ONE 5:e13548 Du X, Wei J, Luo X, Liu Z, Qian Y, Zhu B et al (2020) Low-molecular-weight glutenin subunit LMW-N13 improves dough quality of transgenic wheat. Food Chem 327:127048. 10.1016/j.foodchem.2020.127048 Feldman M, Levy AA (2012) Genome evolution due to allopolyploidization in wheat. Genetics 192:763–774. 10.1534/genetics.112.146316 Gao S, Gu YQ, Wu J, Coleman-Derr D, Huo N, Crossman C et al (2007) Rapid evolution and complex structural organization in genomic regions harboring multiple prolamin genes in the polyploid wheat genome. Plant Mol Biol 65:189–203 Ge W, Gao Y, Xu S, Ma X, Wang H, Kong L et al (2021) Genome-wide identification, characteristics and expression of the prolamin genes in Thinopyrum elongatum. BMC Genomics 22:864. 10.1186/s12864-021-08088-x Guo W, Xin M, Wang Z, Yao Y, Hu Z, Song W et al (2020) Origin and adaptation to high altitude of Tibetan semi-wild wheat. Nat Commun 11:5085. 10.1038/s41467-020-18738-5 Hao C, Jiao C, Hou J, Li T, Liu H, Wang Y et al (2020) Resequencing of 145 Landmark Cultivars Reveals Asymmetric Sub-genome Selection and Strong Founder Genotype Effects on Wheat Breeding in China. Mol Plant 13:1733–1751. 10.1016/j.molp.2020.09.001 He Z, Zhang H, Gao S, Lercher MJ, Chen WH, Hu S (2016) Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees. Nucleic Acids Res 44:W236–241. 10.1093/nar/gkw370 Hu X, Dai S, Yan Y, Liu Y, Zhang J, Lu Z et al (2020) The genetic diversity of group-1 homoeologs and characterization of novel LMW-GS genes from Chinese Xinjiang winter wheat landraces (Triticum aestivum L). J Appl Genet 61:379–389. 10.1007/s13353-020-00564-6 Huang XQ, Cloutier S (2008) Molecular characterization and genomic organization of low molecular weight glutenin subunit genes at the Glu-3 loci in hexaploid wheat (Triticum aestivum L). Theor Appl Genet 116:953–966. 10.1007/s00122-008-0727-1 Huo N, Dong L, Zhang S, Wang Y, Zhu T, Mohr T et al (2017) New insights into structural organization and gene duplication in a 1.75-Mb genomic region harboring the alpha-gliadin gene family in Aegilops tauschii, the source of wheat D genome. Plant J 92:571–583. 10.1111/tpj.13675 Huo N, Zhang S, Zhu T, Dong L, Wang Y, Mohr T et al (2018) Gene Duplication and Evolution Dynamics in the Homeologous Regions Harboring Multiple Prolamin and Resistance Gene Families in Hexaploid Wheat. Front Plant Sci 9:673. 10.3389/fpls.2018.00673 Ikeda TM, Nagamine T, Fukuoka H, Yano H (2002) Identification of new low-molecular-weight glutenin subunit genes in wheat. Theor Appl Genet 104:680–687. 10.1007/s001220100756 Iwgsc (2014) A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345:1251788. 10.1126/science.1251788 Jia J, Zhao G, Li D, Wang K, Kong C, Deng P et al (2023) Genome resources for the elite bread wheat cultivar Aikang 58 and mining of elite homeologous haplotypes for accelerating wheat improvement. Mol Plant 16:1893–1910. 10.1016/j.molp.2023.10.015 Jia J, Zhao S, Kong X, Li Y, Zhao G, He W et al (2013) Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature 496:91–95. 10.1038/nature12028 Keeble-Gagnere G, Rigault P, Tibbits J, Pasam R, Hayden M, Forrest K et al (2018) Optical and physical mapping with local finishing enables megabase-scale resolution of agronomically important regions in the wheat genome. Genome Biol 19:112. 10.1186/s13059-018-1475-4 Kumar A, Kapoor P, Chunduri V, Sharma S, Garg M (2019) Potential of Aegilops sp. for Improvement of Grain Processing and Nutritional Quality in Wheat (Triticum aestivum). Front Plant Sci 10:308. 10.3389/fpls.2019.00308 Lee JY, Beom HR, Altenbach SB, Lim SH, Kim YT, Kang CS et al (2016) Comprehensive identification of LMW-GS genes and their protein products in a common wheat variety. Funct Integr Genomics 16:269–279. 10.1007/s10142-016-0482-3 Li X, Ma W, Gao L, Zhang Y, Wang A, Ji K et al (2008) A novel chimeric low-molecular-weight glutenin subunit gene from the wild relatives of wheat Aegilops kotschyi and Ae. juvenalis: evolution at the Glu-3 loci. Genetics 180:93–101. 10.1534/genetics.108.092403 Li XH, Wang K, Wang SL, Gao LY, Xie XX, Hsam SL et al (2010) Molecular characterization and comparative transcriptional analysis of LMW-m-type genes from wheat (Triticum aestivum L.) and Aegilops species. Theor Appl Genet 121:845–856. 10.1007/s00122-010-1354-1 Li Y, Fu J, Shen Q, Yang D (2020) High-Molecular-Weight Glutenin Subunits: Genetics, Structures, and Relation to End Use Qualities. Int J Mol Sci 22. 10.3390/ijms22010184 Liu L, Ikeda TM, Branlard G, Pena RJ, Rogers WJ, Lerner SE et al (2010) Comparison of low molecular weight glutenin subunits identified by SDS-PAGE, 2-DE, MALDI-TOF-MS and PCR in common wheat. BMC Plant Biol 10:124. 10.1186/1471-2229-10-124 Livak KJ, Schmittgen TD (2001) Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 25:402–408. 10.1006/meth.2001.1262 Lombardi A, Barbante A, Cristina PD, Rosiello D, Castellazzi CL, Sbano L et al (2009) A relaxed specificity in interchain disulfide bond formation characterizes the assembly of a low-molecular-weight glutenin subunit in the endoplasmic reticulum. Plant Physiol 149:412–423. 10.1104/pp.108.127761 Ma S, Wang M, Wu J, Guo W, Chen Y, Li G et al (2021) WheatOmics: A platform combining multiple omics data to accelerate functional genomics studies in wheat. Mol Plant 14:1965–1968. 10.1016/j.molp.2021.10.006 Madeira F, Madhusoodanan N, Lee J, Eusebi A, Niewielska A, Tivey ARN et al (2024) The EMBL-EBI Job Dispatcher sequence analysis tools framework in 2024. Nucleic Acids Res 52:W521–W525. 10.1093/nar/gkae241 Matsuoka Y (2011) Evolution of polyploid triticum wheats under cultivation: the role of domestication, natural hybridization and allopolyploid speciation in their diversification. Plant Cell Physiol 52:750–764. 10.1093/pcp/pcr018 Morgulis A, Coulouris G, Raytselis Y, Madden TL, Agarwala R, Schaffer AA (2008) Database indexing for production MegaBLAST searches. Bioinformatics 24:1757–1764. 10.1093/bioinformatics/btn322 Niu J, Ma S, Zheng S, Zhang C, Lu Y, Si Y et al (2023) Whole-genome sequencing of diverse wheat accessions uncovers genetic changes during modern breeding in China and the United States. Plant Cell 35:4199–4216. 10.1093/plcell/koad229 Peng Y, Yu Z, Islam S, Zhang Y, Wang X, Lei Z et al (2016) Allelic variation of LMW-GS composition in Chinese wheat landraces of the Yangtze-River region detected by MALDI-TOF-MS. Breed Sci 66:646–652 Pfeifer M, Kugler KG, Sandve SR, Zhan B, Rudi H, Hvidsten TR et al (2014) Genome interplay in the grain transcriptome of hexaploid bread wheat. Science 345:1250091. 10.1126/science.1250091 Qin L, Liang Y, Yang D, Sun L, Xia G, Liu S (2015) Novel LMW glutenin subunit genes from wild emmer wheat (Triticum turgidum ssp. dicoccoides) in relation to Glu-3 evolution. Dev Genes Evol 225:31–37. 10.1007/s00427-014-0484-x Qu G, Wang K, Mu J, Zhuo J, Wang X, Li S et al (2023) Identifying cis-Acting Elements Associated with the High Activity and Endosperm Specificity of the Promoters of Genes Encoding Low-Molecular-Weight Glutenin Subunits in Common Wheat (Triticum aestivum). J Agric Food Chem. 10.1021/acs.jafc.3c04209 Ren J, Jiang Z, Li W, Kang X, Bai S, Yang L et al (2022) Characterization of Glutenin Genes in Bread Wheat by Third-Generation RNA Sequencing and the Development of a Glu-1Dx5 Marker Specific for the Extra Cysteine Residue. J Agric Food Chem 70:7211–7219. 10.1021/acs.jafc.2c02050 Renato DO, Stefania M (2004) The low-molecular-weight glutenin subunits of wheat gluten. J Cereal Sci 39:321–339 Sanchez-Munoz R (2024) From the archives: Tales from evolution-inflorescence diversity, gene duplication, and chromatin-mediated gene regulation. Plant Cell 36:2048–2050. 10.1093/plcell/koae092 Sato K, Abe F, Mascher M, Haberer G, Gundlach H, Spannagl M et al (2021) Chromosome-scale genome assembly of the transformation-amenable common wheat cultivar 'Fielder'. DNA Res 28. 10.1093/dnares/dsab008 Schulthess AW, Kale SM, Liu F, Zhao Y, Philipp N, Rembe M et al (2022) Genomics-informed prebreeding unlocks the diversity in genebanks for wheat improvement. Nat Genet 54:1544–1552. 10.1038/s41588-022-01189-7 Shi X, Cui F, Han X, He Y, Zhao L, Zhang N et al (2022) Comparative genomic and transcriptomic analyses uncover the molecular basis of high nitrogen-use efficiency in the wheat cultivar Kenong 9204. Mol Plant 15:1440–1456. 10.1016/j.molp.2022.07.008 Singh N, Shepherd K, Cornish G (1991) A simplified SDS-PAGE procedure for separating LMW subunits of glutenin. J Cereal Sci, 203–208. doi Singh S, Sharma H, Ramankutty R, Ramaswamy S (2024) Review on Nutritional Potential of Underutilized Millets as a Miracle Grain. Curr Pharm Biotechnol 25:1082–1098. 10.2174/0113892010248721230921093208 Tamura K, Stecher G, Kumar S (2021) MEGA11: Molecular Evolutionary Genetics Analysis Version 11. Mol Biol Evol 38:3022–3027. 10.1093/molbev/msab120 Thomas SK, Hoek KV, Ogoti T, Duong H, Angelovici R, Pires JC et al (2024) Halophytes and heavy metals: A multi-omics approach to understand the role of gene and genome duplication in the abiotic stress tolerance of Cakile maritima. Am J Bot , e16310. 10.1002/ajb2.16310 Thompson JD, Gibson TJ, Higgins DG (2002) Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics Chap. 2 , Unit 2 3. 10.1002/0471250953.bi0203s00 Tyler AM, Bhandari DG, Poole M, Napier JA, Jones HD, Lu C et al (2015) Gluten quality of bread wheat is associated with activity of RabD GTPases. Plant Biotechnol J 13:163–176. 10.1111/pbi.12231 Walkowiak S, Gao L, Monat C, Haberer G, Kassa MT, Brinton J et al (2020) Multiple wheat genomes reveal global variation in modern breeding. Nature 588:277–283. 10.1038/s41586-020-2961-x Wang K, Gao L, Wang S, Zhang Y, Li X, Zhang M et al (2011) Phylogenetic relationship of a new class of LMW-GS genes in the M genome of Aegilops comosa. Theor Appl Genet 122:1411–1425. 10.1007/s00122-011-1541-8 Wang S, Wang K, Chen G, Lv D, Han X, Yu Z et al (2012) Molecular characterization of LMW-GS genes in Brachypodium distachyon L. reveals highly conserved Glu-3 loci in Triticum and related species. BMC Plant Biol 12:221. 10.1186/1471-2229-12-221 Wang W, Wang Z, Li X, Ni Z, Hu Z, Xin M et al (2020) SnpHub: an easy-to-set-up web server framework for exploring large-scale genomic variation data in the post-genomic era with applications in wheat. Gigascience 9. 10.1093/gigascience/giaa060 Wang Y, Wang Z, Chen Y, Lan T, Wang X, Liu G et al (2024) Genomic insights into the origin and evolution of spelt (Triticum spelta L.) as a valuable gene pool for modern wheat breeding. Plant Commun 5:100883. 10.1016/j.xplc.2024.100883 Xiang L, Huang L, Gong F, Liu J, Wang Y, Jin Y et al (2019) Enriching LMW-GS alleles and strengthening gluten properties of common wheat through wide hybridization with wild emmer. 3 Biotech 9, 355. 10.1007/s13205-019-1887-1 Xiang Y, Song M, Wei Z, Tong J, Zhang L, Xiao L et al (2011) A jacalin-related lectin-like gene in wheat is a component of the plant defence system. J Exp Bot 62:5471–5483. 10.1093/jxb/err226 Yue YW, Long H, Liu Q, Wei YM, Yan ZH, Zheng YL (2005) Isolation of low-molecular-weight glutenin subunit genes from wild emmer wheat (Triticum dicoccoides). J Appl Genet 46:349–355 Zhang W, Ciclitira P, Messing J (2014) PacBio sequencing of gene families - a case study with wheat gluten genes. Gene 533:541–546. 10.1016/j.gene.2013.10.009 Zhang X, Liu D, Jiang W, Guo X, Yang W, Sun J et al (2011) PCR-based isolation and identification of full-length low-molecular-weight glutenin subunit genes in bread wheat (Triticum aestivum L). Theor Appl Genet 123:1293–1305. 10.1007/s00122-011-1667-8 Zhang X, Liu D, Zhang J, Jiang W, Luo G, Yang W et al (2013) Novel insights into the composition, variation, organization, and expression of the low-molecular-weight glutenin subunit gene family in common wheat. J Exp Bot 64:2027–2040. 10.1093/jxb/ert070 Zhang X, Wang H, Sun H, Li Y, Feng Y, Jiao C et al (2023) A chromosome-scale genome assembly of Dasypyrum villosum provides insights into its application as a broad-spectrum disease resistance resource for wheat improvement. Mol Plant 16:432–451. 10.1016/j.molp.2022.12.021 Zhang Z, Liu D, Li B, Wang W, Zhang J, Xin M et al (2024) A k-mer-based pangenome approach for cataloging seed-storage-protein genes in wheat to facilitate genotype-to-phenotype prediction and improvement of end-use quality. Mol Plant 17:1038–1053. 10.1016/j.molp.2024.05.006 Zhang Z, Schwartz S, Wagner L, Miller W (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7:203–214. 10.1089/10665270050081478 Zhao H, Wang R, Guo A, Hu S, Sun G (2004) Development of primers specific for LMW-GS genes located on chromosome 1D and molecular characterization of a gene from Glu-D3 complex locus in bread wheat. Hereditas 141:193–198. 10.1111/j.1601-5223.2004.01852.x Zhou Y, Zhao X, Li Y, Xu J, Bi A, Kang L et al (2020) Triticum population sequencing provides insights into wheat adaptation. Nat Genet 52:1412–1422. 10.1038/s41588-020-00722-w Zhu T, Wang L, Rimbert H, Rodriguez JC, Deal KR, De Oliveira R et al (2021) Optical maps refine the bread wheat Triticum aestivum cv. Chinese Spring genome assembly. Plant J 107:303–314. 10.1111/tpj.15289 Supplementary Files AppendixS1.docx Fig.S1.pptx Fig.S10.pptx Fig.S11.pptx Fig.S12.pptx Fig.S13A.pptx Fig.S13B.pptx Fig.S13C.pptx Fig.S14.pptx Fig.S15.pdf Fig.S16.pdf Fig.S17.pdf Fig.S18.pptx Fig.S19.pptx Fig.S2.pptx Fig.S20new.pptx Fig.S3.pptx Fig.S4.pptx Fig.S5.pptx Fig.S6.pptx Fig.S7.pptx Fig.S8.pptx Fig.S9.pptx SupportingInformationwithFulllegends20250414.docx TableS1.xlsx TableS10.xlsx TableS11.xlsx TableS12.xlsx TableS13.xlsx TableS14.xlsx TableS2.xlsx TableS3.xlsx TableS4.xlsx TableS5.xlsx TableS6.xlsx TableS7.xlsx TableS8.xlsx TableS9.xlsx Cite Share Download PDF Status: Published Journal Publication published 27 May, 2025 Read the published version in Theoretical and Applied Genetics → Version 1 posted Editorial decision: Accept 01 May, 2025 Reviewers agreed at journal 15 Apr, 2025 Reviewers invited by journal 15 Apr, 2025 Editor assigned by journal 15 Apr, 2025 First submitted to journal 14 Apr, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5789598","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":443075227,"identity":"2e43d54d-8f09-4221-ba22-577f52cb0326","order_by":0,"name":"Shengli Zhang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABCElEQVRIiWNgGAWjYDACdgY2ECXHxsx88DGDAVjMAL8WZogWY372tmRjoGIJorUkzuw5YyYNZBDWIt/M/uzBzx21jBtupKVVFxTU1TGwN2+TYKi5g1OLwWGGdMPeM8eZDW4kH7s9w+CwBAPPsTIJhmPPcGthZjgmwdt2jM0AaMttHoMDEgwSOWYSjA2H8TiMsU3yb9sxHoMbOWbFPAZ1Egzyb/BrYTjMzCbN21YjIQn0PjOPATPQFh78WgwOs7FJy7YdMAAFsjSPwWHJNp60YouEY3gc1t7+TPJtW119GzAqP/P8qePnZz+88caHGjwOgzoPwQRHUwIhDQwMdYSVjIJRMApGwcgFADn6TL2nWMXIAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0009-0003-1680-0045","institution":"Henan Institute of Science and Technology","correspondingAuthor":true,"prefix":"","firstName":"Shengli","middleName":"","lastName":"Zhang","suffix":""},{"id":443075228,"identity":"7ba102a6-7e38-45fc-8806-82c244564586","order_by":1,"name":"Xiaojing Shan","email":"","orcid":"","institution":"Henan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Xiaojing","middleName":"","lastName":"Shan","suffix":""},{"id":443075229,"identity":"987dd931-e074-4693-833b-a68ec12aec1e","order_by":2,"name":"Yun Wang","email":"","orcid":"","institution":"Henan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Yun","middleName":"","lastName":"Wang","suffix":""},{"id":443075230,"identity":"2d7864c1-d54f-4d76-afb7-fc109939c147","order_by":3,"name":"Tairui Lu","email":"","orcid":"","institution":"Henan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Tairui","middleName":"","lastName":"Lu","suffix":""},{"id":443075231,"identity":"cf2087fc-b8d1-43c2-97e2-741cb73ecf88","order_by":4,"name":"Daxing Xu","email":"","orcid":"","institution":"Henan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Daxing","middleName":"","lastName":"Xu","suffix":""},{"id":443075232,"identity":"638dbf24-072a-4c38-8391-cb337d2272e7","order_by":5,"name":"Han Gong","email":"","orcid":"","institution":"Henan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Han","middleName":"","lastName":"Gong","suffix":""},{"id":443075233,"identity":"2b43ea6c-53e8-4272-9fba-ff657055aeea","order_by":6,"name":"Yuchao Fan","email":"","orcid":"","institution":"Henan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Yuchao","middleName":"","lastName":"Fan","suffix":""},{"id":443075234,"identity":"f5fefaed-4950-4cb6-9284-1b6f065e8a23","order_by":7,"name":"Yuanyuan Guan","email":"","orcid":"","institution":"Henan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Yuanyuan","middleName":"","lastName":"Guan","suffix":""},{"id":443075235,"identity":"4028c6ca-ec19-40e5-bd4b-fe8452aa2645","order_by":8,"name":"Junjie Zhao","email":"","orcid":"","institution":"Henan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Junjie","middleName":"","lastName":"Zhao","suffix":""},{"id":443075236,"identity":"f96eacfa-21d4-4752-bb19-fc7b87f21b39","order_by":9,"name":"Haili Sun","email":"","orcid":"","institution":"Henan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Haili","middleName":"","lastName":"Sun","suffix":""},{"id":443075237,"identity":"423e3092-85be-42b9-91a2-f24fa71067e5","order_by":10,"name":"Dongfang Li","email":"","orcid":"","institution":"Henan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Dongfang","middleName":"","lastName":"Li","suffix":""},{"id":443075238,"identity":"cb53692e-a478-4579-80ab-fa00d61fa4e6","order_by":11,"name":"Haiyan Hu","email":"","orcid":"","institution":"Henan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Haiyan","middleName":"","lastName":"Hu","suffix":""},{"id":443075239,"identity":"d867b0ca-c51f-4b06-9cce-498ccbe2ecbe","order_by":12,"name":"Zhengang Ru","email":"","orcid":"","institution":"Henan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Zhengang","middleName":"","lastName":"Ru","suffix":""},{"id":443075240,"identity":"86604808-9762-4d38-bf2d-7a2e9f4d6ae4","order_by":13,"name":"Yong Q. Gu","email":"","orcid":"","institution":"USDA-ARS Western Regional Research Center","correspondingAuthor":false,"prefix":"","firstName":"Yong","middleName":"Q.","lastName":"Gu","suffix":""}],"badges":[],"createdAt":"2025-01-08 13:50:28","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5789598/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5789598/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s00122-025-04919-7","type":"published","date":"2025-05-27T15:57:59+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":80789281,"identity":"96e5f962-5dd2-479a-916f-aedfe27908d4","added_by":"auto","created_at":"2025-04-17 06:34:33","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":216787,"visible":true,"origin":"","legend":"\u003cp\u003eSee image above for figure legend\u003c/p\u003e","description":"","filename":"Fig.1ok202504071.png","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/78711761d72da6f2a388e031.png"},{"id":80789277,"identity":"1d84bebf-0e94-4bb3-bc44-b0d543f188e7","added_by":"auto","created_at":"2025-04-17 06:34:33","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":226746,"visible":true,"origin":"","legend":"\u003cp\u003eSee image above for figure legend\u003c/p\u003e","description":"","filename":"Fig.2OK1.png","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/c85aa728e58263f01665bdf8.png"},{"id":80789274,"identity":"5a93c895-171f-4a53-bd8b-33c450b82a20","added_by":"auto","created_at":"2025-04-17 06:34:33","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":412617,"visible":true,"origin":"","legend":"\u003cp\u003eSee image above for figure legend\u003c/p\u003e","description":"","filename":"Fig.31.png","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/16a401ff5dbe2099d131b8f7.png"},{"id":80791449,"identity":"e7945522-5210-44cb-a7a8-9162ce03b7af","added_by":"auto","created_at":"2025-04-17 06:50:33","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":457116,"visible":true,"origin":"","legend":"\u003cp\u003eSee image above for figure legend\u003c/p\u003e","description":"","filename":"Fig.4OK1.png","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/da4334965ae7086a46df2d76.png"},{"id":80789276,"identity":"d5203b60-2e2d-4491-a487-5c482c565c1e","added_by":"auto","created_at":"2025-04-17 06:34:33","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":431943,"visible":true,"origin":"","legend":"\u003cp\u003eSee image above for figure legend\u003c/p\u003e","description":"","filename":"Fig.5OK1.png","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/32056adebaf29ef4239776a1.png"},{"id":80790877,"identity":"16b6b16b-906b-4680-916d-4d959f4be4b6","added_by":"auto","created_at":"2025-04-17 06:42:33","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":781608,"visible":true,"origin":"","legend":"\u003cp\u003eSee image above for figure legend\u003c/p\u003e","description":"","filename":"Fig.6OK1.png","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/6d9d1c8c1b6ee1d4086401a0.png"},{"id":83782980,"identity":"59144ed4-7776-46cb-ad98-13855091ff12","added_by":"auto","created_at":"2025-06-02 16:09:36","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3803569,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/aa6d1990-196a-4d77-a6b3-d7e2185dd6a7.pdf"},{"id":80789290,"identity":"351986e6-c173-476a-b78e-03be48dcdd68","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"docx","order_by":13,"title":"","display":"","copyAsset":false,"role":"supplement","size":40088,"visible":true,"origin":"","legend":"","description":"","filename":"AppendixS1.docx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/eba82f42dd81d61261d6fedc.docx"},{"id":80789288,"identity":"6eaafb30-43a7-439d-86ef-7842cd50b139","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"pptx","order_by":14,"title":"","display":"","copyAsset":false,"role":"supplement","size":276859,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S1.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/284253caa14049a84ac87441.pptx"},{"id":80789311,"identity":"a5fc40c4-08d9-4524-9227-c97eb5b67b9f","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"pptx","order_by":15,"title":"","display":"","copyAsset":false,"role":"supplement","size":205537,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S10.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/4935f63545627f0b54466ded.pptx"},{"id":80790885,"identity":"f0d29282-0b26-4925-83e6-814314d2d5c1","added_by":"auto","created_at":"2025-04-17 06:42:34","extension":"pptx","order_by":16,"title":"","display":"","copyAsset":false,"role":"supplement","size":10515605,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S11.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/e89f54221b4e97ec8852046b.pptx"},{"id":80789293,"identity":"07e01c63-3746-4b3c-ad7f-df745bfaff1f","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"pptx","order_by":17,"title":"","display":"","copyAsset":false,"role":"supplement","size":4694405,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S12.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/f55854d60f3e9d191fa55c87.pptx"},{"id":80790886,"identity":"5a7b9924-8918-46fd-bb75-5bbe6fdd350b","added_by":"auto","created_at":"2025-04-17 06:42:34","extension":"pptx","order_by":18,"title":"","display":"","copyAsset":false,"role":"supplement","size":576492,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S13A.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/3b6bf92f76c314024e0d3f37.pptx"},{"id":80789300,"identity":"9ac1b71e-e1eb-4cc4-9c65-7d289cdf4a44","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"pptx","order_by":19,"title":"","display":"","copyAsset":false,"role":"supplement","size":415378,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S13B.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/1984dad442154f602a661afd.pptx"},{"id":80790884,"identity":"6a43e5e4-573c-48d7-aabf-e1a77ac909b2","added_by":"auto","created_at":"2025-04-17 06:42:34","extension":"pptx","order_by":20,"title":"","display":"","copyAsset":false,"role":"supplement","size":433432,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S13C.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/09ab59cd6db86967c1e6e0d9.pptx"},{"id":80789298,"identity":"751b6a8b-3fe9-4ef5-9de0-deca732d8696","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"pptx","order_by":21,"title":"","display":"","copyAsset":false,"role":"supplement","size":678960,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S14.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/84ba3ad8df8a80ba0dd9650e.pptx"},{"id":80789304,"identity":"cb64e893-d6f1-4d44-a118-d3705c1fdfd5","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"pdf","order_by":22,"title":"","display":"","copyAsset":false,"role":"supplement","size":3695045,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S15.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/770b6a35cff9d67d8c22a374.pdf"},{"id":80789303,"identity":"f3471cb5-e609-48c8-bedd-a60f1ba181d4","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"pdf","order_by":23,"title":"","display":"","copyAsset":false,"role":"supplement","size":4283075,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S16.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/ad153a48dddbace238545855.pdf"},{"id":80791452,"identity":"087ca945-38eb-4b9c-99b3-7edbb55cd57f","added_by":"auto","created_at":"2025-04-17 06:50:34","extension":"pdf","order_by":24,"title":"","display":"","copyAsset":false,"role":"supplement","size":2346418,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S17.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/20114bc0933873b8010d6c8e.pdf"},{"id":80790880,"identity":"e8b14c1e-4fd4-48fc-8670-4b0607cee092","added_by":"auto","created_at":"2025-04-17 06:42:33","extension":"pptx","order_by":25,"title":"","display":"","copyAsset":false,"role":"supplement","size":256229,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S18.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/38056712fc95209286c52c63.pptx"},{"id":80789287,"identity":"b9452946-4fc4-48a6-92bf-661d6b80ad1a","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"pptx","order_by":26,"title":"","display":"","copyAsset":false,"role":"supplement","size":227248,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S19.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/8a22e721647150b22d84dc14.pptx"},{"id":80789308,"identity":"61b573d9-3d22-47f7-a3f5-74bea25c2bf2","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"pptx","order_by":27,"title":"","display":"","copyAsset":false,"role":"supplement","size":248430,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S2.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/643d70bf028cfad3a95f0d89.pptx"},{"id":80789297,"identity":"da4f3e23-e621-4373-b3c5-cbfc5db22508","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"pptx","order_by":28,"title":"","display":"","copyAsset":false,"role":"supplement","size":269208,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S20new.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/512e90ca050feba22faf8a6e.pptx"},{"id":80790902,"identity":"fece2e80-81c8-4408-b00d-5daafd66563d","added_by":"auto","created_at":"2025-04-17 06:42:35","extension":"pptx","order_by":29,"title":"","display":"","copyAsset":false,"role":"supplement","size":266752,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S3.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/882c38d1990ed4041b1f6389.pptx"},{"id":80791454,"identity":"60ad2402-1767-4d78-a495-fde0e0e12eeb","added_by":"auto","created_at":"2025-04-17 06:50:34","extension":"pptx","order_by":30,"title":"","display":"","copyAsset":false,"role":"supplement","size":276785,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S4.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/1e66860b9b51572f43b3ae2b.pptx"},{"id":80791451,"identity":"fa2cc669-3eed-48a5-9e61-fd2047205a7d","added_by":"auto","created_at":"2025-04-17 06:50:34","extension":"pptx","order_by":31,"title":"","display":"","copyAsset":false,"role":"supplement","size":269952,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S5.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/21d8931ba85cb8705c186992.pptx"},{"id":80790897,"identity":"6a96dedc-9bdd-4017-b0a8-e7dd37e39a18","added_by":"auto","created_at":"2025-04-17 06:42:35","extension":"pptx","order_by":32,"title":"","display":"","copyAsset":false,"role":"supplement","size":300906,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S6.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/d3b740d4738e079dc6d5599e.pptx"},{"id":80790900,"identity":"4cb2fd4c-bc4d-46fe-b96f-fb8ccc056f6f","added_by":"auto","created_at":"2025-04-17 06:42:35","extension":"pptx","order_by":33,"title":"","display":"","copyAsset":false,"role":"supplement","size":283411,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S7.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/36ab9ec18d526ff3724c3914.pptx"},{"id":80790892,"identity":"9e8b0fda-1e35-418b-beb0-a64861f4ab90","added_by":"auto","created_at":"2025-04-17 06:42:34","extension":"pptx","order_by":34,"title":"","display":"","copyAsset":false,"role":"supplement","size":280184,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S8.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/d7e6698066ab852dfa985af1.pptx"},{"id":80790887,"identity":"ce2e4da9-1f2a-4deb-8549-3c629eacf056","added_by":"auto","created_at":"2025-04-17 06:42:34","extension":"pptx","order_by":35,"title":"","display":"","copyAsset":false,"role":"supplement","size":292255,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S9.pptx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/ce10aaa596028fdc6d8d736d.pptx"},{"id":80789302,"identity":"0bce3079-b510-43c4-8779-52e80dcd934b","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"docx","order_by":36,"title":"","display":"","copyAsset":false,"role":"supplement","size":21751,"visible":true,"origin":"","legend":"","description":"","filename":"SupportingInformationwithFulllegends20250414.docx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/43f036d4d0a8b9aa7a30b190.docx"},{"id":80789320,"identity":"c688bf21-7d5b-42d9-816b-b354141c3497","added_by":"auto","created_at":"2025-04-17 06:34:35","extension":"xlsx","order_by":37,"title":"","display":"","copyAsset":false,"role":"supplement","size":14830,"visible":true,"origin":"","legend":"","description":"","filename":"TableS1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/b7d47afdaababbf37b0af11f.xlsx"},{"id":80789295,"identity":"25150d26-593e-4877-a049-77f5f651d12c","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"xlsx","order_by":38,"title":"","display":"","copyAsset":false,"role":"supplement","size":39873,"visible":true,"origin":"","legend":"","description":"","filename":"TableS10.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/49d720bbf78b61827a929d15.xlsx"},{"id":80790890,"identity":"d2adf7c1-7b40-4552-a923-e2c33003e37f","added_by":"auto","created_at":"2025-04-17 06:42:34","extension":"xlsx","order_by":39,"title":"","display":"","copyAsset":false,"role":"supplement","size":13217,"visible":true,"origin":"","legend":"","description":"","filename":"TableS11.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/77b4950c8682289a411bfc91.xlsx"},{"id":80789305,"identity":"ebaca978-1882-4bdb-9696-4135949a288c","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"xlsx","order_by":40,"title":"","display":"","copyAsset":false,"role":"supplement","size":13094,"visible":true,"origin":"","legend":"","description":"","filename":"TableS12.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/ca6913cc29b23ca1e6f7961a.xlsx"},{"id":80789319,"identity":"e88a0325-9e62-4720-8722-e12bcfe0c9b8","added_by":"auto","created_at":"2025-04-17 06:34:35","extension":"xlsx","order_by":41,"title":"","display":"","copyAsset":false,"role":"supplement","size":13314,"visible":true,"origin":"","legend":"","description":"","filename":"TableS13.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/4198c9077bbea3c9807590dd.xlsx"},{"id":80789328,"identity":"164b0012-ad6b-4698-af1b-ffe934675e93","added_by":"auto","created_at":"2025-04-17 06:34:35","extension":"xlsx","order_by":42,"title":"","display":"","copyAsset":false,"role":"supplement","size":60324,"visible":true,"origin":"","legend":"","description":"","filename":"TableS14.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/54469ec7814ee468c1de426d.xlsx"},{"id":80792581,"identity":"865269f7-9100-420c-a413-4e8072fa254d","added_by":"auto","created_at":"2025-04-17 06:58:34","extension":"xlsx","order_by":43,"title":"","display":"","copyAsset":false,"role":"supplement","size":17282,"visible":true,"origin":"","legend":"","description":"","filename":"TableS2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/0c2ca32b1de1851d2b105882.xlsx"},{"id":80790898,"identity":"ce6132b4-ec14-4334-913a-69cd87322d4e","added_by":"auto","created_at":"2025-04-17 06:42:35","extension":"xlsx","order_by":44,"title":"","display":"","copyAsset":false,"role":"supplement","size":16337,"visible":true,"origin":"","legend":"","description":"","filename":"TableS3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/336b7cec2f61c32a49b08d8d.xlsx"},{"id":80789327,"identity":"ea012904-4488-4209-b420-86d840a3af57","added_by":"auto","created_at":"2025-04-17 06:34:35","extension":"xlsx","order_by":45,"title":"","display":"","copyAsset":false,"role":"supplement","size":11854,"visible":true,"origin":"","legend":"","description":"","filename":"TableS4.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/6f78fe97abf2fce572023fcb.xlsx"},{"id":80789301,"identity":"37e26790-5d3c-4ff6-910b-17db9ff6ab76","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"xlsx","order_by":46,"title":"","display":"","copyAsset":false,"role":"supplement","size":10259,"visible":true,"origin":"","legend":"","description":"","filename":"TableS5.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/5759b9ddfb0d9ed606d20e81.xlsx"},{"id":80789299,"identity":"9d81db50-e776-4a84-a348-1cd72a86f4bf","added_by":"auto","created_at":"2025-04-17 06:34:34","extension":"xlsx","order_by":47,"title":"","display":"","copyAsset":false,"role":"supplement","size":46131,"visible":true,"origin":"","legend":"","description":"","filename":"TableS6.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/9482845d7f3205585a8f9f45.xlsx"},{"id":80790906,"identity":"636c4b9d-fb93-45d2-8592-a6e5b4314808","added_by":"auto","created_at":"2025-04-17 06:42:35","extension":"xlsx","order_by":48,"title":"","display":"","copyAsset":false,"role":"supplement","size":37623,"visible":true,"origin":"","legend":"","description":"","filename":"TableS7.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/93cce765820871d365afe684.xlsx"},{"id":80789324,"identity":"2be4edf1-2f83-4d78-a907-b21ed1183e26","added_by":"auto","created_at":"2025-04-17 06:34:35","extension":"xlsx","order_by":49,"title":"","display":"","copyAsset":false,"role":"supplement","size":11767,"visible":true,"origin":"","legend":"","description":"","filename":"TableS8.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/7b2d7643751882b0ba3cdb24.xlsx"},{"id":80789312,"identity":"04f1b4e8-8f2a-4da3-9931-b2409460ab53","added_by":"auto","created_at":"2025-04-17 06:34:35","extension":"xlsx","order_by":50,"title":"","display":"","copyAsset":false,"role":"supplement","size":16900,"visible":true,"origin":"","legend":"","description":"","filename":"TableS9.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5789598/v1/a0bbfb9f00d4ee015999a2a6.xlsx"}],"financialInterests":"","formattedTitle":"​Recent duplications and rare structural variations revealed by comparative sequence analysis of low molecular weight glutenin subunits (LMW-GS) genes re-identified using LMWgsFinder in 26 genomes of the grass family","fulltext":[{"header":"Key Message","content":"\u003cp\u003eLMWgsFinder developed by this study was used to re-identify the LMW-GS genes in a total of 26 genomes across the grass family and several important and novel findings were obtained.\u0026nbsp;\u003c/p\u003e"},{"header":"Introduction","content":"\u003cp\u003eThe evolution and speciation of common wheat (\u003cem\u003eTriticum aestivum\u003c/em\u003e L., hereinafter referred to as wheat) is a complex process involving two rounds of allopolyploidization events, followed by natural selection and domestication, ultimately resulting in the high-yield and high-quality wheat varieties we know today(Matsuoka, \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2011\u003c/span\u003e;Feldman and Levy, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). Wheat is an allohexaploid species (2n\u0026thinsp;=\u0026thinsp;6X\u0026thinsp;=\u0026thinsp;42; AABBDD), whose A, B, and D genome donors are the wild diploid ancestors of three species: \u003cem\u003eTriticum urartu\u003c/em\u003e (2n\u0026thinsp;=\u0026thinsp;2X\u0026thinsp;=\u0026thinsp;14; AA), an unconfirmed species related to \u003cem\u003eAegilops speltoides\u003c/em\u003e (2n\u0026thinsp;=\u0026thinsp;2X\u0026thinsp;=\u0026thinsp;14; SS), and \u003cem\u003eAegilops tauschii\u003c/em\u003e (2n\u0026thinsp;=\u0026thinsp;2X\u0026thinsp;=\u0026thinsp;14; DD) (Huo et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e;Jia et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). The second spontaneous hybridization occurred approximately 8000 to 10000 years ago, which was the hybridization of \u003cem\u003eTriticum turgidum\u003c/em\u003e (2n\u0026thinsp;=\u0026thinsp;4X\u0026thinsp;=\u0026thinsp;28; AABB) with an ancestral diploid \u003cem\u003eAegilops tauschii\u003c/em\u003e (DD), resulted in the speciation of wheat (Jia et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2023\u003c/span\u003e;Cavalet-Giorsa et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). With the addition of the \u003cem\u003eAegilops tauschii\u003c/em\u003e D genome, the gluten quality of wheat has been significantly improved, making it the only one among the main food crops for humans that can be used to make various foods such as noodles, bread, dumplings, cookies. (Jia et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2013\u003c/span\u003e;Ren et al., \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eWheat can be made into various foods mainly depending on the unique storage proteins contained in its seeds, including glutenin and gliadin (Dong et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2010\u003c/span\u003e;Huo et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Based on molecular weight differences, glutenin can be further divided into high molecular weight glutenin subunits (HMW-GS) and LMW-GS (Dong et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2010\u003c/span\u003e;Ren et al., \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). On average, 60% of the variation in gluten quality is caused by genotype differences. Specifically, changes in HMW-GS have been shown to explain 20\u0026ndash;30% of the variation in gluten strength in common wheat(Carlos Guzm\u0026aacute;n et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). The effects of LMW-GS on gluten quality differ between common wheat and durum wheat. In common wheat, the impact of LMW-GS on gluten characteristics is usually smaller compared to HMW-GS, accounting for 10\u0026ndash;20% of the observed variation(Carlos Guzm\u0026aacute;n et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). In contrast, LMW-GS have a greater impact on gluten quality than HMW-GS in durum wheat.\u003c/p\u003e \u003cp\u003eHMW-GS and LMW-GS are linked together through intermolecular disulfide bonds to form large polymers, providing elasticity to wheat dough, while gliadins mainly exist in monomeric form, endowing wheat dough with extensibility (Huo et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). In the wheat gluten network, HMW-GS plays a skeletal role, while an increase in LMW-GS content can significantly improve the gluten strength of the dough (Li et al., \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The composition and proportion of high and low molecular weight glutenin directly affect the final functional properties of wheat flour (Dizlek and Awika, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe \u003cem\u003eGlu-3\u003c/em\u003e loci encoding LMW-GSs are located on the short arms of group 1 chromosomes, with approximately 10\u0026ndash;20 to 30\u0026ndash;40 copies ranging in the genomes of different wheat varieties (Huang and Cloutier, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2008\u003c/span\u003e;Zhang et al., \u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e2011\u003c/span\u003e;Zhang et al., \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2013\u003c/span\u003e;Al-Khayri et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), while the \u003cem\u003eGlu-1\u003c/em\u003e loci encoding HMW-GS are located on the long arm of group 1 chromosomes, with a total of 6 copies, 2 on each of the A, B, and D subgenomes(Al-Khayri et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Gliadins can be subdivided into α, δ, γ, and ω-gliadins based on their molecular characteristics (Ren et al., \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Among them, α-gliadin genes encoded by the \u003cem\u003eGli-2\u003c/em\u003e loci are mapped to the short arm of group 6 chromosomes, while the other three types of gliadins are encoded by \u003cem\u003eGli-1\u003c/em\u003e loci that are located on the short arm of group 1 chromosomes (Huo et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2017\u003c/span\u003e;Huo et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). The total copy number of these three types of gliadin genes in different wheat genomes is around 40 (Dong et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). It is worth noting that \u003cem\u003eGlu-3\u003c/em\u003e and \u003cem\u003eGli-1\u003c/em\u003e are two tightly linked loci located on the short arm of group 1 chromosomes in wheat, and are closely cross-linked with multiple disease-resistant genes such as \u003cem\u003eLr21\u003c/em\u003e, \u003cem\u003ePm3\u003c/em\u003e. that also exhibit multiple copies in this region (Dong et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2010\u003c/span\u003e;Dong et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2016\u003c/span\u003e;Huo et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Therefore, wheat end-use quality is a highly complex trait involving many genes.\u003c/p\u003e \u003cp\u003eIn terms of protein content, LMW-GS encoded by \u003cem\u003eGlu-3\u003c/em\u003e accounts for about 60\u0026ndash;80% of total glutenins, while δ, γ, and ω gliadins encoded by \u003cem\u003eGli-1\u003c/em\u003e account for about 75% of total gliadins (Zhang et al., \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). Therefore, LMW-GS and these three gliadins are the main components of storage proteins in wheat grains, which have an important impact on the processing quality and nutritional quality of wheat flour (Huo et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). In the case of the same HMW-GS genes possessed, the types, quantities, and expression levels of LMW-GS genes and gliadin genes may have a significant impact on the quality stability of high-strong gluten wheat. The LMW-GS genes and gliadin genes are both single exon genes, with coding regions generally ranging from 800 to 1200 bp in length, and the molecular weight of the mature protein they encode varies between 30 and 45 kDa(Zhang et al., \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). Due to the high copy number of LMW-GS genes and gliadin genes in the wheat genome, the high sequence similarity between different copies, the smaller molecular weight of encoded proteins compared to HMW-GS, and the overlap electrophoresis patterns between LMW-GS and gliadins in SDS-PAGE (sodium dodecyl sulfate-polyacrylamide gel electrophoresis) (Singh et al., \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e1991\u003c/span\u003e;Al-Khayri et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), it is difficult to distinguish them, even using MALDI-TOF-MS (Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry) technology(Liu et al., \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2010\u003c/span\u003e;Zhang et al., \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2013\u003c/span\u003e;Peng et al., \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Thus, significant difficulties appeared in the research on the composition of LMW-GS genes and gliadin genes in different wheat varieties and their relationship with end-use quality parameters (Zhang et al., \u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e2011\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eUp to now, research on the LMW-GS genes has rarely delved into molecular levels such as gene editing and RNAi (Tyler et al., \u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e2015\u003c/span\u003e;Qu et al., \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Instead, employing conserved or specific primer amplification, it mainly focused on LMW-GS gene copy number identification, chromosome localization, homologous cloning, real-time quantitative expression analysis, and their relationship with end-use quality traits of wheat (Ikeda et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2002\u003c/span\u003e;Yue et al., \u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e2005\u003c/span\u003e;Li et al., \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2010\u003c/span\u003e;Zhang et al., \u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e2011\u003c/span\u003e;Zhang et al., \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2013\u003c/span\u003e;Lee et al., \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). LMW-GS gene identification studies were also conducted through capillary electrophoresis or Sanger sequencing after polymerase chain reaction (PCR) amplification (Zhao et al., \u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e2004\u003c/span\u003e;Zhang et al., \u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e2011\u003c/span\u003e;Zhang et al., \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). However, due to the scarcity of complete sequence information, the complexity of the large gene families, and the unique repetitive domains that existed in the coding region, the above PCR based research or transcriptome sequencing based \u003cem\u003ede novo\u003c/em\u003e assembly for expression analysis of LMW-GS genes has always been challenging (Huo et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). For example, how can LMW-GS genes with more than 99% similarity in CDS or those that have recently undergone gene duplication or triplication are identified? To date, there have been no reports addressing this issue; all reported results have treated them as the same locus or gene.\u003c/p\u003e \u003cp\u003eAlthough recent advances in DNA sequencing technology and large genome assembly have enabled scientists to complete the sequencing and assembly of over 20 wheat genomes (Akpinar et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2018\u003c/span\u003e;Walkowiak et al., \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2020\u003c/span\u003e;Sato et al., \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2021\u003c/span\u003e;Shi et al., \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e2022\u003c/span\u003e;Jia et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), BLASTN analysis using the Chinese Spring (CS) LMW-GS gene sequences as queries revealed that the identification of LMW-GS genes in most genome annotations remains incomplete. Some genome annotations did not even include any LMW-GS genes. This is related to the different focus of the genome sequencing and annotation research group. Analysis showed that the quality of these genome sequencing and assembly is indeed very high, but due to the complexity of this LMW-GS gene family and the lack of specialized gene annotation tools, the annotation of most genes related to this family in the species that have completed whole genome sequencing is not complete enough.\u003c/p\u003e \u003cp\u003ePrevious studies have shown that different LMW-GS have similar structures in terms of amino acid composition (Dong et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2010\u003c/span\u003e), starting with a signal peptide (Sig, removed upon maturation) consisting of 20 amino acids at the N-terminus, followed by an N-terminal sequence (N-ter) containing 13 amino acids, then a central repetitive domain (Rep) consisting of approximately 70\u0026ndash;186 amino acids, and finally a C-terminal (C-ter) (Renato and Stefania, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2004\u003c/span\u003e;Ren et al., \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). According to the first amino acid residue in the mature protein, LMW-GS can be divided into three types: LMW-s, LMW-m, and LMW-i (Renato and Stefania, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2004\u003c/span\u003e;An et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2006\u003c/span\u003e;Dong et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2010\u003c/span\u003e). The difference in molecular weight of LMW-GS is mainly due to the varying number of repeating units in the Rep. It is generally believed that changes in the number of repeating units are mainly caused by the insertion or deletion of these units. According to the amino acid composition of C-ter, it can be further divided into cysteine-rich region I (C-terI), glutamine-rich region II (C-terII), and C-terminal conserved region (C-terIII). LMW-GS generally contains 8 cysteine residues, of which 7 are located in the C-ter region, and at least one is involved in the formation of the interchain S-S bonds (Lombardi et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2009\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eBased on the amino acid composition characteristics of LMW-GS mentioned above, we have developed a Perl program package named LMWgsFinder. This study re-identified LMW-GS genes in 26 whole genome sequences of wheat and related varieties across the grass family using LMWgsFinder. Additionally, experimental verification of related genes, analysis of transcriptome data supporting the annotated LMW-GS genes, and comparative sequence analysis of these genes were conducted. Several new results that are rarely reported were obtained, offering valuable resources for further in-depth research on LMW-GS gene function at the molecular levels such as gene editing and RNAi, and the improvement of wheat end-use quality traits through molecular-assisted selection and molecular design breeding.\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eGenome sequence resources utilized in this study\u003c/h2\u003e \u003cp\u003eThe genome sequences of different wheat varieties and their closely related species assembled at the chromosome levels were used as the analysis data source (Akpinar et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2018\u003c/span\u003e;Walkowiak et al., \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2020\u003c/span\u003e;Sato et al., \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2021\u003c/span\u003e;Shi et al., \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e2022\u003c/span\u003e;Jia et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) and the LMW-GS genes in the whole genome were re-identified using a perl scripts package (named LMWgsFinder). The genome sequences and their versions used in this study were: \u003cem\u003eTriticum_aestivum\u003c/em\u003e_CSv1.0 (TaCSv1.0), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_CSv2.1 (TaCSv2.1), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_aikang58 (TaAK58), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_kenong9204 (TaKN9204), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_Fielder.V1 (TaFiel), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_arinalrfor.PGSBv2.1 (TaArin), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_jagger.PGSBv2.1 (TaJagg), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_julius.PGSBv2.1 (TaJuli), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_kariega.Tae_Kariega_v1 (TaKari), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_lancer.PGSBv2.1 (TaLanc), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_landmark.PGSBv2.1 (TaLand), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_mace.PGSBv2.1 (TaMace), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_mattis.PGSBv2.1 (TaMatt), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_norin61.PGSBv2.1 (TaNori), \u003cem\u003eTriticum_aestivum\u003c/em\u003e_stanley.PGSBv2.2 (TaStan), \u003cem\u003eTriticum_turgidum\u003c/em\u003e.Svevo.v1 (TrTu), \u003cem\u003eTriticum_dicoccoides\u003c/em\u003e.WEWSeq_v.1.0 (Td), \u003cem\u003eTriticum_spelta\u003c/em\u003e.PGSBv2.0 (Ts), \u003cem\u003eTriticum_urartu\u003c/em\u003e.IGDB (Tu), \u003cem\u003eAegilops_tauschii\u003c/em\u003e.Aet_v4.0 (Aet), \u003cem\u003eSetaria_italica\u003c/em\u003e_v2.0, \u003cem\u003eBrachypodium_distachyon\u003c/em\u003e_v3.0, \u003cem\u003eHordeum_vulgare\u003c/em\u003e.MorexV3 (barley), \u003cem\u003eOryza_sativa\u003c/em\u003e.IRGSP-1.0 (rice), \u003cem\u003eSorghum_bicolor\u003c/em\u003e_NCBIv3 (sorghum), Zea_mays.Zm-B73-REFERENCE-NAM-5.0 (corn). The download addresses for the genome sequence and corresponding annotation files (CDS, Protein sequence, and GFF3) were: TaCSv1.0 and TaCSv2.1, URGI (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://urgi.versailles.inra.fr/\u003c/span\u003e\u003cspan address=\"https://urgi.versailles.inra.fr/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e); TaAK58, Wheatdb (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://triticeae.henau.edu.cn/aikang58/\u003c/span\u003e\u003cspan address=\"https://triticeae.henau.edu.cn/aikang58/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) (Jia et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2023\u003c/span\u003e); TaKN9204, CNCB (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://download.cncb.ac.cn/gwh/Plants/\u003c/span\u003e\u003cspan address=\"https://download.cncb.ac.cn/gwh/Plants/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) (Shi et al., \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e2022\u003c/span\u003e); Fielder, NBRP (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://shigen.nig.ac.jp/wheat/komugi/genome/\u003c/span\u003e\u003cspan address=\"https://shigen.nig.ac.jp/wheat/komugi/genome/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) (Sato et al., \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2021\u003c/span\u003e); Others, EnsemblPlants (release-57) (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://plants.ensembl.org/info/data/ftp/index.html\u003c/span\u003e\u003cspan address=\"http://plants.ensembl.org/info/data/ftp/index.html\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eDevelopment of LMWgsFinder package\u003c/h3\u003e\n\u003cp\u003eA Perl (v5.26.2, built for x86_64-linux-thread multi) scripts package named LMWgsFinder (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/slzhang20088/LMWgsFinder\u003c/span\u003e\u003cspan address=\"https://github.com/slzhang20088/LMWgsFinder\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) was developed in this study, and this package can be used to identify LMW-GS genes in the genomes of different wheat cultivars and their related species and to obtain related sequences and information. For more details, please refer to the supplementary information.\u003c/p\u003e\n\u003ch3\u003eNaming rules for LMW-GS genes\u003c/h3\u003e\n\u003cp\u003eTo facilitate sequence analysis, this study adopted the following rules for naming LMW-GS genes. The prefixes \"\u003cem\u003eGlu-3_\u003c/em\u003e\" or \"LMW-GS_\" is omitted for all LMW-GS gene markers across subgenomes of A, B, and D. Therefore, designations such as \u003cem\u003eGlu-3_A5\u003c/em\u003e or LMW-GS\u003cem\u003e_A5\u003c/em\u003e are simplified to \u003cem\u003eA5\u003c/em\u003e, and so forth. For another example, in the name of \u003cem\u003eTastan_1DgD10p4292433a1065bp\u003c/em\u003e, \u0026ldquo;Ta\u0026rdquo; represents \u003cem\u003eTriticum aestivum\u003c/em\u003e; The third to sixth characters (i.e. stan, here) represent wheat varieties (arin, arinalrfor; jagg, jagger; juli, julius; kari, kariega; lanc, lancer; land, landmark; mace, mace; matt, mattis; nori, norin61; stan, stanley); The characters between the underline \u0026ldquo;_\u0026rdquo; and \u0026ldquo;g\u0026rdquo; represent the No. of chromosomes (1D, here; If it is \u0026ldquo;cont\u0026rdquo; or \u0026ldquo;scaf\u0026rdquo;, they represent contig and scaffold respectively); The characters between \u0026ldquo;g\u0026rdquo; and \u0026ldquo;p\u0026rdquo; represent LMW-GS genes (\u003cem\u003eD10\u003c/em\u003e, here); \u0026ldquo;g\u0026rdquo; indicates gene; The number between \u0026ldquo;p\u0026rdquo; and \u0026ldquo;a\u0026rdquo;, or between \u0026ldquo;p\u0026rdquo; and \u0026ldquo;s\u0026rdquo; represents the starting position of the LMW-GS gene on the corresponding chromosome (contig or scaffold, if happened); \u0026ldquo;p\u0026rdquo; means the starting position (4,292,432, here); \u0026ldquo;s\u0026rdquo; or \u0026ldquo;a\u0026rdquo; represent the positive (sense) or negative (anti-sense) strands of DNA, respectively; The numbers before \u0026ldquo;bp\u0026rdquo; and after \u0026ldquo;a\u0026rdquo; or \u0026ldquo;s\u0026rdquo; indicate the length of CDS region of the LMW-GS gene (1065bp, here). Therefore, \u003cem\u003eTastan_1DgD10p4292433a1065bp\u003c/em\u003e represents the \u003cem\u003eD10\u003c/em\u003e gene located on the 1D chromosome antisense chain in the wheat variety Stanley, with a coding region starting at 4,292,433 and a coding region length of 1065bp. If there is a dot in the middle of the gene name, for example, \u0026ldquo;\u003cem\u003eA2.A1\u003c/em\u003e\u0026rdquo;, it indicates that the Begin_90bp is aligned to \u003cem\u003eA2\u003c/em\u003e, but the End_90bp is aligned to \u003cem\u003eA1\u003c/em\u003e.\u003c/p\u003e\n\u003ch3\u003ePhylogenetic analysis of LMW-GS family members\u003c/h3\u003e\n\u003cp\u003eThe alignment of LMW-GS protein sequences was performed using ClustalW (Thompson et al., \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e2002\u003c/span\u003e), and an NJ tree was constructed using MEGA11 with 1000 bootstrap replications, and bootstrap values were used to estimate the relative support on each branch (Tamura et al., \u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). The phylogenetic tree was beautified by Evolview-v2 (He et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2016\u003c/span\u003e) (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://evolgenius.info//evolview-v2/\u003c/span\u003e\u003cspan address=\"https://evolgenius.info//evolview-v2/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Non-synonymous substitution rate (Ka) and synonymous substitution rate (Ks) were calculated by TBtools software (C. Chen et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The divergence time was deduced using this formula: T\u0026thinsp;=\u0026thinsp;Ks/2λ\u0026thinsp;\u0026times;\u0026thinsp;10\u003csup\u003e\u0026minus;\u0026thinsp;6\u003c/sup\u003e million years ago (MYA), λ\u0026thinsp;=\u0026thinsp;6.5 \u0026times; 10\u003csup\u003e\u0026minus;\u0026thinsp;9\u003c/sup\u003e (Tamura et al., \u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e2021\u003c/span\u003e)。\u003c/p\u003e\n\u003ch3\u003eTranscriptome data analysis\u003c/h3\u003e\n\u003cp\u003eThe RNAseq data were downloaded from NCBI (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ncbi.nlm.nih.gov/sra\u003c/span\u003e\u003cspan address=\"https://www.ncbi.nlm.nih.gov/sra\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Use the FastQC software to perform quality control on the FASTQ format sequences, and then utilize the FastX Toolkit's fastx_trimmer tool to trim the quality-controlled sequences to maintain a consistent read length by using the selected parameters -Q33 and -l 130. Next, the STAR (Spliced Transcripts Alignment to a Reference) software package was used for index construction and reads mapping, and finally, the RSEM (RNA-Seq by Expectation-Maximization) software package was employed to calculate gene expression levels. The 3D column chart of the gene expression profile was drawn using Excel 2010.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eValidation of LMW-GS genes with structural variations in the CDS region\u003c/h2\u003e \u003cp\u003ePCR amplification and Sanger sequencing methods were used to validate the structural variations in CDS regions of \u003cem\u003eA3\u003c/em\u003e and \u003cem\u003eB5\u003c/em\u003e genes that existed in the genomes of different wheat cultivars and their closely related species in this research. We employed a 50 \u0026micro;l PCR amplification system, including 25 \u0026micro;l Taq DNA polymerase (21502-01, 2\u0026times; Magic Green Taq SuperMix, Shanghai Tolo Port Biotech), 2.5 \u0026micro;l of upstream and downstream primers each, 3 \u0026micro;l DNA template, and 17 \u0026micro;l sterile ultrapure water. The specific primers for \u003cem\u003eA3\u003c/em\u003e and \u003cem\u003eB5\u003c/em\u003e genes are shown in Table \u003cspan refid=\"MOESM13\" class=\"InternalRef\"\u003eS13\u003c/span\u003e. The genomic DNA of the wheat flag leaf at the heading stage was used as template. The PCR amplification program adopted was as follows: pre-denaturation at 95\u0026deg;C for 3 minutes, followed by 35 cycles of 95\u0026deg;C for 10 seconds, 65\u0026deg;C for 10 seconds, and 72\u0026deg;C for 1 minute, with a final extension at 72\u0026deg;C for 5 minutes. According to the results of agarose gel electrophoresis, the target PCR products were recovered from the gel and purified, then sequenced by Sangon Biotech (Shanghai) Company using the Sanger method.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eRNA extraction and quantitative real-time PCR analysis\u003c/h3\u003e\n\u003cp\u003eThe young seeds samples were collected from the Yuandi experimental field in Xinxiang, Henan under consistent experimental conditions, and immediately frozen in liquid nitrogen for future use. Total RNA was extracted by Trizol (ET121, TransZol Plant, TransGen Biotech) from the young seeds of wheat on different days after anthesis (DAA), and the first strand cDNA was synthesized by a reverse transcription reagent kit (R122-01, HiScript Q RT SuperMix, Vazyme Biotech). A SYBR Green detection kit (22208, 2xQ5, Shanghai Tolo Port Biotech) was used to perform quantitative reverse transcription-PCR (qRT-PCR) assays on a real-time PCR system (QuantStudio 6 Flex, Applied Biosystems, USA). The qPCR reaction was performed using a 20 \u0026micro;L system with a two-step standard procedure (95\u0026deg;C for 10 seconds, 60\u0026deg;C for 30 seconds) for 40 cycles. The melting curve was obtained using the instrument\u0026rsquo;s default program. The primers for qRT-PCR were designed using AlleleID 6.01 and their sequences for the selected genes are listed in Table \u003cspan refid=\"MOESM13\" class=\"InternalRef\"\u003eS13\u003c/span\u003e. The specificity of the primers was checked using the WheatOmics website (Ma et al., \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). The previously reported \u003cem\u003eTaTubulin\u003c/em\u003e gene was employed as the housekeeping gene and the expression data were normalized to the \u003cem\u003eTaTubulin\u003c/em\u003e gene (Xiang et al., \u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). The relative gene expression was calculated with the 2\u003csup\u003e\u0026ndash;ΔCt\u003c/sup\u003e method (Livak and Schmittgen, \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2001\u003c/span\u003e).\u003c/p\u003e\n\u003ch3\u003eStatistical analysis and sequence alignment\u003c/h3\u003e\n\u003cp\u003eTwo statistical methods were adopted for the experimental data analysis. For two groups comparison, Student\u0026rsquo;s t-test (two-sided) was used, *, p\u0026thinsp;\u0026lt;\u0026thinsp;0.05; **, p\u0026thinsp;\u0026lt;\u0026thinsp;0.01; ***, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001. For multiple comparisons, ANOVA was used followed by Duncan\u0026rsquo;s test. The agricolae package in R, Microsoft Excel 2010, GraphPad Prism 8.0, and IBM SPSS Statistics 25 were employed for data analysis (Clarke et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). The RepeatMasker v4.1.4 software was used for annotating repeat sequences. Alignments between two sequences were performed using Lasergene (version 7.1.0) software mainly. Protein multi-sequence alignments were performed using Clustal Omega (Madeira et al., \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ebi.ac.uk/jdispatcher/msa/clustalo\u003c/span\u003e\u003cspan address=\"https://www.ebi.ac.uk/jdispatcher/msa/clustalo\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), and followed analysis of i-, m-, and s-type LMW-GS using Jalview (version: 2.11.3.3). NCBI-BLASTN refers to the use of BLASTN on the NCBI website (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://blast.ncbi.nlm.nih.gov/Blast.cgi\u003c/span\u003e\u003cspan address=\"https://blast.ncbi.nlm.nih.gov/Blast.cgi\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) for online alignment and sequence retrieval, and Local BLASTN version was NCBI-blast-2.14.0+ (Zhang et al., \u003cspan citationid=\"CR78\" class=\"CitationRef\"\u003e2000\u003c/span\u003e;Morgulis et al., \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2008\u003c/span\u003e).\u003c/p\u003e"},{"header":"Results","content":"\u003ch2\u003eNecessary and feasible for the re-identification of LMW-GS genes in grass genomes\u003c/h2\u003e\n\u003cp\u003eBased on gene coding region characteristics (Fig. 1A), LMWgsFinder has been developed in this study and offers a more effective and precise method for identifying LMW-GS genes in the genomes of wheat and its related varieties across the grass family compared to previous annotations (For more details, please refer to the supplementary information). This new, specialized annotation tool is crucial for comparative sequence analysis and further functional research on these genes. According to statistical data (Tables 1, 2, and S10), this study identified a total of 291 LMW-GS genes using LMWgsFinder across 20 different wheat varieties and related species. However, no LMW-GS genes were found in the genomes of the remaining 6 species (\u003cem\u003eBrachypodium distachyon\u003c/em\u003e, barley, rice,\u003cem\u003e\u0026nbsp;Setaria italica\u003c/em\u003e, sorghum, and corn). The transcriptome data analysis results showed that the genes, identified by LMWgsFinder, and their expression profiles were mostly consistent with the TPM and FPKM analysis results (Fig. 2). In contrast, using the reference genes (i.e. the LMW-GS genes in CS and xy54 [Xiaoyan54]) for BLASTN analysis to identify LMW-GS genes in these 20 different genomes detected only 132 genes which are approximately 45.36% (132/291) of the number identified by LMWgsFinder (Table 3). If the TaCS is not included, only 98 genes (38.13%, 98/257) are detected. This implies that it is both necessary and feasible to carry out the re-identification of LMW-GS genes using LMWgsFinder for wheat and related varieties based on previously published genome data.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eComparative sequence analysis of LMW-GS genes in 14 hexaploid wheat cultivars\u003c/h2\u003e\n\u003cp\u003eWith the rapid advancements in DNA sequencing technologies and large-genome assembly techniques, coupled with decreasing costs of DNA sequencing, international efforts have led to the sequencing and assembly of genomes from nearly 30 wheat varieties (Walkowiak et al., 2020;Sato et al., 2021;Shi et al., 2022;Jia et al., 2023). This includes projects such as the 10+ Wheat Genomes Project (https://10wheatgenomes.com/) (Walkowiak et al., 2020), the sequencing of \u003cem\u003eTriticum spelta\u003c/em\u003e, and the genomes of notable wheat cultivars such as Aikang58 (Jia et al., 2023), Kenong9204 (Shi et al., 2022), and Fielder (Sato et al., 2021). To evaluate the performance of LMWgsFinder in identifying LMW-GS genes across different wheat cultivars, the tool was employed to analyze data from 14 hexaploid wheat genomes assembled at the chromosome level. The systematic identification and comparative sequence analysis of LMW-GS genes yielded the following results: A total of 234 LMW-GS genes were identified across the 14 wheat cultivars, comprising 38 Gap-type (Gap; defined by unresolved nucleotides [N] in the CDSs of these genes), 75 Pseudogenes (Pse; containing premature stop codons), 4 Incomplete (Inc; truncated coding regions without stop codons), and 117 Normal genes (Nor; intact coding regions with functional protein-coding capacity) (Table 1). The analysis revealed that the proportion of structurally normal LMW-GS genes was the highest among these genomes, accounting for 50.00% (117/234). Pseudogenes with one or more premature stop codons represented 32.05% (75/234). Notably, 16.24% (38/234) of the LMW-GS genes contained gaps. For instance, in the TaJagg genome, the number of LMW-GS genes with gaps was as high as 8, representing over half of the identified genes in that genome. Additionally, wheat cultivars TaAK58 and TaKN9204, which contain the 1BL/1RS translocation chromosome, do not have the 1BS chromosome in their genomes. Consequently, no LMW-GS genes were detected on the B subgenome in these cultivars (Table 1).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOverall, the analysis of the 14 wheat cultivars revealed the following patterns regarding LMW-GS genes: (1) Distribution by subgenome: The A and B subgenomes contain a higher number of LMW-GS genes with gaps, while the D subgenome has the highest number of structurally normal LMW-GS genes (Table 2). (2) Functionally expressed genes: Six LMW-GS genes were identified as structurally normal and capable of expression across all wheat genomes analyzed. These genes are \u003cem\u003eA5, B3, D2, D6, D7\u003c/em\u003e, and \u003cem\u003eD9\u003c/em\u003e (Table 1, Figs. 3 and 5). (3) Pseudogenes: Five LMW-GS genes were identified as pseudogenes (either non-expressed or expressed at low levels). These include \u003cem\u003eA4\u003c/em\u003e (except for TaKari, where it is an incomplete gene), \u003cem\u003eB1\u003c/em\u003e (except for \u003cem\u003eTaJuli_B1\u003c/em\u003e, which has a gap), \u003cem\u003eB5\u003c/em\u003e (except for \u003cem\u003eTaJagg_B5\u003c/em\u003e, which has a gap), \u003cem\u003eD4\u003c/em\u003e, and \u003cem\u003eD5\u003c/em\u003e (Table 1 and Fig. 3). (4) Special Case of \u003cem\u003eA3\u003c/em\u003e Gene: (see the \u0026ldquo;Rare structural variations in the CDS of LMW-GS genes\u0026rdquo; sector). (5) New genes identified: Four new type LMW-GS genes (\u003cem\u003eA2.A1, A2.A6, A5.B4\u003c/em\u003e, and \u003cem\u003eD5.B4\u003c/em\u003e) were identified across the 14 wheat cultivars, totaling 12 new LMW-GS genes. Among these, two genes (\u003cem\u003eTaJagg_A5.B4\u003c/em\u003e and \u003cem\u003eTaKari_D5.B4\u003c/em\u003e) are structurally normal. The results of\u0026nbsp;qRT-PCR\u0026nbsp;confirmed that the \u003cem\u003eD5.B4\u003c/em\u003e gene can be expressed normally in wheat varieties such as Shanyou 225 (SY225), Zhengmai 9023 (ZM9023), Fielder, and Shijiazhuang 8 (SJZ8) (Figs. 5C, F, I, and M), while the \u003cem\u003eA5.B4\u003c/em\u003e gene can be expressed normally in SY225 and SJZ8 (Figs. 5C and M). These findings support that within wheat subgenomes or between subgenomes, LMW-GS homologous genes may form new, functional LMW-GS genes through mechanisms such as homologous recombination, gene duplication, and gene conversion (Wang et al., 2011;Qin et al., 2015;Hu et al., 2020). Further research is needed to elucidate the specific molecular mechanisms underlying these processes.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIt is noteworthy that this study identified a unique case among the 17 D subgenomes: only the TaMace genome contains both \u003cem\u003eD1\u003c/em\u003e and \u003cem\u003eD10\u003c/em\u003e genes. These two genes exhibit very high DNA and protein sequence similarity, with only 8 single nucleotide polymorphisms (SNPs) (Fig. S18A) and 7 amino acid differences (one of the 8 SNPs does not result in an amino acid change). This indicates that they are likely different alleles of the same gene locus in different wheat varieties (Table S1). The detection of different alleles at the same gene locus within the same genome indicates that the TaMace used for genome sequencing might be heterozygous at the \u003cem\u003eD1/D10\u003c/em\u003e locus. A similar phenomenon was observed in the expression analysis of LMW-GS genes (Figs. 5A-I, L-M). Additionally, our research team has previously completed PacBio sequencing of LMW-GS genes in the genomes of 57 wheat varieties with varying gluten strength. Analysis revealed that the \u003cem\u003eD1/D10\u003c/em\u003e and \u003cem\u003eD2/D9\u003c/em\u003e loci may be heterozygous in several wheat cultivars, including Xinmai 26 (XM26), Zhengmai 366 (ZM366), ZM9023, and SJZ8. Notably, there is a clear trend that higher-quality wheat cultivars exhibit a greater probability of heterozygosity at these loci, with ongoing experimental validation to further explore these findings.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eThe duplication of LMW-GS genes in hexaploid wheat genomes\u003c/h2\u003e\n\u003cp\u003eGene duplication is a common phenomenon in biological evolution and is a significant factor in the diversity of biological traits (Sanchez-Munoz, 2024;Thomas et al., 2024). In this study, we identified six different LMW-GS genes (\u003cem\u003eA3, B2, B5, B6, D6,\u0026nbsp;\u003c/em\u003eand\u003cem\u003e\u0026nbsp;D8\u003c/em\u003e) that may exhibit gene duplication phenomena mediated possibly by EnSpm-like transposon in the genomes of seven hexaploid wheat cultivars (TaAK58, TaFiel, TaJagg, TaKari, TaLanc, TaNori, and Ts) (Figs. 1 B, C, and D, Tables 1 and S3). For instance, \u003cem\u003eTaFiel_B2-I\u003c/em\u003e and \u003cem\u003eTaFiel_B2-II\u003c/em\u003e are located approximately 36.2 Kb apart on chromosome 1B (Table S3), with 99.9% similarity in their coding sequences, differing by only one SNP. Similar cases include \u003cem\u003eTaKari_B6-I\u003c/em\u003e vs. \u003cem\u003eTaKari_B6-II\u003c/em\u003e (~20.7 kb, 99.3%), \u003cem\u003eTaKari_A3-II\u003c/em\u003e vs. \u003cem\u003eTaKari_A3-III\u003c/em\u003e (~38.5 kb, 100%), and \u003cem\u003eTaLanc_B2-I\u003c/em\u003e vs. \u003cem\u003eTaLanc_B2-II\u003c/em\u003e (~13.0 kb, 99.8%) (Table S3). Taking the \u003cem\u003eA3, B2,\u003c/em\u003e and \u003cem\u003eB5\u003c/em\u003e genes in the CS genome as an example, the assembly of these genes was verified to be good through alignments with the BioNano genome map\u0026nbsp;(Fig. S19). Additionally, the alignments result between the genomic region of the Fielder genome, which includes the \u003cem\u003eB2\u003c/em\u003e gene, and the CS genome BioNano map can roughly indicate that the \u003cem\u003eB2\u003c/em\u003e gene duplication has occurred in this genome\u0026nbsp;(Fig. S19B). Dot plot analyses also revealed that the region harboring the \u003cem\u003eB2\u003c/em\u003e gene in the TaFiel genome has undergone segmental duplication (Fig. S20). Additionally, in the genome sequencing results of TaAK58, TaLanc, and TaNori, we identified four LMW-GS genes in contigs or scaffolds that were not assembled onto chromosomes but have coding sequences completely identical to those of the corresponding genes assembled onto chromosomes. These genes are \u003cem\u003eTaAK58_D8-I, TaLanc_D6-I, TaNori_B6-I,\u003c/em\u003e and \u003cem\u003eTaNori_B5-I\u003c/em\u003e (Table S3). The failure to assemble these genes onto their respective chromosomes may be related to the complex sequence composition of the genome regions or the heterozygous state of the genomic DNA segment containing these gene loci in the sequenced materials\u0026nbsp;(Huo et al., 2018). Nonetheless, these genes are objectively present.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn the five materials (TaAK58, TaArin, TaKN9204, TaMatt, and TaNori), the \u003cem\u003eA6\u003c/em\u003e gene is unique. Regardless of whether the coding region of this gene contains gaps, the distance between \u003cem\u003eA6_I\u003c/em\u003e and \u003cem\u003eA6_II\u0026nbsp;\u003c/em\u003eon the assembled chromosomes within the corresponding genomes is nearly consistent, approximately 71.1 kb. Aside from genes with gaps, the similarity of the \u003cem\u003eA6\u003c/em\u003e gene coding sequences in the other two materials (TaAK58 and TaKN9204) is also relatively consistent, with 84.9% similarity between \u003cem\u003eTaAK58_A6-I\u0026nbsp;\u003c/em\u003eand \u003cem\u003eTaAK58_A6-II\u003c/em\u003e, and 84.4% similarity between \u003cem\u003eTaKN9204_A6-I\u003c/em\u003e and \u003cem\u003eTaKN9204_A6-II\u0026nbsp;\u003c/em\u003e(Table S3 and Fig. S6). It is noteworthy that \u003cem\u003eTaKN9204_A6-II\u003c/em\u003e is a structurally normal gene with expression potential, while \u003cem\u003eTaKN9204_A6-I\u0026nbsp;\u003c/em\u003eis a pseudogene. In the case of xy54, the \u003cem\u003eA3-2xy54\u003c/em\u003e (structurally normal) and \u003cem\u003eA3-3xy54\u003c/em\u003e (pseudogene) sequences show very high similarity, reaching 98.5%. The Begin_90bp and End_90bp of these two genes are completely identical and are numbered as \u003cem\u003eA6\u003c/em\u003e in this study. Therefore, the \u003cem\u003eA6-I\u003c/em\u003e and \u003cem\u003eA6-II\u003c/em\u003e genes identified at the corresponding positions on the same chromosome in these materials should be considered as two different LMW-GS genes. The structurally normal \u003cem\u003eTaKN9204_A6-II\u003c/em\u003e is a direct ortholog of xy54\u0026rsquo;s \u003cem\u003eA3-2xy54\u003c/em\u003e (\u003cem\u003eTaKN9204_A6-II\u003c/em\u003e vs. \u003cem\u003eA3-2xy54\u003c/em\u003e, 100%) (Fig. S8B), while the structurally abnormal \u003cem\u003eTaKN9204_A6-I\u003c/em\u003e is a direct ortholog of xy54\u0026rsquo;s \u003cem\u003eA3-3xy54\u003c/em\u003e (\u003cem\u003eTaKN9204_A6-I\u003c/em\u003e vs. \u003cem\u003eA3-3xy54\u003c/em\u003e, 85.6%) (Fig. S8A).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSimilarly, the\u003cem\u003e\u0026nbsp;B6\u003c/em\u003e gene in the materials TaJagg, TaKari, TaNori, and Ts exhibits a pattern where, despite approximately half of the \u003cem\u003eB6\u003c/em\u003e genes having gaps in their coding regions, the distance between the two \u003cem\u003eB6\u003c/em\u003e genes located on the chromosomes of these four wheat genomes is very close, approximately 20.7 kb (Table S3). In the TaKari genome, both \u003cem\u003eB6\u003c/em\u003e genes (\u003cem\u003eTaKari_B6-I\u0026nbsp;\u003c/em\u003evs. \u003cem\u003eTaKari_B6-II\u003c/em\u003e, 99.3%) are normal, gap-free LMW-GS genes. Based on the Ks value of this gene pairs (0.01169), it is estimated that the \u003cem\u003eB6\u003c/em\u003e gene duplication occurred around 0.899 MYA. To roughly estimate the time of \u003cem\u003eB6\u0026nbsp;\u003c/em\u003egene duplication in the Ts genome, gaps in the coding region of the \u003cem\u003eB6-I\u003c/em\u003e gene were removed, and corresponding bases in the \u003cem\u003eB6-II\u003c/em\u003e gene that matched the gaps in \u003cem\u003eB6-I\u003c/em\u003e were also removed (Fig. S9). The Ks value calculation showed that the Ks value for \u003cem\u003eTs_B6-I\u003c/em\u003e vs. \u003cem\u003eTs_B6-II\u003c/em\u003e is approximately 0.00466, with a divergence time of ~0.358 MYA. The duplication time of the \u003cem\u003eB2\u003c/em\u003e genes detected in the TaFiel and TaLanc genomes is around 0.325 MYA, with Ks values of ~0.00422 for both \u003cem\u003eTaFiel_B2-I\u003c/em\u003e vs. \u003cem\u003eTaFiel_B2-II\u003c/em\u003e and \u003cem\u003eTaLanc_B2-I\u0026nbsp;\u003c/em\u003evs. \u003cem\u003eTaLanc_B2-II\u003c/em\u003e. These gene duplication events in the various wheat genomes occurred later than the divergence time between common wheat and Td (approximately 0.432 MYA) (Chen et al., 2013). This indicates that these duplications happened independently in hexaploid wheat after it diverged from tetraploid wheat. This also supports the view that\u0026nbsp;\u003cem\u003eTriticum spelta\u003c/em\u003e is a derivative of common wheat rather than an ancestor, aligning with the findings of previous studies\u0026nbsp;(Blatter et al., 2004;Wang et al., 2024). Furthermore, the \u003cem\u003eA3-II\u003c/em\u003e and \u003cem\u003eA3-III\u003c/em\u003e genes in TaKari, as well as the \u003cem\u003eD6-I\u003c/em\u003e and \u003cem\u003eD6-II\u003c/em\u003e genes in the TaLanc genome, have identical coding sequences (Table S3), indicating that they are recently formed gene duplications. This also indicates that the duplication of LMW-GS genes in the hexaploid wheat genome has been ongoing throughout the evolution and propagation of common wheat.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn this study, three copies of the \u003cem\u003eA3\u003c/em\u003e gene were found in TaKari. Aside from \u003cem\u003eTaKari_A3-I\u003c/em\u003e (which is a pseudogene), there are two additional structurally normal \u003cem\u003eA3\u003c/em\u003e genes (\u003cem\u003eTaKari_A3-II\u003c/em\u003e and \u003cem\u003eTaKari_A3-III\u003c/em\u003e), located approximately 38.5 kb apart with 100% similarity (Table S3, Figs. S7 and S13). In contrast, the homologous \u003cem\u003eA3\u0026nbsp;\u003c/em\u003egene in CS is a pseudogene, while the two tetraploid wheat varieties used in this study have normal coding regions for the \u003cem\u003eA3\u003c/em\u003e gene (Table 1). This indicates that the \u003cem\u003eA3\u003c/em\u003e gene underwent multiple duplications in the TaKari genome, potentially resulting in the formation of a functionally lost pseudogene (\u003cem\u003eA3-I\u003c/em\u003e) through gene inactivation. Alternatively, this could be due to gene conversion from a donor material containing a pseudogene, or it might indicate an assembly error at the \u003cem\u003eA3\u003c/em\u003e locus in the genome. Similarly, this study also identified multiple duplications of the \u003cem\u003eB6\u003c/em\u003e gene in the TaNori genome (Table 1). Using LMWgsFinder, we detected multiple copies of the same LMW-GS gene in different wheat varieties sequenced and assembled by various research groups. This indicates that multiple duplications of the same LMW-GS gene may be a common phenomenon in wheat genomes. However, there have been no relevant reports so far. On the one hand, this is due to the limited number of wheat species for which whole genome sequencing has been completed in the past. On the other hand, it may also be closely related to the complexity of the genes in this family and the lack of an effective identification method.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eRare structural variations in the CDS of LMW-GS genes\u003c/h2\u003e\n\u003cp\u003eSeveral cases of rare structural variations associated with the loss or acquisition of LMW-GS gene function have been discovered in this study. Among the 16 hexaploid wheat genomes, the \u003cem\u003eA3\u003c/em\u003e gene exhibits variability. It is structurally normal in the TaAK58 genome (Table 1 and Figs. 4A-C) and the latter two genes of \u003cem\u003eA3\u003c/em\u003e triplication occurred in the TaKari genome (Fig. S13A, Tables 1 and S3) are also structurally normal. In contrast, in most other hexaploid wheat genomes, \u003cem\u003eA3\u003c/em\u003e is identified as a pseudogene, indicating that this gene has undergone functional loss in the majority of hexaploid wheat varieties; while it retains normal structure and expression in a few cases (Table 1 and Fig. 5J).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn a previous study, the author discovered, through mining the Wheat Union Database (Wang et al., 2020) (http://wheat.cau.edu.cn/WheatUnion/), that only two materials, MP39 (Zijiehong) and MP116 (Chuanmai42), had normal \u003cem\u003eB5\u003c/em\u003e gene coding regions among 145 MP numbered materials (labeled with \u0026ldquo;MP\u0026rdquo; and additional numbers). Among the other 308 materials, all \u003cem\u003eB5\u003c/em\u003e genes were pseudogenes, resulting in a functional gene ratio of only 0.44% (2/453). To verify the correctness of the previously identified \u003cem\u003eB5\u003c/em\u003e gene, this study not only used experimental evidence to confirm that the \u003cem\u003eTaAK58_A3\u003c/em\u003e gene is a structurally and functionally normal LMW-GS gene (Figs. 4A-C and 5J), but also validated the gene structure and expression of the previously discovered \u003cem\u003eB5\u003c/em\u003e gene. The results were as expected, that is, the \u003cem\u003eB5\u003c/em\u003e gene is a functional LMW-GS gene only in a few materials (Figs. 4D-G and 5K), while in most other wheat genomes, and it is a pseudogene (Huo et al., 2018). Moreover, the \u003cem\u003eB5\u003c/em\u003e gene is highly expressed in both MP39 and MP116, indicating it may have good application potential.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eComparative sequence analysis of LMW-GS genes in four wheat related species\u003c/h2\u003e\n\u003cp\u003eLMWgsFinder is also effective in predicting LMW-GS genes in the genomes of diploid and tetraploid wheat relatives. Dong et al. (2016) conducted a detailed study of the LMW-GS regions in the Aet genome using techniques such as BAC library screening, Roche/454 Titanium platform (GS FLX+), and BioNano optical genome maps. They identified a total of five LMW-GS genes in this genome, all with normal coding regions. In this study (Tables 1 and S5), LMWgsFinder also annotated five LMW-GS genes in the Aet genome, four of which (\u003cem\u003eD9, D3, D6,\u0026nbsp;\u003c/em\u003eand\u003cem\u003e\u0026nbsp;D7.D11\u003c/em\u003e) are consistent with Dong et al.\u0026rsquo;s results. The \u003cem\u003eD10\u003c/em\u003e gene showed a high degree of similarity (99.81%) with the result (\u003cem\u003eAetG53-LMW1\u003c/em\u003e) from Dong et al. (2016). The difference is that after 295 bp in the \u003cem\u003eAet_1DgD10p5071820a1067bp\u003c/em\u003e gene, there are two additional bases, AG, which leads to an early appearance of a stop codon in the coding region, making it a pseudogene (Figs. S10A-B). The comparison of CDSs indicates that the annotation of the LMW-GS gene by Aet_v4.0 is incomplete, as the\u003cem\u003e\u0026nbsp;D10\u003c/em\u003e gene has not been annotated (Table S5). However, RNA-seq data analysis shows that the \u003cem\u003eD10\u003c/em\u003e gene (\u003cem\u003eAET1Gv20018400\u003c/em\u003e) in the Aet genome is expressed (Fig. S10C). This indicates that LMWgsFinder can effectively annotate LMW-GS genes in the Aet genome, regardless of whether the gene structure is normal or not.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eLMWgsFinder identified two structurally normal LMW-GS genes (\u003cem\u003eA2.A1\u003c/em\u003e and \u003cem\u003eA2.A6\u003c/em\u003e) with duplication events in the Tu genome (Table S3), and both gene pairs (\u003cem\u003eTu_A2.A1-I\u003c/em\u003e vs. \u003cem\u003eTu_A2.A1-II\u0026nbsp;\u003c/em\u003eand \u003cem\u003eTu_A2.A6-I\u003c/em\u003e vs. \u003cem\u003eTu_A2.A6-II\u003c/em\u003e) exhibited 100% sequence similarity. Notably, \u003cem\u003eTu_A2.A1-I\u0026nbsp;\u003c/em\u003ewas not mapped to a chromosome, while the two copies of \u003cem\u003eA2.A6\u003c/em\u003e were located on chromosome 1A (Fig. S13A), approximately 1.4 Mb apart. In the tetraploid TrTu genome, LMWgsFinder identified fewer LMW-GS genes, totaling five. In contrast, the tetraploid Td genome contained a larger number of LMW-GS genes, totaling eight, with a majority being newly formed genes (62.5%, 5/8) (Table 1). The observations above indicate that gene duplication and the formation of new LMW-GS genes are ongoing processes in both hexaploid and tetraploid wheat genomes, highlighting the dynamic nature of LMW-GS gene evolution across different wheat species and ploidy levels (Zhang et al., 2023).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn this study, LMWgsFinder identified a total of 23 LMW-GS genes in the diploid (Aet, Tu) and tetraploid (TrTu, Td) wheat. Among these, 60.87% (14/23) of the genes were structurally normal (Table 1). This indicates that after the evolution from diploid and tetraploid wheat to hexaploid wheat, some of the redundant LMW-GS gene copies became pseudogenes (Akhunov et al., 2013;Desjardins et al., 2020). Consequently, the proportion of structurally normal LMW-GS genes in wheat decreased from 60.87% in the diploid and tetraploid relatives to 50.00% (117/234) in wheat, while the proportion of pseudogenes increased from 21.74% (5/23) in the diploid and tetraploid relatives to 32.05% (75/234) in wheat. The proportion of LMW-GS genes with gaps in these four relative species was 17.39% (4/23), which is similar to the proportion found in 14 wheat varieties, which is 16.24% (38/234) (Table 1). This indicates that the occurrence of gap-containing LMW-GS genes is relatively consistent across both wheat relatives and common wheat.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAmong the 23 LMW-GS genes identified, 11 were new genes with different Begin_90bp and End_90bp sequences, constituting 47.83% (11/23) (Table 1). In contrast, LMWgsFinder identified only 5.13% (12/234) of such new genes with different Begin_90bp and End_90bp sequences in 14 common wheat genomes. This significant difference indicates that during the evolution from wild diploid and tetraploid ancestors to hexaploid common wheat through hybridization and chromosomal duplication, LMW-GS genes experienced considerable divergence (Li et al., 2010). As a result, the conserved 5\u0026rsquo; and 3\u0026rsquo; ends of LMW-GS genes in common wheat have lower sequence conservation in their diploid and tetraploid relatives. This divergence is reflected in the higher proportion of new genes with different Begin_90bp and End_90bp sequences identified in the wheat relatives compared to common wheat.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eDue to the limited availability of chromosome-level assemblies for diploid and tetraploid wheat species and the incomplete assembly of some LMW-GS gene loci in genomic sequencing, there may be some limitations in this study\u0026rsquo;s findings. However, the overall trends observed in this research are still considered to be reliable.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eTypes, cysteine, and disulfide bonds of LMW-GS\u003c/h2\u003e\n\u003cp\u003eA total of 149 LMW-GS genes with normal open reading frames (ORFs) were identified across 20 genomes of wheat and its related species in this study. The amino acid length of these genes ranges from 285 to 393 residues (Table S7). The amino acid composition and structural features align with previous studies, exhibiting the typical Sig--(N-ter)--Rep--C-ter structural characteristics (Renato and Stefania, 2004;Dong et al., 2010;Ren et al., 2022) (Fig. S11). Among the 20 genomes, m-type LMW-GS was the most frequent (107), followed by s-type (33) and i-type was the least (9). Structurally, all three types of LMW-GS are highly conserved at both the N-terminal and C-terminal ends. The N-terminal is characterized by motif2, while the C-terminal comprises five motifs: motif4, motif3, motif5, motif8, and motif1 (Fig. S12). Each type contains 8 cysteines. In i-type, all cysteines are located in the C-ter region, whereas in m-type and s-type, only 7 cysteines are found in the C-ter region. The position of the eighth cysteine is the most conserved across all three types, followed by the seventh cysteine (Table S7). Previous research indicates that in i-type LMW-GS, the third and seventh cysteines can form intermolecular disulfide bonds, whereas in s- and m-types, the first and seventh cysteines are involved in intermolecular disulfide bonding (Du et al., 2020) (Fig. S11). The remaining cysteines typically form intramolecular disulfide bonds. The position of the first cysteine is relatively stable in s-type LMW-GS (at the 66th amino acid), while m-type LMW-GS have three possible positions for the first cysteine: the 25th, 46th, and 65th amino acids (Table S7). In terms of amino acid composition, m- and s- types are more similar to each other, especially in the C-ter region (Huang and Cloutier, 2008). However, both the N-ter and C-ter regions of the m-type are more diverse compared to the s-type, and the m-type can be further subdivided into more subtypes. This increased diversity may be related to the ongoing gene expansion in m-type LMW-GS.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003ePhylogenetic analysis of LMW-GS genes\u003c/h2\u003e\n\u003cp\u003eTo explore the phylogenetic relationships of LMW-GS genes, this study constructed a phylogenetic tree of 160 protein sequences (149 identified in this study and 11 from previous research in the xy54 genome) (Fig. 6). The analysis revealed a significant expansion of m-type LMW-GS genes. I-type LMW-GS genes are found only in the A subgenome (primarily \u003cem\u003eA2\u003c/em\u003e), while s-type genes are restricted to the B and D subgenomes (mainly \u003cem\u003eB2, B3, B6, D1/D10\u003c/em\u003e, etc.). In contrast, m-type genes are present across all three subgenomes (A, B, and D), including \u003cem\u003eA3/A5, B4, D2/D9, D3, D6, D7/D11\u003c/em\u003e, and\u003cem\u003e\u0026nbsp;D8\u003c/em\u003e.This is consistent with the previous results of Zhang et al. (2013). The D subgenome contributes the most to m-type genes, accounting for 75.70% (81/107) (Fig. S13). This indicates that the expansion of m-type genes is closely associated with the rapid evolution of the \u003cem\u003eGlu3/Gli1\u003c/em\u003e loci in the D subgenome (Huo et al., 2018). Compared with the \u003cem\u003eGlu3/Gli1\u003c/em\u003e homologous region in\u0026nbsp;\u003cem\u003eAegilops tauschii\u003c/em\u003e, the donor of wheat D genome, within a time range of only about 8000 years since the emergence of wheat species, its D subgenome has rapidly evolved to add three additional genes, \u003cem\u003eD4, D5\u003c/em\u003e, and \u003cem\u003eD8\u003c/em\u003e, to this homologous region, among which \u003cem\u003eD8\u003c/em\u003e is an m-type LMW-GS gene (Figs. S13C and S16). The \u003cem\u003eGlu3/Gli1\u003c/em\u003e homologous region of wheat not only contains multiple copies of LMW-GS genes and gliadin genes (\u0026gamma;, \u0026delta;, \u0026omega;), but also numerous resistance genes such as \u003cem\u003eLr21\u003c/em\u003e, \u003cem\u003ePm3\u003c/em\u003e, that appear as multiple copies\u0026nbsp;(Huo et al., 2018). The presence of these multi-copy genes facilitates DNA recombination and gene conversion in this region, leading to its further rapid evolution\u0026nbsp;(Li et al., 2008). Additionally, the rapid evolution of the \u003cem\u003eGlu3/Gli1\u003c/em\u003e homologous region is closely related to its location at the chromosome ends, where recombination events occur frequently, and the specific reasons still need further research\u0026nbsp;(Choulet et al., 2010;Akhunov et al., 2013;Dong et al., 2016).\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eOrigin and evolutionary analysis of novel LMW-GS genes\u003c/h2\u003e\n\u003cp\u003eTo investigate the origin and evolutionary relationships of the newly identified \u003cem\u003eA2.A1\u003c/em\u003e LMW-GS genes that appear frequently in this study, an evolutionary tree was constructed containing 23 LMW-GS gene sequences (9 selected from NCBI-BLASTN results and the rest identified in this study). The results showed that (Fig. S14A), there were five \u003cem\u003eA2.A1\u003c/em\u003e genes from hexaploid wheat (\u003cem\u003eTaAK58_A2.A1, TaArin_A2.A1, TaMatt_A2.A1, TaNori_A2.A1, TaKN9204_A2.A1\u003c/em\u003e), which possess a close genetic relationship with the \u003cem\u003eTu-DQ857249.1\u003c/em\u003e gene in \u003cem\u003eTriticum urartu\u003c/em\u003e (U7); There were four \u003cem\u003eA2.A1\u003c/em\u003e genes (\u003cem\u003eTaStan_A2.A1, TrTu_A2.A1, Tu_A2.A1-II, Tu_A2.A1-I\u003c/em\u003e), which are closely related to the \u003cem\u003eTu-KM085254.1\u003c/em\u003e or\u003cem\u003e\u0026nbsp;Tu-KM085273.1\u003c/em\u003e genes in \u003cem\u003eTriticum urartu\u003c/em\u003e varieties (PI428255 or PI428200). This indicates that the nine \u003cem\u003eA2.A1\u003c/em\u003e genes identified in this study have different evolutionary origins, but can be divided into two groups. The first five \u003cem\u003eA2.A1\u003c/em\u003e genes are grouped and form a haplotype with \u003cem\u003eA7, A6-I,\u0026nbsp;\u003c/em\u003eand \u003cem\u003eA6-II\u003c/em\u003e genes; the last four \u003cem\u003eA2.A1\u003c/em\u003e genes form a haplotype with the \u003cem\u003eA2\u003c/em\u003e gene (Fig. S13).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe evolutionary relationships of \u003cem\u003eA2.A6\u003c/em\u003e genes are complex. The phylogenetic analysis of 29 related sequences (including 6 \u003cem\u003eA2.A6\u003c/em\u003e genes, 1 \u003cem\u003eA6A2\u003c/em\u003e gene, 4 \u003cem\u003eA2\u003c/em\u003e genes, 5 each of \u003cem\u003eA6-I\u003c/em\u003e and \u003cem\u003eA6-II\u0026nbsp;\u003c/em\u003egenes, and 8 LMW-GS genes selected based on NCBI-BLASTN search results) (Fig. S14B) showed that four \u003cem\u003eA2.A6\u003c/em\u003e genes (\u003cem\u003eTd_A2.A6\u003c/em\u003e, \u003cem\u003eTaFiel_A2.A6\u003c/em\u003e, \u003cem\u003eTu_A2.A6-II\u003c/em\u003e, and \u003cem\u003eTu_A2.A6-I\u003c/em\u003e) were closely related to \u003cem\u003eA2\u003c/em\u003e in the Tu genome (\u003cem\u003eTu_A2\u003c/em\u003e), while the other two \u003cem\u003eA2.A6\u003c/em\u003e genes (\u003cem\u003eTs_A2.A6\u003c/em\u003e and \u003cem\u003eTaLand_A2.A6\u003c/em\u003e) have a relatively close genetic relationship with \u003cem\u003eA6-II\u003c/em\u003e genes (such as \u003cem\u003eTaArin_A6-II, TaMatt_A6-II, TaNori_A6-II\u003c/em\u003e). The \u003cem\u003eTd_A6.A2\u003c/em\u003e gene has a close evolutionary relationship with \u003cem\u003eTrTu_A2\u003c/em\u003e and \u003cem\u003eTaStan_A2\u003c/em\u003e genes, as well as with the \u003cem\u003eA6-I\u003c/em\u003e gene in multiple genomes (TaArin, TaNori, TaMatt, etc.), indicating that the \u003cem\u003eTd_A6.A2\u003c/em\u003e gene has the same evolutionary origin as \u003cem\u003eA6-I\u003c/em\u003e and \u003cem\u003eA2\u003c/em\u003e in the aforementioned materials. However, due to the presence of gaps in these genes, the inferred evolutionary relationships may not be accurate and can only provide a certain degree of reference.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn this study, a new LMW-GS gene, named \u003cem\u003eD5.B4\u003c/em\u003e, was identified in the B subgenomes of three materials (Td, TaKari, and TaJagg) (Table 1 and Fig. S13B). BLASTN searches in NCBI revealed that a gene sequence annotated as LMW-GS in the \u003cem\u003eTriticum dicoccoides\u003c/em\u003e genome (XM_037590130.1) showed high similarity with the\u003cem\u003e\u0026nbsp;D5.B4\u0026nbsp;\u003c/em\u003egenes identified in the three genomes, with similarities of 100%, 99.73%, and 99.74%, respectively, and query coverage of 100%, 100%, and 90%. Phylogenetic analysis indicates that the \u003cem\u003eD5.B4\u003c/em\u003e genes from these three materials are indeed closely related to LMW-GS genes in the B and D subgenomes of different wheat cultivars and their relatives (Fig. S14D).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe \u003cem\u003eB1B3B6.D2\u003c/em\u003e gene in the 1B subgenome of Td is a pseudogene (\u003cem\u003eTd_B1B3B6.D2\u003c/em\u003e). NCBI-BLASTN search results showed that this gene exhibits extremely high sequence similarity (99.40%) with an LMW-GS gene (KT156624.1) in the \u003cem\u003eAegilops sharonensis\u003c/em\u003e genome, differing by only six SNPs (Table S9). This gene also shows high similarity (97.99% to 98.8%) with related LMW-GS genes in the \u003cem\u003eTriticum monococcum\u003c/em\u003e and \u003cem\u003eTriticum urartu\u003c/em\u003e genomes (KR024657.1, KM010188.1, KR024658.1, KM085241.1, KM085237.1, and KM085243.1) (Table S9). Strangely, among the search results, the highest similarity of this gene to an LMW-GS gene in the \u003cem\u003eTriticum dicoccoides\u003c/em\u003e genome is only 88.05% (XM_037590130.1) (Table S9). This discrepancy could be attributed to the relatively limited number of LMW-GS genes cloned and sequenced from the \u003cem\u003eTriticum dicoccoides\u003c/em\u003e genome, resulting in fewer entries in the relevant databases. Alternatively, it also indicates that the \u003cem\u003eB1B3B6.D2\u003c/em\u003e gene in the Td genome has a complex origin, and its evolution may have involved horizontal gene transfer or gene introgression from related species (Kumar et al., 2019;Xiang et al., 2019).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn this study, a novel LMW-GS gene (\u003cem\u003eTd_B5.D5D8\u003c/em\u003e) with a normal ORF was identified in the 1B subgenome of Td. NCBI-BLASTN results indicated that a gene from GenBank, with the accession number XM_037591446.1 (from the \u003cem\u003eTriticum dicoccoides\u003c/em\u003e genome), exhibited 100% sequence identity and query coverage with \u003cem\u003eTd_B5.D5D8\u003c/em\u003e. However, it is annotated as a gamma-gliadin B-I-like gene. Local BLASTN analysis of the XM_037591446.1 sequence against gliadin genes in the CS and Aet genomes did not yield any matches. Subsequent NCBI-BLASTN searches of this sequence revealed matches exclusively with LMW-GS genes apart from its sequence. This observation casts doubt on the annotation of XM_037591446.1 in GenBank. Consequently, this gene was re-annotated as the LMW-GS gene for evolutionary analysis in this study. NCBI-BLASTN results for the \u003cem\u003eTd_B5.D5D8\u003c/em\u003e gene also showed high DNA sequence similarity with LMW-GS genes from various species of the \u003cem\u003eAegilops\u003c/em\u003e genus, including \u003cem\u003eAegilops sharonensis\u003c/em\u003e, \u003cem\u003eAegilops speltoides\u003c/em\u003e, \u003cem\u003eAegilops tauschii\u003c/em\u003e, \u003cem\u003eAegilops longissima\u003c/em\u003e, \u003cem\u003eAegilops bicornis\u003c/em\u003e, as well as with the B subgenome of wheat. Phylogenetic analysis of the top 12 NCBI-BLASTN hits with the highest bit scores (Fig. S14E) indicated that the evolutionary origin of \u003cem\u003eTd_B5.D5D8\u003c/em\u003e is closely related to LMW-GS genes in the \u003cem\u003eAegilops sharonensis\u0026nbsp;\u003c/em\u003eand \u003cem\u003eAegilops speltoides\u003c/em\u003e genomes. It also highlighted that many orthologous genes in the wheat genome have evolved into pseudogenes (Chen et al., 2008).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn this study, a new functional LMW-GS gene (\u003cem\u003eA5.B4\u003c/em\u003e) was identified in the TaJagg genome. Although the Begin_90bp and End_90bp of this gene align with the \u003cem\u003eA5\u003c/em\u003e (with a similarity of 93.33% and 6 base mismatches) and \u003cem\u003eB4\u0026nbsp;\u003c/em\u003e(with a similarity of 88.89% and 10 base mismatches) genes, respectively, it was mapped to the 1D chromosome. To verify the correctness of the gene identification and localization, phylogenetic analysis was conducted using the top 20 sequences from NCBI-BLASTN results (Table S8), along with \u003cem\u003eA5\u003c/em\u003e and \u003cem\u003eB4\u003c/em\u003e genes identified in this study from various materials, and the \u003cem\u003eA3-1xy54\u003c/em\u003e (\u003cem\u003eA5\u003c/em\u003e) gene from xy54. The results of the phylogenetic analysis revealed that the \u003cem\u003eA5.B4\u003c/em\u003e gene from TaJagg (\u003cem\u003eTajagg_1DgA5.B4p3104157s897bp\u003c/em\u003e) is more closely related to LMW-GS genes in the wheat D subgenome (Fig. S14C). The sequence similarity of this gene with corresponding LMW-GS genes in GenBank is 99.89% with common wheat (JF736507.1) and 87.73% with \u003cem\u003eAegilops tauschii\u0026nbsp;\u003c/em\u003e(JX828349.1), both of them with query coverage of 100% (Table S8). This confirms that the \u003cem\u003eA5.B4\u003c/em\u003e gene in the TaJagg genome indeed belongs to the LMW-GS genes on the 1D chromosome. NCBI-BLASTN results also showed that, based on currently available cloned LMW-GS genes worldwide, the \u003cem\u003eA5.B4\u003c/em\u003e gene appears to be rare in wheat (Table S8). However, it is possible that this gene is present in higher frequencies in different wheat genomes but has not yet been cloned or explored. The qRT-PCR results\u0026nbsp;in\u0026nbsp;Figs. 5C and M indicated that the \u003cem\u003eA5.B4\u003c/em\u003e gene is expressed normally in some wheat varieties. Therefore, further investigation is needed to assess the gene\u0026rsquo;s potential for practical applications in production.\u0026nbsp;\u003c/p\u003e"},{"header":"Discussion","content":"\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eComparison between LMWgsFinder and other methods for LMW-GS gene identification\u003c/h2\u003e \u003cp\u003eFor cases without reference genome sequences, traditional methods mostly use conservative or specific primers for PCR amplification combined with capillary electrophoresis or Sanger sequencing to identify LMW-GS genes in the genomes of different wheat varieties and their relatives (Zhao et al., \u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e2004\u003c/span\u003e;Zhang et al., \u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e2011\u003c/span\u003e;Zhang et al., \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). For the method of PCR amplification followed by Sanger sequencing, the complex DNA sequence composition and high similarity of specific regions among different LMW-GS genes inevitably introduce base mismatches during PCR amplification and bacterial cloning. Particularly when sequencing and comparing LMW-GS gene sequences across a large number of wheat varieties, this method results in relatively high costs and significant time consumption (Zhang et al., \u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). When studying the copy number of LMW-GS gene families using second-generation high-throughput sequencing, such as Illumina sequencing, the relatively short read lengths (generally 150-250bp) can lead to the assembly of different gene copies into chimeric sequences due to high sequence similarity among different copies (Hu et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). In contrast, PacBio sequencing, a third-generation sequencing technology, offers high throughput and long read lengths (for example, PacBio Sequel has an average read length of 8\u0026ndash;12 kb) (Ren et al., \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). It also allows for the pooling of multiple samples using barcode technology, significantly reducing sequencing costs. Furthermore, this technology does not exhibit GC bias and avoids the errors introduced by PCR amplification and bacterial cloning steps, making it an excellent choice for resolving the challenges of molecular-level identification of LMW-GS genes in wheat (Zhang et al., \u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e2014\u003c/span\u003e;Ren et al., \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eFor cases where reference genome sequences have not been assembled at the chromosome level but resequencing resources are available (Guo et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2020\u003c/span\u003e;Hao et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2020\u003c/span\u003e;Zhou et al., \u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e2020\u003c/span\u003e;Schulthess et al., \u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2022\u003c/span\u003e;Niu et al., \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), previous researchers have used the \u003cem\u003ek\u003c/em\u003e-mer-based pangenome analysis method to identify and predict the phenotype of wheat seed storage protein (SSP) genes using resequencing data, and have achieved good results (Zhang et al., \u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). However, this method groups all genes with SSP gene sequence similarities exceeding 99% together and selects the longest sequence from each group as a non-redundant representative SSP sequence for SSP gene identification. This approach may overlook newly arisen duplicated genes with sequence similarity above 99%, such as \u003cem\u003eTaKari_B6, TaFiel_B2\u003c/em\u003e, and \u003cem\u003eTaLanc_B2\u003c/em\u003e (Table \u003cspan refid=\"MOESM3\" class=\"InternalRef\"\u003eS3\u003c/span\u003e). Our research group is currently improving the applicability of LMWgsFinder for resequencing data. It is expected that the improved software will be able to identify newly arisen duplicated LMW-GS genes in resequenced wheat varieties recently completed, and will also obtain complete coding region sequences of some LMW-GS genes. However, the identification results may include some LMW-GS genes with gaps, which mainly depend on the quantity (genome coverage) and quality (the ratio of high-quality reads to total reads) of the resequencing data of the variety. Based on the characteristics of the LMW-GS gene sequence, it is speculated that the identification results contain genes with gaps, which are mainly located in the repeat region in the middle of the genes. This creates favorable conditions for gap-filling in the LMW-GS gene. We believe that the improved version of LMWgsFinder software will assist researchers in the field of wheat plants to fully leverage the extensive wheat genome resequencing data to advance research on LMW-GS gene identification and application. At the same time, it will also achieve the efficient use of a vast amount of wheat genome resequencing data.\u003c/p\u003e \u003cp\u003eIn cases where reference genome sequences are available, which have been assembled at the chromosome level (Walkowiak et al., \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2020\u003c/span\u003e;Sato et al., \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2021\u003c/span\u003e;Shi et al., \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e2022\u003c/span\u003e;Jia et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), using LMWgsFinder for identifying LMW-GS genes in different wheat genomes proves to be more comprehensive, faster, and more reliable compared to direct BLASTN search with known LMW-GS gene sequences (such as those from the published CS and xy54 used in this study). Statistics (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e and S10) show that, in this study, only 98 LMW-GS genes (38.13%, 98/257) were detected using BLASTN search when excluding the genes identified in TaCS. This indicates that, except for the TaCS, the annotation of the LMW-GS gene coding sequence in the corresponding genomes by the other 18 different genome sequencing research groups is not complete enough (Table \u003cspan refid=\"MOESM10\" class=\"InternalRef\"\u003eS10\u003c/span\u003e, Table \u003cspan refid=\"MOESM12\" class=\"InternalRef\"\u003eS12\u003c/span\u003e). This is not only associated with the different focuses of the wheat whole genome sequencing research group, but also with the intricate sequence composition and the presence of multiple copies of the LMW-GS family genes (Huo et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIn addition to using BLASTN to compare and identify LMW-GS genes based on annotated CDSs from the target genome, an alternative method involves using reference LMW-GS genes as queries to perform BLASTN analysis on the whole genomic DNA sequences of the target species (Dong et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2010\u003c/span\u003e;Huo et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). This approach is feasible for identifying LMW-GS genes in the target genome that are highly similar to the reference gene sequences and do not contain gaps (Table \u003cspan refid=\"MOESM14\" class=\"InternalRef\"\u003eS14\u003c/span\u003e). For example, in the identification results of BLASTN using the LMW-GS gene CDSs of CS and xy54 as query against whole-genome DNA sequence, only one out of 27 gap-containing LMW-GS genes was accurately identified, accounting for merely 3.7% (1/27) (Table \u003cspan refid=\"MOESM14\" class=\"InternalRef\"\u003eS14\u003c/span\u003e). Moreover, this method often fails to capture the complete coding sequences of LMW-GS genes accurately and requires extensive manual checking and verification, which significantly reduces research efficiency (Table \u003cspan refid=\"MOESM14\" class=\"InternalRef\"\u003eS14\u003c/span\u003e). In today\u0026rsquo;s era, where DNA sequencing costs are decreasing, an increasing number of wheat varieties have completed whole-genome sequencing and assembly (Walkowiak et al., \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2020\u003c/span\u003e;Sato et al., \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2021\u003c/span\u003e;Shi et al., \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e2022\u003c/span\u003e;Jia et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Thus, the development of the LMWgsFinder, which can rapidly, effectively, and precisely identify LMW-GS genes in the entire wheat genome, is of great significance for advancing our understanding of the molecular mechanisms underlying the superior end-use quality possessed by certain wheat cultivars. It also provides a new and useful tool for genetic improvement and molecular design breeding of wheat quality.\u003c/p\u003e \u003cp\u003eHowever, LMWgsFinder has inherent limitations. The software\u0026rsquo;s reidentification strategy depends on sequence conservation within the N-terminal 90 bp (Begin_90bp) and C-terminal 90 bp (End_90bp) regions of the LMW-GS gene CDS. Consequently, if an LMW-GS gene in the target genome lacks either the N-terminal or C-terminal 90 bp region, LMWgsFinder fails to annotate it. For example, the LMW-GS_\u003cem\u003eB2\u003c/em\u003e gene exists in the genome of wheat variety Stanley, but it cannot be annotated properly, because of a gap presented in the starting region of this gene (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eC). This renders the tool ineffective for identifying incomplete LMW-GS genes in the wheat genome. To address this issue, future versions could integrate alternative algorithms (e.g., homology-based gap filling) to improve robustness.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003eSpecies-specificity and functional gain/loss variation of LMW-GS genes\u003c/h2\u003e \u003cp\u003eThe LMWgsFinder was also used to identify the LMW-GS genes in the genomes of \u003cem\u003eBrachypodium distachyon\u003c/em\u003e, barley, rice, \u003cem\u003eSetaria italica\u003c/em\u003e, sorghum, and corn. The results showed that the LMW-GS genes are not present in these genomes, even though the rigor of the search was reduced (identity, 60%; e-value, 1e-5). This finding is consistent with the fact that, among all major cereal crops, only common wheat can be used to make noodles, steamed buns, dumplings, bread, biscuits, and other gluten-containing foods, as other cereals lack gluten proteins (Singh et al., \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). However, unlike this study, Wang et al. (\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e2012\u003c/span\u003e) used proteomics and molecular genetics techniques to investigate the grain glutenin proteins of \u003cem\u003eBrachypodium distachyon\u003c/em\u003e L. and found that this plant contains 4\u0026ndash;5 copies of LMW-GS genes. It is indicated that the LMW-GS gene sequence in \u003cem\u003eBrachypodium\u003c/em\u003e may not be consistent with the known LMW-GS gene coding sequence characteristics in wheat. At least their Begin_90bp and End_90bp sequences differ to some extent (sequence consistency is less than 60%), or their coding sequence length exceeds 2 kb. The specific reasons need further exploration (Wang et al., \u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e2012\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThrough microcollinearity analysis of the \u003cem\u003eD5\u003c/em\u003e gene homologous regions of 12 representative species in the grass family (Y. Chen et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2020\u003c/span\u003e), it was found that the five LMW-GS genes \u003cem\u003eD3, D4, D5, D6\u003c/em\u003e, and \u003cem\u003eD7\u003c/em\u003e in wheat have corresponding orthologous genes only in the \u003cem\u003eThinopyrum elongatum\u003c/em\u003e genome, and only one (\u003cem\u003eTel1E01G022700\u003c/em\u003e) (Fig. \u003cspan refid=\"MOESM17\" class=\"InternalRef\"\u003eS17\u003c/span\u003e). Based on the materials examined in this study, it is speculated that the LMW-GS gene meeting the identification criteria of this study is present only in several genera of Triticeae, such as the \u003cem\u003eTriticum\u003c/em\u003e, \u003cem\u003eAegilops\u003c/em\u003e, \u003cem\u003ePsathyrostachys\u003c/em\u003e, \u003cem\u003eDasypyrum\u003c/em\u003e, and \u003cem\u003eElytrigia\u003c/em\u003e. (Table \u003cspan refid=\"MOESM9\" class=\"InternalRef\"\u003eS9\u003c/span\u003e, Figs. S16 and S17). The genomes of other Triticeae species may not contain LMW-GS genes with more than 60% similarity to the Begin-90bp and End-90bp sequences of the corresponding LMW-GS genes found in wheat. This is consistent with the results identified by the author using LMWgsFinder in the whole genome sequences of these representative species.\u003c/p\u003e \u003cp\u003eTo address the growing demand for high-quality wheat, it is essential to conduct fundamental research on end-use quality-associated genes (e.g., LMW-GS genes) in wheat and its relatives. Such research will advance our understanding of wheat quality traits and facilitate molecular breeding strategies, including marker-assisted selection and genomic design, to improve agronomic and processing properties (Ge et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Accurate identification and acquisition of the coding sequences of these relevant genes are essential for conducting gene function studies using techniques such as gene editing and genetic transformation (Tyler et al., \u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e2015\u003c/span\u003e;Qu et al., \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eFor the cases of rare structural variations associated with the loss or acquisition of LMW-GS gene function, such as the \u003cem\u003eA3\u003c/em\u003e and \u003cem\u003eB5\u003c/em\u003e genes identified in this research, it cannot be ruled out that inaccuracies in the assembly of the \u003cem\u003eGlu3/Gli1\u003c/em\u003e homologous regions in some genomes might have led to inaccurate identification of LMW-GS genes. With the aid of technologies such as PacBio, Hi-C, Omini-C, and BioNano, the quality of wheat genome sequencing and assembly completed today has significantly improved, with scaffold N50 generally above 20 Mb (Shi et al., \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e2022\u003c/span\u003e;Jia et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Therefore, the probability of DNA fragment misplacement in genome sequencing and assembly completed in recent years is relatively low (Keeble-Gagnere et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). In addition, previous studies have documented the formation of chimeric LMW-GS genes through illegitimate recombination between i-type and m-type genes or between s-type and m-type genes (Hu et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Similar findings were observed in this study, where 21 new LMW-GS genes were identified in 20 genomes, with their Begin_90bp and End_90bp sequences derived from different LMW-GS genes.\u003c/p\u003e \u003cp\u003e \u003cb\u003eCopy number variation of LMW genes and evolution of the\u003c/b\u003e \u003cb\u003eGlu3/Gli1\u003c/b\u003e \u003cb\u003ehomologous region\u003c/b\u003e\u003c/p\u003e \u003cp\u003eCopy number variation (CNV) is a common phenomenon in the biological world, closely related to evolutionary processes and environmental adaptation (Walkowiak et al., \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). It is also a major contributor to phenotypic diversity and is speculated to be caused by chromosomal rearrangement, transposable elements (TEs) activity, or polyploidy (Bariah et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). CNV may have significant biological implications for species-specific genomic composition, evolutionary processes, and phylogenetics, as well as the expression and regulation of genes in specific genomic regions (Bariah et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThis study found significant CNVs of LMW-GS genes in the genomes of different wheat varieties and their closely related species (Table \u003cspan refid=\"MOESM11\" class=\"InternalRef\"\u003eS11\u003c/span\u003e). Based on the materials involved in this study, the copy number of LMW-GS genes in the A and B subgenomes ranged from a maximum of 6 to a minimum of 2 (excluding varieties with 1BL/1RS translocation backgrounds in the B subgenome); in the D subgenome, the number ranged from a maximum of 9 to a minimum of 6; from the perspective of the entire genome, the copy number distribution range of LMW-GS genes varies from 13 to 20(Table \u003cspan refid=\"MOESM11\" class=\"InternalRef\"\u003eS11\u003c/span\u003e). This result is consistent with the previous research conclusion of 10\u0026ndash;20 copies, but differs significantly from the previous research results of 30\u0026ndash;40 copies (Huang and Cloutier, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2008\u003c/span\u003e;Zhang et al., \u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e2011\u003c/span\u003e;Zhang et al., \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2013\u003c/span\u003e;Al-Khayri et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). The study inferring the latter conclusion may involve some undetected gene duplication or triplication in the wheat genome at that time, as well as heterozygous sites of LMW-GS gene families possibly present in the genome. For example, Ikeda et al. identified 12 LMW-GS genes (3, 2, and 7 in the A, B, and D subgenomes, respectively) in Norin61 by analyzing PCR products amplified with LMW-GS gene-specific primers (Ikeda et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2002\u003c/span\u003e). However, 20 copies of LMWGS genes were identified in the Norin61 genome in this study, where \u003cem\u003eB5\u003c/em\u003e and \u003cem\u003eB6\u003c/em\u003e underwent duplication and triplication, respectively.\u003c/p\u003e \u003cp\u003eThe causes of CNV in LMW-GS genes may be related to homologous recombination and chromosome rearrangements, such as inversions and translocations following polyploid formation (Bariah et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). These evolutionary events are closely associated with phenomena such as gene duplication, gene triplication, and gene loss. Additionally, the presence of numerous TEs in homologous regions is also related, as TE-mediated ectopic recombination can lead to dramatic chromosomal rearrangements and gene copy number variation (Bonchev and Willi, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). This study identified that in three of the 20 genomes analyzed, large segmental inversions/translocations occurred in the D sub-genome\u0026rsquo;s \u003cem\u003eGlu3/Gli1\u003c/em\u003e homologous regions, such as from \u003cem\u003eD1/D10\u003c/em\u003e to \u003cem\u003eD8\u003c/em\u003e in TaLanc and TaMatt, and from \u003cem\u003eD10\u003c/em\u003e to \u003cem\u003eD5\u003c/em\u003e in TaMace (Figs. S13 and S15). At least four varieties (TaJagg, TaLanc, TaMatt, Tu) also exhibited large segmental inversions in the A sub-genome\u0026rsquo;s \u003cem\u003eGlu3/Gli1\u003c/em\u003e homologous regions (Fig. \u003cspan refid=\"MOESM13\" class=\"InternalRef\"\u003eS13\u003c/span\u003e). In addition, structural variations such as DNA fragment insertions or deletions were observed in some wheat genomes (Fig. \u003cspan refid=\"MOESM15\" class=\"InternalRef\"\u003eS15\u003c/span\u003e). Microcollinearity analysis results indicated that compared with plants in the Ehrhartoideae (rice) and Panicoideae (sorghum, corn, etc.) subfamilies, the \u003cem\u003eGlu3/Gli1\u003c/em\u003e homologous region of Triticeae plants in the Pooideae subfamily have undergone rapid evolution (Gao et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2007\u003c/span\u003e;Dong et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2016\u003c/span\u003e;Huo et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e) (Fig. \u003cspan refid=\"MOESM17\" class=\"InternalRef\"\u003eS17\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThis study found that LMW-GS genes exhibit gene duplication and triplication in multiple wheat genomes. TaNori and TaKari, with \u003cem\u003eB6\u003c/em\u003e and \u003cem\u003eA3\u003c/em\u003e genes respectively, experienced LMW-GS gene tripling (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e and S3). In addition, seven LMW-GS genes (\u003cem\u003eB2, B5, B6, D6, D8, A2.A1\u003c/em\u003e, and \u003cem\u003eA2.A6\u003c/em\u003e) were identified with gene duplication in eight genomes (Ts, TaAK58, TaFiel, TaJagg, TaLanc, TaNori, TaKari, and Tu) (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e and S3). Of particular note, both TaNori and TaKari genomes exhibited not only gene triplication but also gene duplication events (TaNori, \u003cem\u003eB6, B5\u003c/em\u003e and TaKari, \u003cem\u003eA3\u003c/em\u003e, \u003cem\u003eB6\u003c/em\u003e) (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e and S3). Additionally, gene duplication was detected in two genes (TaLanc\u0026rsquo;s \u003cem\u003eB2\u003c/em\u003e and \u003cem\u003eD6\u003c/em\u003e, Tu\u0026rsquo;s \u003cem\u003eA2.A1\u003c/em\u003e and \u003cem\u003eA2.A6\u003c/em\u003e) in the TaLanc and Tu genomes. Despite the analysis of LMW-GS gene triplication and gene duplication mentioned above involving 5 genes that could not be localized to chromosomes (\u003cem\u003eTaAK58_D8, TaLanc_D6, TaNori_B6, TaNori_B5, Tu_A2.A1\u003c/em\u003e), these genes are objectively present. The inability to localize them to chromosomes may be due to the complex DNA composition of the \u003cem\u003eGlu3/Gli1\u003c/em\u003e homologous regions in the corresponding genomes, such as the presence of numerous TEs or high segmental GC content (Gao et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2007\u003c/span\u003e;Huo et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). The \u003cem\u003eD8\u003c/em\u003e and \u003cem\u003eB5\u003c/em\u003e genes in the CS genome were initially not localized to the corresponding chromosomes, but with the aid of various technologies including BioNano, these genes were accurately positioned on specific chromosomes (Zhu et al., \u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). It was found that these two genes were located far from the main clusters of LMW-GS genes on the chromosomes (distance from B4 to B5 is 16,132.4 kb, and from D7 to D8 is 1,783.0 kb) (Fig. \u003cspan refid=\"MOESM13\" class=\"InternalRef\"\u003eS13\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe clustering of LMW-GS gene duplication events in specific wheat lineages may be attributed to the following mechanisms: (1) Chromosomal Structural Variations: Structural variations (e.g., tandem duplications, inversions) in the genomes of different hexaploid wheat cultivars may increase the probability of homologous recombination, promoting the spread of gene duplications within subpopulations. (2) Artificial Selection: LMW-GS genes encode glutenin subunits critical for dough elasticity and processing quality. In specific lineages, the clustering of duplications may be driven by artificial selection pressure (e.g., preferential retention of alleles linked to superior gluten traits during breeding), leading to the fixation and enrichment of certain duplicated variants. (3) Lineage-Specific Transposon Dynamics: Transposon activity may vary across wheat lineages. In some cultivated varieties, long-term domestication or breeding selection may weaken transposon silencing mechanisms, resulting in increased duplication frequency at specific genomic regions (e.g., \u003cem\u003eGlu-3\u003c/em\u003e loci). To infer the mechanism of gene duplication, sequence comparison analysis of the 1Mb homologous region of genomic DNA harboring \u003cem\u003eA3\u003c/em\u003e, \u003cem\u003eB2\u003c/em\u003e, \u003cem\u003eB6\u003c/em\u003e genes with chromosomal localization information was conducted, and it can be inferred that EnSpm-like transposon activity may mediate the duplication of LMW-GS genes in the wheat genomes (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB, C, and D). These data support that sequence variations, including gene duplication, deletion, inversion, and translocation, may occur frequently and independently in the \u003cem\u003eGlu-3\u003c/em\u003e homologous region evolution of different wheat genomes (Walkowiak et al., \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, Figs. S13 and S15).\u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003eFunding\u003c/p\u003e\n\u003cp\u003eThis work was funded by the National Natural Science Foundation of China (Grants 31571667 and U1204315) and was also partially supported by the Key Scientific and Technological Projects in Henan Province (222102110376 and 242102111156).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAuthor contributions\u003c/p\u003e\n\u003cp\u003eSZ conceived this study. XS and TL participated in the qRT-PCR data collection and gene structural validation. SZ, XS, YW, DX, HG, YG, JZ, HS, and DL analyzed the data, and prepared figures and/or tables. XS, TL, and YF prepared the mRNA and cDNA samples. SZ wrote the manuscript. HH, ZR, and YG reviewed the drafts of the paper. All authors have read and approved the final manuscript.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAcknowledgments\u003c/p\u003e\n\u003cp\u003eThe authors would like to thank Professor Daowen Wang and Xiuying Kong for their careful review of the manuscript and valuable comments. We thanks to Professor Mingcheng Luo\u0026rsquo;s group (Department of Plant Sciences, University of California, Davis, CA 95616, U.S.A.) for their helps on the BioNano genome map alignments with the sequences of genomic regions harboring the LMW-GS genes of wheat 1A and 1B subgenome. The authors also wish to thank Professor Jizeng Jia, Lifeng Gao, Chenyang Hao, Guangyao Zhao, Lingli Dong, and Dongcheng Liu for their assistance in collecting wheat cultivars and for their professional advice and opinions related to this article.\u0026nbsp;We also appreciate all the people whose data were used in this study. Costs for open access publishing were funded\u0026nbsp;by the National Natural Science Foundation of China (Grants 31571667 and U1204315) and was also partially supported by the Key Scientific and Technological Projects in Henan Province (222102110376 and 242102111156). \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSupporting Information\u003c/p\u003e\n\u003cp\u003eAdditional Supporting Information may be found in the online version of this article. Supporting figures: Figure S1 ~ S20. Supporting tables: Table S1 ~ S14. Text-based Supporting Information: Appendix S1.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eData availability statement\u003c/p\u003e\n\u003cp\u003eThe original contributions presented in this study are included in the article/Supplementary material; further inquiries can be directed to the corresponding authors.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eCompeting interests\u003c/p\u003e\n\u003cp\u003eThe authors have declared that no competing interests exist.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAkhunov ED, Sehgal S, Liang H, Wang S, Akhunova AR, Kaur G et al (2013) Comparative analysis of syntenic genes in grass genomes reveals accelerated rates of gene structure and coding sequence evolution in polyploid wheat. Plant Physiol 161:252\u0026ndash;265. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1104/pp.112.205161\u003c/span\u003e\u003cspan address=\"10.1104/pp.112.205161\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAkpinar BA, Biyiklioglu S, Alptekin B, Havrankova M, Vrana J, Dolezel J et al (2018) Chromosome-based survey sequencing reveals the genome organization of wild wheat progenitor Triticum dicoccoides. Plant Biotechnol J 16:2077\u0026ndash;2087. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/pbi.12940\u003c/span\u003e\u003cspan address=\"10.1111/pbi.12940\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAl-Khayri JM, Alshegaihi RM, Mahgoub EI, Mansour E, Atallah OO, Sattar MN et al (2023) Association of High and Low Molecular Weight Glutenin Subunits with Gluten Strength in Tetraploid Durum Wheat (Triticum turgidum spp. Durum L). Plants (Basel) 12. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/plants12061416\u003c/span\u003e\u003cspan address=\"10.3390/plants12061416\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAn X, Zhang Q, Yan Y, Li Q, Zhang Y, Wang A et al (2006) Cloning and molecular characterization of three novel LMW-i glutenin subunit genes from cultivated einkorn (Triticum monococcum L). Theor Appl Genet 113:383\u0026ndash;395. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00122-006-0299-x\u003c/span\u003e\u003cspan address=\"10.1007/s00122-006-0299-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBariah I, Keidar-Friedman D, Kashkush K (2020) Identification and characterization of large-scale genomic rearrangements during wheat evolution. PLoS ONE 15:e0231323. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.pone.0231323\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0231323\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBlatter RH, Jacomet S, Schlumbaum A (2004) About the origin of European spelt (Triticum spelta L.): allelic differentiation of the HMW Glutenin B1-1 and A1-2 subunit genes. Theor Appl Genet 108:360\u0026ndash;367. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00122-003-1441-7\u003c/span\u003e\u003cspan address=\"10.1007/s00122-003-1441-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBonchev G, Willi Y (2018) Accumulation of transposable elements in selfing populations of Arabidopsis lyrata supports the ectopic recombination model of transposon evolution. New Phytol 219:767\u0026ndash;778. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/nph.15201\u003c/span\u003e\u003cspan address=\"10.1111/nph.15201\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCarlos Guzm\u0026aacute;n MI, Ibba JB, lvarez (2022) Mike Sissons, and Morris., C. Wheat Quality, in \u003cem\u003eWheat Improvement\u003c/em\u003e, eds. M.P. Reynolds \u0026amp; H.J. Braun. Springer International Publishing), 177\u0026ndash;193\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCavalet-Giorsa E, Gonzalez-Munoz A, Athiyannan N, Holden S, Salhi A, Gardener C et al (2024) Origin and evolution of the bread wheat D genome. Nature. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-024-07808-z\u003c/span\u003e\u003cspan address=\"10.1038/s41586-024-07808-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen C, Chen H, Zhang Y, Thomas HR, Frank MH, He Y et al (2020) TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol Plant 13:1194\u0026ndash;1202. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2020.06.009\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2020.06.009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen F, Zhao F, Xu C, Xia G (2008) Molecular characterization of LMW-GS genes from a somatic hybrid introgression line II-12 between Triticum aestivum and Agropyron elongatum in relation to quick evolution. J Genet Genomics 35:743\u0026ndash;749. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/S1673-8527(08)60230-1\u003c/span\u003e\u003cspan address=\"10.1016/S1673-8527(08)60230-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen Q, Kang HY, Fan X, Wang Y, Sha LN, Zhang HQ et al (2013) Evolutionary history of Triticum petropavlovskyi Udacz. et Migusch. inferred from the sequences of the 3-phosphoglycerate kinase gene. PLoS ONE 8:e71139. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.pone.0071139\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0071139\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen Y, Song W, Xie X, Wang Z, Guan P, Peng H et al (2020) A Collinearity-Incorporating Homology Inference Strategy for Connecting Emerging Assemblies in the Triticeae Tribe as a Pilot Practice in the Plant Pangenomic Era. Mol Plant 13:1694\u0026ndash;1708. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2020.09.019\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2020.09.019\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChoulet F, Wicker T, Rustenholz C, Paux E, Salse J, Leroy P et al (2010) Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and transposable element spaces. Plant Cell 22:1686\u0026ndash;1701. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1105/tpc.110.074187\u003c/span\u003e\u003cspan address=\"10.1105/tpc.110.074187\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eClarke JL, Qiu Y, Schnable JC (2022) Experimental Design for Controlled Environment High-Throughput Plant Phenotyping. Methods Mol Biol 2539:57\u0026ndash;68. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/978-1-0716-2537-8_7\u003c/span\u003e\u003cspan address=\"10.1007/978-1-0716-2537-8_7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDesjardins SD, Ogle DE, Ayoub MA, Heckmann S, Henderson IR, Edwards KJ et al (2020) MutS homologue 4 and MutS homologue 5 Maintain the Obligate Crossover in Wheat Despite Stepwise Gene Loss following Polyploidization. Plant Physiol 183:1545\u0026ndash;1558. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1104/pp.20.00534\u003c/span\u003e\u003cspan address=\"10.1104/pp.20.00534\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDizlek H, Awika JM (2023) Determination of basic criteria that influence the functionality of gluten protein fractions and gluten complex on roll bread characteristics. Food Chem 404:134648. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.foodchem.2022.134648\u003c/span\u003e\u003cspan address=\"10.1016/j.foodchem.2022.134648\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDong L, Huo N, Wang Y, Deal K, Wang D, Hu T et al (2016) Rapid evolutionary dynamics in a 2.8-Mb chromosomal region containing multiple prolamin and resistance gene families in Aegilops tauschii. Plant J 87:495\u0026ndash;506\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDong L, Zhang X, Liu D, Fan H, Sun J, Zhang Z et al (2010) New insights into the organization, recombination, expression and functional mechanism of low molecular weight glutenin subunit genes in bread wheat. PLoS ONE 5:e13548\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDu X, Wei J, Luo X, Liu Z, Qian Y, Zhu B et al (2020) Low-molecular-weight glutenin subunit LMW-N13 improves dough quality of transgenic wheat. Food Chem 327:127048. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.foodchem.2020.127048\u003c/span\u003e\u003cspan address=\"10.1016/j.foodchem.2020.127048\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFeldman M, Levy AA (2012) Genome evolution due to allopolyploidization in wheat. Genetics 192:763\u0026ndash;774. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1534/genetics.112.146316\u003c/span\u003e\u003cspan address=\"10.1534/genetics.112.146316\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGao S, Gu YQ, Wu J, Coleman-Derr D, Huo N, Crossman C et al (2007) Rapid evolution and complex structural organization in genomic regions harboring multiple prolamin genes in the polyploid wheat genome. Plant Mol Biol 65:189\u0026ndash;203\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGe W, Gao Y, Xu S, Ma X, Wang H, Kong L et al (2021) Genome-wide identification, characteristics and expression of the prolamin genes in Thinopyrum elongatum. BMC Genomics 22:864. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s12864-021-08088-x\u003c/span\u003e\u003cspan address=\"10.1186/s12864-021-08088-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuo W, Xin M, Wang Z, Yao Y, Hu Z, Song W et al (2020) Origin and adaptation to high altitude of Tibetan semi-wild wheat. Nat Commun 11:5085. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41467-020-18738-5\u003c/span\u003e\u003cspan address=\"10.1038/s41467-020-18738-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHao C, Jiao C, Hou J, Li T, Liu H, Wang Y et al (2020) Resequencing of 145 Landmark Cultivars Reveals Asymmetric Sub-genome Selection and Strong Founder Genotype Effects on Wheat Breeding in China. Mol Plant 13:1733\u0026ndash;1751. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2020.09.001\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2020.09.001\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHe Z, Zhang H, Gao S, Lercher MJ, Chen WH, Hu S (2016) Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees. Nucleic Acids Res 44:W236\u0026ndash;241. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/nar/gkw370\u003c/span\u003e\u003cspan address=\"10.1093/nar/gkw370\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu X, Dai S, Yan Y, Liu Y, Zhang J, Lu Z et al (2020) The genetic diversity of group-1 homoeologs and characterization of novel LMW-GS genes from Chinese Xinjiang winter wheat landraces (Triticum aestivum L). J Appl Genet 61:379\u0026ndash;389. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s13353-020-00564-6\u003c/span\u003e\u003cspan address=\"10.1007/s13353-020-00564-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang XQ, Cloutier S (2008) Molecular characterization and genomic organization of low molecular weight glutenin subunit genes at the Glu-3 loci in hexaploid wheat (Triticum aestivum L). Theor Appl Genet 116:953\u0026ndash;966. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00122-008-0727-1\u003c/span\u003e\u003cspan address=\"10.1007/s00122-008-0727-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuo N, Dong L, Zhang S, Wang Y, Zhu T, Mohr T et al (2017) New insights into structural organization and gene duplication in a 1.75-Mb genomic region harboring the alpha-gliadin gene family in Aegilops tauschii, the source of wheat D genome. Plant J 92:571\u0026ndash;583. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/tpj.13675\u003c/span\u003e\u003cspan address=\"10.1111/tpj.13675\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuo N, Zhang S, Zhu T, Dong L, Wang Y, Mohr T et al (2018) Gene Duplication and Evolution Dynamics in the Homeologous Regions Harboring Multiple Prolamin and Resistance Gene Families in Hexaploid Wheat. Front Plant Sci 9:673. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fpls.2018.00673\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2018.00673\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIkeda TM, Nagamine T, Fukuoka H, Yano H (2002) Identification of new low-molecular-weight glutenin subunit genes in wheat. Theor Appl Genet 104:680\u0026ndash;687. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s001220100756\u003c/span\u003e\u003cspan address=\"10.1007/s001220100756\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIwgsc (2014) A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345:1251788. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1126/science.1251788\u003c/span\u003e\u003cspan address=\"10.1126/science.1251788\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJia J, Zhao G, Li D, Wang K, Kong C, Deng P et al (2023) Genome resources for the elite bread wheat cultivar Aikang 58 and mining of elite homeologous haplotypes for accelerating wheat improvement. Mol Plant 16:1893\u0026ndash;1910. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2023.10.015\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2023.10.015\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJia J, Zhao S, Kong X, Li Y, Zhao G, He W et al (2013) Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature 496:91\u0026ndash;95. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/nature12028\u003c/span\u003e\u003cspan address=\"10.1038/nature12028\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKeeble-Gagnere G, Rigault P, Tibbits J, Pasam R, Hayden M, Forrest K et al (2018) Optical and physical mapping with local finishing enables megabase-scale resolution of agronomically important regions in the wheat genome. Genome Biol 19:112. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13059-018-1475-4\u003c/span\u003e\u003cspan address=\"10.1186/s13059-018-1475-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKumar A, Kapoor P, Chunduri V, Sharma S, Garg M (2019) Potential of Aegilops sp. for Improvement of Grain Processing and Nutritional Quality in Wheat (Triticum aestivum). Front Plant Sci 10:308. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fpls.2019.00308\u003c/span\u003e\u003cspan address=\"10.3389/fpls.2019.00308\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee JY, Beom HR, Altenbach SB, Lim SH, Kim YT, Kang CS et al (2016) Comprehensive identification of LMW-GS genes and their protein products in a common wheat variety. Funct Integr Genomics 16:269\u0026ndash;279. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s10142-016-0482-3\u003c/span\u003e\u003cspan address=\"10.1007/s10142-016-0482-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi X, Ma W, Gao L, Zhang Y, Wang A, Ji K et al (2008) A novel chimeric low-molecular-weight glutenin subunit gene from the wild relatives of wheat Aegilops kotschyi and Ae. juvenalis: evolution at the Glu-3 loci. Genetics 180:93\u0026ndash;101. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1534/genetics.108.092403\u003c/span\u003e\u003cspan address=\"10.1534/genetics.108.092403\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi XH, Wang K, Wang SL, Gao LY, Xie XX, Hsam SL et al (2010) Molecular characterization and comparative transcriptional analysis of LMW-m-type genes from wheat (Triticum aestivum L.) and Aegilops species. Theor Appl Genet 121:845\u0026ndash;856. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00122-010-1354-1\u003c/span\u003e\u003cspan address=\"10.1007/s00122-010-1354-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi Y, Fu J, Shen Q, Yang D (2020) High-Molecular-Weight Glutenin Subunits: Genetics, Structures, and Relation to End Use Qualities. Int J Mol Sci 22. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/ijms22010184\u003c/span\u003e\u003cspan address=\"10.3390/ijms22010184\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu L, Ikeda TM, Branlard G, Pena RJ, Rogers WJ, Lerner SE et al (2010) Comparison of low molecular weight glutenin subunits identified by SDS-PAGE, 2-DE, MALDI-TOF-MS and PCR in common wheat. BMC Plant Biol 10:124. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/1471-2229-10-124\u003c/span\u003e\u003cspan address=\"10.1186/1471-2229-10-124\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLivak KJ, Schmittgen TD (2001) Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 25:402\u0026ndash;408. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1006/meth.2001.1262\u003c/span\u003e\u003cspan address=\"10.1006/meth.2001.1262\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLombardi A, Barbante A, Cristina PD, Rosiello D, Castellazzi CL, Sbano L et al (2009) A relaxed specificity in interchain disulfide bond formation characterizes the assembly of a low-molecular-weight glutenin subunit in the endoplasmic reticulum. Plant Physiol 149:412\u0026ndash;423. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1104/pp.108.127761\u003c/span\u003e\u003cspan address=\"10.1104/pp.108.127761\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMa S, Wang M, Wu J, Guo W, Chen Y, Li G et al (2021) WheatOmics: A platform combining multiple omics data to accelerate functional genomics studies in wheat. Mol Plant 14:1965\u0026ndash;1968. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2021.10.006\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2021.10.006\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMadeira F, Madhusoodanan N, Lee J, Eusebi A, Niewielska A, Tivey ARN et al (2024) The EMBL-EBI Job Dispatcher sequence analysis tools framework in 2024. Nucleic Acids Res 52:W521\u0026ndash;W525. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/nar/gkae241\u003c/span\u003e\u003cspan address=\"10.1093/nar/gkae241\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMatsuoka Y (2011) Evolution of polyploid triticum wheats under cultivation: the role of domestication, natural hybridization and allopolyploid speciation in their diversification. Plant Cell Physiol 52:750\u0026ndash;764. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/pcp/pcr018\u003c/span\u003e\u003cspan address=\"10.1093/pcp/pcr018\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMorgulis A, Coulouris G, Raytselis Y, Madden TL, Agarwala R, Schaffer AA (2008) Database indexing for production MegaBLAST searches. Bioinformatics 24:1757\u0026ndash;1764. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/bioinformatics/btn322\u003c/span\u003e\u003cspan address=\"10.1093/bioinformatics/btn322\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNiu J, Ma S, Zheng S, Zhang C, Lu Y, Si Y et al (2023) Whole-genome sequencing of diverse wheat accessions uncovers genetic changes during modern breeding in China and the United States. Plant Cell 35:4199\u0026ndash;4216. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/plcell/koad229\u003c/span\u003e\u003cspan address=\"10.1093/plcell/koad229\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePeng Y, Yu Z, Islam S, Zhang Y, Wang X, Lei Z et al (2016) Allelic variation of LMW-GS composition in Chinese wheat landraces of the Yangtze-River region detected by MALDI-TOF-MS. Breed Sci 66:646\u0026ndash;652\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePfeifer M, Kugler KG, Sandve SR, Zhan B, Rudi H, Hvidsten TR et al (2014) Genome interplay in the grain transcriptome of hexaploid bread wheat. Science 345:1250091. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1126/science.1250091\u003c/span\u003e\u003cspan address=\"10.1126/science.1250091\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQin L, Liang Y, Yang D, Sun L, Xia G, Liu S (2015) Novel LMW glutenin subunit genes from wild emmer wheat (Triticum turgidum ssp. dicoccoides) in relation to Glu-3 evolution. Dev Genes Evol 225:31\u0026ndash;37. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00427-014-0484-x\u003c/span\u003e\u003cspan address=\"10.1007/s00427-014-0484-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQu G, Wang K, Mu J, Zhuo J, Wang X, Li S et al (2023) Identifying cis-Acting Elements Associated with the High Activity and Endosperm Specificity of the Promoters of Genes Encoding Low-Molecular-Weight Glutenin Subunits in Common Wheat (Triticum aestivum). J Agric Food Chem. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1021/acs.jafc.3c04209\u003c/span\u003e\u003cspan address=\"10.1021/acs.jafc.3c04209\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRen J, Jiang Z, Li W, Kang X, Bai S, Yang L et al (2022) Characterization of Glutenin Genes in Bread Wheat by Third-Generation RNA Sequencing and the Development of a Glu-1Dx5 Marker Specific for the Extra Cysteine Residue. J Agric Food Chem 70:7211\u0026ndash;7219. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1021/acs.jafc.2c02050\u003c/span\u003e\u003cspan address=\"10.1021/acs.jafc.2c02050\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRenato DO, Stefania M (2004) The low-molecular-weight glutenin subunits of wheat gluten. J Cereal Sci 39:321\u0026ndash;339\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSanchez-Munoz R (2024) From the archives: Tales from evolution-inflorescence diversity, gene duplication, and chromatin-mediated gene regulation. Plant Cell 36:2048\u0026ndash;2050. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/plcell/koae092\u003c/span\u003e\u003cspan address=\"10.1093/plcell/koae092\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSato K, Abe F, Mascher M, Haberer G, Gundlach H, Spannagl M et al (2021) Chromosome-scale genome assembly of the transformation-amenable common wheat cultivar 'Fielder'. \u003cem\u003eDNA Res\u003c/em\u003e 28. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/dnares/dsab008\u003c/span\u003e\u003cspan address=\"10.1093/dnares/dsab008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchulthess AW, Kale SM, Liu F, Zhao Y, Philipp N, Rembe M et al (2022) Genomics-informed prebreeding unlocks the diversity in genebanks for wheat improvement. Nat Genet 54:1544\u0026ndash;1552. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41588-022-01189-7\u003c/span\u003e\u003cspan address=\"10.1038/s41588-022-01189-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShi X, Cui F, Han X, He Y, Zhao L, Zhang N et al (2022) Comparative genomic and transcriptomic analyses uncover the molecular basis of high nitrogen-use efficiency in the wheat cultivar Kenong 9204. Mol Plant 15:1440\u0026ndash;1456. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2022.07.008\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2022.07.008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSingh N, Shepherd K, Cornish G (1991) A simplified SDS-PAGE procedure for separating LMW subunits of glutenin. J Cereal Sci, 203\u0026ndash;208. doi\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSingh S, Sharma H, Ramankutty R, Ramaswamy S (2024) Review on Nutritional Potential of Underutilized Millets as a Miracle Grain. Curr Pharm Biotechnol 25:1082\u0026ndash;1098. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2174/0113892010248721230921093208\u003c/span\u003e\u003cspan address=\"10.2174/0113892010248721230921093208\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTamura K, Stecher G, Kumar S (2021) MEGA11: Molecular Evolutionary Genetics Analysis Version 11. Mol Biol Evol 38:3022\u0026ndash;3027. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/molbev/msab120\u003c/span\u003e\u003cspan address=\"10.1093/molbev/msab120\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThomas SK, Hoek KV, Ogoti T, Duong H, Angelovici R, Pires JC et al (2024) Halophytes and heavy metals: A multi-omics approach to understand the role of gene and genome duplication in the abiotic stress tolerance of Cakile maritima. \u003cem\u003eAm J Bot\u003c/em\u003e, e16310. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/ajb2.16310\u003c/span\u003e\u003cspan address=\"10.1002/ajb2.16310\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThompson JD, Gibson TJ, Higgins DG (2002) Multiple sequence alignment using ClustalW and ClustalX. \u003cem\u003eCurr Protoc Bioinformatics Chap. 2\u003c/em\u003e, Unit 2 3. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/0471250953.bi0203s00\u003c/span\u003e\u003cspan address=\"10.1002/0471250953.bi0203s00\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTyler AM, Bhandari DG, Poole M, Napier JA, Jones HD, Lu C et al (2015) Gluten quality of bread wheat is associated with activity of RabD GTPases. Plant Biotechnol J 13:163\u0026ndash;176. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/pbi.12231\u003c/span\u003e\u003cspan address=\"10.1111/pbi.12231\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWalkowiak S, Gao L, Monat C, Haberer G, Kassa MT, Brinton J et al (2020) Multiple wheat genomes reveal global variation in modern breeding. Nature 588:277\u0026ndash;283. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-020-2961-x\u003c/span\u003e\u003cspan address=\"10.1038/s41586-020-2961-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang K, Gao L, Wang S, Zhang Y, Li X, Zhang M et al (2011) Phylogenetic relationship of a new class of LMW-GS genes in the M genome of Aegilops comosa. Theor Appl Genet 122:1411\u0026ndash;1425. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00122-011-1541-8\u003c/span\u003e\u003cspan address=\"10.1007/s00122-011-1541-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang S, Wang K, Chen G, Lv D, Han X, Yu Z et al (2012) Molecular characterization of LMW-GS genes in Brachypodium distachyon L. reveals highly conserved Glu-3 loci in Triticum and related species. BMC Plant Biol 12:221. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/1471-2229-12-221\u003c/span\u003e\u003cspan address=\"10.1186/1471-2229-12-221\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang W, Wang Z, Li X, Ni Z, Hu Z, Xin M et al (2020) SnpHub: an easy-to-set-up web server framework for exploring large-scale genomic variation data in the post-genomic era with applications in wheat. Gigascience 9. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/gigascience/giaa060\u003c/span\u003e\u003cspan address=\"10.1093/gigascience/giaa060\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Y, Wang Z, Chen Y, Lan T, Wang X, Liu G et al (2024) Genomic insights into the origin and evolution of spelt (Triticum spelta L.) as a valuable gene pool for modern wheat breeding. Plant Commun 5:100883. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.xplc.2024.100883\u003c/span\u003e\u003cspan address=\"10.1016/j.xplc.2024.100883\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXiang L, Huang L, Gong F, Liu J, Wang Y, Jin Y et al (2019) Enriching LMW-GS alleles and strengthening gluten properties of common wheat through wide hybridization with wild emmer. \u003cem\u003e3 Biotech\u003c/em\u003e 9, 355. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s13205-019-1887-1\u003c/span\u003e\u003cspan address=\"10.1007/s13205-019-1887-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXiang Y, Song M, Wei Z, Tong J, Zhang L, Xiao L et al (2011) A jacalin-related lectin-like gene in wheat is a component of the plant defence system. J Exp Bot 62:5471\u0026ndash;5483. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/jxb/err226\u003c/span\u003e\u003cspan address=\"10.1093/jxb/err226\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYue YW, Long H, Liu Q, Wei YM, Yan ZH, Zheng YL (2005) Isolation of low-molecular-weight glutenin subunit genes from wild emmer wheat (Triticum dicoccoides). J Appl Genet 46:349\u0026ndash;355\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang W, Ciclitira P, Messing J (2014) PacBio sequencing of gene families - a case study with wheat gluten genes. Gene 533:541\u0026ndash;546. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.gene.2013.10.009\u003c/span\u003e\u003cspan address=\"10.1016/j.gene.2013.10.009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang X, Liu D, Jiang W, Guo X, Yang W, Sun J et al (2011) PCR-based isolation and identification of full-length low-molecular-weight glutenin subunit genes in bread wheat (Triticum aestivum L). Theor Appl Genet 123:1293\u0026ndash;1305. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00122-011-1667-8\u003c/span\u003e\u003cspan address=\"10.1007/s00122-011-1667-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang X, Liu D, Zhang J, Jiang W, Luo G, Yang W et al (2013) Novel insights into the composition, variation, organization, and expression of the low-molecular-weight glutenin subunit gene family in common wheat. J Exp Bot 64:2027\u0026ndash;2040. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/jxb/ert070\u003c/span\u003e\u003cspan address=\"10.1093/jxb/ert070\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang X, Wang H, Sun H, Li Y, Feng Y, Jiao C et al (2023) A chromosome-scale genome assembly of Dasypyrum villosum provides insights into its application as a broad-spectrum disease resistance resource for wheat improvement. Mol Plant 16:432\u0026ndash;451. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2022.12.021\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2022.12.021\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang Z, Liu D, Li B, Wang W, Zhang J, Xin M et al (2024) A k-mer-based pangenome approach for cataloging seed-storage-protein genes in wheat to facilitate genotype-to-phenotype prediction and improvement of end-use quality. Mol Plant 17:1038\u0026ndash;1053. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molp.2024.05.006\u003c/span\u003e\u003cspan address=\"10.1016/j.molp.2024.05.006\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang Z, Schwartz S, Wagner L, Miller W (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7:203\u0026ndash;214. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1089/10665270050081478\u003c/span\u003e\u003cspan address=\"10.1089/10665270050081478\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao H, Wang R, Guo A, Hu S, Sun G (2004) Development of primers specific for LMW-GS genes located on chromosome 1D and molecular characterization of a gene from Glu-D3 complex locus in bread wheat. Hereditas 141:193\u0026ndash;198. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/j.1601-5223.2004.01852.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1601-5223.2004.01852.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou Y, Zhao X, Li Y, Xu J, Bi A, Kang L et al (2020) Triticum population sequencing provides insights into wheat adaptation. Nat Genet 52:1412\u0026ndash;1422. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41588-020-00722-w\u003c/span\u003e\u003cspan address=\"10.1038/s41588-020-00722-w\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu T, Wang L, Rimbert H, Rodriguez JC, Deal KR, De Oliveira R et al (2021) Optical maps refine the bread wheat Triticum aestivum cv. Chinese Spring genome assembly. Plant J 107:303\u0026ndash;314. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/tpj.15289\u003c/span\u003e\u003cspan address=\"10.1111/tpj.15289\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"theoretical-and-applied-genetics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"taag","sideBox":"Learn more about [Theoretical and Applied Genetics](https://www.springer.com/journal/122)","snPcode":"122","submissionUrl":"https://submission.nature.com/new-submission/122/3","title":"Theoretical and Applied Genetics","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Wheat glutenins, LMW-GS genes, Genome, Gene duplication, Alleles, Triticum aestivum","lastPublishedDoi":"10.21203/rs.3.rs-5789598/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5789598/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eLMW-GS are one of the primary components of wheat (\u003cem\u003eTriticum aestivum\u003c/em\u003e L.) seed storage proteins, which have an important impact on wheat end-use quality traits. Identifying LMW-GS genes accurately within wheat genomes has consistently presented a significant challenge. LMWgsFinder developed by this study was used to re-identify the LMW-GS genes in a total of 26 genomes of the grass family. Apart from six species, a total of 291 LMW-GS genes were identified. Except for the two versions of the TaCS (\u003cem\u003eTriticum aestivum\u003c/em\u003e Chinese Spring) genome, only 38.13% (98/257) of the LMW-GS genes identified by LMWgsFinder were annotated in the coding sequence (CDS) annotation files (provided by the sequencing research groups) of the remaining 18 genomes. EnSpm-like transposon activity mediated recent duplication or triplication of the same LMW-GS gene has been observed in 8 wheat species for the first time, indicating that the replication of LMW-GS genes has been ongoing alongside the evolution of wheat. Several cases of rare structural variations associated with the loss or acquisition of LMW-GS gene function have been discovered and experimentally verified. Twenty-one new LMW-GS genes were discovered in 15 species of Triticeae. The results of this study provide the first empirical support at the DNA level, with confirmed chromosomal localization information, for the widely accepted notion that LMW-GS genes undergo gene duplication during wheat evolution. Additionally, this study offers gene sequence resources and a wealth of valuable information for further research on LMW-GS gene function, molecular-assisted selection, gene aggregation breeding, and molecular design breeding.\u003c/p\u003e","manuscriptTitle":"​Recent duplications and rare structural variations revealed by comparative sequence analysis of low molecular weight glutenin subunits (LMW-GS) genes re-identified using LMWgsFinder in 26 genomes of the grass family","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-17 06:34:28","doi":"10.21203/rs.3.rs-5789598/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Accept","date":"2025-05-01T05:24:26+00:00","index":"","fulltext":""},{"type":"reviewerAgreed","content":"","date":"2025-04-16T00:54:06+00:00","index":0,"fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-04-15T06:46:16+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-04-15T04:31:36+00:00","index":"","fulltext":""},{"type":"submitted","content":"Theoretical and Applied Genetics","date":"2025-04-14T06:27:36+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"theoretical-and-applied-genetics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"taag","sideBox":"Learn more about [Theoretical and Applied Genetics](https://www.springer.com/journal/122)","snPcode":"122","submissionUrl":"https://submission.nature.com/new-submission/122/3","title":"Theoretical and Applied Genetics","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"eae55ee4-a0f1-4e82-b19c-0c1558efbf09","owner":[],"postedDate":"April 17th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-06-02T16:04:00+00:00","versionOfRecord":{"articleIdentity":"rs-5789598","link":"https://doi.org/10.1007/s00122-025-04919-7","journal":{"identity":"theoretical-and-applied-genetics","isVorOnly":false,"title":"Theoretical and Applied Genetics"},"publishedOn":"2025-05-27 15:57:59","publishedOnDateReadable":"May 27th, 2025"},"versionCreatedAt":"2025-04-17 06:34:28","video":"","vorDoi":"10.1007/s00122-025-04919-7","vorDoiUrl":"https://doi.org/10.1007/s00122-025-04919-7","workflowStages":[]},"version":"v1","identity":"rs-5789598","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5789598","identity":"rs-5789598","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00