Whole-Genome Sequencing of Gliadin Degrading Genes Bacillus amyloliquefaciens Strain via Nanopore Technology | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Whole-Genome Sequencing of Gliadin Degrading Genes Bacillus amyloliquefaciens Strain via Nanopore Technology Yeun Kyoung Kim¹, Jincheol Cho¹, Heebal Kim This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6993182/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background: Gluten, a major protein found in wheat, rye, and barley, plays a crucial role in dough formation. Among its components, α-, β-, and γ-gliadins are resistant to enzymatic hydrolysis and can accumulate in the human digestive tract, potentially triggering celiac disease (CD) or nonceliac gluten sensitivity in genetically predisposed individuals. Objective: This study aimed to identify and characterize a bacterial strain capable of degrading gliadin via whole-genome sequencing and comparative genomic analysis. Methods: A gliadin-degrading Bacillus amyloliquefaciens strain , designated SNU-TC2, was isolated on the basis of its ability to form a clear zone on gliadin-containing agar. Whole-genome sequencing was performed viaOxford Nanopore Technology, followed by genome assembly and annotation. Taxonomic classification and genomic comparisons were conducted viaFastANI, core gene phylogenetic, and SNP-based analyses. Results: SNU-TC2 showed >97% nucleotide identity to reference B. amyloliquefaciens strains, with the highest similarity (98.66%) to RD7-7. However, SNP-based comparisonsrevealed 30,736 variants, including several functionally significant mutations. These findings suggest that, while taxonomically similar, SNU-TC2 may harbor distinct genetic features relevant to gliadin degradation. Conclusions: SNU-TC2 represents a promising candidate for further functional validation studies aimed at gluten degradation. Its genomic distinctiveness within the B. amyloliquefaciens strain complex warrants continued investigation for potential application in celiac disease management . gluten degradation gliadin whole-genome sequencing Oxford Nanopore Bacillus amyloliquefaciens SNP analysis celiac disease Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Celiac disease (CD) is an autoimmune condition triggered by gluten ingestion in genetically predisposed individuals [1]. Gluten, a composite of storage proteins in wheat, rye, and barley, contains immunogenic peptides—particularly α-gliadin fragments—that are resistant to gastrointestinal proteolysis and can provoke intestinal inflammation and villous atrophy [2–5]. A strict gluten-free diet (GFD) remains the mainstay of CD management, yet many patients experience persistent symptoms and reduced quality of life despite adhering to their diet [6]. The psychological and social burdens of lifelong dietary restriction further challenge adherence and overall quality of life [7]. Microbial or enzymatic detoxification of gluten has emerged as a complementary strategy to GFD. Lactic acid bacteria (LAB) have been investigated for their ability to degrade gliadin because of their probiotic safety profile [8]. Although the screening in this study initially targeted LAB candidates, one isolate exhibiting robust gliadin-degrading activity was unexpectedly identified as a Bacillus amyloliquefaciens strain. To further characterize this strain, designated SNU-TC2, whole-genome sequencing was performed via Oxford Nanopore Technology, followed by genome assembly, annotation, and comparative genomic analyses [9]. Given the long-readability and portability of nanopore sequencing, this platform has shown great promise in bacterial genome characterization and functional genomics applications [10]. These results suggest distinct genomic features within the Bacillus amyloliquefaciens lineage, supporting its potential utility in gluten degradation. Materials and Methods Gliadin-containing agar preparation To evaluate the gliadin-degrading activity of the bacterial isolates, gliadin-supplemented agar media were prepared. Initially, a series of optimization tests were performed to determine the maximum soluble concentration of gliadin in 60% ethanol. Briefly, 1 g of gliadin was dissolved in varying volumes (5–15 mL) of 60% ethanol and added to tryptic soy agar (TSA) and de Man, Rogosa and Sharpe (MRS) agar to achieve visible turbidity. Excess gliadin caused growth inhibition, so the concentration was carefully adjusted to balance opaqueness with bacterial viability. The optimal conditions were achieved by dissolving gliadin in 60% ethanol for 24 hours prior to media incorporation [11]. Screening of lactic acid bacteria for gliadin degradation Fifteen strains of lactic acid bacteria (LAB), Lactobacillus sakei, Lactobacillus acidophilus, L. Helveticas, L. delbrueckii subsp. bulgaricus, L. rhamnoses, L. gallinarum, L. Delbruck, L. fermentum, Enterococcus faecium, L. paracasei, L. brevis, L. curvatons, Streptococcus thermophilus, L. plantarum, and L. casei , were tested . Each strain was inoculated into 30 mL of MRS broth at 2% (v/v) and cultured anaerobically at 37°C for 19 hours [12]. After incubation, the cultures were centrifuged at 4,000 × g for 10 min at 4°C, and the resulting cell pellets were washed once with 15 mL of sodium phosphate buffer (pH 7.0). The cells were resuspended in 10 mL of the same buffer, and the optical density was measured at 600 nm (OD 600 ). The cell pellets, supernatants, and mixtures of both were separately spotted (5 μL and 10 μL) onto gliadin-containing tryptic soy agar (G_TSA; gliadin-supplemented TSA) and de Man, Rogosa and Sharpe agar (G_MRS). The plates were incubated anaerobically at 37°C for 48 h, and the formation of clear zones was interpreted as an indicator of gliadin-degrading activity [11]. Optimization of media and spot volume Compared with G_MRS,G_TSA was selected as the preferred medium because of its superior ability to support visible clear zone formation. After the cells were cultured in 30 mL of medium, the cultures were centrifuged, and the supernatants were removed. The resulting cell suspensions were adjusted to an optical density of 600 nm (OD₆₀₀), corresponding to approximately 20 CFU/m. A 5 μL volume of each suspension was spotted onto G_TSA plates, which were subsequently incubated at 37°C for 48 h under anaerobic conditions. The formation of clear zones was used as an indicator of gliadin-degrading activity, which is consistent with prior studies that utilized gliadin-containing agar plates to screen for gluten-degrading bacteria [11]. Furthermore, the potential of certain strains to mitigate gliadin-induced epithelial disruption has also been demonstrated in epithelial models [12]. Isolation of an unknown gliadin-degrading strain While screening LAB strains on G_TSA plates stored at room temperature under anaerobic conditions, a distinct colony displaying a clear zone was unexpectedly detected. The colonies were isolated, subcultured, and subsequently replotted on G_TSA plates under aerobic conditions. After 48 h of incubation at 37 °C, the formation of a consistent clear zone was observed, suggesting the potential of the isolate to degrade gliadin [13,14]. Whole-genome sequencing and assembly An unidentified gliadin-degrading Bacillus strain was cultured in 300 mL of MRS and TSB broth. Genomic DNA was extracted via the standard phenol–chloroform method [15], and its quality was confirmed via electrophoresis on a 0.8% agarose gel at 70 V for 80 minutes. Distinct high-molecular-weight DNA bands were observed. Whole-genome sequencing was performed via the MinION Mk1B platform (Oxford Nanopore Technologies, Oxford, UK) by C&K Genomics (Seoul, Korea) following the manufacturer’s protocols [16,17]. Bioinformatics pipeline for genome assembly and annotation Basecalling of the raw reads was conducted via Guppy v3.6.1[18,19], and adapter trimming was performed via Porechop v0.2.4[20]. Quality and length filtering were carried out via NanoFilt [21] and Filtlong v0.2.1 [22]. Three de novo assemblers—Canu [23], Flye [24], and Trycycler [25]—were applied to generate draft assemblies. The assembled genomes were polished via Medaka v1.11.1 [26] to improve base-level accuracy in accordance with established best practices for long-read assembly refinement [27]. Genome annotation was performed via Prokka [28], and final genome visualization, including circular genome mapping, was completed via the Proksee platform [29]. Assembly quality assessment Assembly completeness and contiguity were assessed via BUSCO (Benchmarking Universal Single-Copy Orthologs) [30] and QUAST (Quality Assessment Tool for Genome Assemblies) [31]. Among all the assemblies, the Trycycler-based assembly [25] showed the highest BUSCO completeness, with 771 complete genes and only 3 missing genes (Additional file 1: Fig. S1). Although the Canu assembly [23] presented superior contiguity metrics in QUAST (e.g., N50), the Trycycler assembly was ultimately selected for downstream taxonomic analyses owing to its overall superior completeness (Additional file 1: Fig. S2). 16S rRNA gene sequence analysis To assess the taxonomic position of strain SNU-TC2, phylogenetic analysis based on 16S rRNA gene sequences was performed. 16S rRNA genes were predicted via Barrnap v0.9[34], and representative sequences were selected from each genome. Multiple sequence alignment was performed via MAFFT v7.490 (L-INS-i) [33], and a maximum-likelihood tree was inferred via IQ-TREE v2.2.0 via ModelFinder and 1,000 ultrafast bootstrap replicates [36]. The resulting tree was visualized via iTOL v6[42]. Selection of reference genomes for ANI analysis Representative genomes of the genus Bacillus were retrieved from the Genome Taxonomy Database (GTDB) release R226[38]. The GTDB metadata file (bac120_metadata_r226.tsv) was used to identify the genomes annotated as representative (is_representative = t) and taxonomically classified as Bacillus . Genomes with a CheckM-estimated completeness of ≥95% were retained to ensure high-quality references. The final reference set included N genomes, with a mean completeness of 99.1%. The FTP links provided in the GTDB metadata were used to download the corresponding.fna.gz FASTA files from NCBI RefSeq. These genomes were compiled into a custom reference panel and analyzed via the FastANI engine [38] integrated into the Proksee web platform [28] for pairwise ANI comparisons against the assembled query genome. A threshold of 95% ANI was applied to delineate species boundaries, which is consistent with widely accepted genomic standards for prokaryotic taxonomy [37]. Core genome phylogenetic analysis To identify closely related genomes, pairwise average nucleotide identity (ANI) was calculated via FastANI (v1.33) [38] across 71 Bacillus amyloliquefaciens reference genomes. Fourteen strains with >97% identity to SNU-TC2 were selected for downstream phylogenetic analysis. Genome annotations were generated via Prokka (v1.14.6) [28] to produce GFF files, which were processed with Roary (v3.13.0) [39] to identify core genes. Multiple sequence alignment of the core genome was performed via MAFFT (v7.490) [40]. A maximum likelihood phylogenetic tree was subsequently constructed with IQ-TREE (v2.2.0) [41] on the basis of the aligned core genes, employing 1,000 bootstrap replicates. The resulting tree was visualized and annotated via the iTOL web tool (v6) [42]. SNP identification and annotation To assess genome-wide SNP variation between SNU-TC2 and the 14 most common Bacillus amyloliquefaciens reference strains identified through FastANI analysis, pairwise SNP distances were calculated via snp-dists (v0.7.0) [43] on the basis of core genome alignments. The resulting distance matrix was visualized as a clustered heatmap via the R package pheatmap [44], enabling exploration of genomic diversity among closely related strains sharing >97% sequence identity with the query. For detailed functional annotation, SNPs were identified by aligning the consensus genome of SNU-TC2 against the reference genome of B. amyloliquefaciens RD7-7—the most closely related strain on the basis of FastANI similarity—using Snippy (v4.6.0) [45]. The functional effects of the detected variants were predicted via SnpEff (v5.2) [46] with the RD7-7 genome and its corresponding GFF annotation. To refine the gene-level predictions, particularly for functionally ambiguous regions, additional annotations were obtained via Prokka (v1.14.6) [28] and eggNOG-mapper (v2.1.9) [47]. Results Assembly completeness assessment Assembly completeness and quality were evaluated via BUSCO and QUAST. Among the Trycycler assemblies, one variant presented the highest completeness, with 771 complete orthologs and only three missing single-copy orthologs. In contrast, assemblies generated via Canu and Flye presented slightly lower BUSCO scores. Although the results of the QUAST analysis indicated superior contiguity metrics (e.g., N50) for Canu assembly, Trycycler assembly was ultimately selected for downstream analyses because of its greater completeness (Fig. 2). 16S rRNA gene phylogeny Phylogenetic analysis of the 16S rRNA gene sequences placed strain SNU-TC2 within the Bacillus amyloliquefaciens clade. All 27 copies of the 16S rRNA gene from SNU-TC2 consistently clustered within a strongly supported monophyletic group (bootstrap ≥98), together with the type strain DSM7 and closely related strains such as RD7-7, PP19, 35 M and YP6. No substantial sequence divergence was observed among the multiple 16S rRNA gene copies of SNU-TC2. The overall tree topology was consistent with the GTDB taxonomy [49] and supported the application of genome-based approaches for higher-resolution taxonomic assignment [50]. To improve visual clarity, a midpoint-rooted tree highlighting the SNU–TC2 clade is shown in Fig. 3A, and a magnified subtree of closely related strains is presented in Fig. 3B. A collapsed version of the full 16S rRNA phylogeny including 74 Bacillus strains is shown in Fig. 3C, whereas the complete expanded tree is provided in Supplementary Fig. S3. Average nucleotide identity (ANI) analysis ANI analysis revealed high genomic similarity between SNU-TC2 and reference Bacillus strains. Among the 8,956 complete genome assemblies of the Bacillus genus obtained from the GTDB, 71 reference genomes of B. amyloliquefaciens and the query strain SNU-TC2 were selected for pairwise ANI analysis to assess species-level similarity. The assembled genome of SNU-TC2 exhibited 98.66% identity with that of B. amyloliquefaciens strain RD7-7, confirming the close taxonomic relationship between the two strains [41,49]. Whole-genome alignment via FastANI further demonstrated syntenic conservation and local divergence between SNU-TC2 and RD7-7, as shown in Fig. 4B. For comparison, alignment with the type strain DSM7 revealed slightly lower similarity (97.56% ANI), as shown in Fig. 4A. Phylogenetic relationship of SNU-TC2 with related Bacillus strains To refine the phylogenetic placement of SNU-TC2, a maximum likelihood tree was generated on the basis of core genome alignments (Fig. 5). The analysis included 14 B. amyloliquefaciens strains that presented >97% pairwise ANI with SNU-TC2 [39]. As shown in Fig. 4, SNU-TC2 clustered within a strongly supported monophyletic clade together with strain RD7-7 (SH-aLRT/UFBoot = 98.5/99) [41, 42]. Although several strains presented relatively long branch lengths, >95% of the taxa included in the tree had ANI with SNU-TC2, which was consistent with the species-level classification. These findings support the assignment of SNU-TC2 to the B. amyloliquefaciens clade and highlight RD7-7 as the closest sequenced relative to the other strains on the basis of core genome similarity. SNP-based genomic comparison Pairwise single-nucleotide polymorphism (SNP) distances between SNU-TC2 and the top 14 B. amyloliquefaciens reference genomes were calculated via snp-dists [43], and the results are visualized as a clustered heatmap in Fig. 6. SNU-TC2 exhibited high nucleotide-level similarity (≤5,000 SNPs) with several other strains, including Ba13, 35 M, and Bacillus sp. 7D3. Variant annotation was performed using RD7-7 as the reference genome. A total of 30,736 variants were identified between SNU-TC2 and RD7-7, including 27,497 SNPs, corresponding to an average of one variant per 119 base pairs. According to SnpEff [46] annotations, 66.4% of the SNPs were synonymous, indicating low functional impact. Missense mutations accounted for 21.9%, whereas high-impact mutations—such as frameshift_variant, stop_gained, and start_lost—comprised only 0.05% of the total but may be functionally significant. To further interpret these variants, Prokka [28] and eggNOG-mapper [47] were used for gene annotation and functional classification. The majority of the variants were located in noncoding or regulatory regions (e.g., upstream or downstream of genes), with approximately 8% falling within exonic regions. On the basis of functional impact predictions, most of the variants were classified as MODIFIER (91.95%) or LOW (6.06%), suggesting that despite high genomic similarity, subtle nucleotide differences could contribute to functional divergence between the two strains. Discussion The aim of this study was to identify bacterial strains capable of degrading gliadin, a key immunogenic component of gluten associated with celiac disease [11]. During the screening process for gliadin-degrading activity, we isolated a previously uncharacterized strain, SNU-TC2, which formed a distinct clear zone on TSA plates containing gliadin. Notably, this strain was able to grow under both aerobic and anaerobic conditions. To determine its taxonomic position, we first performed phylogenetic analysis on the basis of 16S rRNA gene sequences [40-43]. All 27 copies of the 16S rRNA gene in SNU-TC2 clustered tightly with the type strain Bacillus amyloliquefaciens DSM7, forming a well-supported monophyletic clade. These results suggested that SNU-TC2 belongs to the B. amyloliquefaciens species complex. However, 16S rRNA-based resolution is limited at the strain level [50]. To further clarify its relationship with other B. amyloliquefaciens strains, we conducted whole-genome comparisons via FastANI against 71 reference genomes [38]. The analysis revealed >97% average nucleotide identity (ANI) with several strains, with the highest similarity (98.66%) observed with strain RD7-7. According to species boundary criteria established through large-scale ANI analyses [41] and standardized bacterial taxonomy frameworks [49], these values strongly support the assignment of SNU-TC2 to the B. amyloliquefaciens species complex. Phylogenetic reconstruction based on core gene alignment further supported this relationship, placing SNU-TC2 firmly within the B. amyloliquefaciens clade alongside RD7-7[44,45]. These findings are consistent with the recently proposed genome-based taxonomy, which revises bacterial phylogeny and species delineation via whole-genome data [49]. Notably, despite the high overall sequence similarity, SNP-based analysis revealed 30,736 variants between SNU-TC2 and RD7-7, including several missense and high-impact mutations [45, 46]. This extent of variation between closely related strains suggested the presence of genomic differences that may underlie the gliadin degradation ability of SNU-TC2. Taken together, these results indicate that SNU-TC2 is a genetically distinct strain within the B. amyloliquefaciens species complex. While further studies are needed to determine the functional significance of the observed SNP diversity, SNU-TC2 represents a promising candidate for future investigations into strain-level proteolytic variation in B. amyloliquefaciens . Abbreviations CD: Celiac disease GFD: Gluten-free diet LAB: Lactic acid bacteria TSA: Tryptic soy agar MRS: de Man, Rogosa and Sharpe agar G_TSA: Gliadin-supplemented tryptic soy agar G_MRS: Gliadin-supplemented de Man, Rogosa and Sharpe agar ONT: Oxford Nanopore Technologies BUSCO: Benchmarking Universal Single-Copy Orthologs QUAST: Quality Assessment Tool for Genome Assemblies ANI: Average nucleotide identity SNP: Single-nucleotide polymorphism ML: Maximum likelihood SH-aLRT: Shimodaira–Hasegawa approximate likelihood ratio test UFBoot: Ultrafast bootstrap MLST: Multilocus sequence typing SNU-TC2: Seoul National University – Trycycler cluster 2 Declarations Funding This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. Clinical trial number Clinical trial number: not applicable. Ethics approval and consent to participate Ethics approval and consent to participate: not applicable. Consent for publication Consent for publication: not applicable. Competing interests: The authors declare that they have no competing interests. Authors’ contributions All authors are affiliated with the College of Agriculture and Life Sciences, Seoul National University. Yeun Kyoung Kim conceived and conducted the study, performed experiments and analyses, and wrote the manuscript. Jincheol Cho contributed to the experimental work and prepared Figure 1. Heebal Kim supervised the study and provided critical guidance. All authors approved the final manuscript. Availability of data and materials Sequence data that support the findings of this study have been deposited in the NCBI BioProject under accession code PRJNA1307060 (BioSample SAMN50652994). References Lebwohl B, Rubio-Tapia A. Epidemiology, presentation, and diagnosis of celiac disease . Gastroenterology. 2021;160(1):63–75. https://doi.org/10.1053/j.gastro.2020.06.098 Arentz-Hansen H, Körner R, Molberg Ø, Quarsten H, Vader W, Kooy YM, et al. The intestinal T-cell response to α-gliadin in adult celiac disease is focused on a single deamidated glutamine targeted by tissue transglutaminase. J Exp Med. 2000;191(4):603–12.https://doi.org/10.1084/jem.191.4.603 Shan L, Molberg Ø, Parrot I, Hausch F, Filiz F, Gray GM, et al. Structural basis for gluten intolerance in celiac sprue. Science. 2002;297(5590):2275–9. https://doi.org/10.1126/science.1074129 Salmi A, Collin P, Korponay-Szabo IR, Laurila K, Partanen J, Huhtala H, et al. Endomysial antibody-negative coeliac disease: clinical characteristics and intestinal autoantibody deposits. Gut. 2006;55(12):1746–53. doi: 10.1136/gut.2005.071514 Salmi A, Collin P. Coeliac disease: villous atrophy and enteropathy following gluten exposure. Gastroenterology. 2009;136(6):1977–83. doi: 10.1053/j.gastro.2008.11.040. Epub 2008 Nov 24. Dochat C, Afari N, Satherley R-M, Coburn S, McBeth JF. Celiac disease symptom profiles and their relationship to gluten-free diet adherence, mental health, and quality of life. BMC Gastroenterol. 2024;24(1):9. https://doi.org/10.1186/s12876-023-03101-x Hall NJ, Rubin G, Charnock A. Systematic review: adherence to a gluten-free diet in adult patients with coeliac disease. Aliment Pharmacol Ther . 2009;30(4):315–30. doi:10.1111/j.1365-2036.2009.04053.x De Angelis M, Rizzello CG, Fasano A, Clemente MG, De Simone C, Rizzello F, et al. Lactobacillus fermentum attenuates gliadin-induced toxicity in intestinal epithelial cells . Appl Environ Microbiol . 2006;72(1):130–138 https://doi.org/10.1186/1471-2180-14-19 Yu-Ping Hong, Bo-Han Chen, You-Wun Wang, Ru-Hsiou Teng, Hsiao-Lun Wei, Chien-Shun Chiou. The usefulness of nanopore sequencing in whole-genome sequencing-based genotyping of Listeria monocytogenes and Salmonella enterica serovar Enteritidis. Microbiol Spectr . 2024;10:e00509-24 https://doi.org/10.1128/spectrum.00509-24 Kono N, Arakawa K. Nanopore sequencing: review of potential applications in functional genomics. Wiley Interdiscip Rev Syst Biol Med . 2019;11(4): e1436.https://doi.org/10.1111/dgd.12608 Berger M, Sarantopoulos C, Ongchangco D, Sry J, Cesario T. Rapid isolation of gluten-digesting bacteria from human stool and saliva by using gliadin-containing plates . Exp Biol Med (Maywood). 2015;240(7):917–924. https://doi.org/10.1177/1535370214564748 Orlando A, Linsalata M, Notarnicola M, Tutino V, Russo F. Lactobacillus GG restoration of the gliadin induced epithelial barrier disruption: the role of cellular polyamines. BMC Microbiol . 2014;14:19. https://doi.org/10.1186/1471-2180-14-19 Caminero, A. et al. (2019). Duodenal bacteria from active celiac patients and healthy individuals hydrolyze gliadin peptides differentially via secreted proteases. Nature Communications , 10, 1371. https://doi.org/10.1038/s41467-019-09037-9 Caminero A, McCarville JL, Galipeau HJ, Deraison C, Bernalier-Donadille A, Jury J, et al. Duodenal bacterial proteolytic activity determines sensitivity to dietary antigen through protease-activated receptor-2. Nat Commun. 2019;10:1198.https://doi.org/10.1038/s41467-019-09037-9 Kim Y, Oh S, Kim SH. Released exopolysaccharides from Lactobacillus plantarum and Bacillus subtilis effectively inhibit foodborne pathogens by enhancing host defense in mice. Food Sci Biotechnol. 2020;29(11):1449–1456. doi: 10.1007/s00253-018-8946-0 Jain, M., et al. (2016). Nanopore sequencing and assembly of a human genome with ultralong reads Nature Biotechnology , 36(4), 338–345. https://doi.org/10.1038/nbt.4060 Wang Y, Zhao Y, Bollas A, Wang Y, Au KF. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol . 2021;39:1348–1365. https://doi.org/10.1038/s41587-021-01108-x Oxford Nanopore Technologies. (2019). Guppy basecalling software. https://community.nanoporetech.com Wick RR, Judd LM, Holt KE . Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol . 2019;20:129. https://doi.org/10.1186/s13059-019-1727-y Wick R. Porechop: adapter trimmer for Oxford Nanopore reads. 2017. Available from: https://github.com/rrwick/Porechop De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics . 2018;34(15):2666–9. https://doi.org/10.1093/bioinformatics/bty149 Wick R. Filtlong: quality filtering for long reads . 2018. Available from: https://github.com/rrwick/Filtlong Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res . 2017;27(5):722–36. https://doi.org/10.1101/gr.215087.116 Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol . 2019;37(5):540–6. https://doi.org/10.1038/s41587-019-0072-8 Wick RR, Judd LM, Cerdeira LT, Hawkey J, Meric G, Vezina B, et al . Trycycler: consensus long-read assemblies for bacterial genomes. Genome Biol . 2021;22:266. https://doi.org/10.1186/s13059-021-02483-z Oxford Nanopore Technologies. Medaka: sequence correction software . 2020. Available from: https://github.com/nanoporetech/medaka Lee JY, Kong M, Oh J, Lim JS, Chung SH, Kim JM, et al. Comparative evaluation of Nanopore polishing tools for microbial genome assembly and polishing strategies for downstream analysis. Sci Rep. 2021;11:20740. https://doi.org/10.1038/s41598-021-00178-w Seemann T . Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068–2069. https://doi.org/10.1093/bioinformatics/btu153 Grant JR, Enns E, Marinier E, Mandal A, Herman EK, Chen C-Y, Graham M, Van Domselaar G, Stothard P. Proksee: in-depth characterization and visualization of bacterial genomes. Nucleic Acids Res. 2023;51(W1): W484–W492. https://doi.org/10.1093/nar/gkad326 Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics . 2015;31(19):3210–2. https://doi.org/10.1093/bioinformatics/btv351 Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–5. https://doi.org/10.1093/bioinformatics/btt086 Seemann T. Barrnap 0.9: Bacterial ribosomal RNA predictor . 2014. Available from: https://github.com/tseemann/barrnap Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability . Mol Biol Evol. 2013;30(4):772–80. https://doi.org/10.1093/molbev/mst010 Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37(5):1530–4. https://doi.org/10.1093/molbev/msaa015 Letunic I, Bork P. Interactive Tree of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res . 2021;49(W1): W293–6.https://doi.org/10.1093/nar/gkab301 Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil PA, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol . 2020;38(9):1079–86. https://doi.org/10.1038/s41587-020-0501-8 Richter M, Rosselló-Móra R. Shifting the genomic gold standard for the prokaryotic species definition. Proc Natl Acad Sci U S A . 2009;106(45):19126–19131. https://doi.org/10.1073/pnas.0906412106 Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries . Nat Commun . 2018;9:5114. https://doi.org/10.1038/s41467-018-07641-9 Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MTG, et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 2015;31(22):3691–3693. https://doi.org/10.1093/bioinformatics/btv421 Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–780. https://doi.org/10.1093/molbev/mst010 Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Haeseler A von, Lanfear R. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37(5):1530–1534. https://doi.org/10.1093/molbev/msaa015 Letunic I, Bork P. Interactive Tree of Life (iTOL) v6: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res . 2021;49(W1): W293–W296. https://doi.org/10.1093/nar/gkab301 Seemann T. snp-dists: pairwise SNP distance matrix from a FASTA alignment . https://github.com/tseemann/snp-dists Kolde R. pheatmap: Pretty Heatmaps. R package version 1.0.12. https://cran.r-project.org/package=pheatmap Seemann T. Snippy: fast bacterial variant calling and core genome alignment. https://github.com/tseemann/snippy Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly (Austin). 2012;6(2):80–92. https://doi.org/10.4161/fly.19695 Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol . 2021;38(12):5825–5829. https://doi.org/10.1093/molbev/msab293 Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database . Bioinformatics . 2020;36(6):1925–1927.https://doi.org/10.1093/bioinformatics/btz848 Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil PA, Hugenholtz P . A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol . 2018;36(10):996–1004. https://doi.org/10.1038/nbt.4229 Hassler HB, Probert B, Moore C, Lawson E, Jackson RW, Russell BT, Richards VP. Phylogenies of the 16S rRNA gene and its hypervariable regions lack concordance with core genome phylogenies. BMC Genomics . 2022;23(1):513. https://doi.org/10.1186/s40168-022-01295-y Additional Declarations No competing interests reported. Supplementary Files SupplementFigures.zip Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6993182","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":516191999,"identity":"b41523a0-eeb3-42cf-ac61-b91cd1549f79","order_by":0,"name":"Yeun Kyoung Kim¹","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3klEQVRIie3RvQrCMBDA8ZNAp5SuEaW+QkQogoOvklDQRRA33QQhndx9lI5KQJfgXGkHXTrrZjfPj60QOjrkv93Bj5AEwOX61wTsuwDkcP3NpBGhAF7MmxP4EBqxRiRItof7LS1owES0qlKNmz0ZLC2EmXPMpClpeycmuW80bgSRxkI4m3EmlaY8E8e8pTRAhg+xtpPB803GmVSLCkmvAYm+p7DYAx8JRyJtBO8yGUpVUmZK0vHVlPaN3PRtBF9MXypVhEEyfzwqNQrDk9ZtG8HvqI0tO6gRl8vlctV6AbIiTRwFPGKZAAAAAElFTkSuQmCC","orcid":"","institution":"Seoul National University – Food and Animal Biotechnology Seoul National University","correspondingAuthor":true,"prefix":"","firstName":"Yeun","middleName":"Kyoung","lastName":"Kim¹","suffix":""},{"id":516192000,"identity":"4578933a-fd20-4320-8ac2-8276dc823978","order_by":1,"name":"Jincheol Cho¹","email":"","orcid":"","institution":"Seoul National University – Food and Animal Biotechnology Seoul National University","correspondingAuthor":false,"prefix":"","firstName":"Jincheol","middleName":"","lastName":"Cho¹","suffix":""},{"id":516192002,"identity":"f275144f-7784-4354-8a67-69e36ec8ae70","order_by":2,"name":"Heebal Kim","email":"","orcid":"","institution":"Seoul National University – Food and Animal Biotechnology Seoul National University","correspondingAuthor":false,"prefix":"","firstName":"Heebal","middleName":"","lastName":"Kim","suffix":""}],"badges":[],"createdAt":"2025-06-27 16:08:08","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6993182/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6993182/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":91650088,"identity":"d7eecb5d-fa66-42e1-8b43-8e8a71d4b2d7","added_by":"auto","created_at":"2025-09-18 16:51:45","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":47643,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDifferent lactic acid bacterial strains on MRS plates containing 1% gliadin (G_MRS).\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIndividual colonies were propagated from 5 μL of bacterial strains spotted in G_MRS medium and incubated for 48 hours under aerobic or anaerobic conditions. Halo formation was evaluated: (+) indicates the presence of a clear zone, and (−) indicates absence.\u003c/p\u003e\n\u003cp\u003eThe image labeled \"putative bacterial colony\" represents the target strain under investigation for gliadin-degrading activity.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6993182/v1/30734072eca209fdf903cb5c.png"},{"id":91650092,"identity":"41891ab3-0f13-42b4-99e5-a1ab22669f00","added_by":"auto","created_at":"2025-09-18 16:51:45","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":759963,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eWGS-based annotation map of \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eBacillus strains \u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003econstructed via Trycycler.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFrom the innermost to the outermost rings: (1) GC skew, (2) GC content, (3) Pokka annotations (− strand), (4–6) ORFs (−3 to −1), (7) contigs, (8–10) ORFs (+1 to +3), and (11) Pokka annotations (+ strand). Functional features such as CDSs, rRNAs, tRNAs, and hypothetical proteins are color coded. The labels are connected via radial lines to enhance clarity.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6993182/v1/a5d76d77e75cf7d64adee44e.png"},{"id":91650683,"identity":"f1346b4a-6380-42b8-bfa9-d53b9c82f483","added_by":"auto","created_at":"2025-09-18 17:07:45","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":109420,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePhylogenetic analysis of strain \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eSNU_TC-2\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e on the basis of rRNA gene sequences.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Maximum-likelihood tree of 16S rRNA gene sequences from\u003cem\u003e Bacillus\u003c/em\u003especies, including all 27 copies from \u003cem\u003eSNU_TC-2\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003eAll 16S rRNA sequences from \u003cem\u003eSNU_TC-2\u003c/em\u003e were placed within the \u003cem\u003eBacillus\u003c/em\u003e \u003cem\u003eamyloliquefaciens\u003c/em\u003eclade, forming a monophyletic group with strong bootstrap support (≥98). The tree was midpoint-rooted. The different rRNA gene types used in the analysis are represented by branch colors: dark purple indicates 16S rRNA, and lighter shades represent 23S and 5S rRNA, respectively.\u003c/p\u003e\n\u003cp\u003e(B) Magnified view of the \u003cem\u003eSNU_TC-2\u003c/em\u003e clade from the 16S rRNA tree shown in (A).\u003c/p\u003e\n\u003cp\u003eThis panel presents a detailed view of the phylogenetic relationships among strains closely related to \u003cem\u003eSNU_TC-2\u003c/em\u003e on the basissolely of 16S rRNA sequences.\u003c/p\u003e\n\u003cp\u003e(C) Collapsed full 16S rRNA gene maximum likelihood tree including 74 representative Bacillus strains.\u003c/p\u003e\n\u003cp\u003eTo enhance interpretability, major clades were collapsed to highlight the phylogenetic context of \u003cem\u003eSNU_TC-2\u003c/em\u003e within the \u003cem\u003eB. amyloliquefaciens\u003c/em\u003elineage. The clade containing \u003cem\u003eSNU_TC-2\u003c/em\u003e was retained in expanded form for clarity.\u003c/p\u003e\n\u003cp\u003eThe full version of the tree with all individual labels is provided in Supplementary Figure S3.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6993182/v1/5c39e4b35f14795cc47da522.png"},{"id":91650489,"identity":"e8e2fc02-dc1b-4cc9-88f1-652bbd8d1ba0","added_by":"auto","created_at":"2025-09-18 16:59:45","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":110534,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eWhole-genome alignments of strain SNU-TC2 with two reference \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eBacillus amyloliquefaciens\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003estrains via FastANI.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A, left panel): Pairwise whole-genome alignment between SNU-TC2 and \u003cem\u003eB. amyloliquefaciens\u003c/em\u003e DSM 7 (type strain). The analysis revealed an average nucleotide identity (ANI) of 97.56%, supporting species-level relatedness.\u003cbr\u003e\n(B, right panel): Alignment between SNU-TC2 and strain RD7-7, which presented the highest genomic similarity among all tested strains (98.66% ANI), was used as areference for downstream SNP and functional analyses.\u003c/p\u003e\n\u003cp\u003eRed blocks indicate syntenic regions with high sequence identity; white gaps represent local genomic divergence. The color gradient corresponds to the percentage identity (from 74% to 100%).\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-6993182/v1/e959f664f5d29b238d40cab9.png"},{"id":91650089,"identity":"f72b9054-b7fc-4bbb-95ae-a46824735308","added_by":"auto","created_at":"2025-09-18 16:51:45","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":21952,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePhylogenetic relationship of SNU-TC2 with related \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eBacillus strains.\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eA maximum likelihood tree was reconstructed on the basis of core genome alignment. Model selection was performed via ModelFinder, and branch support was assessed with 1,000 replicates of both the SH-aLRT test and the ultrafast bootstrap approximation. The support values are shown as SH-aLRT%/UFBoot%. The branchlengths represent the number of substitutions per site, and nodes with SH-aLRT \u0026lt;50% were collapsed. \u003cem\u003eBacillus\u003c/em\u003e sp. 7D3 was used as an outgroup.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-6993182/v1/d9ccf0fe4a35c90ac1aa05ca.png"},{"id":91650093,"identity":"c9890c87-83ad-4887-92a8-80e1df1ea017","added_by":"auto","created_at":"2025-09-18 16:51:45","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":98476,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHeatmap of pairwise SNP differences between the query strain\u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eBacillus amyloliquefaciens \u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003eSNU-TC2 and the reference strain RD7-7.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSNPs were identified via whole-genome alignment between the two genomes, and a distance matrix was constructed on the basis of the distribution of SNP positions.\u003c/p\u003e\n\u003cp\u003eThe heatmap visualizes SNP differences across the genome, with colorgradients ranging from turquoise (low SNP density) through warm ivory to indigo (high SNP density).\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-6993182/v1/545320a689e3ecc8e6c5f9df.png"},{"id":93365628,"identity":"68871799-ee7c-4ce5-a613-baf42ce91197","added_by":"auto","created_at":"2025-10-13 04:47:03","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3822801,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6993182/v1/90e7b98f-a994-4ae4-93a5-99f7449ca7dd.pdf"},{"id":91650095,"identity":"4a86a3cb-0be2-4044-ae2a-6c265b42eb26","added_by":"auto","created_at":"2025-09-18 16:51:45","extension":"zip","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":10277860,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementFigures.zip","url":"https://assets-eu.researchsquare.com/files/rs-6993182/v1/7c8a4e3e779e429532089fe7.zip"}],"financialInterests":"No competing interests reported.","formattedTitle":"Whole-Genome Sequencing of Gliadin Degrading Genes Bacillus amyloliquefaciens Strain via Nanopore Technology","fulltext":[{"header":"Introduction","content":"\u003cp\u003eCeliac disease (CD) is an autoimmune condition triggered by gluten ingestion in genetically predisposed individuals [1]. Gluten, a composite of storage proteins in wheat, rye, and barley, contains immunogenic peptides\u0026mdash;particularly \u0026alpha;-gliadin fragments\u0026mdash;that are resistant to gastrointestinal proteolysis and can provoke intestinal inflammation and villous atrophy [2\u0026ndash;5].\u003c/p\u003e\n\u003cp\u003eA strict gluten-free diet (GFD) remains the mainstay of CD management, yet many patients experience persistent symptoms and reduced quality of life despite adhering to their diet [6]. The psychological and social burdens of lifelong dietary restriction further challenge adherence and overall quality of life [7].\u003c/p\u003e\n\u003cp\u003eMicrobial or enzymatic detoxification of gluten has emerged as a complementary strategy to GFD. Lactic acid bacteria (LAB) have been investigated for their ability to degrade gliadin because of their probiotic safety profile [8]. Although the screening in this study initially targeted LAB candidates, one isolate exhibiting robust gliadin-degrading activity was unexpectedly identified as a \u003cem\u003eBacillus amyloliquefaciens strain.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eTo further characterize this strain, designated SNU-TC2, whole-genome sequencing was performed via Oxford Nanopore Technology, followed by genome assembly, annotation, and comparative genomic analyses [9]. Given the long-readability and portability of nanopore sequencing, this platform has shown great promise in bacterial genome characterization and functional genomics applications [10]. These results suggest distinct genomic features within the \u003cem\u003eBacillus amyloliquefaciens\u003c/em\u003e lineage, supporting its potential utility in gluten degradation.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003e\u003cstrong\u003eGliadin-containing agar preparation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo evaluate the gliadin-degrading activity of the bacterial isolates, gliadin-supplemented agar media were prepared. Initially, a series of optimization tests were performed to determine the maximum soluble concentration of gliadin in 60% ethanol. Briefly, 1 g of gliadin was dissolved in varying volumes (5\u0026ndash;15 mL) of 60% ethanol and added to tryptic soy agar (TSA) and de Man, Rogosa and Sharpe (MRS) agar to achieve visible turbidity. Excess gliadin caused growth inhibition, so the concentration was carefully adjusted to balance opaqueness with bacterial viability. The optimal conditions were achieved by dissolving gliadin in 60% ethanol for 24 hours prior to media incorporation [11].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eScreening of lactic acid bacteria for gliadin degradation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFifteen strains of lactic acid bacteria (LAB), \u003cem\u003eLactobacillus sakei,\u0026nbsp;\u003c/em\u003e\u003cem\u003eLactobacillus\u003c/em\u003e\u003cem\u003e\u0026nbsp;acidophilus, L. Helveticas, L. delbrueckii subsp. bulgaricus, L. rhamnoses, L. gallinarum, L. Delbruck, L. fermentum, Enterococcus faecium, L.\u0026nbsp;\u003c/em\u003e\u003cem\u003eparacasei, L. brevis, L. curvatons, Streptococcus thermophilus, L. plantarum, and L. casei\u003c/em\u003e\u003cem\u003e, were tested\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e Each strain was inoculated into 30 mL of MRS broth at 2% (v/v) and cultured anaerobically at 37\u0026deg;C for 19 hours [12].\u003c/p\u003e\n\u003cp\u003eAfter incubation, the cultures were centrifuged at 4,000 \u0026times; \u003cem\u003eg\u003c/em\u003e for 10 min at 4\u0026deg;C, and the resulting cell pellets were washed once with 15 mL of sodium phosphate buffer (pH 7.0). The cells were resuspended in 10 mL of the same buffer, and the optical density was measured at 600 nm (OD\u003csub\u003e600\u003c/sub\u003e). The cell pellets, supernatants, and mixtures of both were separately spotted (5 \u0026mu;L and 10 \u0026mu;L) onto gliadin-containing tryptic soy agar (G_TSA; gliadin-supplemented TSA) and de Man, Rogosa and Sharpe agar (G_MRS).\u0026nbsp;The plates\u0026nbsp;were incubated anaerobically at 37\u0026deg;C for\u0026nbsp;48 h, and the formation of clear zones was interpreted as an indicator of gliadin-degrading activity [11].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eOptimization of media and\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003espot\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;volume\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eCompared with G_MRS,G_TSA was selected as the preferred medium because of its superior ability to support visible clear zone formation. After the cells were cultured in 30 mL of medium, the cultures were centrifuged, and the supernatants were removed. The resulting cell suspensions were adjusted to an optical density of 600 nm (OD₆₀₀), corresponding to approximately 20 CFU/m. A 5 \u0026mu;L volume of each suspension was spotted onto G_TSA plates, which were subsequently incubated at 37\u0026deg;C for 48 h under anaerobic conditions. The formation of clear zones was used as an indicator of gliadin-degrading activity, which is consistent with prior studies that utilized gliadin-containing agar plates to screen for gluten-degrading bacteria [11]. Furthermore, the potential of certain strains to mitigate gliadin-induced epithelial disruption has also been demonstrated in epithelial models [12].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eIsolation of\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003ean\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eunknown gliadin-degrading strain\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWhile screening LAB strains on G_TSA plates stored at room temperature under anaerobic conditions, a distinct colony displaying a clear zone was unexpectedly detected. The colonies were isolated, subcultured, and subsequently replotted on G_TSA plates under aerobic conditions. After 48 h of incubation at 37 \u0026deg;C, the formation of a consistent clear zone was observed, suggesting the potential of the isolate to degrade gliadin [13,14].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eWhole-genome sequencing and assembly\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAn unidentified gliadin-degrading \u003cem\u003eBacillus\u003c/em\u003e strain was cultured in 300 mL of MRS and TSB broth. Genomic DNA was extracted via the standard phenol\u0026ndash;chloroform method [15], and its quality was confirmed via electrophoresis on a 0.8% agarose gel at 70 V for 80 minutes. Distinct high-molecular-weight DNA bands were observed. Whole-genome sequencing was performed via the MinION Mk1B platform (Oxford Nanopore Technologies, Oxford, UK) by C\u0026amp;K Genomics (Seoul, Korea) following the manufacturer\u0026rsquo;s protocols [16,17].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eBioinformatics pipeline for genome assembly and annotation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBasecalling of the raw reads was conducted via Guppy v3.6.1[18,19], and adapter trimming was performed via Porechop v0.2.4[20]. Quality and length filtering were carried out via NanoFilt [21] and Filtlong v0.2.1 [22]. Three de novo assemblers\u0026mdash;Canu [23], Flye [24], and Trycycler [25]\u0026mdash;were applied to generate draft assemblies. The assembled genomes were polished via Medaka v1.11.1\u0026nbsp;[26] to improve base-level accuracy in accordance with established best practices for long-read assembly refinement\u0026nbsp;[27]. Genome annotation was performed via Prokka [28], and final genome visualization, including circular genome mapping, was completed via the Proksee platform [29].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAssembly quality assessment\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAssembly completeness and contiguity were assessed via BUSCO (Benchmarking Universal Single-Copy Orthologs) [30] and QUAST (Quality Assessment Tool for Genome Assemblies) [31]. Among all the assemblies, the Trycycler-based assembly [25] showed the highest BUSCO completeness, with 771 complete genes and only 3 missing genes (Additional file 1: Fig. S1). Although the Canu assembly [23] presented superior contiguity metrics in QUAST (e.g., N50), the Trycycler assembly was ultimately selected for downstream taxonomic analyses owing to its overall superior completeness (Additional file 1: Fig. S2).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e16S rRNA gene sequence analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo assess the taxonomic position of strain SNU-TC2, phylogenetic analysis based on 16S rRNA gene sequences was performed. 16S rRNA genes were predicted via Barrnap v0.9[34], and representative sequences were selected from each genome. Multiple sequence alignment was performed via MAFFT v7.490 (L-INS-i) [33], and a maximum-likelihood tree was inferred via IQ-TREE v2.2.0 via ModelFinder and 1,000 ultrafast bootstrap replicates [36]. The resulting tree was visualized via iTOL v6[42].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSelection of reference genomes for ANI analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eRepresentative genomes of the genus \u003cem\u003eBacillus\u003c/em\u003e were retrieved from the Genome Taxonomy Database (GTDB) release R226[38]. The GTDB metadata file (bac120_metadata_r226.tsv) was used to identify the genomes annotated as representative (is_representative = t) and taxonomically classified as \u003cem\u003eBacillus\u003c/em\u003e. Genomes with a CheckM-estimated completeness of \u0026ge;95% were retained to ensure high-quality references.\u003c/p\u003e\n\u003cp\u003eThe final reference set included N genomes, with a mean completeness of 99.1%. The FTP links provided in the GTDB metadata were used to download the corresponding.fna.gz FASTA files from NCBI RefSeq. These genomes were compiled into a custom reference panel and analyzed via the FastANI engine [38] integrated into the Proksee web platform [28] for pairwise ANI comparisons against the assembled query genome. A threshold of 95% ANI was applied to delineate species boundaries, which is consistent with widely accepted genomic standards for prokaryotic taxonomy [37].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCore genome phylogenetic analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo identify closely related genomes, pairwise average nucleotide identity (ANI) was calculated via FastANI (v1.33) [38] across 71 \u003cem\u003eBacillus amyloliquefaciens\u003c/em\u003e reference genomes. Fourteen strains with \u0026gt;97% identity to SNU-TC2 were selected for downstream phylogenetic analysis. Genome annotations were generated via Prokka (v1.14.6) [28] to produce GFF files, which were processed with Roary (v3.13.0) [39] to identify core genes. Multiple sequence alignment of the core genome was performed via MAFFT (v7.490) [40]. A maximum likelihood phylogenetic tree was subsequently constructed with IQ-TREE (v2.2.0) [41] on the basis of the aligned core genes, employing 1,000 bootstrap replicates. The resulting tree was visualized and annotated via the iTOL web tool (v6) [42].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSNP identification and annotation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo assess genome-wide SNP variation between SNU-TC2 and the 14 most common \u003cem\u003eBacillus amyloliquefaciens\u003c/em\u003e reference strains identified through FastANI analysis, pairwise SNP distances were calculated via snp-dists (v0.7.0) [43] on the basis of core genome alignments. The resulting distance matrix was visualized as a clustered heatmap via the R package pheatmap [44], enabling exploration of genomic diversity among closely related strains sharing \u0026gt;97% sequence identity with the query.\u003c/p\u003e\n\u003cp\u003eFor detailed functional annotation, SNPs were identified by aligning the consensus genome of SNU-TC2 against the reference genome of \u003cem\u003eB. amyloliquefaciens\u003c/em\u003e RD7-7\u0026mdash;the most closely related strain on the basis of FastANI similarity\u0026mdash;using Snippy (v4.6.0) [45]. The functional effects of the detected variants were predicted via SnpEff (v5.2) [46] with the RD7-7 genome and its corresponding GFF annotation. To refine the gene-level predictions, particularly for functionally ambiguous regions, additional annotations were obtained via Prokka (v1.14.6) [28] and eggNOG-mapper (v2.1.9) [47].\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eAssembly completeness assessment\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAssembly completeness and quality were evaluated via BUSCO and QUAST. Among the Trycycler assemblies, one variant presented the highest completeness, with 771 complete orthologs and only three missing single-copy orthologs. In contrast, assemblies generated via Canu and Flye presented slightly lower BUSCO scores. Although the results of the QUAST analysis indicated superior contiguity metrics (e.g., N50) for Canu assembly, Trycycler assembly was ultimately selected for downstream analyses because of its greater completeness (Fig. 2).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e16S rRNA gene phylogeny\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePhylogenetic analysis of the 16S rRNA gene sequences placed strain SNU-TC2 within the \u003cem\u003eBacillus amyloliquefaciens\u003c/em\u003e clade. All 27 copies of the 16S rRNA gene from SNU-TC2 consistently clustered within a strongly supported monophyletic group (bootstrap \u0026ge;98), together with the type strain DSM7 and closely related strains such as RD7-7, PP19, 35 M and YP6. No substantial sequence divergence was observed among the multiple 16S rRNA gene copies of SNU-TC2. The overall tree topology was consistent with the GTDB taxonomy [49] and supported the application of genome-based approaches for higher-resolution taxonomic assignment [50]. To improve visual clarity, a midpoint-rooted tree highlighting the SNU\u0026ndash;TC2 clade is shown in Fig. 3A, and a magnified subtree of closely related strains is presented in Fig. 3B. A collapsed version of the full 16S rRNA phylogeny including 74 \u003cem\u003eBacillus\u003c/em\u003e strains is shown in Fig. 3C, whereas the complete expanded tree is provided in Supplementary Fig. S3.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAverage nucleotide identity (ANI) analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eANI analysis revealed high genomic similarity between SNU-TC2 and reference \u003cem\u003eBacillus\u003c/em\u003e strains.\u003c/p\u003e\n\u003cp\u003eAmong the 8,956 complete genome assemblies of the \u003cem\u003eBacillus\u003c/em\u003e genus obtained from the GTDB, 71 reference genomes of \u003cem\u003eB. amyloliquefaciens\u003c/em\u003e and the query strain SNU-TC2 were selected for pairwise ANI analysis to assess species-level similarity. The assembled genome of SNU-TC2 exhibited 98.66% identity with that of \u003cem\u003eB.\u003c/em\u003e\u003cem\u003eamyloliquefaciens\u003c/em\u003e strain RD7-7, confirming the close taxonomic relationship between the two strains [41,49]. Whole-genome alignment via FastANI further demonstrated syntenic conservation and local divergence between SNU-TC2 and RD7-7, as shown in Fig. 4B. For comparison, alignment with the type strain DSM7 revealed slightly lower similarity (97.56% ANI), as shown in Fig. 4A.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePhylogenetic relationship of SNU-TC2 with related\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e\u003cem\u003eBacillus\u003c/em\u003e\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;strains\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo refine the phylogenetic placement of SNU-TC2, a maximum likelihood tree was generated on the basis of core genome alignments (Fig. 5). The analysis included \u003cem\u003e14 B. amyloliquefaciens strains\u003c/em\u003e that presented \u0026gt;97% pairwise ANI with SNU-TC2 [39]. As shown in Fig. 4, SNU-TC2 clustered within a strongly supported monophyletic clade together with strain RD7-7 (SH-aLRT/UFBoot = 98.5/99) [41, 42].\u003c/p\u003e\n\u003cp\u003eAlthough several strains presented relatively long branch lengths, \u0026gt;95% of the taxa included in the tree had ANI with SNU-TC2, which was consistent with the species-level classification. These findings support the assignment of SNU-TC2 to the \u003cem\u003eB. amyloliquefaciens\u003c/em\u003e clade and highlight RD7-7 as the closest sequenced relative to the other strains on the basis of core genome similarity.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSNP-based genomic comparison\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePairwise single-nucleotide polymorphism (SNP) distances between SNU-TC2 and the top 14 \u003cem\u003eB. amyloliquefaciens\u003c/em\u003e reference genomes were calculated via snp-dists [43], and the results are visualized as a clustered heatmap in Fig. 6. SNU-TC2 exhibited high nucleotide-level similarity (\u0026le;5,000 SNPs) with several other strains, including Ba13, 35 M, and Bacillus sp. 7D3. Variant annotation was performed using RD7-7 as the reference genome.\u003c/p\u003e\n\u003cp\u003eA total of 30,736 variants were identified between SNU-TC2 and RD7-7, including 27,497 SNPs, corresponding to an average of one variant per 119 base pairs. According to SnpEff [46] annotations, 66.4% of the SNPs were synonymous, indicating low functional impact. Missense mutations accounted for 21.9%, whereas high-impact mutations\u0026mdash;such as frameshift_variant, stop_gained, and start_lost\u0026mdash;comprised only 0.05% of the total but may be functionally significant.\u003c/p\u003e\n\u003cp\u003eTo further interpret these variants, Prokka [28] and eggNOG-mapper [47] were used for gene annotation and functional classification. The majority of the variants were located in noncoding or regulatory regions (e.g., upstream or downstream of genes), with approximately 8% falling within exonic regions. On the basis of functional impact predictions, most of the variants were classified as MODIFIER (91.95%) or LOW (6.06%), suggesting that despite high genomic similarity, subtle nucleotide differences could contribute to functional divergence between the two strains.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe aim of this study was to identify bacterial strains capable of degrading gliadin, a key immunogenic component of gluten associated with celiac disease [11]. During the screening process for gliadin-degrading activity, we isolated a previously uncharacterized strain, SNU-TC2, which formed a distinct clear zone on TSA plates containing gliadin. Notably, this strain was able to grow under both aerobic and anaerobic conditions.\u003c/p\u003e\n\u003cp\u003eTo determine its taxonomic position, we first performed phylogenetic analysis on the basis of 16S rRNA gene sequences [40-43]. All 27 copies of the 16S rRNA gene in SNU-TC2 clustered tightly with the type strain \u003cem\u003eBacillus amyloliquefaciens\u003c/em\u003e DSM7, forming a well-supported monophyletic clade. These results suggested that SNU-TC2 belongs to the \u003cem\u003eB. amyloliquefaciens\u0026nbsp;\u003c/em\u003especies complex. However, 16S rRNA-based resolution is limited at the strain level [50].\u003c/p\u003e\n\u003cp\u003eTo further clarify its relationship with other \u003cem\u003eB. amyloliquefaciens\u003c/em\u003e strains, we conducted whole-genome comparisons via FastANI against 71 reference genomes [38]. The analysis revealed \u0026gt;97% average nucleotide identity (ANI) with several strains, with the highest similarity (98.66%) observed with strain RD7-7. According to species boundary criteria established through large-scale ANI analyses [41] and standardized bacterial taxonomy frameworks [49], these values strongly support the assignment of SNU-TC2 to the \u003cem\u003eB. amyloliquefaciens\u003c/em\u003e species complex.\u003c/p\u003e\n\u003cp\u003ePhylogenetic reconstruction based on core gene alignment further supported this relationship, placing SNU-TC2 firmly within the \u003cem\u003eB. amyloliquefaciens\u0026nbsp;\u003c/em\u003eclade alongside RD7-7[44,45]. These findings are consistent with the recently proposed genome-based taxonomy, which revises bacterial phylogeny and species delineation via whole-genome data [49].\u003c/p\u003e\n\u003cp\u003eNotably, despite the high overall sequence similarity, SNP-based analysis revealed 30,736 variants between SNU-TC2 and RD7-7, including several missense and high-impact mutations [45, 46]. This extent of variation between closely related strains suggested the presence of genomic differences that may underlie the gliadin degradation ability of SNU-TC2.\u003c/p\u003e\n\u003cp\u003eTaken together, these results indicate that SNU-TC2 is a genetically distinct strain within the \u003cem\u003eB. amyloliquefaciens\u0026nbsp;\u003c/em\u003especies complex. While further studies are needed to determine the functional significance of the observed SNP diversity, SNU-TC2 represents a promising candidate for future investigations into strain-level proteolytic variation in \u003cem\u003eB. amyloliquefaciens\u003c/em\u003e.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cp\u003eCD: Celiac disease\u003c/p\u003e\n\u003cp\u003eGFD: Gluten-free diet\u003c/p\u003e\n\u003cp\u003eLAB: Lactic acid bacteria\u003c/p\u003e\n\u003cp\u003eTSA: Tryptic soy agar\u003c/p\u003e\n\u003cp\u003eMRS: de Man, Rogosa and Sharpe agar\u003c/p\u003e\n\u003cp\u003eG_TSA: Gliadin-supplemented tryptic soy agar\u003c/p\u003e\n\u003cp\u003eG_MRS: Gliadin-supplemented de Man, Rogosa and Sharpe agar\u003c/p\u003e\n\u003cp\u003eONT: Oxford Nanopore Technologies\u003c/p\u003e\n\u003cp\u003eBUSCO: Benchmarking Universal Single-Copy Orthologs\u003c/p\u003e\n\u003cp\u003eQUAST: Quality Assessment Tool for Genome Assemblies\u003c/p\u003e\n\u003cp\u003eANI: Average nucleotide identity\u003c/p\u003e\n\u003cp\u003eSNP: Single-nucleotide polymorphism\u003c/p\u003e\n\u003cp\u003eML: Maximum likelihood\u003c/p\u003e\n\u003cp\u003eSH-aLRT: Shimodaira\u0026ndash;Hasegawa approximate likelihood ratio test\u003c/p\u003e\n\u003cp\u003eUFBoot: Ultrafast bootstrap\u003c/p\u003e\n\u003cp\u003eMLST: Multilocus sequence typing\u003c/p\u003e\n\u003cp\u003eSNU-TC2: Seoul National University \u0026ndash; Trycycler cluster 2\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eFunding\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eClinical trial number Clinical trial number:\u003c/strong\u003e not applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate Ethics approval and consent to participate:\u003c/strong\u003e not applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication Consent for publication:\u003c/strong\u003e not applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests:\u003c/strong\u003e The authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026rsquo; contributions\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAll authors are affiliated with the College of Agriculture and Life Sciences, Seoul National University.\u003c/p\u003e\n\u003cp\u003eYeun Kyoung Kim conceived and conducted the study, performed experiments and analyses, and wrote the manuscript.\u003c/p\u003e\n\u003cp\u003eJincheol Cho contributed to the experimental work and prepared Figure 1.\u003c/p\u003e\n\u003cp\u003eHeebal Kim supervised the study and provided critical guidance.\u003c/p\u003e\n\u003cp\u003eAll authors approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSequence data that support the findings of this study have been deposited in the NCBI BioProject under accession code PRJNA1307060 (BioSample SAMN50652994).\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eLebwohl B, Rubio-Tapia A. \u003cstrong\u003eEpidemiology, presentation, and diagnosis of celiac disease\u003c/strong\u003e. \u003cem\u003eGastroenterology.\u003c/em\u003e 2021;160(1):63\u0026ndash;75. https://doi.org/10.1053/j.gastro.2020.06.098\u003c/li\u003e\n \u003cli\u003eArentz-Hansen H, K\u0026ouml;rner R, Molberg \u0026Oslash;, Quarsten H, Vader W, Kooy YM, et al. \u003cstrong\u003eThe intestinal T-cell response to \u0026alpha;-gliadin in adult celiac disease is focused on a single deamidated glutamine targeted by tissue transglutaminase.\u0026nbsp;\u003c/strong\u003e\u003cem\u003eJ Exp Med.\u003c/em\u003e 2000;191(4):603\u0026ndash;12.https://doi.org/10.1084/jem.191.4.603\u003c/li\u003e\n \u003cli\u003eShan L, Molberg \u0026Oslash;, Parrot I, Hausch F, Filiz F, Gray GM, et al. \u003cstrong\u003eStructural basis for gluten intolerance in celiac sprue.\u0026nbsp;\u003c/strong\u003e\u003cem\u003eScience.\u003c/em\u003e 2002;297(5590):2275\u0026ndash;9. https://doi.org/10.1126/science.1074129\u003c/li\u003e\n \u003cli\u003eSalmi A, Collin P, Korponay-Szabo IR, Laurila K, Partanen J, Huhtala H, et al. \u003cstrong\u003eEndomysial antibody-negative coeliac disease: clinical characteristics and intestinal autoantibody deposits.\u003c/strong\u003e \u003cem\u003eGut.\u003c/em\u003e 2006;55(12):1746\u0026ndash;53. doi: 10.1136/gut.2005.071514\u003c/li\u003e\n \u003cli\u003eSalmi A, Collin P. \u003cstrong\u003eCoeliac disease: villous atrophy and enteropathy following gluten exposure.\u003c/strong\u003e \u003cem\u003eGastroenterology.\u003c/em\u003e 2009;136(6):1977\u0026ndash;83. doi: 10.1053/j.gastro.2008.11.040. Epub 2008 Nov 24.\u003c/li\u003e\n \u003cli\u003eDochat C, Afari N, Satherley R-M, Coburn S, McBeth JF. \u003cstrong\u003eCeliac disease symptom profiles and their relationship to gluten-free diet adherence, mental health, and quality of life.\u0026nbsp;\u003c/strong\u003e\u003cem\u003eBMC Gastroenterol.\u003c/em\u003e 2024;24(1):9. https://doi.org/10.1186/s12876-023-03101-x\u003c/li\u003e\n \u003cli\u003eHall NJ, Rubin G, Charnock A. \u003cstrong\u003eSystematic review: adherence to a gluten-free diet in adult patients with coeliac disease.\u003c/strong\u003e \u003cem\u003eAliment Pharmacol Ther\u003c/em\u003e. 2009;30(4):315\u0026ndash;30. doi:10.1111/j.1365-2036.2009.04053.x\u003c/li\u003e\n \u003cli\u003eDe Angelis M, Rizzello CG, Fasano A, Clemente MG, De Simone C, Rizzello F, et al. \u003cstrong\u003e\u003cem\u003eLactobacillus fermentum\u003c/em\u003e attenuates gliadin-induced toxicity in intestinal epithelial cells\u003c/strong\u003e. \u003cem\u003eAppl Environ Microbiol\u003c/em\u003e. 2006;72(1):130\u0026ndash;138 https://doi.org/10.1186/1471-2180-14-19\u003c/li\u003e\n \u003cli\u003eYu-Ping Hong, Bo-Han Chen, You-Wun Wang, Ru-Hsiou Teng, Hsiao-Lun Wei, Chien-Shun Chiou. \u003cstrong\u003eThe usefulness of nanopore sequencing in whole-genome sequencing-based genotyping of \u003cem\u003eListeria monocytogenes\u003c/em\u003e and \u003cem\u003eSalmonella enterica\u003c/em\u003e serovar Enteritidis.\u003c/strong\u003e \u003cem\u003eMicrobiol Spectr\u003c/em\u003e. 2024;10:e00509-24 https://doi.org/10.1128/spectrum.00509-24\u003c/li\u003e\n \u003cli\u003eKono N, Arakawa K. \u003cstrong\u003eNanopore sequencing: review of potential applications in functional genomics.\u003c/strong\u003e \u003cem\u003eWiley Interdiscip Rev Syst Biol Med\u003c/em\u003e. 2019;11(4): e1436.https://doi.org/10.1111/dgd.12608\u003c/li\u003e\n \u003cli\u003eBerger M, Sarantopoulos C, Ongchangco D, Sry J, Cesario T. \u003cstrong\u003eRapid isolation of gluten-digesting bacteria from human stool and saliva by using gliadin-containing plates\u003c/strong\u003e. \u003cem\u003eExp Biol Med (Maywood).\u003c/em\u003e 2015;240(7):917\u0026ndash;924. https://doi.org/10.1177/1535370214564748\u003c/li\u003e\n \u003cli\u003eOrlando A, Linsalata M, Notarnicola M, Tutino V, Russo F. \u003cstrong\u003e\u003cem\u003eLactobacillus GG\u003c/em\u003e restoration of the gliadin induced epithelial barrier disruption: the role of cellular polyamines.\u0026nbsp;\u003c/strong\u003e\u003cem\u003eBMC Microbiol\u003c/em\u003e. 2014;14:19. https://doi.org/10.1186/1471-2180-14-19\u003c/li\u003e\n \u003cli\u003eCaminero, A. et al. (2019). \u003cstrong\u003eDuodenal bacteria from active celiac patients and healthy individuals hydrolyze gliadin peptides differentially via secreted proteases.\u003c/strong\u003e \u003cem\u003eNature Communications\u003c/em\u003e, 10, 1371. https://doi.org/10.1038/s41467-019-09037-9\u003c/li\u003e\n \u003cli\u003eCaminero A, McCarville JL, Galipeau HJ, Deraison C, Bernalier-Donadille A, Jury J, et al. \u003cstrong\u003eDuodenal bacterial proteolytic activity determines sensitivity to dietary antigen through protease-activated receptor-2.\u003c/strong\u003e \u003cem\u003eNat Commun.\u003c/em\u003e 2019;10:1198.https://doi.org/10.1038/s41467-019-09037-9\u003c/li\u003e\n \u003cli\u003eKim Y, Oh S, Kim SH. \u003cstrong\u003eReleased exopolysaccharides from \u003cem\u003eLactobacillus plantarum\u003c/em\u003e and \u003cem\u003eBacillus subtilis\u003c/em\u003e effectively inhibit foodborne pathogens by enhancing host defense in mice.\u003c/strong\u003e \u003cem\u003eFood Sci Biotechnol.\u003c/em\u003e 2020;29(11):1449\u0026ndash;1456. doi: 10.1007/s00253-018-8946-0\u003c/li\u003e\n \u003cli\u003eJain, M., et al. (2016). \u003cstrong\u003eNanopore sequencing and assembly of a human genome with ultralong reads\u003c/strong\u003e \u003cem\u003eNature Biotechnology\u003c/em\u003e, 36(4), 338\u0026ndash;345. https://doi.org/10.1038/nbt.4060\u003c/li\u003e\n \u003cli\u003eWang Y, Zhao Y, Bollas A, Wang Y, Au KF. \u003cstrong\u003eNanopore sequencing technology, bioinformatics and applications.\u0026nbsp;\u003c/strong\u003e\u003cem\u003eNat Biotechnol\u003c/em\u003e. 2021;39:1348\u0026ndash;1365. https://doi.org/10.1038/s41587-021-01108-x\u003c/li\u003e\n \u003cli\u003eOxford Nanopore Technologies. (2019). \u003cstrong\u003eGuppy basecalling software.\u003c/strong\u003e https://community.nanoporetech.com\u003c/li\u003e\n \u003cli\u003eWick RR, Judd LM, Holt KE\u003cstrong\u003e. Performance of neural network basecalling tools for Oxford Nanopore sequencing.\u0026nbsp;\u003c/strong\u003e\u003cem\u003eGenome Biol\u003c/em\u003e. 2019;20:129. https://doi.org/10.1186/s13059-019-1727-y\u003c/li\u003e\n \u003cli\u003eWick R. \u003cstrong\u003ePorechop: adapter trimmer for Oxford Nanopore reads.\u003c/strong\u003e 2017. Available from: https://github.com/rrwick/Porechop\u003c/li\u003e\n \u003cli\u003eDe Coster W, D\u0026rsquo;Hert S, Schultz DT, Cruts M, Van Broeckhoven C. \u003cstrong\u003eNanoPack: visualizing and processing long-read sequencing data.\u003c/strong\u003e \u003cem\u003eBioinformatics\u003c/em\u003e. 2018;34(15):2666\u0026ndash;9. https://doi.org/10.1093/bioinformatics/bty149\u003c/li\u003e\n \u003cli\u003eWick R. \u003cstrong\u003eFiltlong: quality filtering for long reads\u003c/strong\u003e. 2018. Available from: https://github.com/rrwick/Filtlong\u003c/li\u003e\n \u003cli\u003eKoren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. \u003cstrong\u003eCanu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.\u003c/strong\u003e \u003cem\u003eGenome Res\u003c/em\u003e. 2017;27(5):722\u0026ndash;36. https://doi.org/10.1101/gr.215087.116\u003c/li\u003e\n \u003cli\u003eKolmogorov M, Yuan J, Lin Y, Pevzner PA. \u003cstrong\u003eAssembly of long, error-prone reads using repeat graphs.\u003c/strong\u003e \u003cem\u003eNat Biotechnol\u003c/em\u003e. 2019;37(5):540\u0026ndash;6. https://doi.org/10.1038/s41587-019-0072-8\u003c/li\u003e\n \u003cli\u003eWick RR, Judd LM, Cerdeira LT, Hawkey J, Meric G, Vezina B, et al\u003cstrong\u003e. Trycycler: consensus long-read assemblies for bacterial genomes.\u003c/strong\u003e \u003cem\u003eGenome Biol\u003c/em\u003e. 2021;22:266. https://doi.org/10.1186/s13059-021-02483-z\u003c/li\u003e\n \u003cli\u003eOxford Nanopore Technologies. \u003cstrong\u003eMedaka: sequence correction software\u003c/strong\u003e. 2020. Available from: https://github.com/nanoporetech/medaka\u003c/li\u003e\n \u003cli\u003eLee JY, Kong M, Oh J, Lim JS, Chung SH, Kim JM, et al. \u003cstrong\u003eComparative evaluation of Nanopore polishing tools for microbial genome assembly and polishing strategies for downstream analysis.\u003c/strong\u003e \u003cem\u003eSci Rep.\u003c/em\u003e 2021;11:20740. https://doi.org/10.1038/s41598-021-00178-w\u003c/li\u003e\n \u003cli\u003eSeemann T\u003cstrong\u003e. Prokka: rapid prokaryotic genome annotation.\u0026nbsp;\u003c/strong\u003e\u003cem\u003eBioinformatics.\u003c/em\u003e 2014;30(14):2068\u0026ndash;2069. https://doi.org/10.1093/bioinformatics/btu153\u003c/li\u003e\n \u003cli\u003eGrant JR, Enns E, Marinier E, Mandal A, Herman EK, Chen C-Y, Graham M, Van Domselaar G, Stothard P. \u003cstrong\u003eProksee: in-depth characterization and visualization of bacterial genomes.\u003c/strong\u003e \u003cem\u003eNucleic Acids Res.\u003c/em\u003e 2023;51(W1): W484\u0026ndash;W492. https://doi.org/10.1093/nar/gkad326\u003c/li\u003e\n \u003cli\u003eSim\u0026atilde;o FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. \u003cstrong\u003eBUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.\u003c/strong\u003e \u003cem\u003eBioinformatics\u003c/em\u003e. 2015;31(19):3210\u0026ndash;2. https://doi.org/10.1093/bioinformatics/btv351\u003c/li\u003e\n \u003cli\u003eGurevich A, Saveliev V, Vyahhi N, Tesler G. \u003cstrong\u003eQUAST: quality assessment tool for genome assemblies.\u003c/strong\u003e \u003cem\u003eBioinformatics.\u003c/em\u003e 2013;29(8):1072\u0026ndash;5. https://doi.org/10.1093/bioinformatics/btt086\u003c/li\u003e\n \u003cli\u003eSeemann T. \u003cstrong\u003eBarrnap 0.9: Bacterial ribosomal RNA predictor\u003c/strong\u003e. 2014. Available from: https://github.com/tseemann/barrnap\u003c/li\u003e\n \u003cli\u003eKatoh K, Standley DM. \u003cstrong\u003eMAFFT multiple sequence alignment software version 7: improvements in performance and usability\u003c/strong\u003e. \u003cem\u003eMol Biol Evol.\u003c/em\u003e 2013;30(4):772\u0026ndash;80. https://doi.org/10.1093/molbev/mst010\u003c/li\u003e\n \u003cli\u003eMinh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R. \u003cstrong\u003eIQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era.\u003c/strong\u003e \u003cem\u003eMol Biol Evol.\u003c/em\u003e 2020;37(5):1530\u0026ndash;4. https://doi.org/10.1093/molbev/msaa015\u003c/li\u003e\n \u003cli\u003eLetunic I, Bork P. \u003cstrong\u003eInteractive Tree of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation.\u003c/strong\u003e \u003cem\u003eNucleic Acids Res\u003c/em\u003e. 2021;49(W1): W293\u0026ndash;6.https://doi.org/10.1093/nar/gkab301\u003c/li\u003e\n \u003cli\u003eParks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil PA, Hugenholtz P. \u003cstrong\u003eA complete domain-to-species taxonomy for Bacteria and Archaea.\u0026nbsp;\u003c/strong\u003e\u003cem\u003eNat Biotechnol\u003c/em\u003e. 2020;38(9):1079\u0026ndash;86. https://doi.org/10.1038/s41587-020-0501-8\u003c/li\u003e\n \u003cli\u003eRichter M, Rossell\u0026oacute;-M\u0026oacute;ra R. \u003cstrong\u003eShifting the genomic gold standard for the prokaryotic species definition.\u003c/strong\u003e \u003cem\u003eProc Natl Acad Sci U S A\u003c/em\u003e. 2009;106(45):19126\u0026ndash;19131. https://doi.org/10.1073/pnas.0906412106\u003c/li\u003e\n \u003cli\u003eJain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. \u003cstrong\u003eHigh throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries\u003c/strong\u003e. \u003cem\u003eNat Commun\u003c/em\u003e. 2018;9:5114. https://doi.org/10.1038/s41467-018-07641-9\u003c/li\u003e\n \u003cli\u003ePage AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MTG, et al. \u003cstrong\u003eRoary: rapid large-scale prokaryote pan genome analysis.\u003c/strong\u003e \u003cem\u003eBioinformatics.\u0026nbsp;\u003c/em\u003e2015;31(22):3691\u0026ndash;3693. https://doi.org/10.1093/bioinformatics/btv421\u003c/li\u003e\n \u003cli\u003eKatoh K, Standley DM. \u003cstrong\u003eMAFFT multiple sequence alignment software version 7: improvements in performance and usability.\u0026nbsp;\u003c/strong\u003e\u003cem\u003eMol Biol Evol.\u003c/em\u003e 2013;30(4):772\u0026ndash;780. https://doi.org/10.1093/molbev/mst010\u003c/li\u003e\n \u003cli\u003eMinh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Haeseler A von, Lanfear R. \u003cstrong\u003eIQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era.\u003c/strong\u003e \u003cem\u003eMol Biol Evol.\u003c/em\u003e 2020;37(5):1530\u0026ndash;1534. https://doi.org/10.1093/molbev/msaa015\u003c/li\u003e\n \u003cli\u003eLetunic I, Bork P. \u003cstrong\u003eInteractive Tree of Life (iTOL) v6: an online tool for phylogenetic tree display and annotation.\u0026nbsp;\u003c/strong\u003e\u003cem\u003eNucleic Acids Res\u003c/em\u003e. 2021;49(W1): W293\u0026ndash;W296. https://doi.org/10.1093/nar/gkab301\u003c/li\u003e\n \u003cli\u003eSeemann T. \u003cstrong\u003esnp-dists: pairwise SNP distance matrix from a FASTA alignment\u003c/strong\u003e. https://github.com/tseemann/snp-dists\u003c/li\u003e\n \u003cli\u003eKolde R. pheatmap: \u003cstrong\u003ePretty Heatmaps. R package version 1.0.12.\u003c/strong\u003e https://cran.r-project.org/package=pheatmap\u003c/li\u003e\n \u003cli\u003eSeemann T. \u003cstrong\u003eSnippy: fast bacterial variant calling and core genome alignment.\u003c/strong\u003e https://github.com/tseemann/snippy\u003c/li\u003e\n \u003cli\u003eCingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. \u003cstrong\u003eA program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly (Austin).\u003c/strong\u003e 2012;6(2):80\u0026ndash;92. https://doi.org/10.4161/fly.19695\u003c/li\u003e\n \u003cli\u003eCantalapiedra CP, Hern\u0026aacute;ndez-Plaza A, Letunic I, Bork P, Huerta-Cepas J. \u003cstrong\u003eeggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale.\u003c/strong\u003e \u003cem\u003eMol Biol Evol\u003c/em\u003e. 2021;38(12):5825\u0026ndash;5829. https://doi.org/10.1093/molbev/msab293\u003c/li\u003e\n \u003cli\u003eChaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. \u003cstrong\u003eGTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database\u003c/strong\u003e. \u003cem\u003eBioinformatics\u003c/em\u003e. 2020;36(6):1925\u0026ndash;1927.https://doi.org/10.1093/bioinformatics/btz848\u003c/li\u003e\n \u003cli\u003eParks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil PA, Hugenholtz P\u003cstrong\u003e. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life.\u003c/strong\u003e \u003cem\u003eNat Biotechnol\u003c/em\u003e. 2018;36(10):996\u0026ndash;1004. https://doi.org/10.1038/nbt.4229\u003c/li\u003e\n \u003cli\u003eHassler HB, Probert B, Moore C, Lawson E, Jackson RW, Russell BT, Richards VP.\u003cbr\u003e\u003cstrong\u003ePhylogenies of the 16S rRNA gene and its hypervariable regions lack concordance with core genome phylogenies.\u003cbr\u003e\u003c/strong\u003e\u003cem\u003eBMC Genomics\u003c/em\u003e. 2022;23(1):513. https://doi.org/10.1186/s40168-022-01295-y\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"gluten degradation, gliadin, whole-genome sequencing, Oxford Nanopore, Bacillus amyloliquefaciens, SNP analysis, celiac disease","lastPublishedDoi":"10.21203/rs.3.rs-6993182/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6993182/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground: \u003c/strong\u003eGluten, a major protein found in wheat, rye, and barley, plays a crucial role in dough formation. Among its components, α-, β-, and γ-gliadins are resistant to enzymatic hydrolysis and can accumulate in the human digestive tract, potentially triggering celiac disease (CD) or nonceliac gluten sensitivity in genetically predisposed individuals.\u003cstrong\u003e\u003cbr\u003e\nObjective: \u003c/strong\u003eThis study aimed to identify and characterize a bacterial strain capable of degrading gliadin via whole-genome sequencing and comparative genomic analysis.\u003cbr\u003e\n\u003cstrong\u003eMethods: \u003c/strong\u003eA gliadin-degrading \u003cem\u003eBacillus amyloliquefaciens strain\u003c/em\u003e, designated SNU-TC2, was isolated on the basis of its ability to form a clear zone on gliadin-containing agar. Whole-genome sequencing was performed viaOxford Nanopore Technology, followed by genome assembly and annotation. Taxonomic classification and genomic comparisons were conducted viaFastANI, core gene phylogenetic, and SNP-based analyses.\u003cbr\u003e\n\u003cstrong\u003eResults: \u003c/strong\u003eSNU-TC2 showed \u0026gt;97% nucleotide identity to reference \u003cem\u003eB. amyloliquefaciens\u003c/em\u003estrains, with the highest similarity (98.66%) to RD7-7. However, SNP-based comparisonsrevealed 30,736 variants, including several functionally significant mutations. These findings suggest that, while taxonomically similar, SNU-TC2 may harbor distinct genetic features relevant to gliadin degradation.\u003cbr\u003e\n\u003cstrong\u003eConclusions: \u003c/strong\u003eSNU-TC2 represents a promising candidate for further functional validation studies aimed at gluten degradation. Its genomic distinctiveness within the \u003cem\u003eB. amyloliquefaciens\u003c/em\u003e strain complex warrants continued investigation for potential application in celiac disease management\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e","manuscriptTitle":"Whole-Genome Sequencing of Gliadin Degrading Genes Bacillus amyloliquefaciens Strain via Nanopore Technology","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-18 16:51:40","doi":"10.21203/rs.3.rs-6993182/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a392304e-45c7-4dc5-a03b-a0bacca2beb1","owner":[],"postedDate":"September 18th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-10-13T04:38:55+00:00","versionOfRecord":[],"versionCreatedAt":"2025-09-18 16:51:40","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6993182","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6993182","identity":"rs-6993182","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.