Longitudinal Gut Microbiota Tracking Reveals the Persistent Spread of Mobile Genes and HGT-Driven Community Stabilization

preprint OA: gold CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 221,485 characters · extracted from preprint-html · click to expand
Longitudinal Gut Microbiota Tracking Reveals the Persistent Spread of Mobile Genes and HGT-Driven Community Stabilization | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Longitudinal Gut Microbiota Tracking Reveals the Persistent Spread of Mobile Genes and HGT-Driven Community Stabilization Jingyuan Fu, Haoran Peng, Sergio Andreu-Sánchez, Angel Ruiz-Moreno, and 5 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6509357/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 22 Nov, 2025 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Abstract Horizontal gene transfer (HGT) is a major driver of bacterial evolution, but its role in shaping the human gut microbiome over time remains poorly understood. Here, we present a longitudinal metagenomic analysis of 676 fecal samples from 338 individuals collected ~4 years apart, using a newly developed workflow to detect recent HGT events from metagenome-assembled genomes. We identified 5,644 high-confidence HGT events occurring within the past ~10,000 years across 116 gut bacterial species. We find that species pairs with a HGT relationship were significantly more likely to maintain stable ecological relationships over the 4-year period, suggesting that gene exchange contributes to ecological stability. Notably, HGT and strain replacement act together to disseminate mobile genes in the population. Furthermore, our observation that an individual's mobile gene pool remains highly personalized and stable over time indicates that host lifestyles drive specific gene transfer. For example, proton pump inhibitor usage was linked to increased transfer of multidrug transporter genes. Our findings demonstrate, at individual gut microbiome level, that HGT is both an integral and stabilizing force in the human gut ecosystem and an important mechanism for disseminating adaptive functions, underscoring their potential for tracking host lifestyle. Biological sciences/Microbiology/Microbial genetics/Bacterial genetics Biological sciences/Ecology/Ecological networks Health sciences/Medical research/Epidemiology Health sciences/Risk factors Biological sciences/Microbiology/Bacteria/Metagenomics human gut microbiome horizontal gene transfer microbial community mobile genes co-abundance drugs strain transmission gene transmission Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 INTRODUCTION Horizontal gene transfer (HGT) is a microbial mechanism for the acquisition of genetic material from non-parental lineages that plays a significant role in microbial adaptive evolution and interactions 1 , 2 . Although it has been proposed that bacteria rarely work together 3 , highly abundant co-occurring bacteria tend to exchange more genes 4 , 5 , and HGT has been shown to help competitive bacteria co-exist 6 and to maintain the stability of the microbial community 7 , 8 . The human gut microbiome harbors one of the densest, most interactive microbial ecosystems in the body. Gene exchange via HGT is pervasive among gut bacteria 9 , and it has been linked to host lifestyle factors such as geography, diet, and medication use 10 – 13 . Non-industrialized populations, for instance, exhibit higher rates of HGT involving carbohydrate-active enzymes compared to industrialized populations 5 . Moreover, HGT can also contribute to bacterial functionality at both the level of individual bacteria and community level. Evidence has suggested that the metabolic potential of the infant gut microbiome can be influenced by mobile genetic elements from the maternal microbiome 14 . Accordingly, there is increasing interest in understanding the role of HGT in shaping the dynamics of microbial ecosystems from global to individual scales 12 , particularly for recent human history. Prior work indicates that genes transferred within the last ~ 10,000 years often confer contemporary adaptive advantages, such as defense mechanisms and antimicrobial resistance, whereas more ancient transfers tend to involve core metabolic functions 4 . However, few studies have investigated HGT dynamics within individual gut microbiomes over time. One recent study that assessed HGTs in 26 samples longitudinally over maximum 1 year concluded that HGT events are both unique within individuals and change over time 13 . To date, no comprehensive long-term study has addressed how ongoing HGT shapes gut community structure or how these transfers relate to host phenotypic traits 15 . In this study, we profiled HGT events within a longitudinal microbiome cohort of 338 participants from the Lifelines-DEEP study. For these participants, fecal microbiome samples had been collected and sequenced at two time points 4-years apart 16 , accompanied by collection of detailed phenotype information. To capture recent HGT events (within ~ 10,000 years), we developed a dedicated HGT detection pipeline based on metagenome-assembled genomes (MAGs) called HGT Detection from MAGs at Individual level (HDMI). Using this approach, we profiled thousands of HGT events and evaluated their persistence, spread, and impact on the gut microbial community. We show that HGT events contribute to stabilizing species interactions over time and can disseminate adaptive functions across hosts. We also link specific gene transfers to host factors such as medication use, providing new insight into how lifestyle pressures drive microbiome evolution. RESULTS Cohort and the detection of HGTs in the gut microbiome This study included 676 paired microbiome samples from 338 individuals in the Lifelines-DEEP cohort from the northern Netherlands, collected at two time points approximately 4 years apart. At baseline, participants had a mean age of 48.2 years (range = 18–80, SD = 11.7) and a mean BMI of 25.4 (range = 17.6–43.3, SD = 4.08). At follow-up, the mean age was 51.7 years (range = 22–84, SD = 11.7) and the mean BMI was 25.6 (range = 16.1–37.6, SD = 4.0). The phenotypic data assessed included anthropometric traits (e.g., age, sex, BMI, height) and the use of 14 medications (e.g., proton pump inhibitors [PPIs], oral contraceptives, beta-blockers, statins). From the baseline samples (n = 338), we reconstructed 1,473 high-quality MAGs (mean completeness = 95.14 ± 2.68%, mean contamination = 0.69 ± 0.75%) ( see Methods , Fig. 1 a–c) that represent 192 distinct bacterial species ( Table S1 ). To identify HGT events, we developed the HDMI pipeline, which reliably identifies recent HGT events between MAGs. We defined an HGT event as a DNA transfer detected between two distinct genomes that is characterized by the presence of a pair of highly similar DNA regions (> 500 bp, > 99% identity, and < 95% average nucleotide identity (ANI) between genomes). Each transferred DNA region in the donor or recipient genome is referred to as an HGT segment. Thus, a single HGT event involves two corresponding segments, each located in a separate genome, which may differ in length, genomic context, or precise boundaries. ( Fig. S1 , see Methods ). In brief, HDMI combines sequence similarity searches (identifying > 500 bp regions with > 99% identity between genomes of < 95% ANI) with multiple quality control steps (e.g., split-read validation and conserved-gene filtering) to ensure high-confidence HGT detection. Compared to other HGT detection tools, such as the recently developed WAAFLE 13 , HDMI offers several advantages: 1) it focuses on detecting recent HGT events (occurring within the past 0–10,000 years), 2) it enables HGT detection at individual level, and 3) it has high sensitivity for detecting HGT between bacteria with various phylogenetic distances, especially for intra-genus events ( Fig. S2 , Supplementary Note 1 ). Detailed descriptions of all methods, including metagenomic assembly, genome binning, HGT detection, and benchmarking, are provided in the Methods and Supplementary Note 1. Using our HGT detection pipeline, we identified 5,644 high-confidence, recent HGT events (occurring within the past ~ 10,000 years, Table S2 ). Our data show that HGT is common in the gut microbiome. Out of 1,473 MAGs, 901 MAGs (61.2%), representing 116 unique species, were involved in at least one HGT event, with a total of 7,581 HGT segments identified (Fig. 1 a, Table S3 ). Segment sizes ranged from 0.5 to 64 kb (median ~ 1.65 kb), and the median nucleotide identity between donor and recipient segments was 99.36%, consistent with transfers occurring within the past 10,000 years. Gene functions enriched in HGT Next, we hypothesized that two sets of genes would likely be enriched in recent HGT events in the human gut: genes facilitating gene transfer (e.g., mobilome components such as prophages and transposons) and those involved in defense mechanisms, including antimicrobial resistance. To investigate this, we performed a functional enrichment analysis comparing the genes present in HGT segments to those in the pangenome ( see Methods , Table S4 ). In agreement with previous studies 17 , functions associated with gene transfer were significantly enriched for transposase, recombinase, MobA/MobL, and phage integrase (Fig. 2 a). These self-transmissible genes, which can move among various species and occasionally carry host genes, tend to have conserved transfer elements despite the variability in the host genes they transport. For example, we observed that a 5-kb HGT segment carrying a recombinase was shared among seven genera ( Fig. S3 ). Additionally, we found a bacteriophage belonging to class Caudoviricetes that was identified in a 2.7-kb HGT segment using geNomad 18 to be shared among four families (Fig. 2 b, Fig. S3 , Supplemental Note 2 ). Moreover, we estimated that 154 HGT events had occurred within the past 100 years, showing 100% nucleotide identity. This included four recent transfer events linked to the spread of tetracycline resistance genes that showed 100% identity across four different genera: Prevotella, Anaerobutyricum , Bacteroides , and Phocaeicola (Fig. 2 c, Fig. S4 , Supplemental Note 3 ). This observation aligns with the widespread use of tetracyclines starting in the 1950s 19 . Furthermore, we observed significant enrichment for functions involved in the biosynthesis of the molybdenum cofactor (MoCo), including MoCF biosynthesis, MoaC, and FdhD-NarQ (Fig. 2 a). Proteins in the FdhD-NarQ family, such as FdhD, facilitate the insertion of MoCo into target enzymes, while MoaC and MoCF_biosynth catalyze the early and later steps of MoCo biosynthesis, respectively. Together, these components are critical for the MoCo pathway, and MoCo-dependent enzymes are essential for anaerobic respiration in microorganisms 20 , 21 . Additionally, motile bacteria (Likelihood-ratio test (LRT), p -value = 0.00228) and bacteria harboring MoaC (LRT, p -value = 0.0233) displayed significantly higher degree centralities within the HGT network, suggesting that these species act as hubs capable of engaging in gene transfer with a diverse array of other species (Fig. 2 d, Fig. S5 , Supplemental Note 4 ). Previous studies have shown that overexpression of MoCo synthesis genes promotes the growth of Escherichia coli in inflamed intestines 21 and that disruption of these genes leads to poor E. coli growth 22 . This suggests that MoCo plays a critical role in enabling microorganisms to thrive in the hypoxic environment of the intestine. Our findings extend this observation by demonstrating that HGT can facilitate the widespread prevalence of MoCo synthesis genes among intestinal microorganisms, not just in E. coli , thereby enhancing their adaptation to hypoxic conditions. However, further research is needed to fully elucidate the role of HGT in this process. HGT fosters stable microbial communities We next hypothesized that species with an HGT relationship may show stable co-abundance relationships over time due to their ecological interactions. To test this, we calculated pairwise correlations among 192 species based on their abundance across individuals at each time point. We identified 1,968 species pairs with significant positive correlations ( p -value 0.2) and 1,702 species pairs with significant negative correlations ( p -value < 0.01, rho < -0.2) at both time points (Fig. 3 a, see Methods ). Notably, 73.7% (73/99) of species pairs with an HGT relationship at baseline (Fig. 3 b) exhibited stable positive co-abundance relationships, showing a 2.24-fold enrichment ( p -value = 4.5x10 − 21 ) compared to only 32.9% (1979/6001) for species pairs without an HGT relationship (Fig. 3 b). These findings suggest that HGT may indeed reinforce stable ecological relationships. Focusing on species with stable co-abundance relationships (both positive-positive and negative-negative relationships), we constructed a co-abundance network that was further clustered into seven sub-communities using Leiden clustering 23 (Fig. 3 c, see Methods ). The three largest sub-communities were Communities A (n = 47), B (n = 44), and C (n = 92). Within each sub-community, nearly all species were positively correlated (Fig. 3 d), whereas species across different communities were predominantly negatively correlated. Interestingly, the occurrence of HGT events was not evenly distributed among these communities. In Community C, about 10.5% (135/1280) of the co-abundance relationships also exhibited an HGT relationship, compared to 1.4% (5/353) in Community A (Fisher’s exact test, p -value = 5.88x10 − 10 ) and 3.1% (11/354) in Community B (Fisher’s exact test, p -value = 2.65x10 − 6 ) (Fig. 3 d). Furthermore, Community C was significantly enriched in virulence factors (VFs) (LRT, p -value = 2.0x10 − 5 ), antibiotic resistance genes (ARGs) (LRT, p -value = 3.6x10 − 2 ), and antimicrobial peptides resistance genes (AMPRs) (LRT, p -value = 7.84x10 − 5 ) ( Fig. S6 , Table S5 ) compared to Communities A and B, while no significant differences in these gene categories were observed between Communities A and B (LRT, p -value > 0.1). In addition, temporal changes in the composition of Community C, as measured by Bray-Curtis distances, were significantly lower than those observed for Communities A and B ( p -value = 1.13x10 − 20 , Fig. 3 e, see Methods ). Consistent with previous reports that HGT is more likely to occur between phylogenetically close species 5 , 13 , our data show that phylogenetically close bacteria exhibited a higher HGT rate (9999 permutation partial Mantel tests, p -value = 0.0002, Fig. 3 f, Table S6 ). We also observed a negative relationship between phylogenetic distance and co-abundance strength (partial Mantel tests, p -value = 0.002, Fig. 3 g). Importantly, even after controlling for phylogenetic distance, species pairs with an HGT relationship still demonstrated stronger positive co-abundance relationships (9999 permutation partial Mantel tests, p -value = 0.001, Fig. 3 h). Together, these findings suggest that species pairs involved in HGT tend to accumulate more resistance genes and to form more stable ecological interactions. Propagation of HGT segments in the gut community is mediated by both HGT and strain replacement To quantify the spread of HGT segments over time, we compared the prevalence of each HGT segment across individuals at baseline versus follow-up. Out of 338 individuals, an HGT segment was present in 25.6% of individuals on average at baseline, and this prevalence increased to 27.1% at follow-up ( p -value = 3.57x10 − 200 , Fig. 4 a). Overall, 4,696 of the 7,581 segments (62.0%) were detected in a greater fraction of individuals after 4 years compared to baseline, while 2,530 segments (33.4%) became less prevalent (Fig. 4 b). This indicates a net propagation of transferrable segments within the population over the study period. The top increased segment was a 2.7-kb region annotated as bacteriophage in Agathobacter rectalis , with a prevalence that increased from 7.0–30.2% after 4 years (Fig. 4 c). Another notable increase in Anaerobutyricum hallii was linked to a 5.2-kb segment encoding β-lactamase, with its prevalence rising from 40.5–52.1%. As β-lactamase is involved in bacterial resistance to β-lactam antibiotics, this observation suggests a potential risk of widespread dissemination of ARGs within the population. Note that the observed increase in HGT segment prevalence may be explained by two scenarios. In the first, a segment is directly transferred from one species to another. In the second, a strain lacking the HGT segment is replaced by one that carries it. We hypothesized that phylogenetic analysis can distinguish between these scenarios. If the strain acquired the segments during the 4 years, the samples at both time points should cluster together because their genetic background is the same. If a baseline strain without the segment is replaced by another strain with the segment, samples from the two timepoints would cluster in different clades due to their diverse genetic backgrounds. To assess this, we first visualized changes in HGT segments within individuals over the 4-year period ( Fig. S7 ), which identified an interesting cluster of HGT segments encoding glycoside hydrolase genes associated with A. rectalis ( Fig. S7 ). For 24 individuals, the presence or absence status of this A. rectalis segment had changed in 4 years, and we constructed the phylogenetic tree of A. rectalis based on its marker genes for those individuals at both time points (Fig. 4 d, see Methods ). Interestingly, for 12 individuals, the A. rectalis strains at the two timepoints clustered together (phylogenetic distance < 0.1) (Fig. 4 e), indicating a true HGT event resulting in the gain (or loss) of the segment within the 4-year interval (Fig. 4 e). For the other 12 individuals, the A. rectalis strains with or without the segment from the two timepoints clustered into different clades (Fig. 4 f), demonstrating that the change in the presence of HGT segments was due to strain replacement. An individual's HGT remains highly personalized and stable over time, indicating that HGT may serve as a lasting record of host lifestyle Although the overall prevalence of HGT segments increased over time, the prevalence of 5,454 segments (71.9% of all HGT segments) changed minimally over the 4-year period, varying by no more than 5% from baseline (Fig. 5 a). We therefore investigated whether overall HGT profiles are individual-specific and persistent over time. Using Jaccard distance to assess the similarity of HGT profiles between the two time points, we found that these profiles were highly individual-specific, with temporal changes smaller than inter-individual differences (9999 permutation Wilcoxon-Mann-Whitney test, p -value < 1x10 − 4 , Fig. 5 b). Based on HGT profiles, we could correctly match 194 out of 338 paired samples from the same donor, a significantly better performance than could be achieved using taxonomic composition alone (42/338, Fig. 5 c) and microbial pathway alone (16/338). These findings suggest that microbial individual-specificity 16 is reflected not only in microbial composition and genetic makeup but also in their ecological interactions. Since mobile genes are stable and highly individual-specific, we hypothesized that specific lifestyle factors might drive specific gene transfer events. Given that antibiotic use is already well known to promote the transfer of ARGs 24 , we instead focused on the often underestimated influence of non-antibiotic drug use on HGT in the gut microbiome. To estimate the effects of non-antibiotic drugs on HGT changes, we calculated the abundance of mobile genes in HGT segments ( see Methods ) and applied linear mixed models (LMMs) that accounted for age, sex, and read count as fixed effects and treated individual ID and timepoints as random effects. This analysis identified 48 positive and 14 negative associations (false discovery rate (FDR) < 0.05, Table S7 ). Among the associations, PPI usage was significantly linked to an increased abundance of mobile genes encoding the ABC-2 membrane transporter (Beta = 0.51, p -value = 0.0011), the ABC transporter (Beta = 0.54, p -value = 0.00034), and the multi-antimicrobial extrusion protein (Mate) (Beta = 0.43, p -value = 0.00052) (Fig. 5 d–f). Notably, the genes encoding the ABC transporter and the ABC-2 membrane transporter, which were shared between Blautia_A wexlerae and Anaerobutyricum hallii , were located on the same HGT segment. Protein modeling of these genes confirmed that they code for a complete ABC transporter structure comprising two nucleotide-binding domains and two transmembrane domains (Fig. 5 g). Similarly, modeling verified that Mate, which was also shared between Blautia_A wexlerae and Anaerobutyricum hallii , encodes a complete membrane transporter (Fig. 5 h). Both the ABC transporter and Mate can translocate a variety of toxic compounds across membranes, suggesting a potential mechanism by which bacteria gain beneficial genes from other bacteria to mitigate PPI-induced toxicity 25 . Additionally, we observed that the abundance of streptomycin adenylyltransferase within mobile regions was significantly positively associated with age ( p -value = 0.00021, Fig. 5 i). This association may reflect increased antibiotic exposure in elderly individuals, but it could also indicate that resistance genes accumulate and disseminate more readily within the microbial communities of older hosts. In contrast, the abundance of beta-galactosidase within mobile regions showed a significant negative association with age ( p -value = 0.000067, Fig. 5 j). This observation might be explained by higher dairy product consumption among younger individuals, which stimulates microbial beta-galactosidase production for lactose degradation, while reduced dairy intake in older adults could lead to decreased enzyme abundance. Overall, our results suggest that the mobile gene pool within the human microbiome can serve as a reflection of host lifestyle. DISCUSSION This study presents a large-scale, longitudinal metagenomic investigation into the dynamics of recent HGT in the human gut microbiome. Our findings collectively demonstrate that HGT segments can disseminate among hosts, either directly through gene transfer events or indirectly via strain transmission, thereby enhancing their prevalence within populations. Additionally, we observe that species engaged in HGT frequently establish stable ecological associations and tend to accumulate resistance genes. Our results also highlight the individual-specific nature of HGT events and their responsiveness to host lifestyle factors such as aging and non-antibiotic drug usage. These insights expand current knowledge regarding microbial interactions mediated by gene transfer within the human gut and underscore the potential of HGT as a valuable record of recent human lifestyle. HDMI represents a novel workflow capable of detecting recent HGT events (0–10,000 years) at the individual level. Although recent HGT in the human gut microbiome has been shown to be closely related to human lifestyle and metabolic capabilities 5,12,14 , most HGT detection tools cannot distinguish between recent and long-term events 13,26,27 . MetaCHIP is one of the few methods capable of distinguishing between recent and long-term HGT events through phylogenetic tree comparisons, but it cannot be applied at the individual level 28 . In contrast, the recently published tool WAAFLE 13 can infer HGT events from assembled contigs at the individual level and has demonstrated superior performance to MetaCHIP 13 . Our benchmark results show that HDMI outperforms WAAFLE, especially in detecting intra-genus HGT events. As both previous studies 5,13 and our findings indicate, phylogenetically close bacteria are more likely to exchange genes, making intra-genus HGT events a major component of recent gene transfer, further highlighting the advantage of our tool. Building on this approach, we identified previously unrecognized gene transfer events in longitudinal samples from 338 individuals, significantly broadening our understanding of the mobile gene pool in the human gut microbiome. This methodological advance enables the analysis of microbial gene flow at the individual level and lays the groundwork for future investigations into the ecological role of HGT. This study also demonstrates that HGT promotes positive ecological relationships and enhances the stability of microbial community structure. While previous cross-sectional studies have indicated that co-occurring microorganisms are more likely to participate in HGT 29 , no research had examined the role of HGT in human gut microbiota temporal stability. Theoretical models and experimental studies support the notion that HGT fosters cooperative coexistence among microorganisms 6,30 . For instance, ecological stability analyses based on the generalized Lotka–Volterra model have shown that introducing highly transferable mobile genes, such as resistance genes, can improve the community's resilience to disturbances 31 . By enabling the widespread dissemination of beneficial functions, HGT allows more species to share critical survival advantages, thereby promoting multi-species coexistence and enhancing overall system homeostasis. Our longitudinal observations revealed that species involved in HGT tend to accumulate more resistance genes and exhibit more stable ecological relationships, strongly confirming the predictions of both experimental and modeling studies 30-32 . Our findings provide new evidence for the long-term impact of HGT on human gut microbial communities and suggest that, in addition to species diversity and relative abundance, the extent of interspecies genetic exchange may be a key factor in maintaining intestinal microecological homeostasis. In previous studies, HGT has been reported to facilitate the spread of mobile genes through two primary mechanisms. First, mobile genes can be directly transferred between individuals via mobile elements 14 . For example, in a mother-to-child cohort, phage-associated gene fragments in maternal strains were transferred to distinct strains in infants without direct strain transfer, enabling the infant’s intestinal microbes to acquire additional functions 14 . Second, mobile genes may be acquired from new strains through HGT. In fecal microbiota transplantation studies, newly introduced strains were observed to transfer genes to other species in the recipient microbiota 33 , illustrating a scenario involving both strain and gene transfer. Our results add another dimension by showing that strain replacement, where strains carrying mobile genes supplant the original strains, leads to the acquisition of mobile genes by the species as a whole. Moreover, we demonstrate that, in addition to exceptional cases like mother-to-child transmission and fecal microbiota transplantation, the natural spread of mobile genes via HGT is a common phenomenon in the population. Together, HGT and strain replacement ensure the persistence and dissemination of key genes within individuals and across populations 10 . It is noteworthy that numerous previous studies have demonstrated that the mobile gene pool in the microbiome is largely shaped by environmental factors and lifestyle 5,12 . These findings suggest that each individual’s intestinal microbiome may accumulate a unique series of HGT events that reflect their lifestyle history. Our data strongly support this view: we observed that HGTs are highly individual-specific and remain relatively stable over multiple years of follow-up, transcending the effects of short-term fluctuations in microbial communities. These long-term preserved mobile genes act as a "history book" of the microbiome, recording the host’s environmental exposures and selection pressures. For instance, we detected the recent transfer of tetracycline resistance genes and the enrichment of specific resistance genes associated with PPI usage. Although our current detection accuracy has not yet achieved full personalization, we believe that an individual’s microbial mobile gene pool holds promise for future applications, such as inferring disease exposure history and evaluating the efficacy of personalized medical interventions, thereby expanding the role of the microbiome in precision medicine. Limitations of the study We acknowledge several limitations of our study. The Lifelines-DEEP cohort comprises participants of Dutch ancestry from the northern region of the Netherlands, so the results might be biased toward a region-specific microbial background and local environmental exposures. In addition, the Lifelines-DEEP cohort includes mainly healthy individuals, which limits its power to detect associations between HGT, diseases, and drug usage. We also acknowledge that our workflow, like the MAG-based tool MetaCHIP, is affected by MAG quality. Since MAGs are constructed by assembling short sequence reads and binning genomic contigs, reads from HGT regions may map to multiple genomes. This can lead to inaccurate binning and the formation of incomplete or erroneous MAGs, with such contamination (i.e., heterologous sequences) potentially being misidentified as HGT events 34 . Multi-sample metagenomic binning can significantly improve both the number and quality of MAGs 35 , and subsequent detection 34 and removal of chimerism and contamination can help reduce these issues. However, contigs containing heterogeneous regions may not be binned with the rest of the genome 17 , causing many HGT events to be lost. Moreover, HDMI does not utilize all available contigs. In the future, integrating un-binned contigs with those assigned to MAGs could enhance HGT recovery, suggesting that complementary methods such as WAAFLE may continue to play an important role. Additionally, although HGT can integrate into a recipient genome through both recombination and insertion, our individual-level detection only captures HGT events via insertion, as our method does not detect gene replacement through recombination. This limitation, combined with the low coverage depth of low-abundance species, results in numerous NA values in the temporal analysis of HGT segments between baseline and follow-up. Declarations Supplemental Information Supplementary Information is available for this paper. Acknowledgements We thank all the volunteers in the Lifelines cohort (https://www.lifelines.nl/) for their participation and the project staff for their help and management. We thank Kate Mc Intyre for English editing. We also thank Hongyu Jin, Johannes Björk, Yue Zhang, and Jiqiu Wu for their suggestions that inspired this study. This study is supported by Netherlands Organization for Scientific Research (NWO) VICI grant VI.C.202.022 (J.F.), NWO VIDI grant 016.178.056 (A.Z.), NWO VENI grant 222.016 (D.W.), European Research Council (ERC) Consolidator grant 101001678 (J.F.), ERC Starting Grant 715772 (A.Z.), and Dutch Heart Foundation grant IN-CONTROL (CVON2018-27 to J.F. and A.Z.). In addition, H.P. is supported by a joint fellowship from the University Medical Center Groningen and the China Scholarship Council with grant number CSC202208060107. J.F. is supported by a 2023 AMMODO Science Award for Biomedical Sciences from Stichting Ammodo and the Netherlands Organ-on-Chip Initiative, an NWO Gravitation project (024.003.001) funded by the Ministry of Education, Culture and Science of the government of the Netherlands. A.Z. is further supported by the NWO Gravitation grant Exposome-NL (024.004.017) and the EU Horizon Europe Program grant INITIALISE (101094099). Author Contributions J.F. coordinated and supervised the study. J.F., H.P., S.A.-S., and D.W. conceptualized the study. H.P. performed data analysis. A.J.R.-M. and H.P. conducted the protein-structure-based analysis. A.F.-P. performed plasmid and virus annotation. J.W. helped with the statistical analysis. R.G. performed metagenomic data assembly. J.F. and A.Z. set up the Lifelines-DEEP cohort. H.P. and J.F. drafted the manuscript. All authors reviewed and edited the manuscript. Competing Interests A.Z. received a speaker fee from Nestlé. All other authors declare no competing interests. Data and Code Availability All relevant data supporting the key findings of this study are available within the article and its supplementary information files. The raw metagenomic sequencing data and basic phenotypes (i.e., age and sex) of the Lifelines-DEEP participants at both time points are available from the European Genome-Phenome Archive (https://ega-archive.org) via accession numbers EGAD00001001991 and EGAD00001006959, respectively. Due to informed consent regulations, detailed phenotypic data for the Lifelines-DEEP cohort can be requested from Lifelines (https://www.lifelines.nl/researcher) by submitting an intention letter to the Lifelines Data Access Committee responsible for the Lifelines-DEEP data (contact: Jackie Dekens, email: [email protected] ). The availability of datasets is subject to a data transfer agreement and specific rules and guidelines regulate data usage. The code of the workflow is available via: https://github.com/HaoranPeng21/HGT-workflow. Analysis code is available via: https://github.com/HaoranPeng21/HGT-Project. METHODS KEY RESOURCES TABLE REAGENT or RESOURCE SOURCE IDENTIFIER Biological samples Fecal samples This study https://www.Lifelines.nl Deposited data LLD baseline metagenomics European Genome-Phenome Archive (EGA) EGAD00001001991 LLD follow-up metagenomics EGA EGAD00001006959 Software and algorithms R (version 4.0.3) R Foundation https://www.r-project.org/ Python (version 3.7.4) Python https://www.python.org KneadData (version 0.4.6.1) 36 https://huttenhower.sph.harvard.edu/kneaddata/ Megahit 37 https://github.com/voutcn/megahit dRep (version 3.4.3) 38 https://github.com/MrOlm/drep Metawrap (version 1.3.2) 39 https://github.com/bxlab/metaWRAP CheckM (version 1.2.2) 40 https://github.com/Ecogenomics/CheckM GTDBTk (version 2.1.1) 41 https://github.com/Ecogenomics/GTDBTk MASH (version 2.3) 42 https://github.com/marbl/Mash FastANI (version 1.33) 43 https://github.com/ParBLiSS/FastANI Bowtie 2 (version 2.5.1) 44 https://github.com/BenLangmead/bowtie2 CD-HIT (version 4.8.1) 45 https://sites.google.com/view/cd-hit Prodigal (version 2.6.3) 46 https://github.com/hyattpd/Prodigal Barrnap (version 0.9) 47 https://github.com/tseemann/barrnap Prokka (version 1.14.6) 48 https://github.com/tseemann/prokka Blastn (version 2.6.0) 49 https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ eggNOG-mapper (version 2.1.11) 50 http://eggnog5.embl.de Resistance Gene Identifier (RGI) 51 https://github.com/arpcard/rgi Traitar (version 1.1.1) 52 https://github.com/hzi-bifo/traitar Fastspar (version 1.0.0) 53 https://github.com/scwatts/fastspar UBCGtree (version 2.0) 54 http://leb.snu.ac.kr/ubcg2/about vegan R package 55 https://cran.r-project.org/web/packages/vegan/index.html lme4 R package 56 https://cran.r-project.org/web/packages/lme4/index.html lmtest R package 57 https://cran.r-project.org/web/packages/lmtest/index.html ggplot2 58 https://cran.r-project.org/web/packages/ggplot2/index.html igraph v1.2.6 59 https://cran.r-project.org/web/packages/igraph/index.html ggraph v2.0.5 60 https://cran.r-project.org/web/packages/ggraph/index.html ncf v 1.3-2 https://github.com/objornstad/ncf lme4qtl v0.2.2 61 https://rdrr.io/github/variani/lme4qtl/man/relmatLmer.html ComplexHeatmap v2.18.0 62 https://github.com/jokergoo/ComplexHeatmap Database Pfam database 63 https://www.ebi.ac.uk/interpro/entry/pfam dbCAN 64 https://bcb.unl.edu/dbCAN/ CARD database 51 https://card.mcmaster.ca/analyze/rgi  RESOURCE AVAILABILITY Lead contact Further information and requests for resources and reagents should be directed to the Lead Contact, Jingyuan Fu ( [email protected] ). Materials availability The fecal samples of Lifelines participants can be requested via Lifelines biobank (https://www.lifelines.nl/researcher).  EXPERIMENTAL MODEL AND SUBJECT DETAILS Human subjects The Lifelines-DEEP study, part of the Lifelines biobank with over 167,729 participants, focuses on a select group of 1,539 individuals to explore various factors affecting health outcomes in the northern Netherlands. The study has been approved by the institutional ethics review board of the University Medical Center Groningen (ref. M12.113965). A follow-up study was recently conducted on 338 cohort participants who had been analyzed in 2013 65 . As described previously 16 , follow-up stool samples were collected for these 338 individuals (55.6% female and 44.4% male) at the second time point. The duration between the two time points ranged from 3.33 to 3.92 years (mean = 3.53, SD = 0.12). At baseline, the mean age of participants was 48.2 years (range = 18‒80, SD = 11.7) and their mean BMI was 25.4 (range = 17.6‒43.3, SD = 4.08). At follow-up, the mean age was 51.7 years (range = 22‒84, SD = 11.7), and the mean BMI was 25.6 (range = 16.1‒37.6, SD = 4.0). Phenotypic data assessed in this study included anthropometric traits (e.g., age, sex, BMI, height) and usage of 14 medications (e.g., PPIs, oral contraceptives, beta-blockers, statins).  METHOD DETAILS  Metagenomic data generation and preprocessing Stool sample collection and processing followed the same protocol at both time points. All participants were asked to collect fecal samples at home and to place them in their home freezer (-20 ) within 15 minutes after production. Subsequently, a nurse visited the participant to pick up the fecal samples on dry ice and transfer them to the laboratory. Aliquots were then made and stored at -80 until further processing. The same protocol for fecal DNA isolation and metagenomics sequencing was used at both time points. Fecal DNA isolation was performed using the AllPrep DNA/RNA Mini Kit (QIAGEN cat. 80204). After DNA extraction, fecal DNA was sent to the Broad Institute of Harvard and MIT in Cambridge, Massachusetts, USA, where library preparation and whole genome shotgun sequencing were performed on the Illumina HiSeq platform. From the raw metagenomics sequencing data, low-quality reads were discarded by the sequencing facility and reads belonging to the human genome were removed by mapping the data to the human reference genome (version NCBI37) with KneadData (version 0.4.6.1) and Bowtie2 (version 2.1.0) 36 . After filtering, the average read depth was 12.3 million for both baseline and follow-up samples. The read depths of all samples at both time points were not significantly different (paired Wilcoxon test, p -value = 0.89). De novo assembly and quality control MetaSPAdes 66 was used to perform de novo assembly for each sample in Lifelines-DEEP Baseline (n = 338). The assembled contigs were further binned and refined using MetaWRAP 39 . The quality of the MAGs was assessed by CheckM 40 . Genomes with >90% completeness and <5% contamination were retained. dRep 38 was used to dereplicate MAGs with the option -sa 0.998 -pa 0.95 to ensure non-identical genomes were included. MAG clustering and taxonomy classification dRep compare was used to compare and cluster MAGs. MASH 42 was used to form primary clusters with the threshold of 0.95. fastANI 43 was used to create secondary clusters with a threshold of 0.99. The primary and secondary clusters were used for grouping genomes to species- and strain-level. Taxonomic classification of the genomes was performed using GTDB-Tk 41 with default parameters. All genome taxonomies and groups are compiled in Table S1 . We used UBCGtree V2.0 54 to construct the MAG phylogenomic trees based on its 81 previously defined, nearly universal single-copy core genes. Species abundance calculation Species relative abundance was calculated via the sum of the genome abundance within each species in the samples. For species with more than five genomes, we randomly selected five genomes to calculate the relative and median abundance of species. We mapped reads against all selected genomes using Bowtie2 44 and calculated the depth of coverage in all contigs in each genome as: D genome_i = (Coverage11, Coverage12,..., Coverage1 L 1, Coverage21,..., Coverage NL N ) Where N represents the number of the contigs and L N represents the position in the contig. The median abundance of each genome in the metagenome was calculated with Median (D). The per-base depth of coverage K, the average read length L, the size of each genome S, and the total read number T in the shotgun data are used to calculate the relative abundance A of each genome in the metagenome according to A = (K*S/L) / T. HGT detecting workflow In this work we introduce a workflow to identify recent HGT at individual level. In brief, HDMI works as follows. First, HDMI searches for genomic regions of at least 500 bp that show >99% identity between any pair of MAGs with ANI <95%. These regions are likely to be transfer events that occurred within the last 10,000 years (hereafter referred to as “recent” HGTs), assuming a molecular clock of 1 SNP/genome/year for a genome size of 10 6 bp 67-69 . Second, HDMI performs several quality checks. It first checks MAG contamination and identifies split reads that span the junction between the HGT regions and their flanking regions. In addition, we required at least 90% coverage of the breadth and a median 1x genome depth for HGT segments. Third, HDMI excludes false positive HGTs due to highly conserved genes with >99% identity. For this, we extracted and calculated the identity of 81 nearly universal single-copy core genes (including 42 ribosomal proteins) 70 between all cross-species genome pairs. Lastly, inspired by the detection of differential alternative splicing events in transcriptomes, we inferred whether each HGT in the sample is inserted and determined the ratio of insertion to non-insertion by calculating the number of reads mapping to the split sites of sequences with HGT insertion and the junction site with no HGT insertion ( Fig. S1 ). Each of these steps is described in more detail below. 1. HGT candidate detection In this pipeline, we only focused on transfers occurring between bacterial species (ANI < 0.95), ignoring within-species (ANI ≥ 0.95) gene recombination events. We screened all genomes and used Blastn v2.6.0 49 to identify genomic segments of at least 500 bp that were shared between any pair of genomes from different species with an identity >99% To exclude segments carrying potentially conserved genes, which are evolving slowly within species and more likely to have a high identity, we referred to the UBCG2 resource 70 . This resource defines 81 nearly universal single-copy core genes, including 42 ribosomal proteins, that are thought to be vertically transmitted genes. We then calculated the identity of each vertically transmitted gene in all species pairs and excluded between-species genome pairs containing any vertically transmitted gene with >99% identity. For the remaining genome pairs, the best explanation for these high homologous segments is HGT rather than vertical inheritance because the expected identity between highly conserved and vertically inherited genes of different species exceeds the 99% identity threshold used in our approach to retain HGT candidates. In addition, the Assembly algorithm (metaSPAdes) based on DeBruijn graphs can produce contamination for regions with sequencing errors but high similarity 71 . This process produces 'bubbles' that can result in the generation of two contigs with overlapping sequences at their ends. If not handled properly, such duplications may lead to erroneous conclusions in HGT analysis. To mitigate this risk, we disregarded HGT candidates located within 100bp of the end of the contig. We also excluded any putative HGT candidates found on contigs that matched >90% of their full-length with a longer contig, as these are likely to be artificial duplicates created during the assembly process 28 . 2. Cohort-based HGT event validation If the median abundance of all MAGs was 0 among the five randomly selected MAGs, the abundance of the species was considered NA. To make sure the transferred region exists in the genome, the HGT region should have a minimum of 90% breadth of coverage. Subsequently, we counted those reads where one part mapped in the putative HGT and the other part mapped to the flanking region at either end of the HGT to suggest true HGT events. Bowtie2 with option -a --very-sensitive –no-unal was used for read-mapping. For each transferred sequence, we required it to have at least three reads mapping its start and end sites, with at least 10 bps of overhang on either side. If any end lacked sufficient read support, the median abundance of all selected genomes of the species was 0, or the HGT region had < 90% breadth coverage, the presence of this putative HGT in that genome was considered NA. We then conducted detection for both genomes involved in each HGT event. If we observed the HGT in only one of the two genomes of an HGT event, across the entire cohort, the HGT event was considered a false positive. 3. HGT profiling Each HGT event is considered to be two HGTs in two species due to their unique insertion sites (start, end) and the potential to be acquired or lost. First, we extracted sequences containing a HGT and concatenated the flanking regions of the HGT to get sequences without the HGT. We then counted the reads spanning the start and end of the split site with HGT insertion (HI 1 , HI 2 ) and the reads spanning the site with no HGT insertion (nHI) in each sample using the same strategy we used in the second step above. Finally, we calculated the HGT’s presence/absence by: HI = min(HI 1 , HI 2 ) if min(HI 1 , HI 2 ) ≥ 3 else 0 nHI = nHI if nHI ≥ 3 else 0 Here, if the HI and nHI are both 0, it is NA. Benchmark HDMI Selected representative genomes were downloaded from NCBI with GCA ID (see Table S8 ). Seqkit was used to randomly select genes from the donor genome 72 . HgtSIM 73 was used to simulate the insertion of genes with genetic divergence varying between 99% and 100% (-f genome -r 1-0-1-1 -x fna -mixed 0-1 -keep_cds -a genebank). ART (Version 2.5.8) 74 was used to simulate metagenomic sequencing data at 6X, 9X, and 12X coverage (art_illumina -ss HS25 -l 150 -f 9 -p -m 500 -s 10). In our comparison of WAAFLE and HDMI, we assessed the recovery rates of both tools at various taxonomic levels by simulating gene insertions in genome pairs spanning different evolutionary distances (i.e., intra-genus, intra-order, and intra-phylum, Table S8 ). For each genome pair, we generated three metagenomic datasets with distinct ratios (9:12, 12:6, and 6:10). Simulated reads were further assembled by metaSPAdes 66 , and the resulting contigs binned using MetaWRAP 39 to obtain MAGs. In HDMI, a correct detection is defined as the identified insertion gene matching the simulated insertion gene, with a maximum length discrepancy of no more than 1% of the full gene length. Similarly, in WAAFLE, a correct detection event is defined as the transferred gene (i.e., directional HGT event) identified by WAAFLE as being identical to the simulated insertion gene, allowing for a length difference of no more than 1% of the complete gene length. Since real genomes exhibit both recent and ancient HGT events beyond the simulated gene insertions, our comparison focused exclusively on the recovery rates of the simulated gene insertions. HGT clustering, gene annotation, and the direction in the cluster HGT sequences were clustered using cd-hit 45 (-aS 0.9 -aL 0.9 -c 0.9), representing HGTs with similar gene content and function. The coding sequences (CDS) were assigned to all HGTs using Prodigal V2.6.3 46 in metagenome mode to capture gene segments. eggNOG-mapper 50 was used to assign putative function predictions to genes, and the queries were realigned to the Pfam 63 domains to get the Pfam function domain annotation (--evalue 0.001 --score 60 --pident 40 --query_cover 20 --pfam_realign realign). The RGI and CARD database 51 was used to predict ARGs with default parameters. In each cluster, sequences were aligned using the auto option in mafft 75 , and the gene tree was constructed with IQ-TREE 76 (-m MFP). We then subsampled the species tree from the comprehensive MAGs tree using ETE Toolkit. The subtree was used to infer the root of the gene tree using the OptRoot module from RANGER-DTL v.2.0. We then ran RANGER-DTL with default settings to reconcile the gene tree and the genomic tree a total of 500 times. Reconciliations from each optimal root were aggregated using the AggregateRanger module from RANGER-DTL v.2.0. The transfer direction was then extracted for annotation information in each cluster. Pangenome construction and trait prediction for each species To represent the function of each species, we constructed the pangenome 77 . Based on the MAG’s taxonomic classification, in each species we performed gene calling using Prodigal 78 in all MAGs belonging to the species and clustered genes using cd-hit (-c 0.9). The set of non-redundant genes comprising all MAGs from the same species is the meta-pangenome of that species. Traitar 52 was used to predict different phenotype traits for each species, including energy resources for growth, enzymatic activities, and morphology. The Phypat and PGL algorithms were used to predict traits. To avoid over-interpretation of false positive traits, we only considered the 22 traits with >90% predictive accuracy using the Phypat + PGL method, following the accuracy evaluation in the original paper 52 . ARGs were predicted using strict mode in SRID 51 . VF identification was done using the core set of Virulence Factors of Pathogenic Bacteria Database (VFDB 2022 79 ) with the BLASTP option of the Diamond software with strict parameters (e-value 50% identity at the protein level, and 70% query sequence coverage). AMPRs were identified by performing a BLASTP sequence similarity search against the manually curated list of AMPRs 80 with the same parameters (e-value 50% identity at the protein level, and 70% query sequence coverage). Genes encoding CAZymes were identified using dbCAN (CAZyDB.08062022.dmnd). The proportion of CAZyme genes for a particular substrate was calculated as the number of the CAZyme genes involved in its utilization divided by the total number of the CAZyme genes. CAZyme classification was described in a previous study 81 . HGT network construction and degree calculation The HGT network was constructed based on the transfer between species. The network’s edges are unweighted. The R package igraph v1.2.6 59 was used to construct the network and calculate the betweenness and degree centrality, with ggraph v2.0.5 60 used to visualize the network. An LMM with phylogenetic distance matrices as random effect was fitted to examine the association of a bacteria trait and its centrality in the HGT network, adjusting the effect of abundance and phylogenetic distance with the function relmatLmer in R package lme4qtl (version 0.2.2) 61 . Model1: relmatLmer (Centrality ~ Abundance + (1 | Phylogenetics Distance), REML = FALSE) Model2: relmatLmer (Centrality ~ Trait + Abundance + (1 | Phylogenetics Distance), REML = FALSE) LRT was used to measure the effect of traits on HGT network centrality: LRT_Trait = lrtest(model1, model2) The effect of correlation and phylogenetic distance on HGT rates Based on the species abundance calculated above, we used Fastspar 53 , a C++ implementation of the SparCC algorithm 82 , to calculate the correlation between species in each time point. With 1000 permutations, species correlations are considered reliable only if, at two time points, the p -value of the species correlation is < 0.01 and the correlations are either both positive or both negative. The baseline correlation value was used to represent the correlation of each species pair. For MAG-based HGT rate estimation, we implemented a previously published conservative approach in isolates 9 , defining the HGT rates as the rates of between-species genome pairs that share at least one HGT and all between-species genome pairs. Species with < 3 MAGs were not considered in calculating HGT rates. Based on the phylogenetic tree built using 81 nearly universal single-copy genes, we further calculated the phylogenetic distance across all species pairs using the Python package ete3. The phylogenetic distances between species were calculated by averaging the distances of all genome pairs in each species pair. We used the generalized linear and logistic regression models to measure the effect of within-species microbial interaction on HGT rates and occurrence. The partial.mantel.test function from the ncf R package (version 1.3-2) 56 was used with pairwise deletion. HGT_Occur: At least one between-species genome pair shares at least one HGT. Model1: partial.mantel.test (Co-abundant correlation matrix, HGT_Rates matrix, Phylogenetics Distance matrix, resamp = 9999) Model2: partial.mantel.test (Co-abundant correlation matrix, HGT_Occur matrix, Phylogenetics Distance matrix, resamp = 9999) The LMM with phylogenetic distance matrices was fitted to examine the association of the number of function genes in the species pangenome and their communities. Model3: relmatLmer (Genes numbers ~ Abundance + (1 | Phylogenetics Distance), REML = FALSE) Model4: relmatLmer (Genes numbers ~ Community + Abundance + (1 | Phylogenetics Distance), REML = FALSE) Model5: relmatLmer (Bray-Curtis distance ~ Community + (1 | Phylogenetics Distance), REML = FALSE) Model6: relmatLmer (Bray-Curtis distance ~ (1 | Phylogenetics Distance), REML = FALSE) To assess whether Community is significantly contributing to genes enrichment, we performed the following LRT: Trait = lrtest(model3, model4). To assess whether Community is significantly contributing to temporal abundance changes, we performed the following LRT: Trait = lrtest(model5, model6). Microbial co-abundance and HGT network The microbial co-abundance and HGT network was constructed by correlation, using cluster_leiden function with objective_function Constant Potts Model in igraph 59 to define communities. The layout of the network was visualized by ggraph 60 , and the length of the edges in the network was represented by the transformed correlation: 1 − ((correlation + 1) / 2). Within Community C ( Fig. S6 ), three Enterobacteriaceae bacteria were seen to contain a greater than 10-fold higher abundance of ARGs, AMPRs, and VF genes compared to other species, which might explain the enrichment in Community C. To avoid introducing bias to the following analysis, these three Enterobacteriaceae species were excluded from our analysis. Transmission profiling Each HGT event (n = 5,644) refers to two HGTs (n = 11,288) with identical sequences shared between two genomes, resulting in 7,581 non-redundant HGTs distinguished by genomes and insertion locations. First, we extracted the genomic sequences containing the detected HGTs from the MAGs. We then removed the HGT region from the genome, concatenating the flanking regions as if the HGT was not present. Finally, we searched for reads that span this region and align with the flanking regions with no read split, further confirming the absence of HGTs within the samples, as described in the 3 rd step of the workflow. We then calculated the prevalence of each HGT across all individuals at baseline and follow-up, respectively, and determined the delta prevalence of each HGT. Based on the matrix of HGT presence and absence, we applied the vegdist() function from R package vegan (version 2.5.5) 55 to calculate the Jaccard distance dissimilarity matrix. To compare the Jaccard distance dissimilarity between and within individuals, we used the R package coin 61 to calculate the empiric p -value by permuting samples of the HGT’s matrix 9999 times. Strain-level profiling of samples Here we first applied Phylophlan3 to assign a MetaPhlan 4 v Jan21 species genome bin label to MAGs. We only looked at the species Agathobacter rectalis , which was assigned to SGB4933. We then ran StrainPhlan4 in t__SGB4933_group with option --marker_in_n_samples 50 --sample_with_n_markers 50 --secondary_sample_with_n_markers 50 --sample_with_n_markers_after_filt 33. The multiple sequence alignment was built on 187 available markers, and 551/676 samples were used to build the phylogenetic tree. Only those individuals with two timepoints in the resulting tree were included. Samples in which strains’ phylogenetic distance was < 0.1 were considered to have the same strain. The tree was visualized with R package ggtree 83 . Function enrichment To avoid overestimating the frequency of HGTs, we established a non-redundant HGT gene database for each species. First, we tagged the HGTs with information about their host species. We then separately predicted the genes of the HGTs present in each species and used cd-hit to de-replicate these genes, using parameters identical to those used in constructing the species pangenome. We assigned the highest scoring Pfam functional domain annotations to each species' non-redundant genes and each species' non-redundant mobile genes using eggNOG-mapper 50 (--pfam_realign realign --evalue 0.001 --score 60 --pident 40 --query_cover 20 --subject_cover 20). Subsequently, we merged the non-redundant genes and non-redundant mobile genes of each species and calculated the number and frequency of each Pfam functional domain across all genes and all mobile genes. Foldchange was calculated as: the frequency of Pfam function domain in HGTs / the frequency of Pfam function domain in all genes in each species. Fisher’s exact test was used to determine the significance of the annotation enrichment. Two-tailed p-values were corrected using the Benjamini-Hochberg FDR method. Mobile gene profile as a fingerprint To test how well the mobile gene profiles distinguish samples from the same individual, we used the mobile gene abundance matrix described previously and generated the Bray-Curtis distances between all samples at two timepoints. If two samples (and only these two samples) from the same individual had the closest distance, we considered them correctly linked. Phenotype association The mobile genes were predicted from HGTs using Prodigal, using cd-hit to dereplicate with the same parameters, as described above. CoverM 84 was used to calculate the abundance of those mobile genes with the parameters --min-read-aligned-percent 50 --min-read-percent-identity 99 --min-covered-fraction 4. To measure associations between mobile genes and drug usage, smoking, and alcohol intake frequency, we used the glmer function in the lme4 R package 56 to fit the generalized linear mixed effects models. The lmerTest R package 57 was used to estimate the p- value. Model: Gene abundance Association (joint Association) Genes abundance ~ Phenotype + clean_reads + sex + age + (1 | IndividualID) + (1 | TimePoint) References Ochman, H., Lawrence, J. G. & Groisman, E. A. Lateral gene transfer and the nature of bacterial innovation. Nature 405 , 299-304 (2000). https://doi.org/10.1038/35012500 Arnold, B. J., Huang, I. T. & Hanage, W. P. Horizontal gene transfer and adaptive evolution in bacteria. Nature Reviews Microbiology 20 , 206-218 (2022). https://doi.org/10.1038/s41579-021-00650-4 Palmer, J. D. & Foster, K. R. Bacterial species rarely work together. Science 376 , 581-582 (2022). https://doi.org/doi:10.1126/science.abn5093 Dmitrijeva, M. et al. A global survey of prokaryotic genomes reveals the eco-evolutionary pressures driving horizontal gene transfer. Nature Ecology & Evolution 8 , 986-998 (2024). https://doi.org/10.1038/s41559-024-02357-0 Groussin, M. et al. Elevated rates of horizontal gene transfer in the industrialized human microbiome. Cell 184 , 2053-2067.e2018 (2021). https://doi.org/10.1016/j.cell.2021.02.052 Lee, I. P. A., Eldakar, O. T., Gogarten, J. P. & Andam, C. P. Bacterial cooperation through horizontal gene transfer. Trends in Ecology & Evolution 37 , 223-232 (2022). Fan, Y., Xiao, Y., Momeni, B. & Liu, Y.-Y. Horizontal gene transfer can help maintain the equilibrium of microbial communities. Journal of Theoretical Biology 454 , 53-59 (2018). https://doi.org/https://doi.org/10.1016/j.jtbi.2018.05.036 Wang, T. et al. Horizontal gene transfer enables programmable gene stability in synthetic microbiota. Nat Chem Biol 18 , 1245-1252 (2022). https://doi.org/10.1038/s41589-022-01114-3 Smillie, C. S. et al. Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480 , 241-244 (2011). https://doi.org/10.1038/nature10571 Hehemann, J. H. et al. Transfer of carbohydrate-active enzymes from marine bacteria to Japanese gut microbiota. Nature 464 , 908-912 (2010). https://doi.org/10.1038/nature08937 Lester, C. H., Frimodt-Møller, N., Sørensen, T. L., Monnet, D. L. & Hammerum, A. M. In vivo transfer of the vanA resistance gene from an Enterococcus faecium isolate of animal origin to an E. faecium isolate of human origin in the intestines of human volunteers. Antimicrob Agents Chemother 50 , 596-599 (2006). https://doi.org/10.1128/aac.50.2.596-599.2006 Brito, I. L. et al. Mobile genes in the human microbiome are structured from global to individual scales. Nature 535 , 435-439 (2016). https://doi.org/10.1038/nature18927 Hsu, T. Y. et al. Profiling lateral gene transfer events in the human microbiome using WAAFLE. Nature Microbiology (2025). https://doi.org/10.1038/s41564-024-01881-w Vatanen, T. et al. Mobile genetic elements from the maternal microbiome shape infant gut microbial assembly and metabolism. Cell 185 , 4921-4936.e4915 (2022). https://doi.org/10.1016/j.cell.2022.11.023 Peng, H. & Fu, J. Unveiling horizontal gene transfer in the gut microbiome: bioinformatic strategies and challenges in metagenomics analysis. National Science Review , nwaf128 (2025). https://doi.org/10.1093/nsr/nwaf128 Chen, L. et al. The long-term genetic stability and individual specificity of the human gut microbiome. Cell 184 , 2302-2315.e2312 (2021). https://doi.org/10.1016/j.cell.2021.03.024 Hsu, T. Y. et al. Profiling novel lateral gene transfer events in the human microbiome. bioRxiv (2023). https://doi.org/10.1101/2023.08.08.552500 Camargo, A. P. et al. Identification of mobile genetic elements with geNomad. Nature Biotechnology (2023). https://doi.org/10.1038/s41587-023-01953-y Hutchings, M. I., Truman, A. W. & Wilkinson, B. Antibiotics: past, present and future. Current Opinion in Microbiology 51 , 72-80 (2019). https://doi.org/https://doi.org/10.1016/j.mib.2019.10.008 Levillain, F. et al. Horizontal acquisition of a hypoxia-responsive molybdenum cofactor biosynthesis pathway contributed to Mycobacterium tuberculosis pathoadaptation. PLOS Pathogens 13 , e1006752 (2017). https://doi.org/10.1371/journal.ppat.1006752 Hughes, E. R. et al. Microbial respiration and formate oxidation as metabolic signatures of inflammation-associated dysbiosis. Cell host & microbe 21 , 208-219 (2017). Zhu, W. et al. Precision editing of the gut microbiota ameliorates colitis. Nature 553 , 208-211 (2018). https://doi.org/10.1038/nature25172 Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports 9 , 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z Brito, I. L. Examining horizontal gene transfer in microbial communities. Nature Reviews Microbiology 19 , 442-453 (2021). https://doi.org/10.1038/s41579-021-00534-7 Akhtar, A. A. & Turner, D. P. J. The role of bacterial ATP-binding cassette (ABC) transporters in pathogenesis and virulence: Therapeutic and vaccine potential. Microbial Pathogenesis 171 , 105734 (2022). https://doi.org/https://doi.org/10.1016/j.micpath.2022.105734 Trappe, K., Marschall, T. & Renard, B. Y. Detecting horizontal gene transfer by mapping sequencing reads across species boundaries. Bioinformatics 32 , i595-i604 (2016). https://doi.org/10.1093/bioinformatics/btw423 Li, C., Jiang, Y. & Li, S. LEMON: a method to construct the local strains at horizontal gene transfer sites in gut metagenomics. BMC bioinformatics 20 , 702 (2019). Song, W., Wemheuer, B., Zhang, S., Steensen, K. & Thomas, T. MetaCHIP: community-level horizontal gene transfer identification through the combination of best-match and phylogenetic approaches. Microbiome 7 (2019). https://doi.org/10.1186/s40168-019-0649-y Dmitrijeva, M. et al. A global survey of prokaryotic genomes reveals the eco-evolutionary pressures driving horizontal gene transfer. Nature Ecology & Evolution (2024). https://doi.org/10.1038/s41559-024-02357-0 Stecher, B. et al. Gut inflammation can boost horizontal gene transfer between pathogenic and commensal Enterobacteriaceae. Proceedings of the National Academy of Sciences 109 , 1269-1274 (2012). https://doi.org/doi:10.1073/pnas.1113246109 Coyte, K. Z. et al. Horizontal gene transfer and ecological interactions jointly control microbiome stability. PLoS Biol 20 , e3001847 (2022). https://doi.org/10.1371/journal.pbio.3001847 Granato, E. T. et al. Horizontal gene transfer can reshape bacterial warfare. bioRxiv , 2024.2008.2028.610076 (2024). https://doi.org/10.1101/2024.08.28.610076 Behling, A. H. et al. Horizontal gene transfer after faecal microbiota transplantation in adolescents with obesity. Microbiome 12 , 26 (2024). https://doi.org/10.1186/s40168-024-01748-6 Orakov, A. et al. GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biology 22 , 178 (2021). https://doi.org/10.1186/s13059-021-02393-0 Shaw, J. & Yu, Yun W. Fairy: fast approximate coverage for multi-sample metagenomic binning. Microbiome 12 , 151 (2024). https://doi.org/10.1186/s40168-024-01861-6 Langmead, B., Wilks, C., Antonescu, V. & Charles, R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics 35 , 421-432 (2019). https://doi.org/10.1093/bioinformatics/bty648 Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31 , 1674-1676 (2015). https://doi.org/10.1093/bioinformatics/btv033 Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. The ISME Journal 11 , 2864-2868 (2017). https://doi.org/10.1038/ismej.2017.126 Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6 , 158 (2018). https://doi.org/10.1186/s40168-018-0541-1 Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25 , 1043-1055 (2015). https://doi.org/10.1101/gr.186072.114 Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38 , 5315-5316 (2022). https://doi.org/10.1093/bioinformatics/btac672 Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biology 17 , 132 (2016). https://doi.org/10.1186/s13059-016-0997-x Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications 9 , 5114 (2018). https://doi.org/10.1038/s41467-018-07641-9 Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9 , 357-359 (2012). https://doi.org/10.1038/nmeth.1923 Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28 , 3150-3152 (2012). https://doi.org/10.1093/bioinformatics/bts565 Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11 , 119 (2010). https://doi.org/10.1186/1471-2105-11-119 Seemann, T. barrnap 0.9: rapid ribosomal RNA prediction. Google Scholar (2013). Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30 , 2068-2069 (2014). https://doi.org/10.1093/bioinformatics/btu153 Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10 , 421 (2009). https://doi.org/10.1186/1471-2105-10-421 Huerta-Cepas, J. et al. Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper. Molecular Biology and Evolution 34 , 2115-2122 (2017). https://doi.org/10.1093/molbev/msx148 Alcock, B. P. et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Acids Res 51 , D690-d699 (2023). https://doi.org/10.1093/nar/gkac920 Weimann, A. et al. From genomes to phenotypes: Traitar, the microbial trait analyzer. MSystems 1 , e00101-00116 (2016). Watts, S. C., Ritchie, S. C., Inouye, M. & Holt, K. E. FastSpar: rapid and scalable correlation estimation for compositional data. Bioinformatics 35 , 1064-1066 (2019). https://doi.org/10.1093/bioinformatics/bty734 Na, S. I. et al. UBCG: Up-to-date bacterial core gene set and pipeline for phylogenomic tree reconstruction. J Microbiol 56 , 280-285 (2018). https://doi.org/10.1007/s12275-018-8014-6 Dixon, P. VEGAN, a package of R functions for community ecology. Journal of vegetation science 14 , 927-930 (2003). Bates, D. M. (Springer New York, 2010). Hothorn, T. et al. Package ‘lmtest’. Testing linear regression models. https://cran . r-project. org/web/packages/lmtest/lmtest. pdf. Accessed 6 (2015). Wickham, H., Chang, W. & Wickham, M. H. Package ‘ggplot2’. Create elegant data visualisations using the grammar of graphics. Version 2 , 1-189 (2016). Csardi, M. G. Package ‘igraph’. Last accessed 3 , 2013 (2013). Pedersen, T. L., Pedersen, M., LazyData, T., Rcpp, I. & Rcpp, L. Package ‘ggraph’. Retrieved January 1 , 2018 (2017). Ziyatdinov, A. et al. lme4qtl: linear mixed models with flexible covariance structure for genetic studies of related individuals. BMC Bioinformatics 19 , 68 (2018). https://doi.org/10.1186/s12859-018-2057-x Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32 , 2847-2849 (2016). https://doi.org/10.1093/bioinformatics/btw313 Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res 42 , D222-230 (2014). https://doi.org/10.1093/nar/gkt1223 Yin, Y. et al. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 40 , W445-451 (2012). https://doi.org/10.1093/nar/gks479 Zhernakova, A. et al. Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science 352 , 565-569 (2016). https://doi.org/10.1126/science.aad3369 Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res 27 , 824-834 (2017). https://doi.org/10.1101/gr.213959.116 Didelot, X., Walker, A. S., Peto, T. E., Crook, D. W. & Wilson, D. J. Within-host evolution of bacterial pathogens. Nature Reviews Microbiology 14 , 150-162 (2016). https://doi.org/10.1038/nrmicro.2015.13 Duchêne, S. et al. Genome-scale rates of evolutionary change in bacteria. Microb Genom 2 , e000094 (2016). https://doi.org/10.1099/mgen.0.000094 Zhao, S. et al. Adaptive Evolution within Gut Microbiomes of Healthy People. Cell Host Microbe 25 , 656-667.e658 (2019). https://doi.org/10.1016/j.chom.2019.03.007 Kim, J., Na, S. I., Kim, D. & Chun, J. UBCG2: Up-to-date bacterial core genes and pipeline for phylogenomic analysis. J Microbiol 59 , 609-615 (2021). https://doi.org/10.1007/s12275-021-1231-4 Iqbal, Z., Caccamo, M., Turner, I., Flicek, P. & McVean, G. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet 44 , 226-232 (2012). https://doi.org/10.1038/ng.1028 Shen, W., Le, S., Li, Y. & Hu, F. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PloS one 11 , e0163962 (2016). Song, W., Steensen, K. & Thomas, T. HgtSIM: a simulator for horizontal gene transfer (HGT) in microbial communities. PeerJ 5 , e4015 (2017). https://doi.org/10.7717/peerj.4015 Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28 , 593-594 (2012). https://doi.org/10.1093/bioinformatics/btr708 Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30 , 3059-3066 (2002). https://doi.org/10.1093/nar/gkf436 Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32 , 268-274 (2015). https://doi.org/10.1093/molbev/msu300 Ma, B., France, M. & Ravel, J. Meta-pangenome: at the crossroad of pangenomics and metagenomics. The Pangenome , 205 (2020). Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30 , 2068-2069 (2014). https://doi.org/10.1093/bioinformatics/btu153 Liu, B., Zheng, D., Zhou, S., Chen, L. & Yang, J. VFDB 2022: a general classification scheme for bacterial virulence factors. Nucleic Acids Research 50 , D912-D917 (2022). https://doi.org/10.1093/nar/gkab1107 Kintses, B. et al. Phylogenetic barriers to horizontal transfer of antimicrobial peptide resistance genes in the human gut microbiota. Nature Microbiology 4 , 447-458 (2019). https://doi.org/10.1038/s41564-018-0313-5 Wu, G. et al. Two Competing Guilds as a Core Microbiome Signature for Health Recovery. bioRxiv , 2022.2005. 2002.490290 (2022). Friedman, J. & Alm, E. J. Inferring correlation networks from genomic survey data. PLoS computational biology 8 , e1002687 (2012). Xu, S. et al. Ggtree: A serialized data object for visualization of a phylogenetic tree and annotation data. iMeta 1 , e56 (2022). https://doi.org/https://doi.org/10.1002/imt2.56 Aroney, S. T. N. et al. CoverM: Read coverage calculator for metagenomics , version = 0.7.0. (2024). https://doi.org/10.5281/zenodo.10531253 Additional Declarations Yes there is potential Competing Interest. Alexandra Zhernakova received a speaker fee from Nestlé. All other authors declare no competing interests. Supplementary Files SupplementaryTable.xlsx Supplementary Table Supplementarymaterial.docx Supplementary Information Cite Share Download PDF Status: Published Journal Publication published 22 Nov, 2025 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6509357","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":452201282,"identity":"ee92fd2f-6d81-4125-8cec-5eb78c7faf6d","order_by":0,"name":"Jingyuan Fu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAlElEQVRIiWNgGAWjYBACxgbGBwwfoOwDRGphNmCcAeUQp4WBgdmAmYckLcwNzGzStm3b5OUdeA8Q7TA26dy224YbD/AlEKuF/xhIS4JhA48BCbZYkq6FEahFnoFoLc3MzJY9524bbmAmVothezPjjR9lt+Xl23sMHxCnpRnKMDhMlHogkIczGojVMgpGwSgYBSMOAABhJCqng9mXTgAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0001-5578-1236","institution":"University Medical Center Groningen","correspondingAuthor":true,"prefix":"","firstName":"Jingyuan","middleName":"","lastName":"Fu","suffix":""},{"id":452201283,"identity":"1f4ea25c-3520-4c8a-9aa9-4bf61dd0fec5","order_by":1,"name":"Haoran Peng","email":"","orcid":"https://orcid.org/0009-0002-7779-7073","institution":"University Medical Center Groningen","correspondingAuthor":false,"prefix":"","firstName":"Haoran","middleName":"","lastName":"Peng","suffix":""},{"id":452201284,"identity":"5cdac3e6-e95e-49c1-867f-1dd72a0ac5e3","order_by":2,"name":"Sergio Andreu-Sánchez","email":"","orcid":"","institution":"University Medical Center Groningen","correspondingAuthor":false,"prefix":"","firstName":"Sergio","middleName":"","lastName":"Andreu-Sánchez","suffix":""},{"id":452201285,"identity":"1c8a51ae-505c-4804-8bca-cdfb9cf35068","order_by":3,"name":"Angel Ruiz-Moreno","email":"","orcid":"","institution":"University Medical Center Groningen","correspondingAuthor":false,"prefix":"","firstName":"Angel","middleName":"","lastName":"Ruiz-Moreno","suffix":""},{"id":452201286,"identity":"9793015c-468b-4310-bc62-64c88e62c5dd","order_by":4,"name":"Asier Fernández-Pato","email":"","orcid":"","institution":"University Medical Center Groningen","correspondingAuthor":false,"prefix":"","firstName":"Asier","middleName":"","lastName":"Fernández-Pato","suffix":""},{"id":452201287,"identity":"fe296c30-88de-4a4b-bbbd-13b96885481c","order_by":5,"name":"Jiafei Wu","email":"","orcid":"","institution":"University Medical Center Groningen","correspondingAuthor":false,"prefix":"","firstName":"Jiafei","middleName":"","lastName":"Wu","suffix":""},{"id":452201288,"identity":"fae375c6-c734-4340-8f19-34513c4ec2b5","order_by":6,"name":"Ranko Gacesa","email":"","orcid":"https://orcid.org/0000-0003-2119-0539","institution":"University of Groningen and University Medical Center Groningen","correspondingAuthor":false,"prefix":"","firstName":"Ranko","middleName":"","lastName":"Gacesa","suffix":""},{"id":452201289,"identity":"f40294eb-a8f5-4967-870a-b444b5cbf063","order_by":7,"name":"Alexandra Zhernakova","email":"","orcid":"","institution":"University Medical Center Groningen","correspondingAuthor":false,"prefix":"","firstName":"Alexandra","middleName":"","lastName":"Zhernakova","suffix":""},{"id":452201290,"identity":"ea211a5e-00f6-4774-9718-cfc215ec260e","order_by":8,"name":"Daoming Wang","email":"","orcid":"","institution":"University Medical Center Groningen","correspondingAuthor":false,"prefix":"","firstName":"Daoming","middleName":"","lastName":"Wang","suffix":""}],"badges":[],"createdAt":"2025-04-23 06:10:42","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6509357/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6509357/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41467-025-66612-z","type":"published","date":"2025-11-22T05:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":82140742,"identity":"b0025186-0adb-4230-82e4-e3de20baf2fc","added_by":"auto","created_at":"2025-05-07 06:31:38","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":71386,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRecent horizontal gene transfers (HGTs) are commonly present in the gut microbiome.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea\u003c/strong\u003e, Phylogenomic tree of the 1474 metagenomics assembled genomes (MAGs) derived from the Lifelines-Deep Baseline Cohort (n=338). Branches are colored according to the phylum classification. The outer ring indicates whether the genome involved in at least one HGT.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb\u003c/strong\u003e, Distribution of HGT events. Two-dimensional histogram depicting distributions of the length (y-axis, right marginal histogram) against the identity \u0026nbsp;(x-axis, bottom marginal histogram) of GT events (n = 5644).\u003c/p\u003e","description":"","filename":"Binder21.png","url":"https://assets-eu.researchsquare.com/files/rs-6509357/v1/78533f5835af4f9e8754f9a3.png"},{"id":82140743,"identity":"60e9f96e-828e-46dc-b87a-379dd7d63611","added_by":"auto","created_at":"2025-05-07 06:31:38","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":56876,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCharacterization of recently transferred functions in the gut microbiome.\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ea, Functions enriched in transferred genes. Plot shows fold enrichments for Pfam domains among genes transferred within species pairs relative to genes in all species. Fisher’s exact test was used to determine the significance of the annotation enrichment. Dot size and color gradient indicate -log (base 10) of the FDR values. The two-tailed p-values were corrected using the Benjamini-Hochberg FDR method. Only -Log10(FDR) ≥ 8 and Log2FoldChange ≥ 3 are shown. Domains highlighted in blue are functionally related to molybdenum cofactor biosynthesis. Domains highlighted in green are functionally related to mobile elements.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eb-c, Gene transfer network of the same segment shared between multiple genomes encoding (b) phage integrase and (c) antibiotic resistance genes. Each node represents a genome, with the corresponding taxonomy classification indicated by color. Each edge represents a HGT event observed between the genomes. The identity of the HGT event is labeled at the edge. The gene encoded in the segments is shown beside the network.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003ed, Association between species' degree centrality in the HGT network and the presence of the MoaC gene. Relevant to\u0026nbsp;\u003c/p\u003e","description":"","filename":"Binder22.png","url":"https://assets-eu.researchsquare.com/files/rs-6509357/v1/8113c9ad3e394677c2433589.png"},{"id":82144718,"identity":"c6ee66f8-4e2b-488c-89e0-d11bfce2a6bb","added_by":"auto","created_at":"2025-05-07 06:47:39","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":365183,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHGT fosters stable microbial communities.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea\u003c/strong\u003e,\u003cstrong\u003e \u003c/strong\u003eDistribution of various types of correlation among species pairs. Each two-letter code represents the correlation observed at baseline and follow-up (N for negative, P for positive, U for uncorrelated). Correlations that are stable over time (N-N and P-P) are highlighted in the red box.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb\u003c/strong\u003e, Plots showing the change in correlation in species pairs at baseline and follow-up. Temporal changes in species pairs’ correlation with no HGT are shown at left and with HGT at right.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ec\u003c/strong\u003e, Microbial Co-Abundance Network. In this network, edge lengths reflect the strength of species correlations, with shorter edges indicating stronger associations. Nodes are color-coded based on Leiden clustering, and edges are colored according to the type of relationship (blue for negative correlations, red for positive correlations, green for HGT transfers).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ed\u003c/strong\u003e, The three largest sub-communities identified are Community A (n = 47), Community B (n = 44), and Community C (n = 92). Upper panel shows the co-abundance relationship, where edge colors indicate positive correlations in red and negative correlations in blue. Lower panel shows the HGT relationships, with green edges representing HGT events.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ee\u003c/strong\u003e, Temporal Stability of Sub-Communities. The y-axis represents the Bray-Curtis distance of species abundance within each sub-community between baseline and follow-up.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ef\u003c/strong\u003e, Association between HGT rates and phylogenetic distance of species pairs. Bars show the average HGT rates across all species pairs in each bin.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eg\u003c/strong\u003e, Association between co-abundance and phylogenetic distance of species pairs. Bars show the average correlation value across all species pairs in each bin. A generalized linear regression model was used to measure the effect size and \u003cem\u003ep-\u003c/em\u003evalue (see Methods).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eh\u003c/strong\u003e, Association between HGT occurrence and co-abundance of species pairs. The half violin presents the co-abundance distribution when HGT occurred (blue) and did not occur (green) between species pairs. A logistic regression model was fitted to examine the association of HGT occurrence (0, 1) with co-abundance in species pairs corrected for phylogenetic distance. The model formula and \u003cem\u003ep\u003c/em\u003e-value are shown at the top (see Methods). Box plot centers show median value with boxes indicating their inter-quartile ranges (IQRs). Upper and lower whiskers indicate 1.5 times the IQR from above the upper quartile and below the lower quartile, respectively. \u003cem\u003ep\u003c/em\u003e-values were corrected by the FDR method.\u003c/p\u003e","description":"","filename":"Binder23.png","url":"https://assets-eu.researchsquare.com/files/rs-6509357/v1/6a6432fedca78648eaa29ab7.png"},{"id":82140746,"identity":"e9512476-4f6b-4510-8440-a7ea4a92497c","added_by":"auto","created_at":"2025-05-07 06:31:38","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":231589,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePropagation of HGT segments in the gut community is mediated by both HGT and strain replacement.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea\u003c/strong\u003e, Comparison of HGT prevalence at baseline (blue) and follow-up (green) (Wilcoxon Signed Rank Tests, n = 338).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb\u003c/strong\u003e, Distribution of HGT segment prevalence at baseline (blue, 338 individuals, n = 7581) and follow-up (green, 338 individuals, n = 7581).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ec\u003c/strong\u003e, Prevalence of HGT Segment 35 at baseline and follow-up.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ed\u003c/strong\u003e, Phylogenetic tree of \u003cem\u003eAgathobacter rectalis\u003c/em\u003e (112)\u003cem\u003e \u003c/em\u003e(SGB4933) for individuals in selected clusters at baseline and follow-up (see Figure S6).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ee\u003c/strong\u003e, As in (\u003cstrong\u003ed\u003c/strong\u003e) but with individuals hosting the same \u003cem\u003eA. rectalis\u003c/em\u003e strain at baseline and follow-up highlighted in red.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ef\u003c/strong\u003e, As in (\u003cstrong\u003ed\u003c/strong\u003e) but highlighting individuals with different \u003cem\u003eA. rectalis \u003c/em\u003estrains at baseline and follow-up. Samples with\u003cem\u003e A. rectalis\u003c/em\u003e with the HGT segments are indicated in yellow. Samples with\u003cem\u003e A. rectalis\u003c/em\u003ewithout the HGT segments are indicated in green. Red dashed lines indicate the strain replacement direction of individuals who gained HGT segments at follow-up. Blue dashed lines indicate individual loss of HGT segments at follow-up. In (\u003cstrong\u003ed\u003c/strong\u003e−\u003cstrong\u003ee)\u003c/strong\u003e, the tag _B indicates a baseline sample and _F indicates a follow-up sample.\u003c/p\u003e","description":"","filename":"Binder24.png","url":"https://assets-eu.researchsquare.com/files/rs-6509357/v1/72a8e8fc2c4476185dbe3664.png"},{"id":82140747,"identity":"abe1703e-55b5-4e30-884d-c9f865b41fc1","added_by":"auto","created_at":"2025-05-07 06:31:39","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":387322,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAssociation between mobile genes and human lifestyle.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea\u003c/strong\u003e, The delta prevalence of HGT segments from baseline to follow-up (338 individuals, n = 7581).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb\u003c/strong\u003e, Comparison of the inter-individual Jaccard distance of the HGT profile at baseline (blue) and follow-up (green), as well as the Jaccard distance within paired samples at the two time points (cyan). The empiric \u003cem\u003ep-\u003c/em\u003evalue was calculated by permuting samples 9999 times.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ec\u003c/strong\u003e, Accuracy in classifying longitudinal samples of 338 individuals using their HGT profile, species abundance, and microbial pathway abundance.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ed-f\u003c/strong\u003e, Association between proton pump inhibitor (PPI) intake and mobile gene abundance. Gene abundance was transformed by log10.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eg\u003c/strong\u003e, Structure of the ABC transporter. Gray surface: membrane. Green: two transmembrane domains (TMD). Orange: two nucleotide-binding domains (NBD). Blue sticks: ATP molecules. Purple: ABC signature sequence.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eh\u003c/strong\u003e, Structure of Mate. Gray surface: membrane. Green: TMD.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ei\u003c/strong\u003e−\u003cstrong\u003ej\u003c/strong\u003e, Association between age and mobile gene abundance. See Methods.\u003c/p\u003e","description":"","filename":"Binder25.png","url":"https://assets-eu.researchsquare.com/files/rs-6509357/v1/aab1c2dd21f4fe6f00c0aa78.png"},{"id":99211898,"identity":"4b5e148c-1e57-44e8-9904-04f6ba0405aa","added_by":"auto","created_at":"2025-12-30 08:17:25","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2828762,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6509357/v1/fa21bbcd-e731-4768-ab04-826ce705d39b.pdf"},{"id":82143041,"identity":"d661c3c3-173e-4e3b-a21a-b586aa77090d","added_by":"auto","created_at":"2025-05-07 06:39:38","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":2206442,"visible":true,"origin":"","legend":"Supplementary Table","description":"","filename":"SupplementaryTable.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6509357/v1/8802406993337625a4d3eae4.xlsx"},{"id":82140751,"identity":"9749b993-a310-4188-b6da-fae7293e74c0","added_by":"auto","created_at":"2025-05-07 06:31:39","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":8303789,"visible":true,"origin":"","legend":"Supplementary Information","description":"","filename":"Supplementarymaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-6509357/v1/dc0891cc9178ab6a219e34c3.docx"}],"financialInterests":"\u003cb\u003eYes\u003c/b\u003e there is potential Competing Interest.\nAlexandra Zhernakova received a speaker fee from Nestlé. All other authors declare no competing interests.","formattedTitle":"Longitudinal Gut Microbiota Tracking Reveals the Persistent Spread of Mobile Genes and HGT-Driven Community Stabilization","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003eHorizontal gene transfer (HGT) is a microbial mechanism for the acquisition of genetic material from non-parental lineages that plays a significant role in microbial adaptive evolution and interactions\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e,\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. Although it has been proposed that bacteria rarely work together\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e, highly abundant co-occurring bacteria tend to exchange more genes\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e,\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e, and HGT has been shown to help competitive bacteria co-exist\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e and to maintain the stability of the microbial community\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eThe human gut microbiome harbors one of the densest, most interactive microbial ecosystems in the body. Gene exchange via HGT is pervasive among gut bacteria\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e, and it has been linked to host lifestyle factors such as geography, diet, and medication use\u003csup\u003e\u003cspan additionalcitationids=\"CR11 CR12\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. Non-industrialized populations, for instance, exhibit higher rates of HGT involving carbohydrate-active enzymes compared to industrialized populations\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. Moreover, HGT can also contribute to bacterial functionality at both the level of individual bacteria and community level. Evidence has suggested that the metabolic potential of the infant gut microbiome can be influenced by mobile genetic elements from the maternal microbiome\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. Accordingly, there is increasing interest in understanding the role of HGT in shaping the dynamics of microbial ecosystems from global to individual scales\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e, particularly for recent human history. Prior work indicates that genes transferred within the last\u0026thinsp;~\u0026thinsp;10,000 years often confer contemporary adaptive advantages, such as defense mechanisms and antimicrobial resistance, whereas more ancient transfers tend to involve core metabolic functions\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. However, few studies have investigated HGT dynamics within individual gut microbiomes over time. One recent study that assessed HGTs in 26 samples longitudinally over maximum 1 year concluded that HGT events are both unique within individuals and change over time\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. To date, no comprehensive long-term study has addressed how ongoing HGT shapes gut community structure or how these transfers relate to host phenotypic traits\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eIn this study, we profiled HGT events within a longitudinal microbiome cohort of 338 participants from the Lifelines-DEEP study. For these participants, fecal microbiome samples had been collected and sequenced at two time points 4-years apart\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e, accompanied by collection of detailed phenotype information. To capture recent HGT events (within ~\u0026thinsp;10,000 years), we developed a dedicated HGT detection pipeline based on metagenome-assembled genomes (MAGs) called HGT Detection from MAGs at Individual level (HDMI). Using this approach, we profiled thousands of HGT events and evaluated their persistence, spread, and impact on the gut microbial community. We show that HGT events contribute to stabilizing species interactions over time and can disseminate adaptive functions across hosts. We also link specific gene transfers to host factors such as medication use, providing new insight into how lifestyle pressures drive microbiome evolution.\u003c/p\u003e"},{"header":"RESULTS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eCohort and the detection of HGTs in the gut microbiome\u003c/h2\u003e \u003cp\u003eThis study included 676 paired microbiome samples from 338 individuals in the Lifelines-DEEP cohort from the northern Netherlands, collected at two time points approximately 4 years apart. At baseline, participants had a mean age of 48.2 years (range\u0026thinsp;=\u0026thinsp;18\u0026ndash;80, SD\u0026thinsp;=\u0026thinsp;11.7) and a mean BMI of 25.4 (range\u0026thinsp;=\u0026thinsp;17.6\u0026ndash;43.3, SD\u0026thinsp;=\u0026thinsp;4.08). At follow-up, the mean age was 51.7 years (range\u0026thinsp;=\u0026thinsp;22\u0026ndash;84, SD\u0026thinsp;=\u0026thinsp;11.7) and the mean BMI was 25.6 (range\u0026thinsp;=\u0026thinsp;16.1\u0026ndash;37.6, SD\u0026thinsp;=\u0026thinsp;4.0). The phenotypic data assessed included anthropometric traits (e.g., age, sex, BMI, height) and the use of 14 medications (e.g., proton pump inhibitors [PPIs], oral contraceptives, beta-blockers, statins). From the baseline samples (n\u0026thinsp;=\u0026thinsp;338), we reconstructed 1,473 high-quality MAGs (mean completeness\u0026thinsp;=\u0026thinsp;95.14\u0026thinsp;\u0026plusmn;\u0026thinsp;2.68%, mean contamination\u0026thinsp;=\u0026thinsp;0.69\u0026thinsp;\u0026plusmn;\u0026thinsp;0.75%) (\u003cb\u003esee Methods\u003c/b\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ea\u0026ndash;c) that represent 192 distinct bacterial species (\u003cb\u003eTable \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/b\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo identify HGT events, we developed the HDMI pipeline, which reliably identifies recent HGT events between MAGs. We defined an HGT event as a DNA transfer detected between two distinct genomes that is characterized by the presence of a pair of highly similar DNA regions (\u0026gt;\u0026thinsp;500 bp, \u0026gt;\u0026thinsp;99% identity, and \u0026lt;\u0026thinsp;95% average nucleotide identity (ANI) between genomes). Each transferred DNA region in the donor or recipient genome is referred to as an HGT segment. Thus, a single HGT event involves two corresponding segments, each located in a separate genome, which may differ in length, genomic context, or precise boundaries. (\u003cb\u003eFig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e\u003c/b\u003e, \u003cb\u003esee Methods\u003c/b\u003e).\u003c/p\u003e \u003cp\u003eIn brief, HDMI combines sequence similarity searches (identifying\u0026thinsp;\u0026gt;\u0026thinsp;500 bp regions with \u0026gt;\u0026thinsp;99% identity between genomes of \u0026lt;\u0026thinsp;95% ANI) with multiple quality control steps (e.g., split-read validation and conserved-gene filtering) to ensure high-confidence HGT detection. Compared to other HGT detection tools, such as the recently developed WAAFLE\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e, HDMI offers several advantages: 1) it focuses on detecting recent HGT events (occurring within the past 0\u0026ndash;10,000 years), 2) it enables HGT detection at individual level, and 3) it has high sensitivity for detecting HGT between bacteria with various phylogenetic distances, especially for intra-genus events (\u003cb\u003eFig. \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e\u003c/b\u003e, \u003cb\u003eSupplementary Note 1\u003c/b\u003e). Detailed descriptions of all methods, including metagenomic assembly, genome binning, HGT detection, and benchmarking, are provided in the Methods and Supplementary Note 1.\u003c/p\u003e \u003cp\u003eUsing our HGT detection pipeline, we identified 5,644 high-confidence, recent HGT events (occurring within the past ~\u0026thinsp;10,000 years, \u003cb\u003eTable \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e\u003c/b\u003e). Our data show that HGT is common in the gut microbiome. Out of 1,473 MAGs, 901 MAGs (61.2%), representing 116 unique species, were involved in at least one HGT event, with a total of 7,581 HGT segments identified (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ea, \u003cb\u003eTable S3\u003c/b\u003e). Segment sizes ranged from 0.5 to 64 kb (median\u0026thinsp;~\u0026thinsp;1.65 kb), and the median nucleotide identity between donor and recipient segments was 99.36%, consistent with transfers occurring within the past 10,000 years.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eGene functions enriched in HGT\u003c/h3\u003e\n\u003cp\u003eNext, we hypothesized that two sets of genes would likely be enriched in recent HGT events in the human gut: genes facilitating gene transfer (e.g., mobilome components such as prophages and transposons) and those involved in defense mechanisms, including antimicrobial resistance. To investigate this, we performed a functional enrichment analysis comparing the genes present in HGT segments to those in the pangenome (\u003cb\u003esee Methods\u003c/b\u003e, \u003cb\u003eTable S4\u003c/b\u003e). In agreement with previous studies\u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e, functions associated with gene transfer were significantly enriched for transposase, recombinase, MobA/MobL, and phage integrase (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea). These self-transmissible genes, which can move among various species and occasionally carry host genes, tend to have conserved transfer elements despite the variability in the host genes they transport. For example, we observed that a 5-kb HGT segment carrying a recombinase was shared among seven genera (\u003cb\u003eFig. S3\u003c/b\u003e). Additionally, we found a bacteriophage belonging to class \u003cem\u003eCaudoviricetes\u003c/em\u003e that was identified in a 2.7-kb HGT segment using geNomad\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e to be shared among four families (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb, \u003cb\u003eFig. S3\u003c/b\u003e, \u003cb\u003eSupplemental Note 2\u003c/b\u003e). Moreover, we estimated that 154 HGT events had occurred within the past 100 years, showing 100% nucleotide identity. This included four recent transfer events linked to the spread of tetracycline resistance genes that showed 100% identity across four different genera: \u003cem\u003ePrevotella, Anaerobutyricum\u003c/em\u003e, \u003cem\u003eBacteroides\u003c/em\u003e, and \u003cem\u003ePhocaeicola\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec, \u003cb\u003eFig. S4\u003c/b\u003e, \u003cb\u003eSupplemental Note 3\u003c/b\u003e). This observation aligns with the widespread use of tetracyclines starting in the 1950s\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFurthermore, we observed significant enrichment for functions involved in the biosynthesis of the molybdenum cofactor (MoCo), including MoCF biosynthesis, MoaC, and FdhD-NarQ (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea). Proteins in the FdhD-NarQ family, such as FdhD, facilitate the insertion of MoCo into target enzymes, while MoaC and MoCF_biosynth catalyze the early and later steps of MoCo biosynthesis, respectively. Together, these components are critical for the MoCo pathway, and MoCo-dependent enzymes are essential for anaerobic respiration in microorganisms\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e,\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e. Additionally, motile bacteria (Likelihood-ratio test (LRT), \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;0.00228) and bacteria harboring MoaC (LRT, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;0.0233) displayed significantly higher degree centralities within the HGT network, suggesting that these species act as hubs capable of engaging in gene transfer with a diverse array of other species (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ed, \u003cb\u003eFig. S5\u003c/b\u003e, \u003cb\u003eSupplemental Note 4\u003c/b\u003e). Previous studies have shown that overexpression of MoCo synthesis genes promotes the growth of \u003cem\u003eEscherichia coli\u003c/em\u003e in inflamed intestines\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e and that disruption of these genes leads to poor \u003cem\u003eE. coli\u003c/em\u003e growth\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e. This suggests that MoCo plays a critical role in enabling microorganisms to thrive in the hypoxic environment of the intestine. Our findings extend this observation by demonstrating that HGT can facilitate the widespread prevalence of MoCo synthesis genes among intestinal microorganisms, not just in \u003cem\u003eE. coli\u003c/em\u003e, thereby enhancing their adaptation to hypoxic conditions. However, further research is needed to fully elucidate the role of HGT in this process.\u003c/p\u003e\n\u003ch3\u003eHGT fosters stable microbial communities\u003c/h3\u003e\n\u003cp\u003eWe next hypothesized that species with an HGT relationship may show stable co-abundance relationships over time due to their ecological interactions. To test this, we calculated pairwise correlations among 192 species based on their abundance across individuals at each time point. We identified 1,968 species pairs with significant positive correlations (\u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;\u0026lt;\u0026thinsp;0.01, rho\u0026thinsp;\u0026gt;\u0026thinsp;0.2) and 1,702 species pairs with significant negative correlations (\u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;\u0026lt;\u0026thinsp;0.01, rho \u0026lt; -0.2) at both time points (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea, see \u003cb\u003eMethods\u003c/b\u003e). Notably, 73.7% (73/99) of species pairs with an HGT relationship at baseline (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb) exhibited stable positive co-abundance relationships, showing a 2.24-fold enrichment (\u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;4.5x10\u003csup\u003e\u0026minus;\u0026thinsp;21\u003c/sup\u003e) compared to only 32.9% (1979/6001) for species pairs without an HGT relationship (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb). These findings suggest that HGT may indeed reinforce stable ecological relationships.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFocusing on species with stable co-abundance relationships (both positive-positive and negative-negative relationships), we constructed a co-abundance network that was further clustered into seven sub-communities using Leiden clustering\u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec, \u003cb\u003esee Methods\u003c/b\u003e). The three largest sub-communities were Communities A (n\u0026thinsp;=\u0026thinsp;47), B (n\u0026thinsp;=\u0026thinsp;44), and C (n\u0026thinsp;=\u0026thinsp;92). Within each sub-community, nearly all species were positively correlated (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ed), whereas species across different communities were predominantly negatively correlated. Interestingly, the occurrence of HGT events was not evenly distributed among these communities. In Community C, about 10.5% (135/1280) of the co-abundance relationships also exhibited an HGT relationship, compared to 1.4% (5/353) in Community A (Fisher\u0026rsquo;s exact test, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;5.88x10\u003csup\u003e\u0026minus;\u0026thinsp;10\u003c/sup\u003e) and 3.1% (11/354) in Community B (Fisher\u0026rsquo;s exact test, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;2.65x10\u003csup\u003e\u0026minus;\u0026thinsp;6\u003c/sup\u003e) (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ed). Furthermore, Community C was significantly enriched in virulence factors (VFs) (LRT, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;2.0x10\u003csup\u003e\u0026minus;\u0026thinsp;5\u003c/sup\u003e), antibiotic resistance genes (ARGs) (LRT, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;3.6x10\u003csup\u003e\u0026minus;\u0026thinsp;2\u003c/sup\u003e), and antimicrobial peptides resistance genes (AMPRs) (LRT, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;7.84x10\u003csup\u003e\u0026minus;\u0026thinsp;5\u003c/sup\u003e) (\u003cb\u003eFig. S6\u003c/b\u003e, \u003cb\u003eTable S5\u003c/b\u003e) compared to Communities A and B, while no significant differences in these gene categories were observed between Communities A and B (LRT, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;\u0026gt;\u0026thinsp;0.1). In addition, temporal changes in the composition of Community C, as measured by Bray-Curtis distances, were significantly lower than those observed for Communities A and B (\u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;1.13x10\u003csup\u003e\u0026minus;\u0026thinsp;20\u003c/sup\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ee, \u003cb\u003esee Methods\u003c/b\u003e).\u003c/p\u003e \u003cp\u003eConsistent with previous reports that HGT is more likely to occur between phylogenetically close species\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e,\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e, our data show that phylogenetically close bacteria exhibited a higher HGT rate (9999 permutation partial Mantel tests, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;0.0002, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ef, \u003cb\u003eTable S6\u003c/b\u003e). We also observed a negative relationship between phylogenetic distance and co-abundance strength (partial Mantel tests, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;0.002, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eg). Importantly, even after controlling for phylogenetic distance, species pairs with an HGT relationship still demonstrated stronger positive co-abundance relationships (9999 permutation partial Mantel tests, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;0.001, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eh). Together, these findings suggest that species pairs involved in HGT tend to accumulate more resistance genes and to form more stable ecological interactions.\u003c/p\u003e \u003cp\u003e \u003cb\u003ePropagation of HGT segments in the gut community is mediated by both HGT and strain replacement\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo quantify the spread of HGT segments over time, we compared the prevalence of each HGT segment across individuals at baseline versus follow-up. Out of 338 individuals, an HGT segment was present in 25.6% of individuals on average at baseline, and this prevalence increased to 27.1% at follow-up (\u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;3.57x10\u003csup\u003e\u0026minus;\u0026thinsp;200\u003c/sup\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea). Overall, 4,696 of the 7,581 segments (62.0%) were detected in a greater fraction of individuals after 4 years compared to baseline, while 2,530 segments (33.4%) became less prevalent (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb). This indicates a net propagation of transferrable segments within the population over the study period. The top increased segment was a 2.7-kb region annotated as bacteriophage in \u003cem\u003eAgathobacter rectalis\u003c/em\u003e, with a prevalence that increased from 7.0\u0026ndash;30.2% after 4 years (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ec). Another notable increase in \u003cem\u003eAnaerobutyricum hallii\u003c/em\u003e was linked to a 5.2-kb segment encoding β-lactamase, with its prevalence rising from 40.5\u0026ndash;52.1%. As β-lactamase is involved in bacterial resistance to β-lactam antibiotics, this observation suggests a potential risk of widespread dissemination of ARGs within the population.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eNote that the observed increase in HGT segment prevalence may be explained by two scenarios. In the first, a segment is directly transferred from one species to another. In the second, a strain lacking the HGT segment is replaced by one that carries it. We hypothesized that phylogenetic analysis can distinguish between these scenarios. If the strain acquired the segments during the 4 years, the samples at both time points should cluster together because their genetic background is the same. If a baseline strain without the segment is replaced by another strain with the segment, samples from the two timepoints would cluster in different clades due to their diverse genetic backgrounds. To assess this, we first visualized changes in HGT segments within individuals over the 4-year period (\u003cb\u003eFig. S7\u003c/b\u003e), which identified an interesting cluster of HGT segments encoding glycoside hydrolase genes associated with \u003cem\u003eA. rectalis\u003c/em\u003e (\u003cb\u003eFig. S7\u003c/b\u003e). For 24 individuals, the presence or absence status of this \u003cem\u003eA. rectalis\u003c/em\u003e segment had changed in 4 years, and we constructed the phylogenetic tree of \u003cem\u003eA. rectalis\u003c/em\u003e based on its marker genes for those individuals at both time points (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ed, see \u003cb\u003eMethods\u003c/b\u003e). Interestingly, for 12 individuals, the \u003cem\u003eA. rectalis\u003c/em\u003e strains at the two timepoints clustered together (phylogenetic distance\u0026thinsp;\u0026lt;\u0026thinsp;0.1) (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ee), indicating a true HGT event resulting in the gain (or loss) of the segment within the 4-year interval (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ee). For the other 12 individuals, the \u003cem\u003eA. rectalis\u003c/em\u003e strains with or without the segment from the two timepoints clustered into different clades (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ef), demonstrating that the change in the presence of HGT segments was due to strain replacement.\u003c/p\u003e \u003cp\u003e \u003cb\u003eAn individual's HGT remains highly personalized and stable over time, indicating that HGT may serve as a lasting record of host lifestyle\u003c/b\u003e \u003c/p\u003e \u003cp\u003eAlthough the overall prevalence of HGT segments increased over time, the prevalence of 5,454 segments (71.9% of all HGT segments) changed minimally over the 4-year period, varying by no more than 5% from baseline (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea). We therefore investigated whether overall HGT profiles are individual-specific and persistent over time. Using Jaccard distance to assess the similarity of HGT profiles between the two time points, we found that these profiles were highly individual-specific, with temporal changes smaller than inter-individual differences (9999 permutation Wilcoxon-Mann-Whitney test, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;\u0026lt;\u0026thinsp;1x10\u003csup\u003e\u0026minus;\u0026thinsp;4\u003c/sup\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eb). Based on HGT profiles, we could correctly match 194 out of 338 paired samples from the same donor, a significantly better performance than could be achieved using taxonomic composition alone (42/338, Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ec) and microbial pathway alone (16/338). These findings suggest that microbial individual-specificity\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e is reflected not only in microbial composition and genetic makeup but also in their ecological interactions.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSince mobile genes are stable and highly individual-specific, we hypothesized that specific lifestyle factors might drive specific gene transfer events. Given that antibiotic use is already well known to promote the transfer of ARGs\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e, we instead focused on the often underestimated influence of non-antibiotic drug use on HGT in the gut microbiome. To estimate the effects of non-antibiotic drugs on HGT changes, we calculated the abundance of mobile genes in HGT segments (\u003cb\u003esee Methods\u003c/b\u003e) and applied linear mixed models (LMMs) that accounted for age, sex, and read count as fixed effects and treated individual ID and timepoints as random effects. This analysis identified 48 positive and 14 negative associations (false discovery rate (FDR)\u0026thinsp;\u0026lt;\u0026thinsp;0.05, \u003cb\u003eTable S7\u003c/b\u003e).\u003c/p\u003e \u003cp\u003eAmong the associations, PPI usage was significantly linked to an increased abundance of mobile genes encoding the ABC-2 membrane transporter (Beta\u0026thinsp;=\u0026thinsp;0.51, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;0.0011), the ABC transporter (Beta\u0026thinsp;=\u0026thinsp;0.54, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;0.00034), and the multi-antimicrobial extrusion protein (Mate) (Beta\u0026thinsp;=\u0026thinsp;0.43, \u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;0.00052) (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ed\u0026ndash;f). Notably, the genes encoding the ABC transporter and the ABC-2 membrane transporter, which were shared between \u003cem\u003eBlautia_A wexlerae\u003c/em\u003e and \u003cem\u003eAnaerobutyricum hallii\u003c/em\u003e, were located on the same HGT segment. Protein modeling of these genes confirmed that they code for a complete ABC transporter structure comprising two nucleotide-binding domains and two transmembrane domains (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eg). Similarly, modeling verified that Mate, which was also shared between \u003cem\u003eBlautia_A wexlerae\u003c/em\u003e and \u003cem\u003eAnaerobutyricum hallii\u003c/em\u003e, encodes a complete membrane transporter (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eh). Both the ABC transporter and Mate can translocate a variety of toxic compounds across membranes, suggesting a potential mechanism by which bacteria gain beneficial genes from other bacteria to mitigate PPI-induced toxicity\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e. Additionally, we observed that the abundance of streptomycin adenylyltransferase within mobile regions was significantly positively associated with age \u003cb\u003e(\u003c/b\u003e\u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;0.00021, Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ei). This association may reflect increased antibiotic exposure in elderly individuals, but it could also indicate that resistance genes accumulate and disseminate more readily within the microbial communities of older hosts. In contrast, the abundance of beta-galactosidase within mobile regions showed a significant negative association with age (\u003cem\u003ep\u003c/em\u003e-value\u0026thinsp;=\u0026thinsp;0.000067, Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ej). This observation might be explained by higher dairy product consumption among younger individuals, which stimulates microbial beta-galactosidase production for lactose degradation, while reduced dairy intake in older adults could lead to decreased enzyme abundance. Overall, our results suggest that the mobile gene pool within the human microbiome can serve as a reflection of host lifestyle.\u003c/p\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003eThis study presents a large-scale, longitudinal metagenomic investigation into the dynamics of recent HGT in the human gut microbiome. Our findings collectively demonstrate that HGT segments can disseminate among hosts, either directly through gene transfer events or indirectly via strain transmission, thereby enhancing their prevalence within populations. Additionally, we observe that species engaged in HGT frequently establish stable ecological associations and tend to accumulate resistance genes. Our results also highlight the individual-specific nature of HGT events and their responsiveness to host lifestyle factors such as aging and non-antibiotic drug usage. These insights expand current knowledge regarding microbial interactions mediated by gene transfer within the human gut and underscore the potential of HGT as a valuable record of recent human lifestyle.\u003c/p\u003e\n\u003cp\u003eHDMI represents a novel workflow capable of detecting recent HGT events (0–10,000 years) at the individual level. Although recent HGT in the human gut microbiome has been shown to be closely related to human lifestyle and metabolic capabilities\u003csup\u003e5,12,14\u003c/sup\u003e, most HGT detection tools cannot distinguish between recent and long-term events\u003csup\u003e13,26,27\u003c/sup\u003e. MetaCHIP is one of the few methods capable of distinguishing between recent and long-term HGT events through phylogenetic tree comparisons, but it cannot be applied at the individual level\u003csup\u003e28\u003c/sup\u003e. In contrast, the recently published tool WAAFLE\u003csup\u003e13\u003c/sup\u003e can infer HGT events from assembled contigs at the individual level and has demonstrated superior performance to MetaCHIP\u003csup\u003e13\u003c/sup\u003e. Our benchmark results show that HDMI outperforms WAAFLE, especially in detecting intra-genus HGT events. As both previous studies\u003csup\u003e5,13\u003c/sup\u003e and our findings indicate, phylogenetically close bacteria are more likely to exchange genes, making intra-genus HGT events a major component of recent gene transfer, further highlighting the advantage of our tool. Building on this approach, we identified previously unrecognized gene transfer events in longitudinal samples from 338 individuals, significantly broadening our understanding of the mobile gene pool in the human gut microbiome. This methodological advance enables the analysis of microbial gene flow at the individual level and lays the groundwork for future investigations into the ecological role of HGT.\u003c/p\u003e\n\u003cp\u003eThis study also demonstrates that HGT promotes positive ecological relationships and enhances the stability of microbial community structure. While previous cross-sectional studies have indicated that co-occurring microorganisms are more likely to participate in HGT\u003csup\u003e29\u003c/sup\u003e, no research had examined the role of HGT in human gut microbiota temporal stability. Theoretical models and experimental studies support the notion that HGT fosters cooperative coexistence among microorganisms\u003csup\u003e6,30\u003c/sup\u003e. For instance, ecological stability analyses based on the generalized Lotka–Volterra model have shown that introducing highly transferable mobile genes, such as resistance genes, can improve the community's resilience to disturbances\u003csup\u003e31\u003c/sup\u003e. By enabling the widespread dissemination of beneficial functions, HGT allows more species to share critical survival advantages, thereby promoting multi-species coexistence and enhancing overall system homeostasis. Our longitudinal observations revealed that species involved in HGT tend to accumulate more resistance genes and exhibit more stable ecological relationships, strongly confirming the predictions of both experimental and modeling studies\u003csup\u003e30-32\u003c/sup\u003e. Our findings provide new evidence for the long-term impact of HGT on human gut microbial communities and suggest that, in addition to species diversity and relative abundance, the extent of interspecies genetic exchange may be a key factor in maintaining intestinal microecological homeostasis.\u003c/p\u003e\n\u003cp\u003eIn previous studies, HGT has been reported to facilitate the spread of mobile genes through two primary mechanisms. First, mobile genes can be directly transferred between individuals via mobile elements\u003csup\u003e14\u003c/sup\u003e. For example, in a mother-to-child cohort, phage-associated gene fragments in maternal strains were transferred to distinct strains in infants without direct strain transfer, enabling the infant’s intestinal microbes to acquire additional functions\u003csup\u003e14\u003c/sup\u003e. Second, mobile genes may be acquired from new strains through HGT. In fecal microbiota transplantation studies, newly introduced strains were observed to transfer genes to other species in the recipient microbiota\u003csup\u003e33\u003c/sup\u003e, illustrating a scenario involving both strain and gene transfer. Our results add another dimension by showing that strain replacement, where strains carrying mobile genes supplant the original strains, leads to the acquisition of mobile genes by the species as a whole. Moreover, we demonstrate that, in addition to exceptional cases like mother-to-child transmission and fecal microbiota transplantation, the natural spread of mobile genes via HGT is a common phenomenon in the population. Together, HGT and strain replacement ensure the persistence and dissemination of key genes within individuals and across populations\u003csup\u003e10\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eIt is noteworthy that numerous previous studies have demonstrated that the mobile gene pool in the microbiome is largely shaped by environmental factors and lifestyle\u003csup\u003e5,12\u003c/sup\u003e. These findings suggest that each individual’s intestinal microbiome may accumulate a unique series of HGT events that reflect their lifestyle history. Our data strongly support this view: we observed that HGTs are highly individual-specific and remain relatively stable over multiple years of follow-up, transcending the effects of short-term fluctuations in microbial communities. These long-term preserved mobile genes act as a \"history book\" of the microbiome, recording the host’s environmental exposures and selection pressures. For instance, we detected the recent transfer of tetracycline resistance genes and the enrichment of specific resistance genes associated with PPI usage. Although our current detection accuracy has not yet achieved full personalization, we believe that an individual’s microbial mobile gene pool holds promise for future applications, such as inferring disease exposure history and evaluating the efficacy of personalized medical interventions, thereby expanding the role of the microbiome in precision medicine.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eLimitations of the study\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe acknowledge several limitations of our study. The Lifelines-DEEP cohort comprises participants of Dutch ancestry from the northern region of the Netherlands, so the results might be biased toward a region-specific microbial background and local environmental exposures. In addition, the Lifelines-DEEP cohort includes mainly healthy individuals, which limits its power to detect associations between HGT, diseases, and drug usage.\u0026nbsp;We also acknowledge that our workflow, like the MAG-based tool MetaCHIP, is affected by MAG quality. Since MAGs are constructed by assembling short sequence reads and binning genomic contigs, reads from HGT regions may map to multiple genomes. This can lead to inaccurate binning and the formation of incomplete or erroneous MAGs, with such contamination (i.e., heterologous sequences) potentially being misidentified as HGT events\u003csup\u003e34\u003c/sup\u003e. Multi-sample metagenomic binning can significantly improve both the number and quality of MAGs\u003csup\u003e35\u003c/sup\u003e, and subsequent detection\u003csup\u003e34\u003c/sup\u003e and removal of chimerism and contamination can help reduce these issues. However, contigs containing heterogeneous regions may not be binned with the rest of the genome\u003csup\u003e17\u003c/sup\u003e, causing many HGT events to be lost. Moreover, HDMI does not utilize all available contigs. In the future, integrating un-binned contigs with those assigned to MAGs could enhance HGT recovery, suggesting that complementary methods such as WAAFLE may continue to play an important role. Additionally, although HGT can integrate into a recipient genome through both recombination and insertion, our individual-level detection only captures HGT events via insertion, as our method does not detect gene replacement through recombination. This limitation, combined with the low coverage depth of low-abundance species, results in numerous NA values in the temporal analysis of HGT segments between baseline and follow-up.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eSupplemental Information\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSupplementary Information is available for this paper.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe thank all the volunteers in the Lifelines cohort (https://www.lifelines.nl/) for their participation and the project staff for their help and management. We thank Kate Mc Intyre for English editing. We also thank Hongyu Jin, Johannes Björk, Yue Zhang, and Jiqiu Wu for their suggestions that inspired this study. This study is supported by Netherlands Organization for Scientific Research (NWO) VICI grant VI.C.202.022 (J.F.), NWO VIDI grant 016.178.056 (A.Z.), NWO VENI grant 222.016 (D.W.), European Research Council (ERC) Consolidator grant 101001678 (J.F.), ERC Starting Grant 715772 (A.Z.), and Dutch Heart Foundation grant IN-CONTROL (CVON2018-27 to J.F. and A.Z.). In addition, H.P. is supported by a joint fellowship from the University Medical Center Groningen and the China Scholarship Council with grant number CSC202208060107. J.F. is supported by a 2023 AMMODO Science Award for Biomedical Sciences from Stichting Ammodo and the Netherlands Organ-on-Chip Initiative, an NWO Gravitation project (024.003.001) funded by the Ministry of Education, Culture and Science of the government of the Netherlands. A.Z. is further supported by the NWO Gravitation grant Exposome-NL (024.004.017) and the EU Horizon Europe Program grant INITIALISE (101094099).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eJ.F. coordinated and supervised the study. J.F., H.P., S.A.-S., and D.W. conceptualized the study. H.P. performed data analysis. A.J.R.-M. and H.P. conducted the protein-structure-based analysis. A.F.-P. performed plasmid and virus annotation. J.W. helped with the statistical analysis. R.G. performed metagenomic data assembly. J.F. and A.Z. set up the Lifelines-DEEP cohort. H.P. and J.F. drafted the manuscript. All authors reviewed and edited the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA.Z. received a speaker fee from Nestlé. All other authors declare no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData and Code Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll relevant data supporting the key findings of this study are available within the article and its supplementary information files. The raw metagenomic sequencing data and basic phenotypes (i.e., age and sex) of the Lifelines-DEEP participants at both time points are available from the European Genome-Phenome Archive (https://ega-archive.org) via accession numbers EGAD00001001991 and EGAD00001006959, respectively. Due to informed consent regulations, detailed phenotypic data for the Lifelines-DEEP cohort can be requested from Lifelines (https://www.lifelines.nl/researcher) by submitting an intention letter to the Lifelines Data Access Committee responsible for the Lifelines-DEEP data (contact: Jackie Dekens, email: [email protected]). The availability of datasets is subject to a data transfer agreement and specific rules and guidelines regulate data usage.\u003c/p\u003e\n\u003cp\u003eThe code of the workflow is available via: https://github.com/HaoranPeng21/HGT-workflow. Analysis code is available via: https://github.com/HaoranPeng21/HGT-Project.\u003c/p\u003e"},{"header":"METHODS","content":"\u003cp\u003e\u003cstrong\u003eKEY RESOURCES TABLE\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eREAGENT or RESOURCE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003eSOURCE\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003eIDENTIFIER\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eBiological samples\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eFecal samples\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003eThis study\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://www.Lifelines.nl\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eDeposited data\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eLLD baseline metagenomics\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003eEuropean Genome-Phenome Archive (EGA)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003eEGAD00001001991\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eLLD follow-up metagenomics\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003eEGA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003eEGAD00001006959\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eSoftware and algorithms\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eR (version 4.0.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003eR Foundation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://www.r-project.org/\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003ePython (version 3.7.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003ePython\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://www.python.org\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eKneadData (version 0.4.6.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e36\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://huttenhower.sph.harvard.edu/kneaddata/\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eMegahit\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e37\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/voutcn/megahit\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003edRep (version 3.4.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e38\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/MrOlm/drep\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eMetawrap (version 1.3.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e39\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/bxlab/metaWRAP\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eCheckM (version 1.2.2)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e40\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/Ecogenomics/CheckM\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eGTDBTk (version 2.1.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e41\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/Ecogenomics/GTDBTk\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eMASH (version 2.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e42\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/marbl/Mash\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eFastANI (version 1.33)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e43\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/ParBLiSS/FastANI\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eBowtie 2 (version 2.5.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e44\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/BenLangmead/bowtie2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eCD-HIT (version 4.8.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e45\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://sites.google.com/view/cd-hit\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eProdigal (version 2.6.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e46\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/hyattpd/Prodigal\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eBarrnap (version 0.9)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e47\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/tseemann/barrnap\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eProkka (version 1.14.6)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e48\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/tseemann/prokka\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eBlastn (version 2.6.0)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e49\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eeggNOG-mapper (version 2.1.11)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e50\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttp://eggnog5.embl.de\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eResistance Gene Identifier (RGI)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e51\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/arpcard/rgi\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eTraitar (version 1.1.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e52\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/hzi-bifo/traitar\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eFastspar (version 1.0.0)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e53\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/scwatts/fastspar\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eUBCGtree (version 2.0)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e54\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttp://leb.snu.ac.kr/ubcg2/about\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003evegan R package\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e55\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://cran.r-project.org/web/packages/vegan/index.html\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003elme4 R package\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e56\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://cran.r-project.org/web/packages/lme4/index.html\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003elmtest R package\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e57\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://cran.r-project.org/web/packages/lmtest/index.html\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eggplot2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e58\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://cran.r-project.org/web/packages/ggplot2/index.html\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eigraph v1.2.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e59\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://cran.r-project.org/web/packages/igraph/index.html\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eggraph v2.0.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e60\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://cran.r-project.org/web/packages/ggraph/index.html\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003encf v 1.3-2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e\u0026nbsp;\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/objornstad/ncf\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003elme4qtl v0.2.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e61\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://rdrr.io/github/variani/lme4qtl/man/relmatLmer.html\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eComplexHeatmap v2.18.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e62\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://github.com/jokergoo/ComplexHeatmap\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eDatabase\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003ePfam database\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e63\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://www.ebi.ac.uk/interpro/entry/pfam\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003edbCAN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e64\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://bcb.unl.edu/dbCAN/\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 189px;\"\u003e\n \u003cp\u003eCARD database\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003csup\u003e51\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 242px;\"\u003e\n \u003cp\u003ehttps://card.mcmaster.ca/analyze/rgi\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003eRESOURCE AVAILABILITY\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eLead contact\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFurther information and requests for resources and reagents should be directed to the Lead Contact, Jingyuan Fu ([email protected]).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMaterials availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe fecal samples of Lifelines participants can be requested via Lifelines biobank (https://www.lifelines.nl/researcher).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEXPERIMENTAL MODEL AND SUBJECT DETAILS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHuman subjects\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe Lifelines-DEEP study, part of the Lifelines biobank with over 167,729 participants, focuses on a select group of 1,539 individuals to explore various factors affecting health outcomes in the northern Netherlands. The study has been approved by the institutional ethics review board of the University Medical Center Groningen (ref. M12.113965). A follow-up study was recently conducted on 338 cohort participants who had been analyzed in 2013\u003csup\u003e65\u003c/sup\u003e. As described previously\u003csup\u003e16\u003c/sup\u003e, follow-up stool samples were collected for these 338 individuals (55.6% female and 44.4% male) at the second time point. The duration between the two time points ranged from 3.33 to 3.92 years (mean = 3.53, SD = 0.12). At baseline, the mean age of participants was 48.2 years (range = 18‒80, SD = 11.7) and their mean BMI was 25.4 (range = 17.6‒43.3, SD = 4.08). At follow-up, the mean age was 51.7 years (range = 22‒84, SD = 11.7), and the mean BMI was 25.6 (range = 16.1‒37.6, SD = 4.0). Phenotypic data assessed in this study included anthropometric traits (e.g., age, sex, BMI, height) and usage of 14 medications (e.g., PPIs, oral contraceptives, beta-blockers, statins).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003c/strong\u003e\u003cstrong\u003eMETHOD DETAILS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003c/strong\u003e\u003cstrong\u003eMetagenomic data generation and preprocessing\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eStool sample collection and processing followed the same protocol at both time points. All participants were asked to collect fecal samples at home and to place them in their home freezer (-20\u003cimg width=\"13\" height=\"17\" src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABMAAAAZCAMAAADkFeWYAAAAAXNSR0IArs4c6QAAAF1QTFRFAAAAAAAAAAA6AABmADo6ADqQAGa2OgAAOgBmOjo6Oma2OpC2OpDbZgAAZjo6Zrb/kDoAkGY6kLbbtmYAtmY6tv//25A627Zm27aQ2////7Zm/9uQ/9u2//+2///bsq8jIgAAAAF0Uk5TAEDm2GYAAAAJcEhZcwAAFiUAABYlAUlSJPAAAAAZdEVYdFNvZnR3YXJlAE1pY3Jvc29mdCBPZmZpY2V/7TVxAAAAgElEQVQoU8VQ0RKCIBC8zQpTStEiiOT/P9PDU7GZZnyrfWCYPfZ2F6J/IRqg8snd3QAc+R7b2sf+zNQdtSenTp5C+aDpsNDpsd1ww7XgoUC0ml5I+oUUj1n60bSXdbvcN63deswbgspZciY0S+bVKKhDR/RM3TLeRvEfXLj0DzACqawGRpGxViYAAAAASUVORK5CYII=\" alt=\"image\"\u003e) within 15 minutes after production. Subsequently, a nurse visited the participant to pick up the fecal samples on dry ice and transfer them to the laboratory. Aliquots were then made and stored at -80\u003cbr\u003e\u003cimg width=\"13\" height=\"17\" src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABMAAAAZCAMAAADkFeWYAAAAAXNSR0IArs4c6QAAAF1QTFRFAAAAAAAAAAA6AABmADo6ADqQAGa2OgAAOgBmOjo6Oma2OpC2OpDbZgAAZjo6Zrb/kDoAkGY6kLbbtmYAtmY6tv//25A627Zm27aQ2////7Zm/9uQ/9u2//+2///bsq8jIgAAAAF0Uk5TAEDm2GYAAAAJcEhZcwAAFiUAABYlAUlSJPAAAAAZdEVYdFNvZnR3YXJlAE1pY3Jvc29mdCBPZmZpY2V/7TVxAAAAgElEQVQoU8VQ0RKCIBC8zQpTStEiiOT/P9PDU7GZZnyrfWCYPfZ2F6J/IRqg8snd3QAc+R7b2sf+zNQdtSenTp5C+aDpsNDpsd1ww7XgoUC0ml5I+oUUj1n60bSXdbvcN63deswbgspZciY0S+bVKKhDR/RM3TLeRvEfXLj0DzACqawGRpGxViYAAAAASUVORK5CYII=\" alt=\"image\"\u003e until further processing. The same protocol for fecal DNA isolation and metagenomics sequencing was used at both time points. Fecal DNA isolation was performed using the AllPrep DNA/RNA Mini Kit (QIAGEN cat. 80204). After DNA extraction, fecal DNA was sent to the Broad Institute of Harvard and MIT in Cambridge, Massachusetts, USA, where library preparation and whole genome shotgun sequencing were performed on the Illumina HiSeq platform. From the raw metagenomics sequencing data, low-quality reads were discarded by the sequencing facility and reads belonging to the human genome were removed by mapping the data to the human reference genome (version NCBI37) with KneadData (version 0.4.6.1) and Bowtie2 (version 2.1.0)\u003csup\u003e36\u003c/sup\u003e. After filtering, the average read depth was 12.3 million for both baseline and follow-up samples. The read depths of all samples at both time points were not significantly different (paired Wilcoxon test, \u003cem\u003ep\u003c/em\u003e-value = 0.89).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eDe novo\u003c/em\u003e\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;assembly and quality control\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMetaSPAdes\u003csup\u003e66\u003c/sup\u003e was used to perform \u003cem\u003ede novo\u003c/em\u003e assembly for each sample in Lifelines-DEEP Baseline (n = 338). The assembled contigs were further binned and refined using MetaWRAP\u003csup\u003e39\u003c/sup\u003e. The quality of the MAGs was assessed by CheckM\u003csup\u003e40\u003c/sup\u003e. Genomes with \u0026gt;90% completeness and \u0026lt;5% contamination were retained. dRep\u003csup\u003e38\u003c/sup\u003e was used to dereplicate MAGs with the option -sa 0.998 -pa 0.95 to ensure non-identical genomes were included.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMAG clustering and taxonomy classification\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003edRep compare was used to compare and cluster MAGs. MASH\u003csup\u003e42\u003c/sup\u003e was used to form primary clusters with the threshold of 0.95. fastANI\u003csup\u003e43\u003c/sup\u003e was used to create secondary clusters with a threshold of 0.99. The primary and secondary clusters were used for grouping genomes to species- and strain-level. Taxonomic classification of the genomes was performed using GTDB-Tk\u003csup\u003e41\u003c/sup\u003e with default parameters. All genome taxonomies and groups are compiled in \u003cstrong\u003eTable S1\u003c/strong\u003e. We used UBCGtree V2.0\u003csup\u003e54\u003c/sup\u003e\u0026nbsp; to construct the MAG phylogenomic trees based on its 81 previously defined, nearly universal single-copy core genes.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSpecies abundance calculation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSpecies relative abundance was calculated via the sum of the genome abundance within each species in the samples. For species with more than five genomes, we randomly selected five genomes to calculate the relative and median abundance of species. We mapped reads against all selected genomes using Bowtie2\u003csup\u003e44\u003c/sup\u003e and calculated the depth of coverage in all contigs in each genome as:\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eD\u003csub\u003egenome_i\u003c/sub\u003e\u0026nbsp;\u003c/em\u003e= (Coverage11, Coverage12,..., Coverage1\u003cem\u003eL\u003c/em\u003e1, Coverage21,..., Coverage\u003cem\u003eNL\u003c/em\u003e\u003cem\u003eN\u003c/em\u003e)\u003c/p\u003e\n\u003cp\u003eWhere N represents the number of the contigs and L\u003csub\u003eN\u003c/sub\u003e represents the position in the contig. The median abundance of each genome in the metagenome was calculated with Median (D). The per-base depth of coverage K, the average read length L, the size of each genome S, and the total read number T in the shotgun data are used to calculate the relative abundance A of each genome in the metagenome according to A = (K*S/L) / T.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHGT detecting workflow\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn this work we introduce a workflow to identify recent HGT at individual level. In brief, HDMI works as follows. First, HDMI searches for genomic regions of at least 500 bp that show \u0026gt;99% identity between any pair of MAGs with ANI \u0026lt;95%. These regions are likely to be transfer events that occurred within the last 10,000 years (hereafter referred to as \u0026ldquo;recent\u0026rdquo; HGTs), assuming a molecular clock of 1 SNP/genome/year for a genome size of 10\u003csup\u003e6\u003c/sup\u003e bp\u003csup\u003e67-69\u003c/sup\u003e. Second, HDMI performs several quality checks. It first checks MAG contamination and identifies split reads that span the junction between the HGT regions and their flanking regions. In addition, we required at least 90% coverage of the breadth and a median 1x genome depth for HGT segments. Third, HDMI excludes false positive HGTs due to highly conserved genes with \u0026gt;99% identity. For this, we extracted and calculated the identity of 81 nearly universal single-copy core genes (including 42 ribosomal proteins)\u003csup\u003e70\u003c/sup\u003e between all cross-species genome pairs. Lastly, inspired by the detection of differential alternative splicing events in transcriptomes, we inferred whether each HGT in the sample is inserted and determined the ratio of insertion to non-insertion by calculating the number of reads mapping to the split sites of sequences with HGT insertion and the junction site with no HGT insertion (\u003cstrong\u003eFig. S1\u003c/strong\u003e). Each of these steps is described in more detail below.\u003c/p\u003e\n\u003cp\u003e1. HGT candidate detection\u003c/p\u003e\n\u003cp\u003eIn this pipeline, we only focused on transfers occurring between bacterial species (ANI \u0026lt; 0.95), ignoring within-species (ANI \u0026ge; 0.95) gene recombination events. We screened all genomes and used Blastn v2.6.0\u003csup\u003e49\u003c/sup\u003e to identify genomic segments of at least 500 bp that were shared between any pair of genomes from different species with an identity \u0026gt;99%\u003c/p\u003e\n\u003cp\u003eTo exclude segments carrying potentially conserved genes, which are evolving slowly within species and more likely to have a high identity, we referred to the UBCG2 resource\u003csup\u003e70\u003c/sup\u003e. This resource defines 81 nearly universal single-copy core genes, including 42 ribosomal proteins, that are thought to be vertically transmitted genes. We then calculated the identity of each vertically transmitted gene in all species pairs and excluded between-species genome pairs containing any vertically transmitted gene with \u0026gt;99% identity. For the remaining genome pairs, the best explanation for these high homologous segments is HGT rather than vertical inheritance because the expected identity between highly conserved and vertically inherited genes of different species exceeds the 99% identity threshold used in our approach to retain HGT candidates.\u003c/p\u003e\n\u003cp\u003eIn addition, the Assembly algorithm (metaSPAdes) based on DeBruijn graphs can produce contamination for regions with sequencing errors but high similarity\u003csup\u003e71\u003c/sup\u003e. This process produces \u0026apos;bubbles\u0026apos; that can result in the generation of two contigs with overlapping sequences at their ends. If not handled properly, such duplications may lead to erroneous conclusions in HGT analysis. To mitigate this risk, we disregarded HGT candidates located within 100bp of the end of the contig. We also excluded any putative HGT candidates found on contigs that matched \u0026gt;90% of their full-length with a longer contig, as these are likely to be artificial duplicates created during the assembly process\u003csup\u003e28\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003e2. Cohort-based HGT event validation\u003c/p\u003e\n\u003cp\u003eIf the median abundance of all MAGs was 0 among the five randomly selected MAGs, the abundance of the species was considered NA. To make sure the transferred region exists in the genome, the HGT region should have a minimum of 90% breadth of coverage. Subsequently, we counted those reads where one part mapped in the putative HGT and the other part mapped to the flanking region at either end of the HGT to suggest true HGT events.\u003c/p\u003e\n\u003cp\u003eBowtie2 with option -a --very-sensitive \u0026ndash;no-unal was used for read-mapping. For each transferred sequence, we required it to have at least three reads mapping its start and end sites, with at least 10 bps of overhang on either side. If any end lacked sufficient read support, the median abundance of all selected genomes of the species was 0, or the HGT region had \u0026lt; 90% breadth coverage, the presence of this putative HGT in that genome was considered NA. We then conducted detection for both genomes involved in each HGT event. If we observed the HGT in only one of the two genomes of an HGT event, across the entire cohort, the HGT event was considered a false positive.\u003c/p\u003e\n\u003cp\u003e3. HGT profiling\u003c/p\u003e\n\u003cp\u003eEach HGT event is considered to be two HGTs in two species due to their unique insertion sites (start, end) and the potential to be acquired or lost. First, we extracted sequences containing a HGT and concatenated the flanking regions of the HGT to get sequences without the HGT. We then counted the reads spanning the start and end of the split site with HGT insertion (HI\u003csub\u003e1\u003c/sub\u003e, HI\u003csub\u003e2\u003c/sub\u003e) and the reads spanning the site with no HGT insertion (nHI) in each sample using the same strategy we used in the second step above. Finally, we calculated the HGT\u0026rsquo;s presence/absence by:\u003c/p\u003e\n\u003cp\u003eHI = min(HI\u003csub\u003e1\u003c/sub\u003e, HI\u003csub\u003e2\u003c/sub\u003e) if min(HI\u003csub\u003e1\u003c/sub\u003e, HI\u003csub\u003e2\u003c/sub\u003e) \u0026ge; 3 else 0\u003c/p\u003e\n\u003cp\u003enHI = nHI if nHI \u0026ge; 3 else 0\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAVoAAAB/CAYAAACuRwpHAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAFiUAABYlAUlSJPAAABP2SURBVHhe7d3Pb9NI/wfwd773BU3KCSGE6t5YqRK4IK3KgUeiLrDHRXHPKwEOKw4cWpSKE1ppnfOK0APntH9AUFKkRdpUK6CAEgnEYXGEEMekEfwD8z08GWs8sfOjjXn64/2SLOh4MvYkzqcz4xk3I6WUICKi1PyfmUBERJPFQEtElDIGWiKilDHQEhGljIGWiChlDLRERCljoCUiShkDLdEAi4uLyGQyyGQyWFxcjE0vFosAgK2trTAtk8lga2tLK4mOsgwXLBAl63Q6uHjxIoQQqFarmJqaCtNPnDgB27axvb0d5l9bW8Pt27dRLpfhuq5WEh1lbNESDTA1NYWZmRlMTU2FQVal6/8qZ8+eBQCcOnUqkk5HGwMtEVHKGGiJRlCr1SLjr5lMxsxClIiBlmgEjuNAShnZiEbFQEtElLKJB1o1xUWfChOnWCxGpsbEWV9fh+u6yGazYXdtZmYG+XwerVbLzN7XtRu2EVGyVqsF13Uj371B31dKNvFAOwnNZhNzc3NYWlrC5uYmXNeF7/vwfR9CCJRKJViWhfX19cjrHMfp20bZR5Sk0+ng48eP6HQ66HQ6kXT1r57+/v17AMCXL1/CtIOo2WzCtm1sbGzA8zz4vg8AWFlZQT6fN7PTMHLC6vW6BCAdxzF3Rfi+LwFI3/cj6e12WwohYvcp9Xo9zFOv183dEQBkCtWkI8JxnPAa0q9pPV1dp+raV9uwa3PSHj9+LAuFggyCwNw1NlU/sw5J6TTYxCPQXgOt53kSgCwUCpF0U7lclgBkLpczd0Uw0NJRob5T6vtXLpfNLCMJgkACkLZtm7tko9EY6XtHUftq6KDVaqFUKkEIgXv37pm7I1zXhW3bZvLELC4uYnFxEc1mMzJO5bpu39JKNS5dLBaxurqKTCaDbDaLtbW1SB5zvCufz0e6nUl5s9ks8vk8ms2mmRWtVgv5fD4cx85ms3Bdty+vOsetrS2sra1hZmYmLD9pzLvT6aBYLEbyxpWNmHMeVL84+pLWQRsl+/XXX1Eul2HbNmq1GpaWlpDNZrG6uhr7+SZ59eoVAODGjRvmLszOzkIIgc3NTXMXDWJG3r3aS4t21FbqOHbbonUcR9q2LYUQ0rIs6fu+9DwvHLKoVCphXlVnIYQUQshCoSA9zwtbFKpeQgjpeZ70fV/mcrmw1dBut8OyKpWKBCAty5KFQiE8rnq93i1sNBrh+eRyucg5CiFko9EI86pzVHVS56G6gkKIyHm0221p23b4Gt/3ZaFQCPPqZY9TvySFQkE6jjN0o9E0Gg1ZKBTC6wNjtHLVd1O/xnXqmqHRTfzdMsephm16oI0LvnuljjMudTGZgUIFNz0w6XU2x66CIIgNfFILUJ7nhWnquGZw8n1fCiEiF78KhOaXR52jZVlhmv7LwDwPFRT1clRQ1c9NxvwiHbd+9P2Vy+XwM467Rk3qe5iUT12jSfupX2pDB0KIvrv8+mZZlvmSocxupNrMrvwkbWxsRNazz87OwnVddLtdPHv2LJLXsizMz89H0p49e4Zut4v79+9jdnY2sk8Nf5RKpb4u9suXLyM/Ly8vY2dnB9euXQN6d4Vfv34Nx3H6Hl4yOzsLz/MQBAGePn0a2ee6bt95XL58GQDw+fPnMK1UKgEAHj58GKYBwPz8PGzbDs93t/Wj7+fYsWNmEn1nqQXaCxcuoFqtJm43b940XzKUGayFEGaWibIsC9PT02Yyfv75Z8AITAAwMzMT+RkAnj9/DgD49OkTisVi39btdgEAHz58AADcvXsXAHD9+vVwnHN9fb0vUL148SL8v1lmsVjEmzdvAADv3r3TXgWcOXMm8jO0B6EorVYL3W4XjuP0PTQFALa3t8MnVo1bvzSZv4AP4zaqZrOJ1dVVZLNZXL9+HRsbG3AcB+Vyua8xQN+B2cTdK7NrmSRumODx48djdTVH6cKo7tK4Bo0Jqjqqcx9UZ3WOwza9DpVKJRwW0LdcLhcOKaj3b9hmnmPcsIy5b1B9TLupX5xRy6Fk7XZblsvlyLUjevcMxpnyxTHayUutRbsbV65cAXorwswW3P/Cx48fzSRgl5PR6/V6ZJ28uemtjGvXrmF7extBEKBcLsPzPAghsLGxgTt37kTK9X2/ryx9W15ejuQfxcmTJ82kocapX5zz58/39VjiNkr25MkTLC0thUNK5XIZOzs7+P3332N7ZklOnz4NxPSGlFqtlnpv8rDZV4F2enoanueh2+3iyZMn5u7vLgiC2ID/999/AwB+/PFHc1ef//znPwCAf/75x9wF9IYb5ubmwuk3i4uLmJubA3rvh+u6ePToEf7991+gN2YMAD/99BMA4K+//grL0uXzeczMzPSN0Y5ienoaQgjUarXY+quym83m2PVL8vvvv/cNL8VtlOz48eMoFAoIggDVarVv7H5UFy5cABKuLTW1b2FhwdxFg5hN3L0atdsZN3QgY1aGmXffVR51Fx5DuqW77XKq7pE5jKEmbMfd0Y+rs5r8HXdXXg2V6K9LGg5Rx9UnkVuWJZEw60Ad05wZYb7fSfuSZh3osy7kLupHB4O6Ds1rK+n6pMHGj0BDDAo6uqRAK3tfXvWBCiHCOaJ+b36mCrBCiNjX67DHQCuECOeR6vNZ4+aoJtVZTXNSgUvVI64sFST1vPp8SH3cTAU9dWx1jipN/5LEBdNB+/R5tKrspPMYp350MOjXlvpM1S9285cvDTd+BBpiWNBRBgVapVKpyFwuF37A6ouby+VkuVyObe2a1OvGpQJto9EI/4/eDSkzcIxS53q9HpnLKHqT++NuUjQajb5fKLlcLrYVEQRBJLiqczTzxgXTYftUz0F//+PKlmPWjw4GdR2qz1Qt3KHx8Y8zJlhcXEStVgPfHiLaq311M4xor/Q5p0Xt2al6ulrgUuw9E3ncOap7kXRMPV1/lnNSfeiAMZu49F9quIAOFjXGbY4jqmdImE+FU13jUYc4MIEbQUnHVGPi5pLvuPrQwcIWLR0qahmwuQJOLUM9fvx4JP3cuXNAb0rb95J0zLg/YZ5UHzpYGGgTVKtVjs/SWFQXf3V1NfK4TPMvgdDRw0BLh9LKykpkfPPSpUtmlokLggDoLSq5evUqpJRYWFjA0tKSmRWIeTZDrVYzs9AhwUBLh5K5NLler5tZRhJ38+rSpUt9adCGAm7evBkuOVbDBHH085NSconxIcZASzTA8vJyJBgi5rkORMMw0BIRpYyBlg4V9dCTT58+RdK/ffsWm/727Vug9wzevVJl6Mf4+vVrZB8GHFM9wEdPT6oPHTDmfC+ig0wtF4WxpFhPV/Ng1TJwtY1i0Dxa8xhqabNeftIx9XR9Kbeel8tfDy4uwSUiShmHDoiIUsZAS0SUMgZaIqKUMdASEaWMgZaIKGUMtEREKWOgJSJKGQMtEVHKGGiJiFLGQEtElDIGWiKilDHQEhGljIGWiChlDLRERCljoCUiShkDLRFRyhhoiYhSxkBLRJQyBloiopQx0BIRpYyBlogoZQy0REQpY6AlIkoZAy0RUcoYaImIUsZAS0SUMgZaIqKUMdASEaWMgZaIKGUMtEREKWOgJSJKGQMtEVHKGGiJiFLGQEtElDIGWiKilDHQEhGljIGWiChlDLRERCljoCUiShkDLRFRyhhoiYhSxkBLR97i4iIymQwymQwWFxdj04vFIgBga2srTMtkMtja2tJKIoqXkVJKM5HoKOl0Orh48SKEEKhWq5iamgrTT5w4Adu2sb29HeZfW1vD7du3US6X4bquVhJRPLZo6cibmprCzMwMpqamwiCr0vV/lbNnzwIATp06FUknSsJAS0SUMgZaop5arRYZf81kMmYWol1hoCXqcRwHUsrIRjQJDLRERCmbeKBV01/0aTJxisViZNpMnPX1dbiui2w2G3blZmZmkM/n0Wq1zOx93b5h26SMWmei/abVasF13cj3a9B3knZn4oF2EprNJubm5rC0tITNzU24rgvf9+H7PoQQKJVKsCwL6+vrkdc5jtO3jbKPjrZOp4OPHz+i0+mg0+lE0tW/evr79+8BAF++fAnTDqJmswnbtrGxsQHP8+D7PgBgZWUF+XzezE57ISesXq9LANJxHHNXhO/7EoD0fT+S3m63pRAidp9Sr9fDPPV63dwdAUCmUM2IUetM+5PjOOF1on+Gerq6FtVnrbZh199+pupn1iEpnXZv37VoHzx4gG63i0KhgOXlZXM3AGB+fh6PHj0CAPz555/mbqKxVKvV8OZXtVqNTVfX4vz8fORm2fz8vFbSwdFqtVCr1WDbdl8dVMuW363J2VeBttVqoVQqQQiBe/fumbsjXNeFbdtm8kQ1m03k8/nIGPHc3NzAMaynT59ibm4OmUwG2Ww2cTx5a2srMjam8jabTTMrWq1W5Dyy2Sxc1+3Lq8aKt7a2sLa2hpmZmbD8pPPodDooFouRvHFlI+ac1Xi53q0eRF/SOmijZOrexrBt2NLgV69eAQBu3Lhh7sLs7CyEENjc3DR30W6ZTdy9GrUbHTd0UC6XJQCZy+Uiefdit0MHqh5CCOl5nvR9X3qeFzusoedVdfd9X+ZyuTA9CIIwf6VSkQCkZVmyUCiEZcflbTQaYbm5XC5yHkII2Wg0wrzqPGzbjpy36goKIWS73Q7zt9ttadt2+Brf92WhUAjz6mWrz0YvV9XPtu1IuUkKhYJ0HGfoRsnK5XLf+xW36Z9dHPX9q1Qq5i4pteEDmoyJv5PmGNawTQ9YccF3r9RxxqUCkHnBBkEg0QuSil7nx48fR/I/fvxYwvjloS5iMzj5vi+FEJGLX51HuVyO5FUBOO48zCAppQyDol6OCqqe50Xymr8sgyCIDexSC8BmGbS/qe9a0jisukaT9tN4Uhs6EEL03eXXN8uyzJcMZXaPRu0mjUt1hR3HwezsbGTf9PQ0ACAIgkg6AFiWhVu3bkXSbt26BcuysLGx0dfFfvnyZeTn5eVl7Ozs4Nq1a0Bv6OL169dwHKfv4SWzs7PwPA9BEODp06eRfa7r9p335cuXAQCfP38O00qlEgDg4cOHYRp645C2bYfn++zZM3S7Xdy/f7+vXDWEUyqV+upHRP+VWqC9cOECqtVq4nbz5k3zJUOZwVoIYWaZiKmpKWxvb6NaraLZbGJrawvFYhH5fB5zc3Nm9tDCwoKZBGjpHz58AADcvXsXAHD9+vVwnHN9fb0vUL148SL8f7FY7NvevHkDAHj37p32KuDMmTORn6E9CEVptVrodrtwHKfvoSkAsL29HT6x6vnz5wCAT58+9Z1DsVhEt9sFtPqlyfwlexg3OoTMJu5emd3OJHHDBKqbPWo3dJTuDXY5dNBoNKRlWeHrVZdcdcH1MlWdk4Y84rpplUolHBbQt1wuFw4pqNcN29RxB52HuW/Uz0ka05wGbYM+BzlGOZRs1Gti2GfBMdrvK7UW7W5cuXIF6K0IM1t331Or1cLly5cRBAEKhQLq9TqCIMDOzk7fIgndp0+fzCQgIf3atWvY3t5GEAQol8vwPA9CCGxsbODOnTuRvL7vR6YUmVvSNLhBTp48aSYNVa/X+46tb+Y0IdP58+f7eiVxGyU7ffp03/sVt/3www/mSyNOnz4NxPSGlFqtllqP8UgyI+9ejdpSimvRSinDu+9mepy0WrSqZV0oFMxdst1u95Wp6mzbdiSvolrGqqXqOE5sXrPsYe+l53nSsqywVWK2WnVx+9RsBvOmnNTKbjQaiZ+VYlmWtG07MluC9jd1Uzfu2mo0GhITnv1z1O2rFi16N2aEEFhZWUGxWIxt2aq5n2ouYFrU2KPuwYMHZlLo9evXfS3etbU1BEEAz/MiY6GvX7/uu4mnlnSq+cHz8/OwLAu1Wq2v3GaziVKphJ2dHVy8eDGyb1Se5wExdWo2m1hfX8fOzg5mZ2fxyy+/AAD++OOPvvm1qn5TU1PhjULa/6anp+E4Tuy1tbKyAgD47bffIum0B2bk3athrTBlUCspCIKwtarGRX3fl35v7qZqiQkhYl+v01uIo1LTmVQ9/N78UsuywilVepl6ixa9MWZfm79qzjNVLQY9b6FQCI+pj5vp82jVuejzefXpWnGt1kH79Hm0ej3jzkNN49LPWZ8nbE77ov1Pv7bUZ6qu7VHvk9BoxotAI5hEoFUqlYrM5XKRm1Iq8JbL5dgur0m9blyNRiNy40stLmi32+H807gue6VSCc9X9Cb3x52nKl//pZHL5WKHQYIgiARX9Lp1Zt64YDpsX7vdjnzBksqWvTL090TVj0MGB1fcdW5eI7R3/OOMREQp23djtETD6HNO9edO6Olq/Nt8NsD3kHRMPV1/dnFSfegQMZu4RPudGuM2xxHVMyTM2SKqazzqEAeGzGQZRdIx1Zh43Ji9WR86PNiipQNHLQM2V8AdO3YMAHD8+PFI+rlz5wBt+fT3kHTMuD9hnlQfOjwYaInGoLr4q6urWF1dRab32EpzihSRjoGWDqyVlZXI+OalS5fMLBOnHia0sbGBq1evQkqJhYUFLC0tmVmBmGcz1Go1MwsdAQy0dGCZS5Pr9bqZZSRxN68uXbrUlwZtKODmzZvhkmM1TBDHXKbMJcZHEwMtHXnLy8uRYIiY5zoQ7QUDLRFRyhho6cBRz1swn4r27du32PS3b98Cvaey7ZUqQz/G169fI/sw4Jjq2R16elJ96BAx53sR7XdquSiMJcV6upoHq5Z6q20Ug+bRmsdQS5v18pOOqafrS9T1vFz+ejhxCS4RUco4dEBElDIGWiKilDHQEhGljIGWiChlDLRERCljoCUiShkDLRFRyhhoiYhSxkBLRJQyBloiopT9P6CvVIixLNqcAAAAAElFTkSuQmCC\"\u003e\u003c/p\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cp\u003eHere, if the HI and nHI are both 0, it is NA.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eBenchmark HDMI\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSelected representative genomes were downloaded from NCBI with GCA ID (see \u003cstrong\u003eTable S8\u003c/strong\u003e). Seqkit was used to randomly select genes from the donor genome\u003csup\u003e72\u003c/sup\u003e. HgtSIM\u003csup\u003e73\u003c/sup\u003e was used to simulate the insertion of genes with genetic divergence varying between 99% and 100% (-f genome -r 1-0-1-1 -x fna -mixed 0-1 -keep_cds -a genebank). ART (Version 2.5.8)\u003csup\u003e74\u003c/sup\u003e was used to simulate metagenomic sequencing data at 6X, 9X, and 12X coverage (art_illumina -ss HS25 -l 150 -f 9 -p -m 500 -s 10).\u003c/p\u003e\n\u003cp\u003eIn our comparison of WAAFLE and HDMI, we assessed the recovery rates of both tools at various taxonomic levels by simulating gene insertions in genome pairs spanning different evolutionary distances (i.e., intra-genus, intra-order, and intra-phylum, \u003cstrong\u003eTable S8\u003c/strong\u003e). For each genome pair, we generated three metagenomic datasets with distinct ratios (9:12, 12:6, and 6:10). Simulated reads were further assembled by metaSPAdes\u003csup\u003e66\u003c/sup\u003e, and the resulting contigs binned using MetaWRAP\u003csup\u003e39\u003c/sup\u003e to obtain MAGs. In HDMI, a correct detection is defined as the identified insertion gene matching the simulated insertion gene, with a maximum length discrepancy of no more than 1% of the full gene length. Similarly, in WAAFLE, a correct detection event is defined as the transferred gene (i.e., directional HGT event) identified by WAAFLE as being identical to the simulated insertion gene, allowing for a length difference of no more than 1% of the complete gene length. Since real genomes exhibit both recent and ancient HGT events beyond the simulated gene insertions, our comparison focused exclusively on the recovery rates of the simulated gene insertions.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHGT clustering, gene annotation, and the direction in the cluster\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHGT sequences were clustered using cd-hit\u003csup\u003e45\u003c/sup\u003e (-aS 0.9 -aL 0.9 -c 0.9), representing HGTs with similar gene content and function. The coding sequences (CDS) were assigned to all HGTs using Prodigal V2.6.3\u003csup\u003e46\u003c/sup\u003e in metagenome mode to capture gene segments. eggNOG-mapper\u003csup\u003e50\u003c/sup\u003e was used to assign putative function predictions to genes, and the queries were realigned to the Pfam\u003csup\u003e63\u003c/sup\u003e domains to get the Pfam function domain annotation (--evalue 0.001 --score 60 --pident 40 --query_cover 20 --pfam_realign realign). The RGI and CARD database\u003csup\u003e51\u003c/sup\u003e was used to predict ARGs with default parameters.\u003c/p\u003e\n\u003cp\u003eIn each cluster, sequences were aligned using the auto option in mafft\u003csup\u003e75\u003c/sup\u003e, and the gene tree was constructed with IQ-TREE\u003csup\u003e76\u003c/sup\u003e (-m MFP). We then subsampled the species tree from the comprehensive MAGs tree using ETE Toolkit. The subtree was used to infer the root of the gene tree using the OptRoot module from RANGER-DTL v.2.0. We then ran RANGER-DTL with default settings to reconcile the gene tree and the genomic tree a total of 500 times. Reconciliations from each optimal root were aggregated using the AggregateRanger module from RANGER-DTL v.2.0. The transfer direction was then extracted for annotation information in each cluster.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePangenome construction and trait prediction for each species\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo represent the function of each species, we constructed the pangenome\u003csup\u003e77\u003c/sup\u003e. Based on the MAG\u0026rsquo;s taxonomic classification, in each species we performed gene calling using Prodigal\u003csup\u003e78\u003c/sup\u003e in all MAGs belonging to the species and clustered genes using cd-hit (-c 0.9). The set of non-redundant genes comprising all MAGs from the same species is the meta-pangenome of that species. Traitar\u003csup\u003e52\u003c/sup\u003e was used to predict different phenotype traits for each species, including energy resources for growth, enzymatic activities, and morphology. The Phypat and PGL algorithms were used to predict traits. To avoid over-interpretation of false positive traits, we only considered the 22 traits with \u0026gt;90% predictive accuracy using the Phypat + PGL method, following the accuracy evaluation in the original paper\u003csup\u003e52\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eARGs were predicted using strict mode in SRID\u003csup\u003e51\u003c/sup\u003e. VF identification was done using the core set of Virulence Factors of Pathogenic Bacteria Database (VFDB 2022\u003csup\u003e79\u003c/sup\u003e) with the BLASTP option of the Diamond software with strict parameters (e-value \u0026lt; 10\u003csup\u003e\u0026minus;5\u003c/sup\u003e, \u0026gt;50% identity at the protein level, and 70% query sequence coverage). AMPRs were identified by performing a BLASTP sequence similarity search against the manually curated list of AMPRs\u003csup\u003e80\u003c/sup\u003e with the same parameters (e-value \u0026lt; 10\u003csup\u003e\u0026minus;5\u003c/sup\u003e, \u0026gt;50% identity at the protein level, and 70% query sequence coverage). Genes encoding CAZymes were identified using dbCAN (CAZyDB.08062022.dmnd). The proportion of CAZyme genes for a particular substrate was calculated as the number of the CAZyme genes involved in its utilization divided by the total number of the CAZyme genes. CAZyme classification was described in a previous study\u003csup\u003e81\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHGT network construction and degree calculation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe HGT network was constructed based on the transfer between species. The network\u0026rsquo;s edges are unweighted. The R package igraph v1.2.6\u003csup\u003e59\u003c/sup\u003e was used to construct the network and calculate the betweenness and degree centrality, with ggraph v2.0.5\u003csup\u003e60\u003c/sup\u003e used to visualize the network. An LMM with phylogenetic distance matrices as random effect was fitted to examine the association of a bacteria trait and its centrality in the HGT network, adjusting the effect of abundance and phylogenetic distance with the function relmatLmer in R package lme4qtl (version 0.2.2)\u003csup\u003e61\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eModel1: relmatLmer (Centrality ~ Abundance + (1 | Phylogenetics Distance), REML = FALSE)\u003c/p\u003e\n\u003cp\u003eModel2: relmatLmer (Centrality ~ Trait + Abundance + (1 | Phylogenetics Distance), REML = FALSE)\u003c/p\u003e\n\u003cp\u003eLRT was used to measure the effect of traits on HGT network centrality:\u003c/p\u003e\n\u003cp\u003eLRT_Trait = lrtest(model1, model2)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eThe effect of correlation and phylogenetic distance on HGT rates\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBased on the species abundance calculated above, we used Fastspar\u003csup\u003e53\u003c/sup\u003e, a C++ implementation of the SparCC algorithm\u003csup\u003e82\u003c/sup\u003e, to calculate the correlation between species in each time point. With 1000 permutations, species correlations are considered reliable only if, at two time points, the \u003cem\u003ep\u003c/em\u003e-value of the species correlation is \u0026lt; 0.01 and the correlations are either both positive or both negative. The baseline correlation value was used to represent the correlation of each species pair.\u003c/p\u003e\n\u003cp\u003eFor MAG-based HGT rate estimation, we implemented a previously published conservative approach in isolates\u003csup\u003e9\u003c/sup\u003e, defining the HGT rates as the rates of between-species genome pairs that share at least one HGT and all between-species genome pairs. Species with \u0026lt; 3 MAGs were not considered in calculating HGT rates.\u003c/p\u003e\n\u003cp\u003eBased on the phylogenetic tree built using 81 nearly universal single-copy genes, we further calculated the phylogenetic distance across all species pairs using the Python package ete3. The phylogenetic distances between species were calculated by averaging the distances of all genome pairs in each species pair.\u003c/p\u003e\n\u003cp\u003eWe used the generalized linear and logistic regression models to measure the effect of within-species microbial interaction on HGT rates and occurrence. The partial.mantel.test function from the ncf R package (version 1.3-2)\u003csup\u003e56\u003c/sup\u003e was used with pairwise deletion.\u003c/p\u003e\n\u003cp\u003eHGT_Occur: At least one between-species genome pair shares at least one HGT.\u003c/p\u003e\n\u003cp\u003eModel1: partial.mantel.test (Co-abundant correlation matrix, HGT_Rates matrix, Phylogenetics Distance matrix, resamp = 9999)\u003c/p\u003e\n\u003cp\u003eModel2: partial.mantel.test (Co-abundant correlation matrix, HGT_Occur matrix, Phylogenetics Distance matrix, resamp = 9999)\u003c/p\u003e\n\u003cp\u003eThe LMM with phylogenetic distance matrices was fitted to examine the association of the number of function genes in the species pangenome and their communities.\u003c/p\u003e\n\u003cp\u003eModel3: relmatLmer (Genes numbers ~ Abundance + (1 | Phylogenetics Distance), REML = FALSE)\u003c/p\u003e\n\u003cp\u003eModel4: relmatLmer (Genes numbers ~ Community + Abundance + (1 | Phylogenetics Distance), REML = FALSE)\u003c/p\u003e\n\u003cp\u003eModel5: relmatLmer (Bray-Curtis distance ~ Community + (1 | Phylogenetics Distance), REML = FALSE)\u003c/p\u003e\n\u003cp\u003eModel6: relmatLmer (Bray-Curtis distance ~ (1 | Phylogenetics Distance), REML = FALSE)\u003c/p\u003e\n\u003cp\u003eTo assess whether Community is significantly contributing to genes enrichment, we performed the following LRT: Trait = lrtest(model3, model4). To assess whether Community is significantly contributing to temporal abundance changes, we performed the following LRT: Trait = lrtest(model5, model6).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMicrobial co-abundance and HGT network\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe microbial co-abundance and HGT network\u0026nbsp;was constructed by\u0026nbsp;correlation, using cluster_leiden function with objective_function Constant Potts Model in igraph\u003csup\u003e59\u003c/sup\u003e to define communities. The layout of the network was visualized by ggraph\u003csup\u003e60\u003c/sup\u003e, and the length of the edges in the network was represented by the transformed correlation: 1 \u0026minus; ((correlation + 1) / 2). Within Community C (\u003cstrong\u003eFig. S6\u003c/strong\u003e), three \u003cem\u003eEnterobacteriaceae\u003c/em\u003e bacteria were seen to contain a greater than 10-fold higher abundance of ARGs, AMPRs, and VF genes compared to other species, which might explain the enrichment in Community C. To avoid introducing bias to the following analysis, these three \u003cem\u003eEnterobacteriaceae\u003c/em\u003e species were excluded from our analysis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTransmission profiling\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eEach HGT event (n = 5,644) refers to two HGTs (n = 11,288) with identical sequences shared between two genomes, resulting in 7,581 non-redundant HGTs distinguished by genomes and insertion locations. First, we extracted the genomic sequences containing the detected HGTs from the MAGs. We then removed the HGT region from the genome, concatenating the flanking regions as if the HGT was not present. Finally, we searched for reads that span this region and align with the flanking regions with no read split, further confirming the absence of HGTs within the samples, as described in the 3\u003csup\u003erd\u003c/sup\u003e step of the workflow. We then calculated the prevalence of each HGT across all individuals at baseline and follow-up, respectively, and determined the delta prevalence of each HGT. Based on the matrix of HGT presence and absence, we applied the vegdist() function from R package vegan (version 2.5.5)\u003csup\u003e55\u003c/sup\u003e to calculate the Jaccard distance dissimilarity matrix. To compare the Jaccard distance dissimilarity between and within individuals, we used the R package coin\u003csup\u003e61\u003c/sup\u003e to calculate the empiric \u003cem\u003ep\u003c/em\u003e-value by permuting samples of the HGT\u0026rsquo;s matrix 9999 times.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStrain-level profiling of samples\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHere we first applied Phylophlan3 to assign a MetaPhlan 4 v Jan21 species genome bin label to MAGs. We only looked at the species \u003cem\u003eAgathobacter rectalis\u003c/em\u003e,\u003cem\u003e\u0026nbsp;\u003c/em\u003ewhich was assigned to SGB4933. We then\u0026nbsp;ran StrainPhlan4 in t__SGB4933_group with option --marker_in_n_samples 50 --sample_with_n_markers 50 --secondary_sample_with_n_markers 50 --sample_with_n_markers_after_filt 33. The multiple sequence alignment was built on 187 available markers, and 551/676 samples were used to build the phylogenetic tree. Only those individuals with two timepoints in the resulting tree were included. Samples in which strains\u0026rsquo; phylogenetic distance was \u0026lt; 0.1 were considered to have the same strain. The tree was visualized with R package ggtree\u003csup\u003e83\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunction enrichment\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo avoid overestimating the frequency of HGTs, we established a non-redundant HGT gene database for each species. First, we tagged the HGTs with information about their host species. We then separately predicted the genes of the HGTs present in each species and used cd-hit to de-replicate these genes, using parameters identical to those used in constructing the species pangenome. We assigned the highest scoring Pfam functional domain annotations to each species\u0026apos; non-redundant genes and each species\u0026apos; non-redundant mobile genes using eggNOG-mapper\u003csup\u003e50\u003c/sup\u003e (--pfam_realign realign --evalue 0.001 --score 60 --pident 40 --query_cover 20 --subject_cover 20). Subsequently, we merged the non-redundant genes and non-redundant mobile genes of each species and calculated the number and frequency of each Pfam functional domain across all genes and all mobile genes. Foldchange was calculated as: the frequency of Pfam function domain in HGTs / the frequency of Pfam function domain in all genes in each species. Fisher\u0026rsquo;s exact test was used to determine the significance of the annotation enrichment. Two-tailed p-values were corrected using the Benjamini-Hochberg FDR method.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMobile gene profile as a fingerprint\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo test how well the mobile gene profiles distinguish samples from the same individual, we used the mobile gene abundance matrix described previously and generated the Bray-Curtis distances between all samples at two timepoints. If two samples (and only these two samples) from the same individual had the closest distance, we considered them correctly linked.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePhenotype association\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe mobile genes were predicted from HGTs using Prodigal, using cd-hit to dereplicate with the same parameters, as described above. CoverM\u003csup\u003e84\u003c/sup\u003e was used to calculate the abundance of those mobile genes with the parameters --min-read-aligned-percent 50 --min-read-percent-identity 99 --min-covered-fraction 4. To measure associations between mobile genes and drug usage, smoking, and alcohol intake frequency, we used the glmer function in the lme4 R package\u003csup\u003e56\u003c/sup\u003e to fit the generalized linear mixed effects models. The lmerTest R package\u003csup\u003e57\u003c/sup\u003e was used to estimate the \u003cem\u003ep-\u003c/em\u003evalue.\u003c/p\u003e\n\u003cp\u003eModel: Gene abundance Association (joint Association)\u003c/p\u003e\n\u003cp\u003eGenes abundance ~ Phenotype + clean_reads + sex + age + (1 | IndividualID) + (1 | TimePoint)\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eOchman, H., Lawrence, J. G. \u0026amp; Groisman, E. A. Lateral gene transfer and the nature of bacterial innovation. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e405\u003c/strong\u003e, 299-304 (2000). https://doi.org/10.1038/35012500\u003c/li\u003e\n \u003cli\u003eArnold, B. J., Huang, I. T. \u0026amp; Hanage, W. P. Horizontal gene transfer and adaptive evolution in bacteria. \u003cem\u003eNature Reviews Microbiology\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 206-218 (2022). https://doi.org/10.1038/s41579-021-00650-4\u003c/li\u003e\n \u003cli\u003ePalmer, J. D. \u0026amp; Foster, K. R. Bacterial species rarely work together. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e376\u003c/strong\u003e, 581-582 (2022). https://doi.org/doi:10.1126/science.abn5093\u003c/li\u003e\n \u003cli\u003eDmitrijeva, M.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e A global survey of prokaryotic genomes reveals the eco-evolutionary pressures driving horizontal gene transfer. \u003cem\u003eNature Ecology \u0026amp; Evolution\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 986-998 (2024). https://doi.org/10.1038/s41559-024-02357-0\u003c/li\u003e\n \u003cli\u003eGroussin, M.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Elevated rates of horizontal gene transfer in the industrialized human microbiome. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e184\u003c/strong\u003e, 2053-2067.e2018 (2021). https://doi.org/10.1016/j.cell.2021.02.052\u003c/li\u003e\n \u003cli\u003eLee, I. P. A., Eldakar, O. T., Gogarten, J. P. \u0026amp; Andam, C. P. Bacterial cooperation through horizontal gene transfer. \u003cem\u003eTrends in Ecology \u0026amp; Evolution\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 223-232 (2022).\u003c/li\u003e\n \u003cli\u003eFan, Y., Xiao, Y., Momeni, B. \u0026amp; Liu, Y.-Y. Horizontal gene transfer can help maintain the equilibrium of microbial communities. \u003cem\u003eJournal of Theoretical Biology\u003c/em\u003e \u003cstrong\u003e454\u003c/strong\u003e, 53-59 (2018). https://doi.org/https://doi.org/10.1016/j.jtbi.2018.05.036\u003c/li\u003e\n \u003cli\u003eWang, T.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Horizontal gene transfer enables programmable gene stability in synthetic microbiota. \u003cem\u003eNat Chem Biol\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 1245-1252 (2022). https://doi.org/10.1038/s41589-022-01114-3\u003c/li\u003e\n \u003cli\u003eSmillie, C. S.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Ecology drives a global network of gene exchange connecting the human microbiome. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e480\u003c/strong\u003e, 241-244 (2011). https://doi.org/10.1038/nature10571\u003c/li\u003e\n \u003cli\u003eHehemann, J. H.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Transfer of carbohydrate-active enzymes from marine bacteria to Japanese gut microbiota. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e464\u003c/strong\u003e, 908-912 (2010). https://doi.org/10.1038/nature08937\u003c/li\u003e\n \u003cli\u003eLester, C. H., Frimodt-M\u0026oslash;ller, N., S\u0026oslash;rensen, T. L., Monnet, D. L. \u0026amp; Hammerum, A. M. In vivo transfer of the vanA resistance gene from an Enterococcus faecium isolate of animal origin to an E. faecium isolate of human origin in the intestines of human volunteers. \u003cem\u003eAntimicrob Agents Chemother\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, 596-599 (2006). https://doi.org/10.1128/aac.50.2.596-599.2006\u003c/li\u003e\n \u003cli\u003eBrito, I. L.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Mobile genes in the human microbiome are structured from global to individual scales. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e535\u003c/strong\u003e, 435-439 (2016). https://doi.org/10.1038/nature18927\u003c/li\u003e\n \u003cli\u003eHsu, T. Y.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Profiling lateral gene transfer events in the human microbiome using WAAFLE. \u003cem\u003eNature Microbiology\u003c/em\u003e (2025). https://doi.org/10.1038/s41564-024-01881-w\u003c/li\u003e\n \u003cli\u003eVatanen, T.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Mobile genetic elements from the maternal microbiome shape infant gut microbial assembly and metabolism. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e185\u003c/strong\u003e, 4921-4936.e4915 (2022). https://doi.org/10.1016/j.cell.2022.11.023\u003c/li\u003e\n \u003cli\u003ePeng, H. \u0026amp; Fu, J. Unveiling horizontal gene transfer in the gut microbiome: bioinformatic strategies and challenges in metagenomics analysis. \u003cem\u003eNational Science Review\u003c/em\u003e, nwaf128 (2025). https://doi.org/10.1093/nsr/nwaf128\u003c/li\u003e\n \u003cli\u003eChen, L.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e The long-term genetic stability and individual specificity of the human gut microbiome. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e184\u003c/strong\u003e, 2302-2315.e2312 (2021). https://doi.org/10.1016/j.cell.2021.03.024\u003c/li\u003e\n \u003cli\u003eHsu, T. Y.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Profiling novel lateral gene transfer events in the human microbiome. \u003cem\u003ebioRxiv\u003c/em\u003e (2023). https://doi.org/10.1101/2023.08.08.552500\u003c/li\u003e\n \u003cli\u003eCamargo, A. P.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Identification of mobile genetic elements with geNomad. \u003cem\u003eNature Biotechnology\u003c/em\u003e (2023). https://doi.org/10.1038/s41587-023-01953-y\u003c/li\u003e\n \u003cli\u003eHutchings, M. I., Truman, A. W. \u0026amp; Wilkinson, B. Antibiotics: past, present and future. \u003cem\u003eCurrent Opinion in Microbiology\u003c/em\u003e \u003cstrong\u003e51\u003c/strong\u003e, 72-80 (2019). https://doi.org/https://doi.org/10.1016/j.mib.2019.10.008\u003c/li\u003e\n \u003cli\u003eLevillain, F.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Horizontal acquisition of a hypoxia-responsive molybdenum cofactor biosynthesis pathway contributed to Mycobacterium tuberculosis pathoadaptation. \u003cem\u003ePLOS Pathogens\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, e1006752 (2017). https://doi.org/10.1371/journal.ppat.1006752\u003c/li\u003e\n \u003cli\u003eHughes, E. R.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Microbial respiration and formate oxidation as metabolic signatures of inflammation-associated dysbiosis. \u003cem\u003eCell host \u0026amp; microbe\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 208-219 (2017).\u003c/li\u003e\n \u003cli\u003eZhu, W.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Precision editing of the gut microbiota ameliorates colitis. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e553\u003c/strong\u003e, 208-211 (2018). https://doi.org/10.1038/nature25172\u003c/li\u003e\n \u003cli\u003eTraag, V. A., Waltman, L. \u0026amp; van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. \u003cem\u003eScientific Reports\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z\u003c/li\u003e\n \u003cli\u003eBrito, I. L. Examining horizontal gene transfer in microbial communities. \u003cem\u003eNature Reviews Microbiology\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 442-453 (2021). https://doi.org/10.1038/s41579-021-00534-7\u003c/li\u003e\n \u003cli\u003eAkhtar, A. A. \u0026amp; Turner, D. P. J. The role of bacterial ATP-binding cassette (ABC) transporters in pathogenesis and virulence: Therapeutic and vaccine potential. \u003cem\u003eMicrobial Pathogenesis\u003c/em\u003e \u003cstrong\u003e171\u003c/strong\u003e, 105734 (2022). https://doi.org/https://doi.org/10.1016/j.micpath.2022.105734\u003c/li\u003e\n \u003cli\u003eTrappe, K., Marschall, T. \u0026amp; Renard, B. Y. Detecting horizontal gene transfer by mapping sequencing reads across species boundaries. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, i595-i604 (2016). https://doi.org/10.1093/bioinformatics/btw423\u003c/li\u003e\n \u003cli\u003eLi, C., Jiang, Y. \u0026amp; Li, S. LEMON: a method to construct the local strains at horizontal gene transfer sites in gut metagenomics. \u003cem\u003eBMC bioinformatics\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 702 (2019).\u003c/li\u003e\n \u003cli\u003eSong, W., Wemheuer, B., Zhang, S., Steensen, K. \u0026amp; Thomas, T. MetaCHIP: community-level horizontal gene transfer identification through the combination of best-match and phylogenetic approaches. \u003cem\u003eMicrobiome\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e (2019). https://doi.org/10.1186/s40168-019-0649-y\u003c/li\u003e\n \u003cli\u003eDmitrijeva, M.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e A global survey of prokaryotic genomes reveals the eco-evolutionary pressures driving horizontal gene transfer. \u003cem\u003eNature Ecology \u0026amp; Evolution\u003c/em\u003e (2024). https://doi.org/10.1038/s41559-024-02357-0\u003c/li\u003e\n \u003cli\u003eStecher, B.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Gut inflammation can boost horizontal gene transfer between pathogenic and commensal \u0026lt;i\u0026gt;Enterobacteriaceae\u0026lt;/i\u0026gt;. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e \u003cstrong\u003e109\u003c/strong\u003e, 1269-1274 (2012). https://doi.org/doi:10.1073/pnas.1113246109\u003c/li\u003e\n \u003cli\u003eCoyte, K. Z.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Horizontal gene transfer and ecological interactions jointly control microbiome stability. \u003cem\u003ePLoS Biol\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, e3001847 (2022). https://doi.org/10.1371/journal.pbio.3001847\u003c/li\u003e\n \u003cli\u003eGranato, E. T.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Horizontal gene transfer can reshape bacterial warfare. \u003cem\u003ebioRxiv\u003c/em\u003e, 2024.2008.2028.610076 (2024). https://doi.org/10.1101/2024.08.28.610076\u003c/li\u003e\n \u003cli\u003eBehling, A. H.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Horizontal gene transfer after faecal microbiota transplantation in adolescents with obesity. \u003cem\u003eMicrobiome\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 26 (2024). https://doi.org/10.1186/s40168-024-01748-6\u003c/li\u003e\n \u003cli\u003eOrakov, A.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e GUNC: detection of chimerism and contamination in prokaryotic genomes. \u003cem\u003eGenome Biology\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 178 (2021). https://doi.org/10.1186/s13059-021-02393-0\u003c/li\u003e\n \u003cli\u003eShaw, J. \u0026amp; Yu, Yun W. Fairy: fast approximate coverage for multi-sample metagenomic binning. \u003cem\u003eMicrobiome\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 151 (2024). https://doi.org/10.1186/s40168-024-01861-6\u003c/li\u003e\n \u003cli\u003eLangmead, B., Wilks, C., Antonescu, V. \u0026amp; Charles, R. Scaling read aligners to hundreds of threads on general-purpose processors. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, 421-432 (2019). https://doi.org/10.1093/bioinformatics/bty648\u003c/li\u003e\n \u003cli\u003eLi, D., Liu, C. M., Luo, R., Sadakane, K. \u0026amp; Lam, T. W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 1674-1676 (2015). https://doi.org/10.1093/bioinformatics/btv033\u003c/li\u003e\n \u003cli\u003eOlm, M. R., Brown, C. T., Brooks, B. \u0026amp; Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. \u003cem\u003eThe ISME Journal\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 2864-2868 (2017). https://doi.org/10.1038/ismej.2017.126\u003c/li\u003e\n \u003cli\u003eUritskiy, G. V., DiRuggiero, J. \u0026amp; Taylor, J. MetaWRAP\u0026mdash;a flexible pipeline for genome-resolved metagenomic data analysis. \u003cem\u003eMicrobiome\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 158 (2018). https://doi.org/10.1186/s40168-018-0541-1\u003c/li\u003e\n \u003cli\u003eParks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. \u0026amp; Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. \u003cem\u003eGenome Res\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 1043-1055 (2015). https://doi.org/10.1101/gr.186072.114\u003c/li\u003e\n \u003cli\u003eChaumeil, P.-A., Mussig, A. J., Hugenholtz, P. \u0026amp; Parks, D. H. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e38\u003c/strong\u003e, 5315-5316 (2022). https://doi.org/10.1093/bioinformatics/btac672\u003c/li\u003e\n \u003cli\u003eOndov, B. D.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Mash: fast genome and metagenome distance estimation using MinHash. \u003cem\u003eGenome Biology\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 132 (2016). https://doi.org/10.1186/s13059-016-0997-x\u003c/li\u003e\n \u003cli\u003eJain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. \u0026amp; Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. \u003cem\u003eNature Communications\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 5114 (2018). https://doi.org/10.1038/s41467-018-07641-9\u003c/li\u003e\n \u003cli\u003eLangmead, B. \u0026amp; Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. \u003cem\u003eNat Methods\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 357-359 (2012). https://doi.org/10.1038/nmeth.1923\u003c/li\u003e\n \u003cli\u003eFu, L., Niu, B., Zhu, Z., Wu, S. \u0026amp; Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 3150-3152 (2012). https://doi.org/10.1093/bioinformatics/bts565\u003c/li\u003e\n \u003cli\u003eHyatt, D.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Prodigal: prokaryotic gene recognition and translation initiation site identification. \u003cem\u003eBMC Bioinformatics\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 119 (2010). https://doi.org/10.1186/1471-2105-11-119\u003c/li\u003e\n \u003cli\u003eSeemann, T. barrnap 0.9: rapid ribosomal RNA prediction. \u003cem\u003eGoogle Scholar\u003c/em\u003e (2013).\u003c/li\u003e\n \u003cli\u003eSeemann, T. Prokka: rapid prokaryotic genome annotation. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 2068-2069 (2014). https://doi.org/10.1093/bioinformatics/btu153\u003c/li\u003e\n \u003cli\u003eCamacho, C.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e BLAST+: architecture and applications. \u003cem\u003eBMC Bioinformatics\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 421 (2009). https://doi.org/10.1186/1471-2105-10-421\u003c/li\u003e\n \u003cli\u003eHuerta-Cepas, J.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper. \u003cem\u003eMolecular Biology and Evolution\u003c/em\u003e \u003cstrong\u003e34\u003c/strong\u003e, 2115-2122 (2017). https://doi.org/10.1093/molbev/msx148\u003c/li\u003e\n \u003cli\u003eAlcock, B. P.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e51\u003c/strong\u003e, D690-d699 (2023). https://doi.org/10.1093/nar/gkac920\u003c/li\u003e\n \u003cli\u003eWeimann, A.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e From genomes to phenotypes: Traitar, the microbial trait analyzer. \u003cem\u003eMSystems\u003c/em\u003e \u003cstrong\u003e1\u003c/strong\u003e, e00101-00116 (2016).\u003c/li\u003e\n \u003cli\u003eWatts, S. C., Ritchie, S. C., Inouye, M. \u0026amp; Holt, K. E. FastSpar: rapid and scalable correlation estimation for compositional data. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, 1064-1066 (2019). https://doi.org/10.1093/bioinformatics/bty734\u003c/li\u003e\n \u003cli\u003eNa, S. I.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e UBCG: Up-to-date bacterial core gene set and pipeline for phylogenomic tree reconstruction. \u003cem\u003eJ Microbiol\u003c/em\u003e \u003cstrong\u003e56\u003c/strong\u003e, 280-285 (2018). https://doi.org/10.1007/s12275-018-8014-6\u003c/li\u003e\n \u003cli\u003eDixon, P. VEGAN, a package of R functions for community ecology. \u003cem\u003eJournal of vegetation science\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 927-930 (2003).\u003c/li\u003e\n \u003cli\u003eBates, D. M. (Springer New York, 2010).\u003c/li\u003e\n \u003cli\u003eHothorn, T.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Package \u0026lsquo;lmtest\u0026rsquo;. \u003cem\u003eTesting linear regression models.\u0026nbsp;\u003c/em\u003e\u003cem\u003ehttps://cran\u003c/em\u003e\u003cem\u003e. r-project. org/web/packages/lmtest/lmtest. pdf. Accessed\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e (2015).\u003c/li\u003e\n \u003cli\u003eWickham, H., Chang, W. \u0026amp; Wickham, M. H. Package \u0026lsquo;ggplot2\u0026rsquo;. \u003cem\u003eCreate elegant data visualisations using the grammar of graphics. Version\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, 1-189 (2016).\u003c/li\u003e\n \u003cli\u003eCsardi, M. G. Package \u0026lsquo;igraph\u0026rsquo;. \u003cem\u003eLast accessed\u003c/em\u003e \u003cstrong\u003e3\u003c/strong\u003e, 2013 (2013).\u003c/li\u003e\n \u003cli\u003ePedersen, T. L., Pedersen, M., LazyData, T., Rcpp, I. \u0026amp; Rcpp, L. Package \u0026lsquo;ggraph\u0026rsquo;. \u003cem\u003eRetrieved January\u003c/em\u003e \u003cstrong\u003e1\u003c/strong\u003e, 2018 (2017).\u003c/li\u003e\n \u003cli\u003eZiyatdinov, A.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e lme4qtl: linear mixed models with flexible covariance structure for genetic studies of related individuals. \u003cem\u003eBMC Bioinformatics\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 68 (2018). https://doi.org/10.1186/s12859-018-2057-x\u003c/li\u003e\n \u003cli\u003eGu, Z., Eils, R. \u0026amp; Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 2847-2849 (2016). https://doi.org/10.1093/bioinformatics/btw313\u003c/li\u003e\n \u003cli\u003eFinn, R. D.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Pfam: the protein families database. \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e42\u003c/strong\u003e, D222-230 (2014). https://doi.org/10.1093/nar/gkt1223\u003c/li\u003e\n \u003cli\u003eYin, Y.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e dbCAN: a web resource for automated carbohydrate-active enzyme annotation. \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e40\u003c/strong\u003e, W445-451 (2012). https://doi.org/10.1093/nar/gks479\u003c/li\u003e\n \u003cli\u003eZhernakova, A.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e352\u003c/strong\u003e, 565-569 (2016). https://doi.org/10.1126/science.aad3369\u003c/li\u003e\n \u003cli\u003eNurk, S., Meleshko, D., Korobeynikov, A. \u0026amp; Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. \u003cem\u003eGenome Res\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 824-834 (2017). https://doi.org/10.1101/gr.213959.116\u003c/li\u003e\n \u003cli\u003eDidelot, X., Walker, A. S., Peto, T. E., Crook, D. W. \u0026amp; Wilson, D. J. Within-host evolution of bacterial pathogens. \u003cem\u003eNature Reviews Microbiology\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 150-162 (2016). https://doi.org/10.1038/nrmicro.2015.13\u003c/li\u003e\n \u003cli\u003eDuch\u0026ecirc;ne, S.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Genome-scale rates of evolutionary change in bacteria. \u003cem\u003eMicrob Genom\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, e000094 (2016). https://doi.org/10.1099/mgen.0.000094\u003c/li\u003e\n \u003cli\u003eZhao, S.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Adaptive Evolution within Gut Microbiomes of Healthy People. \u003cem\u003eCell Host Microbe\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 656-667.e658 (2019). https://doi.org/10.1016/j.chom.2019.03.007\u003c/li\u003e\n \u003cli\u003eKim, J., Na, S. I., Kim, D. \u0026amp; Chun, J. UBCG2: Up-to-date bacterial core genes and pipeline for phylogenomic analysis. \u003cem\u003eJ Microbiol\u003c/em\u003e \u003cstrong\u003e59\u003c/strong\u003e, 609-615 (2021). https://doi.org/10.1007/s12275-021-1231-4\u003c/li\u003e\n \u003cli\u003eIqbal, Z., Caccamo, M., Turner, I., Flicek, P. \u0026amp; McVean, G. De novo assembly and genotyping of variants using colored de Bruijn graphs. \u003cem\u003eNat Genet\u003c/em\u003e \u003cstrong\u003e44\u003c/strong\u003e, 226-232 (2012). https://doi.org/10.1038/ng.1028\u003c/li\u003e\n \u003cli\u003eShen, W., Le, S., Li, Y. \u0026amp; Hu, F. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. \u003cem\u003ePloS one\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, e0163962 (2016).\u003c/li\u003e\n \u003cli\u003eSong, W., Steensen, K. \u0026amp; Thomas, T. HgtSIM: a simulator for horizontal gene transfer (HGT) in microbial communities. \u003cem\u003ePeerJ\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, e4015 (2017). https://doi.org/10.7717/peerj.4015\u003c/li\u003e\n \u003cli\u003eHuang, W., Li, L., Myers, J. R. \u0026amp; Marth, G. T. ART: a next-generation sequencing read simulator. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 593-594 (2012). https://doi.org/10.1093/bioinformatics/btr708\u003c/li\u003e\n \u003cli\u003eKatoh, K., Misawa, K., Kuma, K. \u0026amp; Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. \u003cem\u003eNucleic Acids Res\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 3059-3066 (2002). https://doi.org/10.1093/nar/gkf436\u003c/li\u003e\n \u003cli\u003eNguyen, L. T., Schmidt, H. A., von Haeseler, A. \u0026amp; Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. \u003cem\u003eMol Biol Evol\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 268-274 (2015). https://doi.org/10.1093/molbev/msu300\u003c/li\u003e\n \u003cli\u003eMa, B., France, M. \u0026amp; Ravel, J. Meta-pangenome: at the crossroad of pangenomics and metagenomics. \u003cem\u003eThe Pangenome\u003c/em\u003e, 205 (2020).\u003c/li\u003e\n \u003cli\u003eSeemann, T. Prokka: rapid prokaryotic genome annotation. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 2068-2069 (2014). https://doi.org/10.1093/bioinformatics/btu153\u003c/li\u003e\n \u003cli\u003eLiu, B., Zheng, D., Zhou, S., Chen, L. \u0026amp; Yang, J. VFDB 2022: a general classification scheme for bacterial virulence factors. \u003cem\u003eNucleic Acids Research\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, D912-D917 (2022). https://doi.org/10.1093/nar/gkab1107\u003c/li\u003e\n \u003cli\u003eKintses, B.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Phylogenetic barriers to horizontal transfer of antimicrobial peptide resistance genes in the human gut microbiota. \u003cem\u003eNature Microbiology\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, 447-458 (2019). https://doi.org/10.1038/s41564-018-0313-5\u003c/li\u003e\n \u003cli\u003eWu, G.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Two Competing Guilds as a Core Microbiome Signature for Health Recovery. \u003cem\u003ebioRxiv\u003c/em\u003e, 2022.2005. 2002.490290 (2022).\u003c/li\u003e\n \u003cli\u003eFriedman, J. \u0026amp; Alm, E. J. Inferring correlation networks from genomic survey data. \u003cem\u003ePLoS computational biology\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, e1002687 (2012).\u003c/li\u003e\n \u003cli\u003eXu, S.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e Ggtree: A serialized data object for visualization of a phylogenetic tree and annotation data. \u003cem\u003eiMeta\u003c/em\u003e \u003cstrong\u003e1\u003c/strong\u003e, e56 (2022). https://doi.org/https://doi.org/10.1002/imt2.56\u003c/li\u003e\n \u003cli\u003eAroney, S. T. N.\u003cem\u003e\u0026nbsp;et al.\u003c/em\u003e CoverM: Read coverage calculator for metagenomics , version = 0.7.0. (2024). https://doi.org/10.5281/zenodo.10531253\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"human gut microbiome, horizontal gene transfer, microbial community, mobile genes, co-abundance, drugs, strain transmission, gene transmission","lastPublishedDoi":"10.21203/rs.3.rs-6509357/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6509357/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eHorizontal gene transfer (HGT) is a major driver of bacterial evolution, but its role in shaping the human gut microbiome over time remains poorly understood. Here, we present a longitudinal metagenomic analysis of 676 fecal samples from 338 individuals collected ~4 years apart, using a newly developed workflow to detect recent HGT events from metagenome-assembled genomes. We identified 5,644 high-confidence HGT events occurring within the past ~10,000 years across 116 gut bacterial species. We find that species pairs with a HGT relationship were significantly more likely to maintain stable ecological relationships over the 4-year period, suggesting that gene exchange contributes to ecological stability. Notably, HGT and strain replacement act together to disseminate mobile genes in the population. Furthermore, our observation that an individual's mobile gene pool remains highly personalized and stable over time indicates that host lifestyles drive specific gene transfer. For example, proton pump inhibitor usage was linked to increased transfer of multidrug transporter genes. Our findings demonstrate, at individual gut microbiome level, that HGT is both an integral and stabilizing force in the human gut ecosystem and an important mechanism for disseminating adaptive functions, underscoring their potential for tracking host lifestyle.\u003c/p\u003e","manuscriptTitle":"Longitudinal Gut Microbiota Tracking Reveals the Persistent Spread of Mobile Genes and HGT-Driven Community Stabilization","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-07 06:31:34","doi":"10.21203/rs.3.rs-6509357/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"e1b17506-21a6-482f-ac56-7f2da55b2328","owner":[],"postedDate":"May 7th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":48091676,"name":"Biological sciences/Microbiology/Microbial genetics/Bacterial genetics"},{"id":48091677,"name":"Biological sciences/Ecology/Ecological networks"},{"id":48091678,"name":"Health sciences/Medical research/Epidemiology"},{"id":48091679,"name":"Health sciences/Risk factors"},{"id":48091680,"name":"Biological sciences/Microbiology/Bacteria/Metagenomics"}],"tags":[],"updatedAt":"2025-12-30T08:17:18+00:00","versionOfRecord":{"articleIdentity":"rs-6509357","link":"https://doi.org/10.1038/s41467-025-66612-z","journal":{"identity":"nature-communications","isVorOnly":false,"title":"Nature Communications"},"publishedOn":"2025-11-22 05:00:00","publishedOnDateReadable":"November 22nd, 2025"},"versionCreatedAt":"2025-05-07 06:31:34","video":"","vorDoi":"10.1038/s41467-025-66612-z","vorDoiUrl":"https://doi.org/10.1038/s41467-025-66612-z","workflowStages":[]},"version":"v1","identity":"rs-6509357","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6509357","identity":"rs-6509357","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-21T05:10:58.409756+00:00
License: CC-BY-4.0