Microbial Antioxidants and Their Interactions with Gastrointestinal Tract Epithelial Cells in the Cattle | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Microbial Antioxidants and Their Interactions with Gastrointestinal Tract Epithelial Cells in the Cattle Hui-Zeng Sun, Senlin Zhu, Minghui Jia, Hou-Cheng Li, Bo Han, Tao Shi, and 12 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4193125/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Oxidative stress is a pivotal trigger of immune responses and cellular dysfunction. The ruminant gastrointestinal tract (GIT) with complex microbial community demonstrated strong metabolic capabilities and close relationships with host oxidative stress. However, microbial antioxidant secondary metabolites in the GIT and their interactions with the host are still under-studied. Here, based on metagenome assembled genomes (MAGs) resources, deep learning, single-cell RNA-sequencing, and large number of protein-metabolites interactions inferring, we discovered the antioxidants from the microbial secondary metabolites and deciphered their potential interactions with the GIT epithelial cells. Totally 26,503 biosynthetic gene clusters (BGCs, 8,672 novel ones) were identified from 14,093 non-redundant MAGs distributed in 10 segments of cattle GIT. From the 436 BGCs’ products, totally 396 secondary metabolites were predicted into 5 categories of antioxidants using a custom-trained deep learning tool. The GIT epithelial cells showed higher expression of antioxidant genes among 1,006 clusters (belong to 126 cell types) of 51 tissues in cattle, especially the spinous cells and basal cells in the forestomach. Moreover, using metabolite-protein interaction inference, we predicted over 6 million pairs of interactive scores between 396 secondary metabolites and 14,976 marker proteins in the GIT cell types. Significant interactive scores between Cys-Cys-Cys and marker proteins participating in antioxidative metabolism such as CYC1, MGST1, GSTA1 in rumen and omasum spinous cells were highlighted. Our study presented a comprehensive computational framework for exploring natural antioxidants from MAGs, revealed the potential antioxidants from cattle GITs microbiota, and inferred their potential interactions with host GIT cell types, which will provide novel insights into the under-investigated antioxidant potential of cattle GIT microbiota and reshaping our comprehension of the symbiotic interplay between the gut microbiota and host antioxidant defense mechanisms. Biological sciences/Chemical biology/Biosynthesis/Oxidoreductases Biological sciences/Microbiology/Microbial communities/Metagenomics Biological sciences/Microbiology/Microbial communities/Microbiome Biosynthetic gene clusters database Microbial secondary metabolites Antioxidants prediction Cattle single-cell atlas Host-microbiota interaction Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Main Reactive oxygen species (ROS) are high reactive molecules generated as byproducts of cellular metabolism, such as mitochondrial respiration 1 , immune responses 2 , and enzymatic reactions 2 , in living organisms. However, different concentrations of ROS could result in completely different cellular response, referred to eustress or distress 3 . In the context of oxidative eustress, oxidants at modest levels are engaging with specific molecular targets to facilitate physiological redox signaling. By contrast, during oxidative distress, an escalated concentration of oxidants leads to ROS attack non-specific cellular components 3 . Although methods for detecting ROS at the cellular level have emerged 3 , the distribution of ROS in various cell types is unclear. Due to the inevitable exposure to exogenous nutrients and microbial pathogens, the gastrointestinal tract (GIT) is recognized as a significant source of ROS and antioxidants 4 . Therefore, various nutritional supplements are used as antioxidants for the treatment of oxidative stress in livestock industry 5 . Amongst, microbial agents were gradually used for effective antioxidant purposes by producing antioxidants directly or indirectly 6 . Microbial secondary metabolites have been reported to possess the functions of antimicrobial 7 , anticancer 8 , antioxidant 9 , and regulatory potential of microbial communities 10,11 , representing a significant source of bioactive antioxidants. Serving as the smallest unit for studying microbial metabolism 12 , biosynthetic gene clusters (BGCs) are groups of genes in a microbial genome that work together to produce certain secondary metabolites. Through interpretation of newly identified BGCs, natural bioactive products and drug resources are continuously unraveled, for example, the biosynthetic mechanism of mangicol-like diterpenoid compounds (ester diterpenoids) with significant anti-inflammatory activity was discovered in Aspergillus oryzae 13 . Vast ranges of metabolic and functional diversity possessed by environment microbiota present a promising opportunity for the exploration of novel enzymes and biochemical compounds from BGCs 14–16 , including the potentials for discovering new antioxidants. It is well acknowledged that ruminants are the most typical animals exhibiting symbiotic relationships with GIT microorganisms, together determining the host production and health. Approximately 70% of cattle’s energy requirements are provided by short-chain fatty acids (SCFAs) from rumen microbial fermentation of human inedible plant biomass 17 . In the bovine GIT, the complex microbiome and dietary intensify microbial interactions, which could generate a significant diversity of microbial secondary metabolites 10 . However, research on BGCs of ruminant remains scarce 18 , which are largely attributed to the unculturable features of GIT microbiota 19 . Currently, it makes possible to reconstruct the genomes of tremendous uncultivated microbiota 18 and to explore the BGCs from abundant metagenomic data using emerging MAGs database and BGC-identification tools. Another big challenge is how to assess the potential properties of new natural products encoded by GIT BGCs. To overcome the limitations of time-consuming and labor-intensive by using wet-lab methods, deep learning models may facilitate the determination of molecular properties by capturing the most concise and specific molecular structures 20,21 , yet the models identifying antioxidant features based on the structure of small molecules need to be trained. In this study, to explore the antioxidants biosynthetic potentials of the cattle GIT microbiome and their interactions with host cell types, we first established a comprehensive BGCs dataset from the to-date largest cattle GIT MAG database. By training the structure-based deep learning model to predict antioxidant properties, we were able to identify potential antioxidants of BGCs products. Next, to identify which cell types are more readily regulated by antioxidants, we characterized the oxidative stress signaling patterns of each cell type in our cattle single-cell atlas (Han et al., 2024, in preparation; Shi et al., 2024, in preparation). Subsequently, computational tool was applied for inferring the interactions between antioxidants and key cell types. Our work discovered abundant microbial genes, antioxidants and interacted cell types in the cattle GIT, and provide novel insights into the understanding of cattle GIT-derived oxidative stress. Results The biosynthetic potential of the cattle GIT microbiome based on high-quality MAGs We expanded our previous database, BGMGM (Bovine Gastro Microbial Genome Map), by integrating additional 2,114 non-redundant large intestine MAGs from a published study 22 , as well as 109 MAGs newly sequenced and assembled from third-generation sequencing of omasum, jejunum, and rectum content samples. After dereplication, the number of high-quality MAGs increased from 13,572 to 14,093 (44.88% of total MAGs, contamination > 90% and completeness < 10%; Fig. 1 A, Table S1). The comprehensive MAGs dataset from 10 segments of the GIT provides extensive coverage (Fig. 1 B, Table S1) and allows for a comparison of microbial components across different GIT locations. Approximately, a quarter of MAGs (3,759) span across different segments of GIT, with 3.0% and 0.60% of MAGs were concurrently found in the stomach and small intestine, small and large intestine regions (Fig. 1 B, Table S2). To improve the quality of genes in these MAGs, pre-assembled contigs were scaffolded using long reads as a backbone. Then, a high-quality MAG database with more complete gene sets was obtained, with the average number of contigs per genome decreased from 154.8 to 86.6 ( P < 0.001), the length of the maximum scaffold increased from 144,318.9 to 276,725.2 bp ( P < 0.001), the mean N50 elevated from 62,223.5 to 138,958.2 bp ( P < 0.001, Fig. 1 C, Table S1). After being mapped to the Genome Taxonomy Database Toolkit (GTDB-Tk), a total of 11,141 (21.9%) MAGs were annotated (Fig. 1 A, Table S1). Next, the antiSMASH framework was applied to identify BGCs in MAGs. From all 14,093 MAGs, totally 26,503 BGCs (clustered into 328 gene cluster clans (GCCs)) were identified (Fig. 2 A, Table S2) with length ranging from 1 to 117 kb (46.5% were 15 to 25 kb) (Fig. 2 B, Table S2). After comparing with BIG-FAM/RefSeq 23 , totally 8,672 (32.7%) BGCs were assessed as novel BGCs (distance > 0.6), and 1,031 (3.9%) were completely novel (distance = 1.0),which represent the unique BGCs in the cattle GIT (Figs. 2 A & C). A large number of BGCs were found to encoded natural products classified into NRPS (Non-Ribosomal Peptide Synthetases, 5,096 BGCs) and RiPPs (Ribosomally-synthesized and Post-translationally modified Peptides, 13,689 BGCs) (Figure S1). Moreover, within 167 types of natural products derived from 26,503 BGCs, a series of Betalactone (3,027 BGC products) and Arylpolyene (2,637 BGC products) were identified. Betalactone and Arylpolyene are recognized as substances involved in food fermentation and plant defense mechanisms 24 , with the potential of antioxidant, antibacterial, and anticancer activities. For the 1,031 completely novel BGCs, 15 distinct products categories, including 20% NRPS, 15.1% RRE-containing, 10.7% cyclic-lactone-autoinducer were encoded (Fig. 2 C, Table S3), indicating a profound database of BGCs were harbored by cattle GIT. In 167 types of natural products, 95 were originated from specific microbial phylum (Figure S2, Table S3). Actinobacteria accounted for 5.3% of the 1,031 complete novel BGCs and encompassed 7 distinct types. It’s noticed that Methanobacteriota, as one of the most abundant archaea phyla in the rumen and BGMGM 25,26 , specifically encodes novel subtypes like NRPS-like.NRPS.thioamitides, NRPS-like.thioamitides, thioamitides.LAP. CowSGB_9022, an unannotated species (92.54% completeness and 2.92% contamination for its genome) from the Pseudomonadaceae family and the Saccharopolyspora genus, encodes the highest number of BGCs and produces the most variety of products (Fig. 1 D, Table S4). Totally 7 novel BGCs were identified in CowSGB_9022 including c00003, c00046, c00102, c00137, c00245, c00350, and c00368. The c00046 BGC encodes an amino acid analogue which contains a substitution of the R group of glycine in an NRPs-like BGC, whereas the c000368 BGC specifically encodes a Lanthipeptide-class-i peptide (Fig. 1 E). Interestingly, the genes in the CowSGB_9022 found in the small intestine were enriched in the acetate oxidation, ethanol oxidation, hydrogen oxidation and fermentation (Figure S3). Its diverse encoded BGCs and abundance of oxidative-reduction related genes suggest its significant role in the anaerobic fermentation process within the small intestine and strong adaptability within the intestine microbiome. In sum, we characterized the distribution of secondary metabolite classes in cattle GIT, discovered novel cattle-specific BGCs as well as their encoded products and belonged MAGs. The evaluation of antioxidative properties of metabolites encoded by microbial BGCs To explore the antioxidative properties of the BGCs products, we trained an interpretable ensemble model for identifying potential antioxidants using graph neural network 27,28 . A total of 60,621 antioxidant assays with the corresponding Simplified Molecular Input Line Entry System (SMILES) information were collected from PubChem, CHEMBL, and AODB databases 29 . These metabolites were further classified into five categories of binary classification datasets: electron transfer (ET), hydrogen atom transfer (HAT), inhibition of lipid peroxidation-based assay (Lipid), NRF2-antioxiant response element signaling pathway (NRF2), and others (Others) based on the method 29 of “collection of small molecule antioxidant data” (Fig. 3 A, Table S4). Ensembles of Message Passing Neural Network (MPNN) on each of the five datasets were applied to learn the characteristics directly from molecular diagrams. Optimization was performed to ensure the most optimal values of depth ( 6 ), dropout rate (0.05), feedforward neural network hidden size (FFN, 2,000), number of FFN layers ( 2 ), and the overall model’s hidden size (2,000) in MPNNs of the ET models (The optimal parameters of the remaining four models are shown in Table S4). To improve the model performance, molecular features including Morgan fingerprint, Morgan count, and RDKIT 2d normalized features were provided as data augmentation strategies. As a result, five structures of MPNNs that could predict the antioxidants through the successive convolution from the digital atoms and bonds with hidden size larger than 1,000 were established (Fig. 3 B, Table S5). A ten-fold cross-validation strategy with an 80%-10%-10% split for training, validation, and testing respectively were utilized, combined with 60 assembled models. To deal with the unbalanced training data, binary cross-entropy loss function was employed. Simultaneously, Precision-Recall Area Under the Curve (PRAUC), Area under the Receiver Operating Characteristic Curve (AUROC), F1 score, and Matthew's Correlation Coefficient (MCC) were used as evaluation metrics across five datasets for. In the training datasets, the PRAUC and the AUROC reach 0.970, 0.990, 1.000, 0.990, 0.840, and 0.974, 0.978, 0.998, 0.999, 0.969 for the ET, HAT, Lipid, NRF2, and Others training datasets, respectively (Fig. 3 B, Table S5). In the ten-fold validation, due to the more balanced dataset, the ET, HAT and Lipid models performed with high average PRAUC (ET 0.718, HAT 0.907, Lipid 0.844). Particularly, with imbalanced datasets, the NRF2 model and Others model performed well, achieving PRAUC values of 0.539 ± 0.0512, 0.413 ± 0.0992 (Fig. 3 B, Table S5), which is obviously higher than the PRAUC (0.364) of antibiotic MPNNs in a study trained by 512 antibiotic-active and 38,80 non-antibiotic-active compounds recently published in Nature 27 . To predict antioxidant properties, we sorted out the structures of secondary metabolites as follows: 244 metabolites’ SMILES from our 26,503 BGCs, 134 from nature products database and additional 58 from mapped MIBIG records (detailed in the “exploration of microbial-derived small molecules” methodology). This resulted in a dataset of 436 secondary metabolite SMILES strings that were utilized for antioxidants prediction (Fig. 3 C, Table S6). The majority of these 436 molecules exhibited low similarity to known antioxidants (Average morgan fingerprint similarity = 0.0553, Fig. 3 C, Table S6). Subsequently, five fine-tuned graph neural networks performed convolutional operations based on the specific atoms and bonds present within each input molecular structure, thereby assessing the antioxidant properties of the 436 metabolites. Totally 245, 121, 266, 2, and 47 potential antioxidants were predicted with ET, HAT, Lipid, NRF2, and others antioxidant properties, respectively (Fig. 3 D, Table S7). The interpret structures of potential antioxidants with high prediction score were calculated by conducting a Monte Carlo tree search 27 . We found that 229 (52.5%) secondary metabolites were predicted to have at least two types of antioxidant properties. Among them, four metabolites possessed 4 types of potential antioxidant properties, including NPA007583, CMNPD14685, BGC0002074 and BGC0000621 (Table S7). Overall, our deep-learning model provides a brand-new approach to predict the antioxidant based on their molecular structure, offering an important avenue for exploring and utilizing secondary metabolites. The distribution of BGCs and their encoded metabolites across GIT segments After evaluating the BGCs and secondary metabolites from the MAGs, we focused on the distribution of BGCs and their secondary metabolites in different GIT segments (Fig. 4 A). Similarly, most of the BGCs were identified in the rumen (10,836) and rectum (10,049) (Fig. 4 A, Table S8). Interestingly, we found that 896 BGCs commonly occurred in the forestomach and the small intestine, much higher than the small-large intestines (169 common BGCs) and the forestomach-large intestines (108 common BGCs) (Fig. 4 A, Table S8), suggesting a closer relationship between the secondary metabolic potential of the forestomach and the small intestine than the small and large intestines. In other word, an independent microbial metabolic community may reside in the large intestine. When considering the distribution of novelty BGCs, BGCs in the small intestine showed a significant higher novelty (Mean dist = 0.396) compared to those in the forestomach (Mean dist = 0.358, P.adjust < 0.001) and large intestine (Mean dist = 0.297, P.adjust < 0.001, Fig. 4 B, Tables S2, 8). Particularly, novelty of rumen BGCs remained consistently high and presented no significant difference with small intestines (compare to duodenum, jejunum, and ileum, the P.adjust were 0.168, 0.478, and 1.000, respectively. Figure 4 B, Table S8). Moreover, the study found that BGCs encoding NRPS and RiPP were the most common types in all segments of the GIT, while forestomach and small intestinal had a higher proportion of NRPS (25.9%, 27.9%) compared to the large intestinal tract (9.4%), suggesting a more enriched diversity of multi-domain protein complexes of the forestomach and small intestinal secondary metabolites. Furthermore, we focused on the distribution of potential antioxidants and their sourced microbial species in the GIT (Fig. 4 D, Table S9). Totally 95 potential antioxidants with at least two antioxidative properties and predicted scores higher than 0.7 were selected for GIT distribution (Fig. 4 E, Table S9). Among them, 33 antioxidants found to be specific to the rumen, 4 to the rectum, and 1 to the Ileum and cecum. A particular molecular with conjugated double bonds from the 4 unique potential antioxidants of rectum were identified, achieving ET and HAT prediction scores of 0.747 and 0.848. It was encoded by the FixCowSGB-3326 species of the CAG-100 genus within the Firmicutes. Its conjugated double bond structure along with the carboxyl group were calculated to contribute 79.7% of antioxidant prediction through the interpretability analysis of the Monte Carlo tree-based model (Fig. 4 E, Table S9). The distinct potential antioxidants in the rumen were originated from 18 species, including six from Methanobrevibacter_A and Methanobrevibacter_B , nine from seven genera such as Ruminococcus , UBA1213 , and Thermobifida . Many of the 95 potential antioxidants showed peptide structures. Among them, Fuscachelin C, encoded by orphan NRPS BGCs from Thermobifida fusca , a degrader of plant cell walls 30 , has been predicted to possess antioxidant properties in ET (0.928), Lipid (0.941), and Others (0.710) categories, suggesting the antioxidant potential of some cellulose-degrading microbiota in the rumen. Interpretability analysis of ET prediction suggests that 98.2% of contribution derived from the dipeptide backbone carbon chain structures in Fig. 4 E (Table S9). The interpretability analysis of the oligopeptide, Cys-Cys-Cys, suggests a contribution score of 0.993 from the thiol group-containing structure, which served as the primary antioxidant group in the glutathione (GSH) (Fig. 4 E, Table S9). In addition, we have also discovered peptide structures containing multiple cysteines with thiol group-containing structures in the specific metabolic products from the rumen, omasum, abomasum to the small intestine from 8 species in the Firmicutes phylum (Fig. 4 E). The secretion of these potentially potent antioxidants suggests a strong potential for antioxidant compound secretion within the bovine GIT microbiota, which may regulate the host’s antioxidant status as previously reported 31,32 . The oxidative stress signaling patterns of the cattle single-cell atlas The oxygen signal is essential for life activities 33 , yet the regulatory gene expression of oxygen signals at the single-cell level is currently unknown. Using our cattle single-cell atlas, we compared the mRNA expression levels of antioxidant, ROS generation, and ROS signal response in 1,803,004 cells belonged to 126 cell types and 1,006 clusters across 51 tissues (Han et al., 2024, in preparation; Shi et al., 2024, in preparation). After scoring the 16 ROS generation, the 19 antioxidant, and the 22 ROS response pathways from Gene Ontology (GO) database, we have identified the top 50 cell clusters in three aspects of oxygen signal, among which 19 clusters were from the digestive system, 13 from the immune system, and 7 from the reproductive system (Fig. 5 A, Table S10). Neutrophils were found to be the major cell types in the top 50, which align with the understanding that neutrophils produce ROS to eliminate pathogens 34 . As for ROS responding pathways, the average scores of gene sets were highly expressed in the digestive system, reproductive system, and immune system (Fig. 5 A, Table S10), with enterocytes of the ileum and duodenum sorted into the top 10 clusters (Fig. 5 A, Table S10). In terms of the antioxidant pathways, a total of 31 out of the top 50 clusters belonged to the digestive system, most of which were identified as epithelial cells in the esophagus and forestomach. Spinous cells of the reticulum, esophagus, rumen, and omasum were ordered at the top 4, 7, 8, and 9 among 1,006 clusters, highlighting the oxidative stress environment of GIT epithelium cells (Figure S4, Fig. 5 A, Table S10). We next focused on the function of GIT cells with the highest scores in the antioxidant, ROS generation, and ROS signal response pathways. Interestingly, spinous cells in the forestomach and enterocytes in the small intestine especially in the ileum showed the highest score in the GIT cell types. Particularly, spinous cells in the reticulum and enterocytes in the ileum, showed the most significant enrichment and the largest number of genes in pathways related to mitochondria, electron transport, respiration, antioxidant and ROS generation (Figs. 5 B & C). Additionally, the genes with high expression in neutrophils located in the rectum were enriched in pathways related to response to other organisms. Similar results were also found in the hindgut neutrophils. These results indicate that the GIT epithelial cells act as a selective barrier between tissue and the gut environment, undergoes challenge from lumen microbiota. To elucidate the regulatory mechanisms behind the gene expression of these cell types, we ascertain the specific correspondence of regulons within each cell type in the whole GIT segments using SCENIC 35 (Han et al., 2024, in preparation). In the spinous cells of the forestomach, we found that ESRRA consistently maintains a top 5 RSS score in the rumen, reticulum and omasum. Among the top 20 genes regulated by it, there are a large number of genes related to energy metabolism and antioxidation, such as HSD17B13 , SLC25A34 , ATP5ME , TST MGST3 , and GLRX . For the first time, we analyzed oxygen signal gene expression in the to-date largest cattle single-cell atlas and discovered that cell types in the digestive system exhibited a higher response to ROS signals than other tissues, especially the epithelial cells in the forestomach. Cell metabolism of high antioxidant cell types in the forestomach Cellular antioxidant enhancement often signifies that the cells have been subjected to oxidative stress 36 . To further explore the clues behind the occurrence of a strong antioxidant state in the GIT, we selected spinous and basal mitotic cells using 3 major pathways of antioxidants and average antioxidant scores as primary reference metrics (Fig. 6 A). Apart from spinous and basal mitotic cells in the forestomach, smooth muscle cells in the rectum, and principal cells in the epididymis were in the top 50 for the three major antioxidant pathway scores and the average antioxidant scores (Fig. 6 A), indicating strong oxidative stress in these cells. Afterward, through the High Dimension Weighted Gene Co-expression Network Analysis (hdWGCNA), we found that spinous and basal cells simutaniously exhibited higher correlations with module1, 2, 4, 5 (Fig. 6 B, module1 R = 0.164, P < 0.001, R = 0.152, P < 0.001; module2 R = 0.113, P < 0.001, R = 0.156, P < 0.001; module4 R = 0.204, P < 0.001, R = 0.170, P < 0.001; module5 R = 0.117, P < 0.001, R = 0.170, P < 0.001) than other modules and specific expression in gene modules 1, 2, and 4 in the rumen (Fig. 6 B, Figure S5, Table S11), which were enriched in the functions of aerobic electron transport chain ( P < 0.001, Enrichment score = 111942.7) and fatty acid beta-oxidation ( P < 0.001, Enrichment score = 623.48). Moreover, these modules are also strongly correlated with important antioxidant pathways such as antioxidant activity (R = 0.535, P < 0.001), peroxidase activity (R = 0.493, P < 0.001), and peroxiredoxin activity (R = 0.576, P < 0.001). Similarly, we identified highly enrichment of modules related to fatty acid metabolism and energy metabolism in both modules in Reticulum (module1, 3) and Omasum (module7) (Figure S5, Table S11). Since spinous and mitotic basal cells utilized fatty acids as the major energy source, these results demonstrated the co-expression of the antioxidant genes and fatty acid metabolism genes, suggesting that antioxidant genes could serve as protection for robust fatty acid metabolism in these cells. As the rumen is the primary site for fermentation and production of SCFAs in the forestomach. Next, we analyzed the cellular antioxidant metabolism and fatty acid metabolism in the rumen using the flux balance analysis tool. It was discovered that the metabolic activity of spinous and basal cells in the forestomach was significantly higher ( P < 0.05) than other cell types. Specifically, among all the results of the flux balance analysis, a total of 5,696 up-regulated reactions ( P < 0.05) in spinous and mitotic basal cells and 1,012 down-regulated reactions in other cell types were found ( P < 0.05), highlighting the strong metabolic status of the spinous and mitotic basal cells in the forestomach. Pathways related to energy metabolism, antioxidant and respiration-related processes were found in a high activity, including fatty acid oxidation (631 upregulated reactions in spinous and basal cells, compared to 3 in other cell types), glutathione metabolism (21 upregulated reactions in spinous and basal cells, no in other cell types), ROS detoxification (7 upregulated reactions in spinous and basal cells, no in others cell types) (Fig. 6 C, Table S12). We focused on glutathione metabolism and fatty acid oxidation, where the reduction of GSH to GSSG in glutathione metabolism and the elongation of short fatty acids into longer fatty acids showed the highest Cohen's value (Fig. 6 D, Table S12). These two reactions not only demonstrate how the cell regulates its oxidative stress by consuming GSH but also illustrate the process that synthesizes fatty acids from short fatty acids (Fig. 6 D). To further assess the subclusters and that regulate the fatty acid and antioxidant process, the two cell types were re-clustered into 6 subclusters (Fig. 6 E, Table S13). Interestingly, we observed subclusters 3 and 4 that could be an intermediate cell types in the differentiating process from basal cell to spinous cell with high expression of KRT6A , KRT14 , and KRT5 . Through pseudo-time analysis, we have observed a cell differentiation trajectory from subcluster 4, to subcluster 3, and finally to subcluster 1 (Fig. 6 F). Hallmark pathways scoring results showed that subcluster 3 exhibited significantly higher expression levels of fatty acid metabolism (u = 0.21, P < 0.01), while subcluster 4 exhibited strong cell division potential (u = 0.13, P < 0.01) when compared to other subclusters (Fig. 6 G). These results suggest that a specialized phase of fatty acid metabolism and antioxidant defense stages exist during the known differentiation process from basal cells to spinous cells (Fig. 6 G). Moreover, two TFs, ESRRA and PPARG, along with the genes they regulate related to fatty acid metabolism and antioxidation, are highly expressed in cluster 3 (Fig. 6 H). Sum up, we found that increased antioxidant metabolism acted as a defense against intense fatty acid metabolism in a specific subtype of cells dedicated to robust fatty acid metabolism. Deriving interactions between potential antioxidants and high GIT oxidative stress cells One of the most relevant types of interactions that modulate host metabolism status are the metabolite-mediated interactions among microbes, metabolites, and host cell proteins 37 . Here, we employed a structures based virtual screening method,transformerCPI 2.0 38 , to score the interaction between 436 secondary metabolites and 14,976 marker proteins from genes (LogFC > 1) of all cell types in GIT. A total of 6,484,608 interactive scores were obtained (Fig. 7 A). Furthermore, we examined the interaction between distinct cell types in the rumen and the potential antioxidant properties of Cys-Cys-Cys, an oligopeptide which exhibited predictive scores of 0.770 and 0.822 for its effectiveness in ET and lipid antioxidation, respectively. This unique compound is specifically prevalent in the rumen, omasum, and rectum. (Fig. 4 E), which possesses the same antioxidant moiety, thiol (-SH), as GSH. First, we calculated its absorption properties using the ADMET2.0 ( https://admetmesh.scbdd.com/ ) tool and found that it has acceptable HIA, Lipinski Rule, and MDCK Permeability, indicating the potential to pass through the GIT epithelium. Moreover, in the rumen spinous cells, 151 proteins were predicted to interact with Cys-Cys-Cys (score > 0.5) among the 263 marker protein sequences (Fig. 7 B). Interestingly, these 151 proteins were enriched into more than 10 pathways, with half of which were associated with mitochondria and antioxidant metabolism ( P.adjust 10, Fig. 7 C). Several pathways were found to co-occur in both the marker gene enrichment results and the interactive gene enrichment results, such as mitochondrial membrane, mitochondrial envelope, organelle envelope, mitochondrial inner membrane and cellular detoxification pathways (Fig. 7 C). In the cellular detoxification pathway ( P.adjust < 0.001 Count = 8), 6 out of 8 marker genes have the potential to interact with Cys-Cys-Cys ( P.adjust = 0.002,Average Interaction = 0.700), while in the mitochondrial membrane pathway, 13 out of 26 genes have the potential for interaction ( P.adjust = 0.003,Average Interaction = 0.703) (Figs. 7 D, E, F). Moreover, the 19 genes from cellular detoxification and mitochondrial membrane pathways are found highly expressed in spinous and mitotic basal cells (cluster 6,7, P < 0.001) in the rumen (Table S13) (Fig. 7 D). Similar results were identified in the 310 marker proteins from spinous cells (cluster5) of omasum (Figure S6). This indicates a potential mechanism for microbial interaction with the host through the secretion of antioxidant substances. Subsequently, to determine the potential functions of the spinous cells, we correlated the cell types in the rumen to dairy cow phenotypes data from GWAS. The result revealed a strong association between the high fatty acid metabolism subcluster (subcluster 3 of spinous and mitotic basal cells) in the forestomach and milk protein (bp value = 0.0101, tp value = 2.32), milk fat (bp value = 0.0729, tp value = 1.45) traits in cows (Fig. 7 G, Table S14). This finding suggests that subcluster3 of spinous and mitotic basal cell not only show high fatty acid metabolism and cellular antioxidant activities, but also potentially significantly influences the production traits of dairy cows. Discussion The GIT microbiome of ruminants plays a crucial role in the unique ability to convert fiber into meat and milk for human consumption. Although extensive research has been conducted on rumen microbial fiber degradation 39 and trace element metabolism 40 , the understanding of the microbial secondary metabolites production is still limited. As one of the most complex microbial communities, the microbiota of the ruminant GIT may possess a greater biosynthetic potential than microorganisms found in other environment 14 . To obtain a more comprehensive BGC set, lower counts of gaps between contigs in the MAGs are needed. The third generation of long-read sequencing can be used for improving the gene quality and length of MAGs 41 . Following gap-filling processing, we successfully reduced the percentage of BGCs located at sequence edges from 60–48.9%. Our study discovered 26,503 BGCs with high-quality gene sets, which showed a 79% increase than the current largest rumen BGC database 42 . The complete BGC dataset allows for a deeper exploration of the basal composition of the BGC categories. For instance, NRPS and RiPPs were consistently found to be the dominant types among total BGCs as well as the novel BGCs, implying a profound potential for peptide synthesis within the ruminant GIT environment. These functional peptides have not yet been studied extensively in comparison to ruminal microbial proteins 43,44 . In our research, over 1,000 BGCs were found to be completely different to those in the known databases (BIG-FAM) 23 , which largely expanded our understanding of the metabolic potential of bovine GIT microbiota. Meanwhile, BGCs are distributed in various species, with different MAGs encoding varied numbers and degrees of novel BGCs, which may be attributed to the survival competition and ecological balance within microbial community 45 . For the metabolites encoded by BGCs, we observed a rich presence of betalactones and arylpolyenes alongside the predominant production of RiPPs and NRPS in the others category. More than 55% of product categories were the phyla-specific, suggesting the presence of abundant ecological adaptation mechanisms in more than half of the phyla. Notably, within the Methanobacteriota phylum, high specific metabolites were identified, such as NRPS.thioamitides, thiopeptide.thioamitides. Besides, high abundance of compounds containing rare substructures of thioamitides 46 were discovered in the Methanobacteriota (96, 89.7% of the total thioamitides products, Table S3). The thioamitides characteristically possess macrocycles interconnected by sulfur atoms and diverse functional group 47,48 . In particular, thioamitides and thiopeptides exhibit remarkable antioxidant 49 and antibacterial 46 potency, highlighting the antioxidant and antimicrobial biosynthesis potential of species in the Methanobacteriota. The abundant novel products and rare compounds aligns with the understanding that methane-producing archaea play a crucial role in the intricate ecology of the GIT 50 . Moreover, several species from Actinobacteria phylum that encode diverse and novel BGCs were first identified in our study, which may serve as core metabolic species in the GIT ecosystem. Microbiota encoded secondary metabolites possess various biological activities, such as antimicrobial, anti-inflammatory, and anticancer properties 7,9,10 . Increasing studies indicated that GIT microbiota may contribute to the host oxidative status 32,51 , actually, there is also intense competition among microorganisms for antioxidant metabolism. Dumitrescu et al. 31 reported that microbes within the human hindgut can compete for antioxidants through the transporter of antioxidant ergothioneine. However, there is no large-scale method for identifying antioxidant molecules based on the structural properties of small molecules. We trained a novel antioxidant prediction model for small molecules using SMILES structures, which was built on a comprehensive collection of antioxidants from databases and thorough review of related references, which can predict both the antioxidants and their mechanisms, such as ET, HAT, and interpret their structures. These models enable us to predict antioxidants from the products of BGCs, thereby identifying BGCs with antioxidant properties and microbial species with strong antioxidant capabilities. When explored the distribution of BGCs and their potential antioxidant metabolites in the 10 GIT locations, we identified segment-specific potential antioxidant compounds. Interestingly, we discovered a peptide composed of three cysteine residues in the forestomach, L-Cysteine, L-Cysteine, L-Cysteine (Cys-Cys-Cys). Specifically, this oligopeptide has a molecular weight of 327.5, containing three consecutive cysteine residues and three thiol groups, which are the primary antioxidant functional groups found in glutathione, compared to just one in glutathione 52,53 . Cys-Cys-Cys was derived from the Ruminococcus flavefaciens , Ruminococcus_D sp902788785 , UBA2942 sp900321525 , Clostridium_AI polysaccharolyticum , Butyrivibrio hungatei_A , Blautia_A massiliensis and Acetitomaculum ruminis in the Firmicute phylum in the rumen, omasum, abomasum and rectum. The mechanism of how this peptide regulate crucial oxidative stress in the gastric mucosal epithelial cells is yet to be explored. Moreover, although the substance can be synthesized (CAS number: 206058-60-8) and found in conserved sequences in the coat protein (CP) and helper component-proteinase (HC-Pro) of potyviruses 54 , the information of the antioxidant property of the substance is very limited. Therefore, it is necessary to conduct ET and HAT antioxidant experiments such as 2,2-diphenyl-1-picrylhydrazyl assay (DPPH), Oxygen Radical Absorbance Capacity assay (ORAC), and even cellular-level antioxidant experiments on Cys-Cys-Cys. In this section, we explored the potential antioxidants secreted by the rumen microbiota, screened for tissue-specific potential antioxidant compounds, and discovered potential antioxidants with unique structures. Moreover, in the subsequent experiments, we will experimentally validate the currently predicted and synthesizable antioxidant substances. Serving as the basic physiological reactions, response to oxygen signals and antioxidants determined the normal living status of cells. While real-time measurement of ROS levels across various cell types in vivo remains a big challenge, single-cell transcriptomic profiles provide a unique view through which to discern distinct stages of cellular ROS responses. Based on the collaborative single-cell atlases of cattle covering 51 tissues, 126 cell types and 1,006 clusters (Han et al., 2024, in preparation; Shi et al., 2024, in preparation), we further investigated cellular responses in defending, responding to, or generating ROS at mRNA level. Interestingly, we found cells in the GIT, particularly the epithelium cells, comprise the majority of cell types with high expression of the three key pathways related to antioxidant. This could be attributed to the abundant exogenous metabolites and high prevalent mechanical damage from crude fiber faced by the GIT of ruminants 55,56 . Spinous and mitotic basal cells in the forestomach, as well as enterocytes in the small intestines, exhibited high expression of pathways that defend and respond to ROS. For spinous and mitotic basal cells, higher active fatty acid metabolism was revealed through hdWGCNA and single-cell flux balance analysis, which is consistent with the knowledge that forestomach epithelium cells primarily use fatty acids as their energy source 57 , especially SCFAs. Subsequently, to elucidate the regulatory elements involved in fatty acid metabolism, we identified the specific TFs such as ESRRA 58 and PPARG 59 could regulate fatty acids metabolism and antioxidant genes in the forestomach. Interestingly, ESRRA have been reported to play a pivotal role in mice intestinal homeostasis by engaging in orchestrating gut microbiota composition, thereby safeguarding the host against detrimental mitochondrial dysfunction 58 . Additionally, PPARG plays a crucial role in urothelium by controlling mitochondrial function, development, and regeneration 60 . The results suggest that both ESRRA and PPARG may serve as crucial TFs in the spinous and basal cells for maintaining energy and fatty acids metabolism homeostasis. Moreover, the exploration of the interactions between ESRRA and forestomach microbiota community is necessary. It is known that spinous and basal cells in the forestomach are in a continuous process of differentiation 61 , therefore, the intermediate cell states and functions are necessary to understand the dynamic cellular transitions and cellular functional specialization. We observed the differential state from basal cells to spinous cells, with 5 intermediate subclusters playing their respect roles throughout this process. A specialized cell subtype that is focused on fatty acid utilization was further identified, which require further experimental exploration. This suggests that there is a cellular functional specialization among the subtypes of spinous and mitotic basal cells, and these particular cell subtypes merit further experimental exploration. In short, we discovered that the GIT epithelial cells displayed heightened ROS response and defense among 51 tissues using our comprehensive single-cell atlas in cattle, and the spinous and basal cells in the forestomach were specially highlighted. Additionally, the forestomach epithelial cells with oxidative stress status may be regulated by antioxidants derived the microbiota, which were explored by metabolite-protein interactions prediction. Currently, bioinformatic tools for predicting protein-metabolite interactions based on metabolite structure and protein sequence are rapidly involved 62–64 . The metabolite-mediated interactions between microbes and host cells represent one of the most pertinent types of molecular interplay capable of modulating the gene expression of host cells 65,66 . The potential interactions between metabolites and proteins from the microbes and ruminal GIT cells, as identified in our study, were established using molecule structures (SMILES) and protein sequence from the cell markers. We focused on the predicted interactions between spinous cells exhibiting high oxidative stress in the forestomach and Cys-Cys-Cys, a potential antioxidant with three thiol groups. Our study on the interactions between Substance A and various epithelial cell types in the forestomach has uncovered that Cys-Cys-Cys potentially regulates over 50% of the proteins that are highly expressed in these cells and involved in antioxidant and energy metabolic processes. Cysteine is the second least abundant amino acid following tryptophan 67,68 and Cys-contatining peptides are highly conserved 69 , being present in over 97% of mammal proteins. However, future research is required to identify its binding sites with marker proteins in the spinous cells. Taken together, the finding of a large number of interaction scores between potential antioxidants provides computational resources for further exploration of how microbiota regulate host status by metabolites, especially oxidative stress. Conclusion In this study, we constructed the most abundant BGCs collection in the cattle GIT to explore the microbial secondary metabolites. Using the newly-trained structure-based deep learning models in small antioxidant molecules, a total of 396 BGC products were predicted with potential antioxidant properties. We found that GIT epithelial cells exhibit strong antioxidant among 126 cell types and 1,006 clusters across 51 tissues in dairy cows. Particularly, a high occurrence of fatty acid metabolism and antioxidant defense in rumen epithelial cells especially spinous and mitotic basal cell types were observed. We predicted over 6 million interaction scores between BGC metabolites and marker proteins in the GIT cell types, and Cys-Cys-Cys was identified to potentially regulate the cellular energy metabolism and detoxification in the rumen epithelium. Our work not only facilitates further exploration for the microbial secondary metabolism in cattle GIT but also offers theoretical and model bases for the discovery of novel antioxidants and microbes. Our results suggest that the cattle GIT, particularly the forestomach, serve as a critical site of oxidative stress occurrence and could be subjected to antioxidant regulation by microbial antioxidants, pointing towards the potential of regulating host oxidative stress responses through microbial manipulation. Methods Library preparation and next-generation sequencing (NGS) We extracted metagenomic DNA from the omasum, jejunum and rectum content sample using the QIAamp DNA Stool minikit (Qiagen, cat.no. 51604) for metagenomic sequencing. The quality and quantity of the obtained DNA were assessed by running on a 0.5% agarose gel and using the Qubit dsDNA assay kit (Thermo Fisher Scientific Inc.). Finally, sequencing was performed using high molecular weight (modal size > 2 kbp) and sufficient quantity (> 10 µg) of DNA samples. Using 1 µg of total DNA extracted from the omasum, jejunum and rectum content samples as Illumina Sequencing starting library. Then DNA was fragmented using the Covaris M220 ultrasonicator to approximately 400 bp. As for PE library construction, TruSeq™ DNA Sample Prep Kit was used. In this process, the DNA undergoes ligation of “Y” adapters, removal of adapter dimers, enrichment through PCR amplification, and sodium hydroxide denaturation to generate single-stranded DNA fragments. Subsequently, the PCR bridging process, facilitated by the cBot TruSeq PE Cluster Kit, results in the formation of DNA clusters and their linearization into single strands. Following the next generation process, the DNA fragment sequence was determined, enabling the generation of high-quality sequencing data for downstream analysis. Then trimmomatic 70 ( http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic ) were utilized for adapter trimming, trimming low-quality sequence ends, discarding reads with an N ratio exceeding 10%, and removing sequences with a trimmed length less than 75 bp (ILLUMINACLIP: adapter.fa:2:30:10 SLIDINGWINDOW:4:15 MINLEN:75). PacBio sequencing and hybrid assembly using HIFI and NGS sequencing data For PacBio sequencing, 5 µg of DNA of omasum, jejunum and rectum content is prepared the SMRTbell library using the PacBio SMRTbell prep kit 3.0 (Pacific Biosciences, Part Number: 102-182-700). Prior to library preparation, the initial DNA sample with damaged double-stranded DNA is repaired using the New England Biolabs PreCR Repair Mix Kit (M0309SVIAL). Subsequently, the BluePippin system (Sage Science) is employed to select repaired DNA molecules larger than 3 kb in size. The SMRTbell library is then sequenced on the PacBio Sequel IIe instrument (Pacific Biosciences) using SMRT 2M cells (Part Number: 101-389-001) for v3 chemistry sequencing. A total of 3,271,891, 2,926,668, 2,512,929 raw long reads of omasum, jejunum and rectum samples were obtained, which is further processed using the CCS mode in SMRT Link package (Pacific Biosciences, v10.0) with the following parameters: --min-length 200, --min-passes 3, and–-min-rq 0.99, to generate HiFi reads. Hybrid assembly was applied using 3 softwares with the following parameters. MaSuRCA 71 ( https://github.com/alekseyzimin/masurca , version 4.0.3, corrected long-read mode with the default parameter), hybridmetaSPAdes 72 ( https://github.com/ablab/spades , version 3.15.3, key parameters: ‘–meta –pacbio -m 500’) and Operams 73 ( https://github.com/CSB5/OPERA-MS , version: 2.11-r797, default parameter). Finally, a total of 109 MAGs were assembled from second and third-generation sequencing data (mean completeness 67.2%, mean contamination 2.4%, mean N50 305988.5bp, mean max-scaffold 419980.8 bp). Metagenomic assembled genomes collection and quality improvement Based on our to-date largest bovine gastrointestinal microbial MAG database, “Bovine Gastro Microbial Genome Map” (BGMGM), we further expanded and elevated it with newly published resources and sequenced samples. Specifically, we collected 2,114 new MAGs from cattle hindgut 22 and 106 MAGs hybrid assembled from long-read sequencing data obtained from omasum, jejunum, and rectum content, resulting in a total of 47,241 MAGs. After dereplicating using dRep (v.2.5.4) 74 with a 95% ANI threshold (-comp 0 -con 1000 -sa 0.95 -nc 0.2), 14,093 non-redundant MAGs were retained. Taxonomy information was annotated by GTDB-Tk 75 (v202, https://github.com/Ecogenomics/GTDBTk ). After dereplicating steps, we utilized HIFI data to remove gaps within each MAG among the 14,093 MAGs with minimap2 76 ( https://github.com/lh3/minimap2 ) and SSPACE-LongRead 77 ( https://github.com/Runsheng/sspace_longread ). We extract HIFI-fq data based on raw bins, and then perform direct scaffolding using the SSPACE-LongRead 77 tool. Identifying and clustering BGCs from MAGs We applied rapid genome-wide identification, annotation and analysis of secondary metabolite BGCs in our non-redundant BGMGM database with 14,093 MAGs, using antiSMASH 78 (v.6.1.1, https://github.com/antismash/antismash/releases ). All BGCs were identified by antiSMASH 78 with default parameters and subsequently processed with BiG-SLICE 79 (v1.1.0, https://github.com/medema-group/bigslice ) and BiG-SCAPE 80 ( https://github.com/medema-group/BiG-SCAPE ). Each BGC was functionally characterized based on predicted product types defined in antiSMASH or broader product classes defined in BiG-SCAPE. The diversity and novelty of the eight BGC classes, including terpenes, RiPPs, NRPS, PKSI, PKSother, PKS-NRP hybrids, saccharides, and others, were estimated by calculating their distances to computationally predicted databases (RefSeq database within BiG-FAM 23 ) and experimentally validated databases (MIBIG 2.0) using BIG-SLICE. A BGC with distance greater than 0.6 and 1.0 was considered as novel and completely novel, respectively. We applied BIG-SCAPE to calculate pairwise cosine distances between all BGCs and clustered them using average linkage into GCFs and GCCs, with distance thresholds at 0.3 and 0.9, respectively. Exploration of microbial-derived small molecules Among the 26,503 (BGCs), totally 244 small molecule compounds with SMILES structure were identified after removing redundancy (Supplementary table 4). Subsequently, the results of aligning the rumen transcriptome to the MIBIG database showed that 58 BGCs with products’ SMILES structures expressed in rumen microbiota and (Supplementary table 5). Finally, 134 products from 23 species in BGMGM were identified by searching the Natural Products Atlas 2.0 81 (NPA, https://www.npatlas.org/ ) and comprehensive marine natural products database 82 (CMNPD, https://cmnpd.org/ ) databases by species (Supplementary table 6). These metabolites were collected from our NRPS and RiPPs BGCs (244 metabolites), Minimum Information about a Biosynthetic Gene cluster (MIBIG 58 metabolites), The Comprehensive Marine Natural Products Database (CMNPD 82 metabolites) and The Natural Products Atlas (NPA) database (52 metabolites). In total, 436 microbial-derived small molecular in the cattle GIT were collected. Collection of small molecule antioxidant data Antioxidants data was collected from three different databases, including AODB 29 , CHEMBL 83 , PubChem 84 . All the antioxidant assays information downloaded from the ADOB and manually collected and curated from PubChem and CHEMBL were merged, resulting with 60,621 antioxidant assays data. Through manually examination of the experimental description, the 60,621 antioxidant assays were classified into 5 classes (electron transfer, ET; Hydrogen atom transfer, HAT; Inhibition of lipid oxidation; Targeting NRF2-ARE, NRF2; and others) based on the classification criteria in the AODB database 29 . Within each category, molecules demonstrating ROS scavenging capability less than 50% of Trolox, or an inhibitory concentration (IC50) value higher than 1,000 nM, were classified as non-antioxidants. Finally, five datasets with antioxidants properties were obtained, with 5,118 antioxidants and 5,741 non-antioxidants in the ET antioxidant class, 1,516 antioxidants and 862 non-antioxidants in the HAT antioxidant assay, 1,762 antioxidants and 1,810 non-antioxidants in the inhibition of lipid oxidant assay, 1,985 antioxidants and 15,600 non-antioxidants in the inhibition of NRF2-ARE targeting assay, 1,584 antioxidants and 13,651 non-antioxidants in the other antioxidant assays. Construction of antioxidants identification graph neural network To construct molecular properties predicting deep learning model, we applied graph neural network structures, chemprop 28 ( https://github.com/chemprop/chemprop ), for antioxidants prediction model training. The graph neural networks performing convolution on the fingerprint matrix of atoms and bonds from each molecule 20 . To address the issue of data imbalance, we utilized Area Under the Precision-Recall Curve (PRAUC) as the model’s major loss metric, binary cross-entropy as the loss function, and enhanced the features using morgan and rdkit 2D normalized data. Then, five binary classified datasets annotated by ourselves were used for training. Ten folds cross-validation were applied for models training with 80%-10%-10% splits of the training, testing, and validation dataset, respectively. The features were augmented by a list of RDKIT-computed molecular features (morgan, morgan_count, rdkit_2d_normalized) to improve the performance of models. The hyperparameters employed for chemprop were configured as follows: data splitting involved an 80% allocation for training, 10% for validation, and 10% for testing, utilizing scaffold balancing. A 10-fold cross-validation was conducted, accompanied by an ensemble size of 6. The aggregation method utilized normalization with a norm value of 50. The loss function chosen was binary cross-entropy. Evaluation metrics encompassed precision-recall area under the curve (PRC-AUC), alongside supplementary metrics such as F1 score, Matthews correlation coefficient (MCC), and area under the curve (AUC). Each prediction scores of five antioxidant models were assembled by the average scores of 60 ensemble models of the 10 folds. Single-cell dimensionality reduction, clustering and identification of marker genes The clustering and dimensionality reduction methods for single-cell analysis across all tissues are the same as described in our companion papers (Han et al., 2024, in preparation; Shi et al., 2024, in preparation). In brief, CellRanger (v7.0.1, 10x Genomics) was utilized to perform sample demultiplexing, barcode processing, and single-cell 3’ gene counting. The scRNA-seq data were then aligned to the ARS-UCD1.2 cattle reference genome in order to identify gene expression profiles at the single-cell level. Then, the features, barcodes, and count matrix were loaded into Seurat 85 (4.3.0, https://github.com/satijalab/seurat ) to facilitate downstream single-cell analysis and visualization. Each library underwent cell quality control using ddgcR (v0.1.0 https://github.com/ayshwaryas/ddqc ). Initially, the cells were clustered using standard scRNA-seq analysis preprocessing and clustering steps. Within each cluster, cells with values of n.counts and n.genes less than 2 median absolute deviations were filtered out. Following the exclusion of cells with mitochondrial gene ratio surpassing 10%, we employed the DoubletFinder package 86 (v2.0.3, https://github.com/chris-mcginnis-ucsf/DoubletFinder ) to eliminate doublets. The conventional dimensionality reduction and clustering workflow is performed according to the following steps: The “NormalizeData” function was employed to calculate gene expression values with ‘LogNormalize’ method and 10,000 ‘scale.factors’. The function “FindVariableGenes” was performed to select top 2,000 high variable genes, and the expression levels of these genes were scaled using the “ScaleData” function. PCA (principal component analysis) was performed for dimensionality reduction and clustering count matrix using the “RunPCA” function, and number of PCs from 30 to 50 were tested to assess the most suitable number of PCs. Harmony 87 (v0.1.0, https://github.com/immunogenomics/harmony ) was used to correct batch effects. Cell clustering was performed using the “FindClusters” function, with resolution from 0.4 to 1.3 tested for an appropriate resolution. Cell visualization was achieved using the “RunUMAP” function. The “FindAllMarkers” function was utilized to identify differentially expressed genes (DEGs) or marker genes (|‘avg_logFC’| > 0.25 and ‘p_val_adj’ < 0.05). Enrichment analysis and gene pathways scoring analysis in single-cell atlas. Enrichment analysis was conducted using ‘enrichGO’ functions in R package ‘clusterprofiler’, utilizing OrgDb set as ‘org.Bt.eg.db’, with both ‘pvaluecutoff’ and ‘qvaluecutoff’ set to 0.05. Genes within the 16 oxidase generation, 19 antioxidant, 22 response to oxygen signaling gene sets are listed in Supplementary Table 10. The ‘score_genes’ function within Scanpy 88 (v1.9.3, https://github.com/scverse/scanpy ) was applied to evaluate the activity of a specific pathway expression score of all the 1,803,004 cells. The differences in signature scores among cell types were assessed using a two-sided Wilcoxon rank-sum test. A significance level of 0.05 was employed. Pseudo-time analysis of cell subclusters in the forestomach To model differentiation trajectories, we performed trajectory analysis using Monocle2 89 (v2.26.0, https://github.com/cole-trapnell-lab/monocle-release ) for all the spinous and mitotic basal cells in the rumen, reticulum, and omasum, according to the general pipeline ( http://cole-trapnell-lab.github.io/monocle-release/docs/ ). Cell metabolic state estimation by compass algorithm To characterize the metabolic state of cells using single-cell sequencing data and flux balance analysis, Compass 90 ( https://github.com/YosefLab/Compass ) was applied for cell metabolism analysis following the default parameters ( https://yoseflab.github.io/Compass/tutorial.html ). Single-cell transcription factors regulatory network inference We computed the gene regulatory networks of these tissue cell types.By calculating the Regulon specificity score (RSS), we were able to ascertain the specific correspondence of regulons within each cell type. The GRN consisting primarily of genes and their transcription factors (TF) in spinous and basal cells was calculated using pySCENIC 35 (v 0.12.1, https://github.com/aertslab/pySCENIC ). The activity of each TF in single cells was analyzed using the AUCell function of pySCENIC. The results obtained from pySCENIC are presented in Supplementary Table S3. To identify the main TFs in spinous and basal cells, we employed the RSS method implemented in the 'calcRSS' function of the R package SCENIC (v 1.3.1, https://github.com/aertslab/SCENIC ) for the identification of cell type-specific TFs. Single-cell high dimension co-expression weighted gene co-expression network analysis (hdWGCNA) and cellular metabolism analysis The “hdWGCNA” package 91 (0.2.19, https://github.com/smorabit/hdWGCNA ) was employed to perform hdWGCNA in the rumen, reticulum, omasum single cell atlas following the default parameters. Interaction inferring between microbial antioxidants and cell types using deep learning model A total of 6,484,608 unique pairs were identified, comprising protein-metabolite interactions between 436 different microbial metabolites and 14,976 marker proteins across all cell types within GIT. These pairs were utilized for inferring potential interactions based on the inference model, TransformerCPI 2.0 ( https://github.com/lifanchen-simm/transformerCPI2.0 ) 38 , which scored the interaction using peptide sequence and SMILES structures by end-to-end differentiable learning. ‘Featurizer.py’ were used for tokenizing and encoding the protein sequence and compounds. The interaction scores were predicted by trained model ‘Virtual Screening.pt’ in https://drive.google.com/drive/folders/1X7i1eO-EykCQcvqMeWeB7QXT3E9eLG08?usp=sharing . Correlations between cell types and production traits in Chinese Holstein dairy cows Totally 56 cell types and complex production traits (milk protein, milk fat and milk yield) were correlated using genome-wide association study analysis in Chinese Holstein dairy cattle. Dairy cows GWAS data and all the cell types within the rumen, reticulum and omasum single-cell atlas were used for correlation analysis. ‘ScpaGwas’ ( https://github.com/sulab-wmu/scPagwas ) 92 was utilized to perform this analysis, which employs a polygenic regression model to prioritize a set of trait-relevant genes. Moreover, the ScpaGwas uncovered trait-relevant cell subpopulations by incorporating pathway activity in scRNA-seq data with GWAS summary data. In details, for the sake of comprehensive results, we employed 317 human KEGG pathways post the removal of duplicates and the conversion of homologous genes. Regarding the cell type association, ScpaGwas applied the block bootstrap to estimate standard errors and calculate a t-statistic accompanied by an associated P-value for each cell type. Totally 200 iterations of the block bootstrap procedure were employed for each cell-type association study. Additionally, it offers optional parameters that users can customize for the execution of the block bootstrap process. The Boot evaluate function was applied to calculate the Pearson correlation between cell-types and traits, while the ‘scPagwas_perform_score’ function was applied to define the enrichment level of pathways for each trait. P-value lower than 0.05 was considered as significant. Declarations Data and materials availability All the raw sequencing of third generation data have been deposited to Genome Sequence Archive (GSA) database (accession number: PRJCA022361). Derived data supporting the findings of this study are available from the corresponding author upon request (HS: [email protected] ). Acknowledgments We thank all the members in the Institute of Dairy Science, College of Animal Sciences, Zhejiang University for their assistant in the sample collection. This work was supported by the following funds: Natural Science Foundation of Zhejiang Province (LR23C170001), National Key R&D Program of China (2022YFD130100106, 2022YFD1301700). Contributions S.L.Z, J.X.L, George E. L., Y.J., D.X.S, L.Z.F and H.Z.S. designed the research. S.L.Z, M.H.J., J.H.X, L.C, Y.N.Y., W.Q, F.F.G and H.Z.S., improved the Bovine Gastro Microbial Genome Map (BGMGM). S.L.Z., M.H.J., X.J., Y.N.Y. and W.Q, performed the biosynthesis gene cluster identification from the BGMGM. S.L.Z., M.H.J., H.C.L., B.H., Q.Z., W.J.Z., and T.S. performed the single cell downstream analysis, cell type annotation, pathway analysis and pseudo-time analysis. S.L.Z., J.H.X, L.C. collected the antioxidant assays. S.L.Z. trained the deep-learning models, performed the interactive analysis and visualized the results. H.Z.S., S.L.Z, L.Z.F., D.X.S. and Y.J. interpreted the data and wrote the manuscript with input from all other authors. Ethics declarations Competing interests The authors declare no competing interests. References Vercellino, I. & Sazanov, L. A. The assembly, regulation and function of the mitochondrial respiratory chain. Nat. Rev. Mol. Cell Biol. 23 , 141–161 (2022). Lee, Y. M., He, W. & Liou, Y.-C. The redox language in neurodegenerative diseases: oxidative post-translational modifications by hydrogen peroxide. Cell Death Dis. 12 , 58 (2021). Gouda, M., Chen, K., Li, X., Liu, Y. & He, Y. Detection of microalgae single-cell antioxidant and electrochemical potentials by gold microelectrode and Raman micro-spectroscopy combined with chemometrics. Sens. Actuators B 329 , 129229 (2021). Bhattacharyya, A., Chattopadhyay, R., Mitra, S. & Crowe, S. E. Oxidative Stress: An Essential Factor in the Pathogenesis of Gastrointestinal Mucosal Diseases. Physiol. Rev. 94 , 329–354 (2014). Wang, Y., Chen, Y., Zhang, X., Lu, Y. & Chen, H. New insights in intestinal oxidative stress damage and the health intervention effects of nutrients: A review. J. Funct. Foods 75 , 104248 (2020). Chandra, P., Sharma, R. K. & Arora, D. S. Antioxidant compounds from microbial sources: A review. Food Res. Int. 129 , 108849 (2020). Kalelkar, P. P., Riddick, M. & García, A. J. Biomaterial-based antimicrobial therapies for the treatment of bacterial infections. Nat. Rev. Mater. 7 , 39–54 (2021). Zhang, X. & Jia, Y. Recent Advances in β-lactam Derivatives as Potential Anticancer Agents. Curr. Top. Med. Chem. 20 , 1468–1480 (2020). Belhadj Slimen, I., Najar, T. & Abderrabba, M. Chemical and Antioxidant Properties of Betalains. J. Agric. Food Chem. 65 , 675–689 (2017). Geller-McGrath, D. et al. Diverse secondary metabolites are expressed in particle-associated and free-living microorganisms of the permanently anoxic Cariaco Basin. Nat. Commun. 14 , 656 (2023). Chevrette, M. G. et al. Microbiome composition modulates secondary metabolism in a multispecies bacterial community. Proc. Natl. Acad. Sci. 119 , e2212930119 (2022). Blin, K. et al. antiSMASH 7.0: new and improved predictions for detection, regulation, chemical structures and visualisation. Nucleic Acids Res. 51 , W46–W50 (2023). Yuan, Y. et al. Efficient exploration of terpenoid biosynthetic gene clusters in filamentous fungi. Nat. Catal. 5 , 277–287 (2022). Paoli, L. et al. Biosynthetic potential of the global ocean microbiome. Nature 607 , 111–118 (2022). Zhang, J. et al. Microbial enzymes induce colitis by reactivating triclosan in the mouse gastrointestinal tract. Nat. Commun. 13 , 136 (2022). Cui, Y. et al. Development of a versatile and efficient C–N lyase platform for asymmetric hydroamination via computational enzyme redesign. Nat. Catal. 4 , 364–373 (2021). Deng, W., Xi, D., Mao, H. & Wanapat, M. The use of molecular techniques based on ribosomal RNA and DNA for rumen microbial ecosystem studies: a review. Mol. Biol. Rep. 35 , 265–274 (2008). Stewart, R. D. et al. Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat. Biotechnol. 37 , 953–961 (2019). Hungate1000 project collaborators et al. Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection. Nat. Biotechnol. 36 , 359–367 (2018). Zaidi, S. et al. Pre-training via Denoising for Molecular Property Prediction. Preprint at http://arxiv.org/abs/2206.00133 (2022). Wang, Y., Wang, J., Cao, Z. & Barati Farimani, A. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence 4 , 279–287 (2022). Teseo, S. et al. A global phylogenomic and metabolic reconstruction of the large intestine bacterial community of domesticated cattle. Microbiome 10 , 155 (2022). Kautsar, S. A., Blin, K., Shaw, S., Weber, T. & Medema, M. H. BiG-FAM: the biosynthetic gene cluster families database. Nucleic Acids Res. 49 , D490–D497 (2021). Du, R., Xiong, W., Xu, L., Xu, Y. & Wu, Q. Metagenomics reveals the habitat specificity of biosynthetic potential of secondary metabolites in global food fermentations. Microbiome 11 , 115 (2023). Carberry, Ci. A., Waters, S. M., Kenny, D. A. & Creevey, C. J. Rumen Methanogenic Genotypes Differ in Abundance According to Host Residual Feed Intake Phenotype and Diet Type. Appl. Environ. Microbiol. 80 , 586–594 (2014). Moss, A. R., Jouany, J.-P. & Newbold, J. Methane production by ruminants:its contribution to global warming. Annales de Zootechnie 49 , 231–253 (2000). Wong, F. et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature (2023) doi:10.1038/s41586-023-06887-8. Stokes, J. M. et al. A Deep Learning Approach to Antibiotic Discovery. Cell 180 , 688-702.e13 (2020). Deng, W., Chen, Y., Sun, X. & Wang, L. AODB: A comprehensive database for antioxidants including small molecules, peptides and proteins. Food Chem. 418 , 135992 (2023). Dimise, E. J., Widboom, P. F. & Bruner, S. D. Structure elucidation and biosynthesis of fuscachelins, peptide siderophores from the moderate thermophile Thermobifida fusca . Proc. Natl. Acad. Sci. 105 , 15311–15316 (2008). Dumitrescu, D. G. et al. A microbial transporter of the dietary antioxidant ergothioneine. Cell 185 , 4526-4540.e18 (2022). Gu, F. et al. The hindgut microbiome contributes to host oxidative stress in postpartum dairy cows by affecting glutathione synthesis process. Microbiome 11 , 87 (2023). Sies, H. et al. Defining roles of specific reactive oxygen species (ROS) in cell biology and physiology. Nat. Rev. Mol. Cell Biol. 23 , 499–515 (2022). Winterbourn, C. C., Kettle, A. J. & Hampton, M. B. Reactive Oxygen Species and Neutrophil Function. Annu. Rev. Biochem. 85 , 765–792 (2016). Van De Sande, B. et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat. Protoc. 15 , 2247–2276 (2020). Pisoschi, A. M. & Pop, A. The role of antioxidants in the chemistry of oxidative stress: A review. Eur. J. Med. Chem. 97 , 55–74 (2015). Sudhakar, P. et al. Targeted interplay between bacterial pathogens and host autophagy. Autophagy 15 , 1620–1633 (2019). Chen, L. et al. Sequence-based drug design as a concept in computational drug design. Nat. Commun. 14 , 4217 (2023). Xue, M.-Y. et al. Investigation of fiber utilization in the rumen of dairy cows based on metagenome-assembled genomes and single-cell RNA sequencing. Microbiome 10 , 11 (2022). Lin, L. et al. Genome-centric investigation of bile acid metabolizing microbiota of dairy cows and associated diet-induced functional implications. The ISME journal (2022) doi:10.1038/s41396-022-01333-5. Huang, B. et al. Filling gaps of genome scaffolds via probabilistic searching optical maps against assembly graph. BMC Bioinf. 22 , 533 (2021). Anderson, C. L. & Fernando, S. C. Insights into rumen microbial biosynthetic gene cluster diversity through genome-resolved metagenomics. Communications Biology 4 , 818 (2021). Xue, M.-Y., Sun, H.-Z., Wu, X.-H., Liu, J.-X. & Guan, L. L. Multi-omics reveals that the rumen microbiome and its metabolome together with the host metabolome contribute to individualized dairy cow performance. Microbiome 8 , 64 (2020). Liu, K. et al. Ruminal microbiota–host interaction and its effect on nutrient metabolism. Animal Nutrition 7 , 49–55 (2021). Perry, E. K., Meirelles, L. A. & Newman, D. K. From the soil to the clinic: the impact of microbial secondary metabolites on antibiotic tolerance and resistance. Nat. Rev. Microbiol. 20 , 129–142 (2022). Mahanta, N., Szantai-Kis, D. M., Petersson, E. J. & Mitchell, D. A. Biosynthesis and Chemical Applications of Thioamides. ACS Chem. Biol. 14 , 142–163 (2019). Eyles, T. H., Vior, N. M., Lacret, R. & Truman, A. W. Understanding thioamitide biosynthesis using pathway engineering and untargeted metabolomics. Chem. Sci. 12 , 7138–7150 (2021). Chan, D. C. K. & Burrows, L. L. Thiopeptides: antibiotics with unique chemical structures and diverse biological activities. J. Antibiot. 74 , 161–175 (2021). Chernov’yants, M. S., Kolesnikova, T. S. & Karginova, A. O. Thioamides as radical scavenging compounds: Methods for screening antioxidant activity and detection. Talanta 149 , 319–325 (2016). Li, Q. S. et al. Dietary selection of metabolically distinct microorganisms drives hydrogen metabolism in ruminants. The ISME Journal (2022) doi:10.1038/s41396-022-01294-9. Uchiyama, J., Akiyama, M., Hase, K., Kumagai, Y. & Kim, Y.-G. Gut microbiota reinforce host antioxidant capacity via the generation of reactive sulfur species. Cell Rep. 38 , 110479 (2022). Zhang, W. et al. Intracellular GSH/GST antioxidants system change as an earlier biomarker for toxicity evaluation of iron oxide nanoparticles. NanoImpact 23 , 100338 (2021). Giustarini, D. et al. Assessment of glutathione/glutathione disulphide ratio and S-glutathionylated proteins in human blood, solid tissues, and cultured cells. Free Radic. Biol. Med. 112 , 360–375 (2017). Flasinski, S. & Cassidy, B. G. Potyvirus aphid transmission requires helper component and homologous coat protein for maximal efficiency. Arch. Virol. 143 , 2159–2172 (1998). Gonzales, K. A. U. & Fuchs, E. Skin and Its Regenerative Powers: An Alliance between Stem Cells and Their Niche. Developmental Cell 43 , 387–401 (2017). Zhang, K. et al. Early concentrate starter introduction induces rumen epithelial parakeratosis by blocking keratinocyte differentiation with excessive ruminal butyrate accumulation. J. Adv. Res. S2090123223004010 (2023) doi:10.1016/j.jare.2023.12.016. Beckett, L. et al. Rumen volatile fatty acid molar proportions, rumen epithelial gene expression, and blood metabolite concentration responses to ruminally degradable starch and fiber supplies. J. Dairy Sci. 104 , 8857–8869 (2021). Kim, S. et al. ESRRA (estrogen related receptor alpha) is a critical regulator of intestinal homeostasis through activation of autophagic flux via gut microbiota. Autophagy 17 , 2856–2875 (2021). Cipolletta, D. et al. PPAR-γ is a major driver of the accumulation and phenotype of adipose tissue Treg cells. Nature 486 , 549–553 (2012). Liu, C. et al. Pparg promotes differentiation and regulates mitochondrial gene expression in bladder epithelial cells. Nat. Commun. 10 , 4589 (2019). Wu, J.-J. et al. Cross-tissue single-cell transcriptomic landscape reveals the key cell subtypes and their potential roles in the nutrient absorption and metabolism in dairy cattle. J. Adv. Res. 37 , 1–18 (2022). Sadybekov, A. A. et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 601 , 452–459 (2022). Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566 , 224–229 (2019). Gorgulla, C. et al. An open-source drug discovery platform enables ultra-large virtual screens. Nature 580 , 663–668 (2020). Michellod, D. & Liebeke, M. Host–microbe metabolic dialogue. Nature Microbiology 9 , 318–319 (2024). Zhang, Y., Chen, R., Zhang, D., Qi, S. & Liu, Y. Metabolite interactions between host and microbiota during health and disease: Which feeds the other? Biomed. Pharmacother. 160 , 114295 (2023). Huang, H. et al. Simultaneous Enrichment of Cysteine-containing Peptides and Phosphopeptides Using a Cysteine-specific Phosphonate Adaptable Tag (CysPAT) in Combination with titanium dioxide (TiO2) Chromatography. Mol. Cell. Proteomics 15 , 3282–3296 (2016). Jones, D. P. Radical-free biology of oxidative stress. American Journal of Physiology-Cell Physiology 295 , C849–C868 (2008). Giron, P., Dayon, L. & Sanchez, J. Cysteine tagging for MS‐based proteomics. Mass Spectrom. Rev. 30 , 366–395 (2011). Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30 , 2114–2120 (2014). Zimin, A. V. et al. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii , a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res. 27 , 787–792 (2017). Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27 , 824–834 (2017). Bertrand, D. et al. Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat. Biotechnol. 37 , 937–944 (2019). Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. The ISME Journal 11 , 2864–2868 (2017). Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36 , 996–1004 (2018). Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34 , 3094–3100 (2018). Boetzer, M. & Pirovano, W. SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinf. 15 , 211 (2014). Blin, K. et al. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res. 49 , W29–W35 (2021). Kautsar, S. A., van der Hooft, J. J. J., de Ridder, D. & Medema, M. H. BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters. GigaScience 10 , giaa154 (2021). Navarro-Muñoz, J. C. et al. A computational framework to explore large-scale biosynthetic diversity. Nat. Chem. Biol. 16 , 60–68 (2020). van Santen, J. A. et al. The Natural Products Atlas 2.0: a database of microbially-derived natural products. Nucleic Acids Res. 50 , D1317–D1323 (2022). Lyu, C. et al. CMNPD: a comprehensive marine natural products database towards facilitating drug discovery from the ocean. Nucleic Acids Res. 49 , D509–D515 (2021). Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40 , D1100–D1107 (2012). Kim, S. et al. PubChem Substance and Compound databases. Nucleic Acids Res. 44 , D1202–D1213 (2016). Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184 , 3573-3587.e29 (2021). McGinnis, C. S., Murrow, L. M. & Gartner, Z. J. DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Systems 8 , 329-337.e4 (2019). Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16 , 1289–1296 (2019). Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19 , 15 (2018). Han, L. et al. Cell transcriptomic atlas of the non-human primate Macaca fascicularis. Nature 604 , 723–731 (2022). Wagner, A. et al. Metabolic modeling of single Th17 cells reveals regulators of autoimmunity. Cell 184 , 4168-4185.e21 (2021). Morabito, S., Reese, F., Rahimzadeh, N., Miyoshi, E. & Swarup, V. hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data. Cell Reports Methods 3 , 100498 (2023). Ma, Y. et al. Polygenic regression uncovers trait-relevant cellular contexts through pathway activation transformation of single-cell RNA sequencing data. Cell Genomics 3 , 100383 (2023). Supplementary Tables Supplementary Tables 1-14 are not available with this version. Additional Declarations There is NO Competing Interest. Supplementary Files SUNnrreportingsummary.pdf Article File - Reporting Summary Extendeddata.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4193125","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":291968761,"identity":"6de1be6d-14f6-4e13-9c68-517de80d8041","order_by":0,"name":"Hui-Zeng Sun","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA00lEQVRIiWNgGAWjYBACAwbGBhAtx8BwAEixkaDFmBQtEJAI1kiUFnOJ5OYPH3fUps9vPGPA8KHsMAP/7Ab8WixnJDYYzjxzPLex4YwB44xzhxkk7hwg4LAbiQ3JvG3HcpsZzhgw87YdZjCQSCCs5fDftmPpbCAtf4nU0tjM2FaTwAPSwkiUljMPmxl72w4YzmA4VnCw51w6j8QNQlqOpz/+8LOtTl5+xuGND36UWcvxzyCgBQoOMzBIHABHJg9R6oGgjoGBv4FYxaNgFIyCUTDSAADQm0koEpFnrQAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0001-5380-6030","institution":"Zhejiang University","correspondingAuthor":true,"prefix":"","firstName":"Hui-Zeng","middleName":"","lastName":"Sun","suffix":""},{"id":291968762,"identity":"c9f2fe77-f13d-43c5-be95-d874905c7961","order_by":1,"name":"Senlin Zhu","email":"","orcid":"","institution":"Institute of Dairy Science, College of Animal Sciences, Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Senlin","middleName":"","lastName":"Zhu","suffix":""},{"id":291968763,"identity":"c2e3da84-84e5-46b1-991f-511a9d94d0c6","order_by":2,"name":"Minghui Jia","email":"","orcid":"","institution":"Institute of Dairy Science, College of Animal Sciences, Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Minghui","middleName":"","lastName":"Jia","suffix":""},{"id":291968764,"identity":"f5a38323-1d05-4a4f-85f6-4b6725ad3fa4","order_by":3,"name":"Hou-Cheng Li","email":"","orcid":"","institution":"Aarhus University","correspondingAuthor":false,"prefix":"","firstName":"Hou-Cheng","middleName":"","lastName":"Li","suffix":""},{"id":291968765,"identity":"34ab0fbd-6d72-499f-8e4f-1cf673a602ea","order_by":4,"name":"Bo Han","email":"","orcid":"","institution":"China Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Bo","middleName":"","lastName":"Han","suffix":""},{"id":291968766,"identity":"994ccdd0-d55d-4669-9d63-d61345baae3e","order_by":5,"name":"Tao Shi","email":"","orcid":"","institution":"Northwest A\u0026F University","correspondingAuthor":false,"prefix":"","firstName":"Tao","middleName":"","lastName":"Shi","suffix":""},{"id":291968767,"identity":"bd8ebbe7-ebb7-4382-b0fc-7fd9e68d6733","order_by":6,"name":"Qi Zhang","email":"","orcid":"","institution":"China Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Qi","middleName":"","lastName":"Zhang","suffix":""},{"id":291968768,"identity":"3efea92c-9ff6-4e2e-afad-b2b86dfda33a","order_by":7,"name":"Wei-Jie Zheng","email":"","orcid":"","institution":"China Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Wei-Jie","middleName":"","lastName":"Zheng","suffix":""},{"id":291968769,"identity":"e606dbf6-b651-4be4-91db-ec14822d0967","order_by":8,"name":"Jing-Hong Xu","email":"","orcid":"","institution":"Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Jing-Hong","middleName":"","lastName":"Xu","suffix":""},{"id":291968770,"identity":"b9e280ef-86a7-4a4e-bec1-e67b07de3991","order_by":9,"name":"Liang Chen","email":"","orcid":"","institution":"Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Liang","middleName":"","lastName":"Chen","suffix":""},{"id":291968771,"identity":"3a392d30-4fa0-4ad4-b164-a7ed36d779c6","order_by":10,"name":"Yu-Nan Yan","email":"","orcid":"","institution":"Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Yu-Nan","middleName":"","lastName":"Yan","suffix":""},{"id":291968772,"identity":"d9130f51-bac9-4a23-8abf-6064f8f772ee","order_by":11,"name":"Wenlingli Qi","email":"","orcid":"","institution":"Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Wenlingli","middleName":"","lastName":"Qi","suffix":""},{"id":291968773,"identity":"95b2dcf8-9b8b-4243-b080-e14dc5a49b93","order_by":12,"name":"Gu Feng-Fei","email":"","orcid":"","institution":"Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Gu","middleName":"","lastName":"Feng-Fei","suffix":""},{"id":291968774,"identity":"9f50ee56-1edc-47ce-982e-65ea1c315f8f","order_by":13,"name":"Jian-Xin Liu","email":"","orcid":"https://orcid.org/0000-0002-5812-5186","institution":"Institute of Dairy Science, College of Animal Sciences, Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Jian-Xin","middleName":"","lastName":"Liu","suffix":""},{"id":291968775,"identity":"ffe82b6f-aec5-440c-b1b2-4359d0de0281","order_by":14,"name":"George E. Liu","email":"","orcid":"https://orcid.org/0000-0003-0192-6705","institution":"Agricultural Research Service","correspondingAuthor":false,"prefix":"","firstName":"George","middleName":"E.","lastName":"Liu","suffix":""},{"id":291968776,"identity":"7140e4bf-52b4-4b60-a521-b84583716323","order_by":15,"name":"Yu Jiang","email":"","orcid":"https://orcid.org/0000-0003-4821-3585","institution":"Northwest A\u0026F University","correspondingAuthor":false,"prefix":"","firstName":"Yu","middleName":"","lastName":"Jiang","suffix":""},{"id":291968777,"identity":"16d5f872-3628-4250-a32d-1e08ffd9b19f","order_by":16,"name":"Dong-Xiao Su","email":"","orcid":"","institution":"China Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Dong-Xiao","middleName":"","lastName":"Su","suffix":""},{"id":291968778,"identity":"5f46fa03-a738-408f-9fea-4e528f08fc24","order_by":17,"name":"Lingzhao Fang","email":"","orcid":"https://orcid.org/0000-0003-1103-3679","institution":"Aarhus University","correspondingAuthor":false,"prefix":"","firstName":"Lingzhao","middleName":"","lastName":"Fang","suffix":""}],"badges":[],"createdAt":"2024-03-30 17:20:19","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4193125/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4193125/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":54853529,"identity":"b8d403bc-3f3a-480f-95bd-575842532703","added_by":"auto","created_at":"2024-04-17 17:18:00","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":3387115,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAugmented reference of cattle gastrointestinal tract microbiome with 14,093 non-redundant genomes (ANI\u0026gt;95%). \u003c/strong\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Phylogenetic tree of 2,952 metagenome-assembled genomes (MAGs) with concrete annotated species from improved BGMGM. (\u003cstrong\u003eB\u003c/strong\u003e) MAGs distribution in the 10 segments of GIT. (\u003cstrong\u003eC\u003c/strong\u003e) Improvement of genes’ quality after elongation of contigs within the MAGs using third-generation of sequencing. BGMGM: Bovine Gastro Microbial Genome Map.\u003c/p\u003e","description":"","filename":"Fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-4193125/v1/96fe3a94722f81c653a64276.png"},{"id":54853864,"identity":"4be9a3d9-f263-4287-a677-9bf51ef55d96","added_by":"auto","created_at":"2024-04-17 17:26:08","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":2833985,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eBiosyntheticgene clusters (BGCs) collection of cattle GIT microbiome with 26,503 BGCs and their taxonomy distribution. \u003c/strong\u003e(\u003cstrong\u003eA\u003c/strong\u003e) A total of 328 gene cluster clans (GCCs) and 18,435 gene cluster families (GCFs) clustered from 26,503 BGCs and their novelty, product count, taxonomy attribution. (\u003cstrong\u003eB\u003c/strong\u003e) Lengths distribution of all BGCs and novel BGCs. (\u003cstrong\u003eC\u003c/strong\u003e) Phylum distribution of 1,031 novel BGCs and their products categories. (\u003cstrong\u003eD\u003c/strong\u003e) BGCs count and products count distribution of novel BGCs (novelty = 0.6, novelty = 1.0). (\u003cstrong\u003eE\u003c/strong\u003e) Structures and products of BGCs with in CowSGB-9022.\u003c/p\u003e","description":"","filename":"Fig2.png","url":"https://assets-eu.researchsquare.com/files/rs-4193125/v1/f695c3db8b3dcf883fb969b3.png"},{"id":54853537,"identity":"663f048f-3baa-4287-b515-b1e5422033b3","added_by":"auto","created_at":"2024-04-17 17:18:01","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1533744,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAntioxidant ensemble deep learning model for antioxidants identification in BGCs products. \u003c/strong\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Five categories of antioxidant assays from three databases. (\u003cstrong\u003eB\u003c/strong\u003e) Deep learning model structure and performance. (\u003cstrong\u003eC\u003c/strong\u003e) Collection of 436 BGCs’ products with SMILES structures and their relationship with antioxidant. (\u003cstrong\u003eD\u003c/strong\u003e) Prediction result of 436 BGCs’ products using 5 deep learning models.\u003c/p\u003e","description":"","filename":"Fig3.png","url":"https://assets-eu.researchsquare.com/files/rs-4193125/v1/7b641b4bef13685518e5b190.png"},{"id":54853535,"identity":"c028a5ae-df7f-4d74-a9e5-dd3a17abd791","added_by":"auto","created_at":"2024-04-17 17:18:00","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1157610,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe distribution of BGCs and their encoded metabolites in 10 segments of cattle GIT. \u003c/strong\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Distribution of BGCs number in 10 segments of cattle GIT. (\u003cstrong\u003eB\u003c/strong\u003e) Novelty of BGCs in 10 segments of cattle GIT. (\u003cstrong\u003eC\u003c/strong\u003e) Eight categories of BGCs in 10 segments of cattle GIT. (\u003cstrong\u003eD\u003c/strong\u003e) Predicted antioxidants distribution in 10 segments of cattle GIT. (\u003cstrong\u003eE\u003c/strong\u003e) The 95 potential antioxidant compounds, gastrointestinal location, species attribution.\u003c/p\u003e","description":"","filename":"Fig4.png","url":"https://assets-eu.researchsquare.com/files/rs-4193125/v1/026a30e0b6ef8b401dc27c6e.png"},{"id":54853536,"identity":"a1c2f725-b3e7-4858-8a6c-51698bdfae4b","added_by":"auto","created_at":"2024-04-17 17:18:00","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":2615671,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe oxidative stress signaling patterns of the cattle single-cell atlas with 51 tissue and 1,006 clusters. \u003c/strong\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Pathways scores of antioxidant, ROS generation, and ROS signal response in 126 cell types and 1,006 clusters. (\u003cstrong\u003eB\u003c/strong\u003e) Gene Ontology (GO) enrichment results of three GIT cell types sorted in top 50 of antioxidant, ROS generation, and ROS signal scores. (\u003cstrong\u003eC\u003c/strong\u003e) Genes distribution of enriched GO pathways in (B). (D) Specific Transcription factors (TFs) within three cell types.\u003c/p\u003e","description":"","filename":"Fig5.png","url":"https://assets-eu.researchsquare.com/files/rs-4193125/v1/e4dd0e5032c479bb6cf67355.png"},{"id":54853532,"identity":"57da668a-4ce2-4681-9271-7e900c23c252","added_by":"auto","created_at":"2024-04-17 17:18:00","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":2012410,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCell metabolism and subclusters of high antioxidant cell types in the forestomach. \u003c/strong\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Top50 Pathways scores of all antioxidants pathways, peroxiredoxin activity, antioxidant activity, peroxidase activity. (\u003cstrong\u003eB\u003c/strong\u003e) High Dimension Weighted Gene Co-expression Network Analysis (hdWGCNA) analysis of rumen. (\u003cstrong\u003eC\u003c/strong\u003e) Cell metabolism analysis of spinous and mitotic basal cells. (\u003cstrong\u003eD\u003c/strong\u003e) Cell metabolic reactions sorted in the top of glutathione metabolism and fatty acid metabolism. (\u003cstrong\u003eE\u003c/strong\u003e) Subclusters and marker genes of spinous and mitotic basal cells. (\u003cstrong\u003eF\u003c/strong\u003e) Pseudotime analysis of subclusters of spinous and mitotic basal cells in the forestomach. (\u003cstrong\u003eG\u003c/strong\u003e) Pathways scoring of subclusters of spinous and mitotic basal cells. (\u003cstrong\u003eH\u003c/strong\u003e) TFs and regulatory genes expression within three cell types.\u003c/p\u003e","description":"","filename":"FIg6.png","url":"https://assets-eu.researchsquare.com/files/rs-4193125/v1/f79fd3bea2d715bbf6b26fef.png"},{"id":54853533,"identity":"fe56713a-ee91-4f18-8fbe-a3175a154b24","added_by":"auto","created_at":"2024-04-17 17:18:00","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":854174,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDeriving interactions between potential antioxidants and spinous cells. \u003c/strong\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Workflow of protein-metabolites interaction. (\u003cstrong\u003eB\u003c/strong\u003e) Predicted antioxidants properties of potential antioxidants Cys-Cys-Cys and interactive proteins in the rumen spinous cells. (\u003cstrong\u003eC\u003c/strong\u003e) Enriched GO pathways in the marker proteins of rumen spinous cells and marker proteins predicted to interact with Cys-Cys-Cys. (\u003cstrong\u003eD\u003c/strong\u003e) Interactive scores between Cys-Cys-Cys and rumen marker proteins in two pathways. (\u003cstrong\u003eE\u003c/strong\u003e) Rumen single-cell expression of interactive marker genes in two pathways. (\u003cstrong\u003eF\u003c/strong\u003e) Interactive genes percentages in thetwo enriched pathways in rumen marker genes. (\u003cstrong\u003eG\u003c/strong\u003e) Correlation between cell types in the forestomach to dairy cow phenotypes.\u003c/p\u003e","description":"","filename":"Fig7.png","url":"https://assets-eu.researchsquare.com/files/rs-4193125/v1/477355b4b71d8a87f58899ed.png"},{"id":59037730,"identity":"947ad2e5-be8e-4d9f-a5b9-1755196d6755","added_by":"auto","created_at":"2024-06-25 15:39:15","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":13315991,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4193125/v1/9563bf34-763e-4ee2-9797-76934d7d5c24.pdf"},{"id":54853528,"identity":"94829043-a460-49e2-9690-89ece6031ca4","added_by":"auto","created_at":"2024-04-17 17:18:00","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":1636975,"visible":true,"origin":"","legend":"Article File - Reporting Summary","description":"","filename":"SUNnrreportingsummary.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4193125/v1/73f53652d7b92eea0f22ae7b.pdf"},{"id":54853531,"identity":"7ae139d0-c947-4990-b6c8-19d830e3b373","added_by":"auto","created_at":"2024-04-17 17:18:00","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":5285242,"visible":true,"origin":"","legend":"","description":"","filename":"Extendeddata.docx","url":"https://assets-eu.researchsquare.com/files/rs-4193125/v1/2dfa65e4d03595c48ae4d3c8.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Microbial Antioxidants and Their Interactions with Gastrointestinal Tract Epithelial Cells in the Cattle","fulltext":[{"header":"Main","content":"\u003cp\u003eReactive oxygen species (ROS) are high reactive molecules generated as byproducts of cellular metabolism, such as mitochondrial respiration\u003csup\u003e1\u003c/sup\u003e, immune responses\u003csup\u003e2\u003c/sup\u003e, and enzymatic reactions\u003csup\u003e2\u003c/sup\u003e, in living organisms. However, different concentrations of ROS could result in completely different cellular response, referred to eustress or distress\u003csup\u003e3\u003c/sup\u003e. In the context of oxidative eustress, oxidants at modest levels are engaging with specific molecular targets to facilitate physiological redox signaling. By contrast, during oxidative distress, an escalated concentration of oxidants leads to ROS attack non-specific cellular components\u003csup\u003e3\u003c/sup\u003e. Although methods for detecting ROS at the cellular level have emerged\u003csup\u003e3\u003c/sup\u003e, the distribution of ROS in various cell types is unclear. Due to the inevitable exposure to exogenous nutrients and microbial pathogens, the gastrointestinal tract (GIT) is recognized as a significant source of ROS and antioxidants\u003csup\u003e4\u003c/sup\u003e. Therefore, various nutritional supplements are used as antioxidants for the treatment of oxidative stress in livestock industry\u003csup\u003e5\u003c/sup\u003e. Amongst, microbial agents were gradually used for effective antioxidant purposes by producing antioxidants directly or indirectly\u003csup\u003e6\u003c/sup\u003e. Microbial secondary metabolites have been reported to possess the functions of antimicrobial\u003csup\u003e7\u003c/sup\u003e, anticancer\u003csup\u003e8\u003c/sup\u003e, antioxidant\u003csup\u003e9\u003c/sup\u003e, and regulatory potential of microbial communities\u003csup\u003e10,11\u003c/sup\u003e, representing a significant source of bioactive antioxidants. Serving as the smallest unit for studying microbial metabolism\u003csup\u003e12\u003c/sup\u003e, biosynthetic gene clusters (BGCs) are groups of genes in a microbial genome that work together to produce certain secondary metabolites. Through interpretation of newly identified BGCs, natural bioactive products and drug resources are continuously unraveled, for example, the biosynthetic mechanism of mangicol-like diterpenoid compounds (ester diterpenoids) with significant anti-inflammatory activity was discovered in \u003cem\u003eAspergillus oryzae\u003c/em\u003e\u003csup\u003e13\u003c/sup\u003e. Vast ranges of metabolic and functional diversity possessed by environment microbiota present a promising opportunity for the exploration of novel enzymes and biochemical compounds from BGCs\u003csup\u003e14\u0026ndash;16\u003c/sup\u003e, including the potentials for discovering new antioxidants.\u003c/p\u003e \u003cp\u003eIt is well acknowledged that ruminants are the most typical animals exhibiting symbiotic relationships with GIT microorganisms, together determining the host production and health. Approximately 70% of cattle\u0026rsquo;s energy requirements are provided by short-chain fatty acids (SCFAs) from rumen microbial fermentation of human inedible plant biomass \u003csup\u003e17\u003c/sup\u003e. In the bovine GIT, the complex microbiome and dietary intensify microbial interactions, which could generate a significant diversity of microbial secondary metabolites\u003csup\u003e10\u003c/sup\u003e. However, research on BGCs of ruminant remains scarce\u003csup\u003e18\u003c/sup\u003e, which are largely attributed to the unculturable features of GIT microbiota\u003csup\u003e19\u003c/sup\u003e. Currently, it makes possible to reconstruct the genomes of tremendous uncultivated microbiota\u003csup\u003e18\u003c/sup\u003e and to explore the BGCs from abundant metagenomic data using emerging MAGs database and BGC-identification tools. Another big challenge is how to assess the potential properties of new natural products encoded by GIT BGCs. To overcome the limitations of time-consuming and labor-intensive by using wet-lab methods, deep learning models may facilitate the determination of molecular properties by capturing the most concise and specific molecular structures\u003csup\u003e20,21\u003c/sup\u003e, yet the models identifying antioxidant features based on the structure of small molecules need to be trained.\u003c/p\u003e \u003cp\u003eIn this study, to explore the antioxidants biosynthetic potentials of the cattle GIT microbiome and their interactions with host cell types, we first established a comprehensive BGCs dataset from the to-date largest cattle GIT MAG database. By training the structure-based deep learning model to predict antioxidant properties, we were able to identify potential antioxidants of BGCs products. Next, to identify which cell types are more readily regulated by antioxidants, we characterized the oxidative stress signaling patterns of each cell type in our cattle single-cell atlas (Han et al., 2024, in preparation; Shi et al., 2024, in preparation). Subsequently, computational tool was applied for inferring the interactions between antioxidants and key cell types. Our work discovered abundant microbial genes, antioxidants and interacted cell types in the cattle GIT, and provide novel insights into the understanding of cattle GIT-derived oxidative stress.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eThe biosynthetic potential of the cattle GIT microbiome based on high-quality MAGs\u003c/h2\u003e \u003cp\u003eWe expanded our previous database, BGMGM (Bovine Gastro Microbial Genome Map), by integrating additional 2,114 non-redundant large intestine MAGs from a published study\u003csup\u003e22\u003c/sup\u003e, as well as 109 MAGs newly sequenced and assembled from third-generation sequencing of omasum, jejunum, and rectum content samples. After dereplication, the number of high-quality MAGs increased from 13,572 to 14,093 (44.88% of total MAGs, contamination\u0026thinsp;\u0026gt;\u0026thinsp;90% and completeness\u0026thinsp;\u0026lt;\u0026thinsp;10%; Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA, Table S1). The comprehensive MAGs dataset from 10 segments of the GIT provides extensive coverage (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB, Table S1) and allows for a comparison of microbial components across different GIT locations. Approximately, a quarter of MAGs (3,759) span across different segments of GIT, with 3.0% and 0.60% of MAGs were concurrently found in the stomach and small intestine, small and large intestine regions (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB, Table S2). To improve the quality of genes in these MAGs, pre-assembled contigs were scaffolded using long reads as a backbone. Then, a high-quality MAG database with more complete gene sets was obtained, with the average number of contigs per genome decreased from 154.8 to 86.6 (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001), the length of the maximum scaffold increased from 144,318.9 to 276,725.2 bp (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001), the mean N50 elevated from 62,223.5 to 138,958.2 bp (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eC, Table S1). After being mapped to the Genome Taxonomy Database Toolkit (GTDB-Tk), a total of 11,141 (21.9%) MAGs were annotated (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA, Table S1).\u003c/p\u003e \u003cp\u003eNext, the antiSMASH framework was applied to identify BGCs in MAGs. From all 14,093 MAGs, totally 26,503 BGCs (clustered into 328 gene cluster clans (GCCs)) were identified (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA, Table S2) with length ranging from 1 to 117 kb (46.5% were 15 to 25 kb) (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB, Table S2). After comparing with BIG-FAM/RefSeq\u003csup\u003e23\u003c/sup\u003e, totally 8,672 (32.7%) BGCs were assessed as novel BGCs (distance\u0026thinsp;\u0026gt;\u0026thinsp;0.6), and 1,031 (3.9%) were completely novel (distance\u0026thinsp;=\u0026thinsp;1.0),which represent the unique BGCs in the cattle GIT (Figs.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA \u0026amp; C). A large number of BGCs were found to encoded natural products classified into NRPS (Non-Ribosomal Peptide Synthetases, 5,096 BGCs) and RiPPs (Ribosomally-synthesized and Post-translationally modified Peptides, 13,689 BGCs) (Figure S1). Moreover, within 167 types of natural products derived from 26,503 BGCs, a series of Betalactone (3,027 BGC products) and Arylpolyene (2,637 BGC products) were identified. Betalactone and Arylpolyene are recognized as substances involved in food fermentation and plant defense mechanisms\u003csup\u003e24\u003c/sup\u003e, with the potential of antioxidant, antibacterial, and anticancer activities. For the 1,031 completely novel BGCs, 15 distinct products categories, including 20% NRPS, 15.1% RRE-containing, 10.7% cyclic-lactone-autoinducer were encoded (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC, Table S3), indicating a profound database of BGCs were harbored by cattle GIT.\u003c/p\u003e \u003cp\u003eIn 167 types of natural products, 95 were originated from specific microbial phylum (Figure S2, Table S3). Actinobacteria accounted for 5.3% of the 1,031 complete novel BGCs and encompassed 7 distinct types. It\u0026rsquo;s noticed that Methanobacteriota, as one of the most abundant archaea phyla in the rumen and BGMGM\u003csup\u003e25,26\u003c/sup\u003e, specifically encodes novel subtypes like NRPS-like.NRPS.thioamitides, NRPS-like.thioamitides, thioamitides.LAP. CowSGB_9022, an unannotated species (92.54% completeness and 2.92% contamination for its genome) from the Pseudomonadaceae family and the \u003cem\u003eSaccharopolyspora\u003c/em\u003e genus, encodes the highest number of BGCs and produces the most variety of products (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eD, Table S4). Totally 7 novel BGCs were identified in CowSGB_9022 including c00003, c00046, c00102, c00137, c00245, c00350, and c00368. The c00046 BGC encodes an amino acid analogue which contains a substitution of the R group of glycine in an NRPs-like BGC, whereas the c000368 BGC specifically encodes a Lanthipeptide-class-i peptide (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eE). Interestingly, the genes in the CowSGB_9022 found in the small intestine were enriched in the acetate oxidation, ethanol oxidation, hydrogen oxidation and fermentation (Figure S3). Its diverse encoded BGCs and abundance of oxidative-reduction related genes suggest its significant role in the anaerobic fermentation process within the small intestine and strong adaptability within the intestine microbiome. In sum, we characterized the distribution of secondary metabolite classes in cattle GIT, discovered novel cattle-specific BGCs as well as their encoded products and belonged MAGs.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eThe evaluation of antioxidative properties of metabolites encoded by microbial BGCs\u003c/h2\u003e \u003cp\u003eTo explore the antioxidative properties of the BGCs products, we trained an interpretable ensemble model for identifying potential antioxidants using graph neural network\u003csup\u003e27,28\u003c/sup\u003e. A total of 60,621 antioxidant assays with the corresponding Simplified Molecular Input Line Entry System (SMILES) information were collected from PubChem, CHEMBL, and AODB databases\u003csup\u003e29\u003c/sup\u003e. These metabolites were further classified into five categories of binary classification datasets: electron transfer (ET), hydrogen atom transfer (HAT), inhibition of lipid peroxidation-based assay (Lipid), NRF2-antioxiant response element signaling pathway (NRF2), and others (Others) based on the method\u003csup\u003e29\u003c/sup\u003e of \u0026ldquo;collection of small molecule antioxidant data\u0026rdquo; (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA, Table S4). Ensembles of Message Passing Neural Network (MPNN) on each of the five datasets were applied to learn the characteristics directly from molecular diagrams. Optimization was performed to ensure the most optimal values of depth (\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e), dropout rate (0.05), feedforward neural network hidden size (FFN, 2,000), number of FFN layers (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e), and the overall model\u0026rsquo;s hidden size (2,000) in MPNNs of the ET models (The optimal parameters of the remaining four models are shown in Table S4). To improve the model performance, molecular features including Morgan fingerprint, Morgan count, and RDKIT 2d normalized features were provided as data augmentation strategies. As a result, five structures of MPNNs that could predict the antioxidants through the successive convolution from the digital atoms and bonds with hidden size larger than 1,000 were established (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB, Table S5). A ten-fold cross-validation strategy with an 80%-10%-10% split for training, validation, and testing respectively were utilized, combined with 60 assembled models. To deal with the unbalanced training data, binary cross-entropy loss function was employed. Simultaneously, Precision-Recall Area Under the Curve (PRAUC), Area under the Receiver Operating Characteristic Curve (AUROC), F1 score, and Matthew's Correlation Coefficient (MCC) were used as evaluation metrics across five datasets for. In the training datasets, the PRAUC and the AUROC reach 0.970, 0.990, 1.000, 0.990, 0.840, and 0.974, 0.978, 0.998, 0.999, 0.969 for the ET, HAT, Lipid, NRF2, and Others training datasets, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB, Table S5). In the ten-fold validation, due to the more balanced dataset, the ET, HAT and Lipid models performed with high average PRAUC (ET 0.718, HAT 0.907, Lipid 0.844). Particularly, with imbalanced datasets, the NRF2 model and Others model performed well, achieving PRAUC values of 0.539\u0026thinsp;\u0026plusmn;\u0026thinsp;0.0512, 0.413\u0026thinsp;\u0026plusmn;\u0026thinsp;0.0992 (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB, Table S5), which is obviously higher than the PRAUC (0.364) of antibiotic MPNNs in a study trained by 512 antibiotic-active and 38,80 non-antibiotic-active compounds recently published in Nature\u003csup\u003e27\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eTo predict antioxidant properties, we sorted out the structures of secondary metabolites as follows: 244 metabolites\u0026rsquo; SMILES from our 26,503 BGCs, 134 from nature products database and additional 58 from mapped MIBIG records (detailed in the \u0026ldquo;exploration of microbial-derived small molecules\u0026rdquo; methodology). This resulted in a dataset of 436 secondary metabolite SMILES strings that were utilized for antioxidants prediction (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC, Table S6). The majority of these 436 molecules exhibited low similarity to known antioxidants (Average morgan fingerprint similarity\u0026thinsp;=\u0026thinsp;0.0553, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC, Table S6). Subsequently, five fine-tuned graph neural networks performed convolutional operations based on the specific atoms and bonds present within each input molecular structure, thereby assessing the antioxidant properties of the 436 metabolites. Totally 245, 121, 266, 2, and 47 potential antioxidants were predicted with ET, HAT, Lipid, NRF2, and others antioxidant properties, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eD, Table S7). The interpret structures of potential antioxidants with high prediction score were calculated by conducting a Monte Carlo tree search\u003csup\u003e27\u003c/sup\u003e. We found that 229 (52.5%) secondary metabolites were predicted to have at least two types of antioxidant properties. Among them, four metabolites possessed 4 types of potential antioxidant properties, including NPA007583, CMNPD14685, BGC0002074 and BGC0000621 (Table S7). Overall, our deep-learning model provides a brand-new approach to predict the antioxidant based on their molecular structure, offering an important avenue for exploring and utilizing secondary metabolites.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eThe distribution of BGCs and their encoded metabolites across GIT segments\u003c/h2\u003e \u003cp\u003eAfter evaluating the BGCs and secondary metabolites from the MAGs, we focused on the distribution of BGCs and their secondary metabolites in different GIT segments (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA). Similarly, most of the BGCs were identified in the rumen (10,836) and rectum (10,049) (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA, Table S8). Interestingly, we found that 896 BGCs commonly occurred in the forestomach and the small intestine, much higher than the small-large intestines (169 common BGCs) and the forestomach-large intestines (108 common BGCs) (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA, Table S8), suggesting a closer relationship between the secondary metabolic potential of the forestomach and the small intestine than the small and large intestines. In other word, an independent microbial metabolic community may reside in the large intestine. When considering the distribution of novelty BGCs, BGCs in the small intestine showed a significant higher novelty (Mean dist\u0026thinsp;=\u0026thinsp;0.396) compared to those in the forestomach (Mean dist\u0026thinsp;=\u0026thinsp;0.358, \u003cem\u003eP.adjust\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001) and large intestine (Mean dist\u0026thinsp;=\u0026thinsp;0.297, \u003cem\u003eP.adjust\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB, Tables S2, 8). Particularly, novelty of rumen BGCs remained consistently high and presented no significant difference with small intestines (compare to duodenum, jejunum, and ileum, the \u003cem\u003eP.adjust\u003c/em\u003e were 0.168, 0.478, and 1.000, respectively. Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB, Table S8). Moreover, the study found that BGCs encoding NRPS and RiPP were the most common types in all segments of the GIT, while forestomach and small intestinal had a higher proportion of NRPS (25.9%, 27.9%) compared to the large intestinal tract (9.4%), suggesting a more enriched diversity of multi-domain protein complexes of the forestomach and small intestinal secondary metabolites.\u003c/p\u003e \u003cp\u003eFurthermore, we focused on the distribution of potential antioxidants and their sourced microbial species in the GIT (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eD, Table S9). Totally 95 potential antioxidants with at least two antioxidative properties and predicted scores higher than 0.7 were selected for GIT distribution (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE, Table S9). Among them, 33 antioxidants found to be specific to the rumen, 4 to the rectum, and 1 to the Ileum and cecum. A particular molecular with conjugated double bonds from the 4 unique potential antioxidants of rectum were identified, achieving ET and HAT prediction scores of 0.747 and 0.848. It was encoded by the \u003cem\u003eFixCowSGB-3326\u003c/em\u003e species of the \u003cem\u003eCAG-100\u003c/em\u003e genus within the Firmicutes. Its conjugated double bond structure along with the carboxyl group were calculated to contribute 79.7% of antioxidant prediction through the interpretability analysis of the Monte Carlo tree-based model (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE, Table S9). The distinct potential antioxidants in the rumen were originated from 18 species, including six from \u003cem\u003eMethanobrevibacter_A\u003c/em\u003e and \u003cem\u003eMethanobrevibacter_B\u003c/em\u003e, nine from seven genera such as \u003cem\u003eRuminococcus\u003c/em\u003e, \u003cem\u003eUBA1213\u003c/em\u003e, and \u003cem\u003eThermobifida\u003c/em\u003e. Many of the 95 potential antioxidants showed peptide structures. Among them, Fuscachelin C, encoded by orphan NRPS BGCs from \u003cem\u003eThermobifida fusca\u003c/em\u003e, a degrader of plant cell walls\u003csup\u003e30\u003c/sup\u003e, has been predicted to possess antioxidant properties in ET (0.928), Lipid (0.941), and Others (0.710) categories, suggesting the antioxidant potential of some cellulose-degrading microbiota in the rumen. Interpretability analysis of ET prediction suggests that 98.2% of contribution derived from the dipeptide backbone carbon chain structures in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE (Table S9). The interpretability analysis of the oligopeptide, Cys-Cys-Cys, suggests a contribution score of 0.993 from the thiol group-containing structure, which served as the primary antioxidant group in the glutathione (GSH) (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE, Table S9). In addition, we have also discovered peptide structures containing multiple cysteines with thiol group-containing structures in the specific metabolic products from the rumen, omasum, abomasum to the small intestine from 8 species in the Firmicutes phylum (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE). The secretion of these potentially potent antioxidants suggests a strong potential for antioxidant compound secretion within the bovine GIT microbiota, which may regulate the host\u0026rsquo;s antioxidant status as previously reported\u003csup\u003e31,32\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eThe oxidative stress signaling patterns of the cattle single-cell atlas\u003c/h2\u003e \u003cp\u003eThe oxygen signal is essential for life activities\u003csup\u003e33\u003c/sup\u003e, yet the regulatory gene expression of oxygen signals at the single-cell level is currently unknown. Using our cattle single-cell atlas, we compared the mRNA expression levels of antioxidant, ROS generation, and ROS signal response in 1,803,004 cells belonged to 126 cell types and 1,006 clusters across 51 tissues (Han et al., 2024, in preparation; Shi et al., 2024, in preparation). After scoring the 16 ROS generation, the 19 antioxidant, and the 22 ROS response pathways from Gene Ontology (GO) database, we have identified the top 50 cell clusters in three aspects of oxygen signal, among which 19 clusters were from the digestive system, 13 from the immune system, and 7 from the reproductive system (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA, Table S10). Neutrophils were found to be the major cell types in the top 50, which align with the understanding that neutrophils produce ROS to eliminate pathogens\u003csup\u003e34\u003c/sup\u003e. As for ROS responding pathways, the average scores of gene sets were highly expressed in the digestive system, reproductive system, and immune system (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA, Table S10), with enterocytes of the ileum and duodenum sorted into the top 10 clusters (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA, Table S10). In terms of the antioxidant pathways, a total of 31 out of the top 50 clusters belonged to the digestive system, most of which were identified as epithelial cells in the esophagus and forestomach. Spinous cells of the reticulum, esophagus, rumen, and omasum were ordered at the top 4, 7, 8, and 9 among 1,006 clusters, highlighting the oxidative stress environment of GIT epithelium cells (Figure S4, Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA, Table S10).\u003c/p\u003e \u003cp\u003eWe next focused on the function of GIT cells with the highest scores in the antioxidant, ROS generation, and ROS signal response pathways. Interestingly, spinous cells in the forestomach and enterocytes in the small intestine especially in the ileum showed the highest score in the GIT cell types. Particularly, spinous cells in the reticulum and enterocytes in the ileum, showed the most significant enrichment and the largest number of genes in pathways related to mitochondria, electron transport, respiration, antioxidant and ROS generation (Figs.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB \u0026amp; C). Additionally, the genes with high expression in neutrophils located in the rectum were enriched in pathways related to response to other organisms. Similar results were also found in the hindgut neutrophils. These results indicate that the GIT epithelial cells act as a selective barrier between tissue and the gut environment, undergoes challenge from lumen microbiota. To elucidate the regulatory mechanisms behind the gene expression of these cell types, we ascertain the specific correspondence of regulons within each cell type in the whole GIT segments using SCENIC\u003csup\u003e35\u003c/sup\u003e (Han et al., 2024, in preparation). In the spinous cells of the forestomach, we found that ESRRA consistently maintains a top 5 RSS score in the rumen, reticulum and omasum. Among the top 20 genes regulated by it, there are a large number of genes related to energy metabolism and antioxidation, such as \u003cem\u003eHSD17B13\u003c/em\u003e, \u003cem\u003eSLC25A34\u003c/em\u003e, \u003cem\u003eATP5ME\u003c/em\u003e, \u003cem\u003eTST MGST3\u003c/em\u003e, and \u003cem\u003eGLRX\u003c/em\u003e. For the first time, we analyzed oxygen signal gene expression in the to-date largest cattle single-cell atlas and discovered that cell types in the digestive system exhibited a higher response to ROS signals than other tissues, especially the epithelial cells in the forestomach.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eCell metabolism of high antioxidant cell types in the forestomach\u003c/h2\u003e \u003cp\u003eCellular antioxidant enhancement often signifies that the cells have been subjected to oxidative stress\u003csup\u003e36\u003c/sup\u003e. To further explore the clues behind the occurrence of a strong antioxidant state in the GIT, we selected spinous and basal mitotic cells using 3 major pathways of antioxidants and average antioxidant scores as primary reference metrics (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA). Apart from spinous and basal mitotic cells in the forestomach, smooth muscle cells in the rectum, and principal cells in the epididymis were in the top 50 for the three major antioxidant pathway scores and the average antioxidant scores (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA), indicating strong oxidative stress in these cells. Afterward, through the High Dimension Weighted Gene Co-expression Network Analysis (hdWGCNA), we found that spinous and basal cells simutaniously exhibited higher correlations with module1, 2, 4, 5 (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eB, module1 R\u0026thinsp;=\u0026thinsp;0.164, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, R\u0026thinsp;=\u0026thinsp;0.152, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001; module2 R\u0026thinsp;=\u0026thinsp;0.113, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, R\u0026thinsp;=\u0026thinsp;0.156, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001; module4 R\u0026thinsp;=\u0026thinsp;0.204, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, R\u0026thinsp;=\u0026thinsp;0.170, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001; module5 R\u0026thinsp;=\u0026thinsp;0.117, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, R\u0026thinsp;=\u0026thinsp;0.170, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001) than other modules and specific expression in gene modules 1, 2, and 4 in the rumen (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eB, Figure S5, Table S11), which were enriched in the functions of aerobic electron transport chain (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, Enrichment score\u0026thinsp;=\u0026thinsp;111942.7) and fatty acid beta-oxidation (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, Enrichment score\u0026thinsp;=\u0026thinsp;623.48). Moreover, these modules are also strongly correlated with important antioxidant pathways such as antioxidant activity (R\u0026thinsp;=\u0026thinsp;0.535, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001), peroxidase activity (R\u0026thinsp;=\u0026thinsp;0.493, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001), and peroxiredoxin activity (R\u0026thinsp;=\u0026thinsp;0.576, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001). Similarly, we identified highly enrichment of modules related to fatty acid metabolism and energy metabolism in both modules in Reticulum (module1, 3) and Omasum (module7) (Figure S5, Table S11). Since spinous and mitotic basal cells utilized fatty acids as the major energy source, these results demonstrated the co-expression of the antioxidant genes and fatty acid metabolism genes, suggesting that antioxidant genes could serve as protection for robust fatty acid metabolism in these cells.\u003c/p\u003e \u003cp\u003eAs the rumen is the primary site for fermentation and production of SCFAs in the forestomach. Next, we analyzed the cellular antioxidant metabolism and fatty acid metabolism in the rumen using the flux balance analysis tool. It was discovered that the metabolic activity of spinous and basal cells in the forestomach was significantly higher (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05) than other cell types. Specifically, among all the results of the flux balance analysis, a total of 5,696 up-regulated reactions (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05) in spinous and mitotic basal cells and 1,012 down-regulated reactions in other cell types were found (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), highlighting the strong metabolic status of the spinous and mitotic basal cells in the forestomach. Pathways related to energy metabolism, antioxidant and respiration-related processes were found in a high activity, including fatty acid oxidation (631 upregulated reactions in spinous and basal cells, compared to 3 in other cell types), glutathione metabolism (21 upregulated reactions in spinous and basal cells, no in other cell types), ROS detoxification (7 upregulated reactions in spinous and basal cells, no in others cell types) (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eC, Table S12). We focused on glutathione metabolism and fatty acid oxidation, where the reduction of GSH to GSSG in glutathione metabolism and the elongation of short fatty acids into longer fatty acids showed the highest Cohen's value (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eD, Table S12). These two reactions not only demonstrate how the cell regulates its oxidative stress by consuming GSH but also illustrate the process that synthesizes fatty acids from short fatty acids (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eD).\u003c/p\u003e \u003cp\u003eTo further assess the subclusters and that regulate the fatty acid and antioxidant process, the two cell types were re-clustered into 6 subclusters (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eE, Table S13). Interestingly, we observed subclusters 3 and 4 that could be an intermediate cell types in the differentiating process from basal cell to spinous cell with high expression of \u003cem\u003eKRT6A\u003c/em\u003e, \u003cem\u003eKRT14\u003c/em\u003e, and \u003cem\u003eKRT5\u003c/em\u003e. Through pseudo-time analysis, we have observed a cell differentiation trajectory from subcluster 4, to subcluster 3, and finally to subcluster 1 (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eF). Hallmark pathways scoring results showed that subcluster 3 exhibited significantly higher expression levels of fatty acid metabolism (u\u0026thinsp;=\u0026thinsp;0.21, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.01), while subcluster 4 exhibited strong cell division potential (u\u0026thinsp;=\u0026thinsp;0.13, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.01) when compared to other subclusters (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eG). These results suggest that a specialized phase of fatty acid metabolism and antioxidant defense stages exist during the known differentiation process from basal cells to spinous cells (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eG). Moreover, two TFs, ESRRA and PPARG, along with the genes they regulate related to fatty acid metabolism and antioxidation, are highly expressed in cluster 3 (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eH). Sum up, we found that increased antioxidant metabolism acted as a defense against intense fatty acid metabolism in a specific subtype of cells dedicated to robust fatty acid metabolism.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eDeriving interactions between potential antioxidants and high GIT oxidative stress cells\u003c/h2\u003e \u003cp\u003eOne of the most relevant types of interactions that modulate host metabolism status are the metabolite-mediated interactions among microbes, metabolites, and host cell proteins\u003csup\u003e37\u003c/sup\u003e. Here, we employed a structures based virtual screening method,transformerCPI 2.0\u003csup\u003e38\u003c/sup\u003e, to score the interaction between 436 secondary metabolites and 14,976 marker proteins from genes (LogFC\u0026thinsp;\u0026gt;\u0026thinsp;1) of all cell types in GIT. A total of 6,484,608 interactive scores were obtained (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eA).\u003c/p\u003e \u003cp\u003eFurthermore, we examined the interaction between distinct cell types in the rumen and the potential antioxidant properties of Cys-Cys-Cys, an oligopeptide which exhibited predictive scores of 0.770 and 0.822 for its effectiveness in ET and lipid antioxidation, respectively. This unique compound is specifically prevalent in the rumen, omasum, and rectum. (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE), which possesses the same antioxidant moiety, thiol (-SH), as GSH. First, we calculated its absorption properties using the ADMET2.0 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://admetmesh.scbdd.com/\u003c/span\u003e\u003cspan address=\"https://admetmesh.scbdd.com/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) tool and found that it has acceptable HIA, Lipinski Rule, and MDCK Permeability, indicating the potential to pass through the GIT epithelium. Moreover, in the rumen spinous cells, 151 proteins were predicted to interact with Cys-Cys-Cys (score\u0026thinsp;\u0026gt;\u0026thinsp;0.5) among the 263 marker protein sequences (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eB). Interestingly, these 151 proteins were enriched into more than 10 pathways, with half of which were associated with mitochondria and antioxidant metabolism (\u003cem\u003eP.adjust\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05, Count\u0026thinsp;\u0026gt;\u0026thinsp;10, Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eC). Several pathways were found to co-occur in both the marker gene enrichment results and the interactive gene enrichment results, such as mitochondrial membrane, mitochondrial envelope, organelle envelope, mitochondrial inner membrane and cellular detoxification pathways (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eC). In the cellular detoxification pathway (\u003cem\u003eP.adjust\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001 Count\u0026thinsp;=\u0026thinsp;8), 6 out of 8 marker genes have the potential to interact with Cys-Cys-Cys (\u003cem\u003eP.adjust\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.002,Average Interaction\u0026thinsp;=\u0026thinsp;0.700), while in the mitochondrial membrane pathway, 13 out of 26 genes have the potential for interaction (\u003cem\u003eP.adjust\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.003,Average Interaction\u0026thinsp;=\u0026thinsp;0.703) (Figs.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eD, E, F). Moreover, the 19 genes from cellular detoxification and mitochondrial membrane pathways are found highly expressed in spinous and mitotic basal cells (cluster 6,7, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001) in the rumen (Table S13) (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eD). Similar results were identified in the 310 marker proteins from spinous cells (cluster5) of omasum (Figure S6). This indicates a potential mechanism for microbial interaction with the host through the secretion of antioxidant substances.\u003c/p\u003e \u003cp\u003eSubsequently, to determine the potential functions of the spinous cells, we correlated the cell types in the rumen to dairy cow phenotypes data from GWAS. The result revealed a strong association between the high fatty acid metabolism subcluster (subcluster 3 of spinous and mitotic basal cells) in the forestomach and milk protein (bp value\u0026thinsp;=\u0026thinsp;0.0101, tp value\u0026thinsp;=\u0026thinsp;2.32), milk fat (bp value\u0026thinsp;=\u0026thinsp;0.0729, tp value\u0026thinsp;=\u0026thinsp;1.45) traits in cows (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eG, Table S14). This finding suggests that subcluster3 of spinous and mitotic basal cell not only show high fatty acid metabolism and cellular antioxidant activities, but also potentially significantly influences the production traits of dairy cows.\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe GIT microbiome of ruminants plays a crucial role in the unique ability to convert fiber into meat and milk for human consumption. Although extensive research has been conducted on rumen microbial fiber degradation\u003csup\u003e39\u003c/sup\u003e and trace element metabolism\u003csup\u003e40\u003c/sup\u003e, the understanding of the microbial secondary metabolites production is still limited. As one of the most complex microbial communities, the microbiota of the ruminant GIT may possess a greater biosynthetic potential than microorganisms found in other environment\u003csup\u003e14\u003c/sup\u003e. To obtain a more comprehensive BGC set, lower counts of gaps between contigs in the MAGs are needed. The third generation of long-read sequencing can be used for improving the gene quality and length of MAGs\u003csup\u003e41\u003c/sup\u003e. Following gap-filling processing, we successfully reduced the percentage of BGCs located at sequence edges from 60\u0026ndash;48.9%. Our study discovered 26,503 BGCs with high-quality gene sets, which showed a 79% increase than the current largest rumen BGC database\u003csup\u003e42\u003c/sup\u003e. The complete BGC dataset allows for a deeper exploration of the basal composition of the BGC categories. For instance, NRPS and RiPPs were consistently found to be the dominant types among total BGCs as well as the novel BGCs, implying a profound potential for peptide synthesis within the ruminant GIT environment. These functional peptides have not yet been studied extensively in comparison to ruminal microbial proteins\u003csup\u003e43,44\u003c/sup\u003e. In our research, over 1,000 BGCs were found to be completely different to those in the known databases (BIG-FAM)\u003csup\u003e23\u003c/sup\u003e, which largely expanded our understanding of the metabolic potential of bovine GIT microbiota. Meanwhile, BGCs are distributed in various species, with different MAGs encoding varied numbers and degrees of novel BGCs, which may be attributed to the survival competition and ecological balance within microbial community\u003csup\u003e45\u003c/sup\u003e. For the metabolites encoded by BGCs, we observed a rich presence of betalactones and arylpolyenes alongside the predominant production of RiPPs and NRPS in the others category. More than 55% of product categories were the phyla-specific, suggesting the presence of abundant ecological adaptation mechanisms in more than half of the phyla. Notably, within the Methanobacteriota phylum, high specific metabolites were identified, such as NRPS.thioamitides, thiopeptide.thioamitides. Besides, high abundance of compounds containing rare substructures of thioamitides\u003csup\u003e46\u003c/sup\u003e were discovered in the Methanobacteriota (96, 89.7% of the total thioamitides products, Table S3). The thioamitides characteristically possess macrocycles interconnected by sulfur atoms and diverse functional group\u003csup\u003e47,48\u003c/sup\u003e. In particular, thioamitides and thiopeptides exhibit remarkable antioxidant\u003csup\u003e49\u003c/sup\u003e and antibacterial\u003csup\u003e46\u003c/sup\u003e potency, highlighting the antioxidant and antimicrobial biosynthesis potential of species in the Methanobacteriota. The abundant novel products and rare compounds aligns with the understanding that methane-producing archaea play a crucial role in the intricate ecology of the GIT\u003csup\u003e50\u003c/sup\u003e. Moreover, several species from Actinobacteria phylum that encode diverse and novel BGCs were first identified in our study, which may serve as core metabolic species in the GIT ecosystem.\u003c/p\u003e \u003cp\u003eMicrobiota encoded secondary metabolites possess various biological activities, such as antimicrobial, anti-inflammatory, and anticancer properties\u003csup\u003e7,9,10\u003c/sup\u003e. Increasing studies indicated that GIT microbiota may contribute to the host oxidative status\u003csup\u003e32,51\u003c/sup\u003e, actually, there is also intense competition among microorganisms for antioxidant metabolism. Dumitrescu et al.\u003csup\u003e31\u003c/sup\u003e reported that microbes within the human hindgut can compete for antioxidants through the transporter of antioxidant ergothioneine. However, there is no large-scale method for identifying antioxidant molecules based on the structural properties of small molecules. We trained a novel antioxidant prediction model for small molecules using SMILES structures, which was built on a comprehensive collection of antioxidants from databases and thorough review of related references, which can predict both the antioxidants and their mechanisms, such as ET, HAT, and interpret their structures. These models enable us to predict antioxidants from the products of BGCs, thereby identifying BGCs with antioxidant properties and microbial species with strong antioxidant capabilities. When explored the distribution of BGCs and their potential antioxidant metabolites in the 10 GIT locations, we identified segment-specific potential antioxidant compounds. Interestingly, we discovered a peptide composed of three cysteine residues in the forestomach, L-Cysteine, L-Cysteine, L-Cysteine (Cys-Cys-Cys). Specifically, this oligopeptide has a molecular weight of 327.5, containing three consecutive cysteine residues and three thiol groups, which are the primary antioxidant functional groups found in glutathione, compared to just one in glutathione\u003csup\u003e52,53\u003c/sup\u003e. Cys-Cys-Cys was derived from the \u003cem\u003eRuminococcus flavefaciens\u003c/em\u003e, \u003cem\u003eRuminococcus_D sp902788785\u003c/em\u003e, \u003cem\u003eUBA2942 sp900321525\u003c/em\u003e, \u003cem\u003eClostridium_AI polysaccharolyticum\u003c/em\u003e, \u003cem\u003eButyrivibrio hungatei_A\u003c/em\u003e, \u003cem\u003eBlautia_A massiliensis\u003c/em\u003e and \u003cem\u003eAcetitomaculum ruminis\u003c/em\u003e in the Firmicute phylum in the rumen, omasum, abomasum and rectum. The mechanism of how this peptide regulate crucial oxidative stress in the gastric mucosal epithelial cells is yet to be explored. Moreover, although the substance can be synthesized (CAS number: 206058-60-8) and found in conserved sequences in the coat protein (CP) and helper component-proteinase (HC-Pro) of potyviruses\u003csup\u003e54\u003c/sup\u003e, the information of the antioxidant property of the substance is very limited. Therefore, it is necessary to conduct ET and HAT antioxidant experiments such as 2,2-diphenyl-1-picrylhydrazyl assay (DPPH), Oxygen Radical Absorbance Capacity assay (ORAC), and even cellular-level antioxidant experiments on Cys-Cys-Cys. In this section, we explored the potential antioxidants secreted by the rumen microbiota, screened for tissue-specific potential antioxidant compounds, and discovered potential antioxidants with unique structures. Moreover, in the subsequent experiments, we will experimentally validate the currently predicted and synthesizable antioxidant substances.\u003c/p\u003e \u003cp\u003eServing as the basic physiological reactions, response to oxygen signals and antioxidants determined the normal living status of cells. While real-time measurement of ROS levels across various cell types in vivo remains a big challenge, single-cell transcriptomic profiles provide a unique view through which to discern distinct stages of cellular ROS responses. Based on the collaborative single-cell atlases of cattle covering 51 tissues, 126 cell types and 1,006 clusters (Han et al., 2024, in preparation; Shi et al., 2024, in preparation), we further investigated cellular responses in defending, responding to, or generating ROS at mRNA level. Interestingly, we found cells in the GIT, particularly the epithelium cells, comprise the majority of cell types with high expression of the three key pathways related to antioxidant. This could be attributed to the abundant exogenous metabolites and high prevalent mechanical damage from crude fiber faced by the GIT of ruminants\u003csup\u003e55,56\u003c/sup\u003e. Spinous and mitotic basal cells in the forestomach, as well as enterocytes in the small intestines, exhibited high expression of pathways that defend and respond to ROS. For spinous and mitotic basal cells, higher active fatty acid metabolism was revealed through hdWGCNA and single-cell flux balance analysis, which is consistent with the knowledge that forestomach epithelium cells primarily use fatty acids as their energy source\u003csup\u003e57\u003c/sup\u003e, especially SCFAs. Subsequently, to elucidate the regulatory elements involved in fatty acid metabolism, we identified the specific TFs such as ESRRA\u003csup\u003e58\u003c/sup\u003e and PPARG\u003csup\u003e59\u003c/sup\u003e could regulate fatty acids metabolism and antioxidant genes in the forestomach. Interestingly, ESRRA have been reported to play a pivotal role in mice intestinal homeostasis by engaging in orchestrating gut microbiota composition, thereby safeguarding the host against detrimental mitochondrial dysfunction\u003csup\u003e58\u003c/sup\u003e. Additionally, PPARG plays a crucial role in urothelium by controlling mitochondrial function, development, and regeneration\u003csup\u003e60\u003c/sup\u003e. The results suggest that both ESRRA and PPARG may serve as crucial TFs in the spinous and basal cells for maintaining energy and fatty acids metabolism homeostasis. Moreover, the exploration of the interactions between ESRRA and forestomach microbiota community is necessary. It is known that spinous and basal cells in the forestomach are in a continuous process of differentiation\u003csup\u003e61\u003c/sup\u003e, therefore, the intermediate cell states and functions are necessary to understand the dynamic cellular transitions and cellular functional specialization. We observed the differential state from basal cells to spinous cells, with 5 intermediate subclusters playing their respect roles throughout this process. A specialized cell subtype that is focused on fatty acid utilization was further identified, which require further experimental exploration. This suggests that there is a cellular functional specialization among the subtypes of spinous and mitotic basal cells, and these particular cell subtypes merit further experimental exploration. In short, we discovered that the GIT epithelial cells displayed heightened ROS response and defense among 51 tissues using our comprehensive single-cell atlas in cattle, and the spinous and basal cells in the forestomach were specially highlighted. Additionally, the forestomach epithelial cells with oxidative stress status may be regulated by antioxidants derived the microbiota, which were explored by metabolite-protein interactions prediction.\u003c/p\u003e \u003cp\u003eCurrently, bioinformatic tools for predicting protein-metabolite interactions based on metabolite structure and protein sequence are rapidly involved\u003csup\u003e62\u0026ndash;64\u003c/sup\u003e. The metabolite-mediated interactions between microbes and host cells represent one of the most pertinent types of molecular interplay capable of modulating the gene expression of host cells\u003csup\u003e65,66\u003c/sup\u003e. The potential interactions between metabolites and proteins from the microbes and ruminal GIT cells, as identified in our study, were established using molecule structures (SMILES) and protein sequence from the cell markers. We focused on the predicted interactions between spinous cells exhibiting high oxidative stress in the forestomach and Cys-Cys-Cys, a potential antioxidant with three thiol groups. Our study on the interactions between Substance A and various epithelial cell types in the forestomach has uncovered that Cys-Cys-Cys potentially regulates over 50% of the proteins that are highly expressed in these cells and involved in antioxidant and energy metabolic processes. Cysteine is the second least abundant amino acid following tryptophan\u003csup\u003e67,68\u003c/sup\u003e and Cys-contatining peptides are highly conserved\u003csup\u003e69\u003c/sup\u003e, being present in over 97% of mammal proteins. However, future research is required to identify its binding sites with marker proteins in the spinous cells. Taken together, the finding of a large number of interaction scores between potential antioxidants provides computational resources for further exploration of how microbiota regulate host status by metabolites, especially oxidative stress.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn this study, we constructed the most abundant BGCs collection in the cattle GIT to explore the microbial secondary metabolites. Using the newly-trained structure-based deep learning models in small antioxidant molecules, a total of 396 BGC products were predicted with potential antioxidant properties. We found that GIT epithelial cells exhibit strong antioxidant among 126 cell types and 1,006 clusters across 51 tissues in dairy cows. Particularly, a high occurrence of fatty acid metabolism and antioxidant defense in rumen epithelial cells especially spinous and mitotic basal cell types were observed. We predicted over 6\u0026nbsp;million interaction scores between BGC metabolites and marker proteins in the GIT cell types, and Cys-Cys-Cys was identified to potentially regulate the cellular energy metabolism and detoxification in the rumen epithelium. Our work not only facilitates further exploration for the microbial secondary metabolism in cattle GIT but also offers theoretical and model bases for the discovery of novel antioxidants and microbes. Our results suggest that the cattle GIT, particularly the forestomach, serve as a critical site of oxidative stress occurrence and could be subjected to antioxidant regulation by microbial antioxidants, pointing towards the potential of regulating host oxidative stress responses through microbial manipulation.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n\u003ch2\u003eLibrary preparation and next-generation sequencing (NGS)\u003c/h2\u003e\n\u003cp\u003eWe extracted metagenomic DNA from the omasum, jejunum and rectum content sample using the QIAamp DNA Stool minikit (Qiagen, cat.no. 51604) for metagenomic sequencing. The quality and quantity of the obtained DNA were assessed by running on a 0.5% agarose gel and using the Qubit dsDNA assay kit (Thermo Fisher Scientific Inc.). Finally, sequencing was performed using high molecular weight (modal size\u0026thinsp;\u0026gt;\u0026thinsp;2 kbp) and sufficient quantity (\u0026gt;\u0026thinsp;10 \u0026micro;g) of DNA samples. Using 1 \u0026micro;g of total DNA extracted from the omasum, jejunum and rectum content samples as Illumina Sequencing starting library. Then DNA was fragmented using the Covaris M220 ultrasonicator to approximately 400 bp. As for PE library construction, TruSeq\u0026trade; DNA Sample Prep Kit was used. In this process, the DNA undergoes ligation of \u0026ldquo;Y\u0026rdquo; adapters, removal of adapter dimers, enrichment through PCR amplification, and sodium hydroxide denaturation to generate single-stranded DNA fragments. Subsequently, the PCR bridging process, facilitated by the cBot TruSeq PE Cluster Kit, results in the formation of DNA clusters and their linearization into single strands.\u003c/p\u003e\n\u003cp\u003eFollowing the next generation process, the DNA fragment sequence was determined, enabling the generation of high-quality sequencing data for downstream analysis.\u003c/p\u003e\n\u003cp\u003eThen trimmomatic\u003csup\u003e70\u003c/sup\u003e (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.usadellab.org/cms/uploads/supplementary/Trimmomatic\u003c/span\u003e\u003c/span\u003e) were utilized for adapter trimming, trimming low-quality sequence ends, discarding reads with an N ratio exceeding 10%, and removing sequences with a trimmed length less than 75 bp (ILLUMINACLIP: adapter.fa:2:30:10 SLIDINGWINDOW:4:15 MINLEN:75).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n\u003ch2\u003ePacBio sequencing and hybrid assembly using HIFI and NGS sequencing data\u003c/h2\u003e\n\u003cp\u003eFor PacBio sequencing, 5 \u0026micro;g of DNA of omasum, jejunum and rectum content is prepared the SMRTbell library using the PacBio SMRTbell prep kit 3.0 (Pacific Biosciences, Part Number: 102-182-700). Prior to library preparation, the initial DNA sample with damaged double-stranded DNA is repaired using the New England Biolabs PreCR Repair Mix Kit (M0309SVIAL). Subsequently, the BluePippin system (Sage Science) is employed to select repaired DNA molecules larger than 3 kb in size. The SMRTbell library is then sequenced on the PacBio Sequel IIe instrument (Pacific Biosciences) using SMRT 2M cells (Part Number: 101-389-001) for v3 chemistry sequencing. A total of 3,271,891, 2,926,668, 2,512,929 raw long reads of omasum, jejunum and rectum samples were obtained, which is further processed using the CCS mode in SMRT Link package (Pacific Biosciences, v10.0) with the following parameters: --min-length 200, --min-passes 3, and\u0026ndash;-min-rq 0.99, to generate HiFi reads. Hybrid assembly was applied using 3 softwares with the following parameters. MaSuRCA\u003csup\u003e71\u003c/sup\u003e (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/alekseyzimin/masurca\u003c/span\u003e\u003c/span\u003e, version 4.0.3, corrected long-read mode with the default parameter), hybridmetaSPAdes\u003csup\u003e72\u003c/sup\u003e (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ablab/spades\u003c/span\u003e\u003c/span\u003e, version 3.15.3, key parameters: \u0026lsquo;\u0026ndash;meta \u0026ndash;pacbio -m 500\u0026rsquo;) and Operams\u003csup\u003e73\u003c/sup\u003e(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/CSB5/OPERA-MS\u003c/span\u003e\u003c/span\u003e, version: 2.11-r797, default parameter). Finally, a total of 109 MAGs were assembled from second and third-generation sequencing data (mean completeness 67.2%, mean contamination 2.4%, mean N50 305988.5bp, mean max-scaffold 419980.8 bp).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n\u003ch2\u003eMetagenomic assembled genomes collection and quality improvement\u003c/h2\u003e\n\u003cp\u003eBased on our to-date largest bovine gastrointestinal microbial MAG database, \u0026ldquo;Bovine Gastro Microbial Genome Map\u0026rdquo; (BGMGM), we further expanded and elevated it with newly published resources and sequenced samples. Specifically, we collected 2,114 new MAGs from cattle hindgut\u003csup\u003e22\u003c/sup\u003e and 106 MAGs hybrid assembled from long-read sequencing data obtained from omasum, jejunum, and rectum content, resulting in a total of 47,241 MAGs. After dereplicating using dRep (v.2.5.4)\u003csup\u003e74\u003c/sup\u003e with a 95% ANI threshold (-comp 0 -con 1000 -sa 0.95 -nc 0.2), 14,093 non-redundant MAGs were retained. Taxonomy information was annotated by GTDB-Tk\u003csup\u003e75\u003c/sup\u003e (v202, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/Ecogenomics/GTDBTk\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e\n\u003cp\u003eAfter dereplicating steps, we utilized HIFI data to remove gaps within each MAG among the 14,093 MAGs with minimap2\u003csup\u003e76\u003c/sup\u003e (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/lh3/minimap2\u003c/span\u003e\u003c/span\u003e) and SSPACE-LongRead\u003csup\u003e77\u003c/sup\u003e (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/Runsheng/sspace_longread\u003c/span\u003e\u003c/span\u003e). We extract HIFI-fq data based on raw bins, and then perform direct scaffolding using the SSPACE-LongRead\u003csup\u003e77\u003c/sup\u003e tool.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\n\u003ch2\u003eIdentifying and clustering BGCs from MAGs\u003c/h2\u003e\n\u003cp\u003eWe applied rapid genome-wide identification, annotation and analysis of secondary metabolite BGCs in our non-redundant BGMGM database with 14,093 MAGs, using antiSMASH\u003csup\u003e78\u003c/sup\u003e (v.6.1.1, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/antismash/antismash/releases\u003c/span\u003e\u003c/span\u003e). All BGCs were identified by antiSMASH\u003csup\u003e78\u003c/sup\u003e with default parameters and subsequently processed with BiG-SLICE\u003csup\u003e79\u003c/sup\u003e (v1.1.0, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/medema-group/bigslice\u003c/span\u003e\u003c/span\u003e) and BiG-SCAPE\u003csup\u003e80\u003c/sup\u003e (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/medema-group/BiG-SCAPE\u003c/span\u003e\u003c/span\u003e). Each BGC was functionally characterized based on predicted product types defined in antiSMASH or broader product classes defined in BiG-SCAPE. The diversity and novelty of the eight BGC classes, including terpenes, RiPPs, NRPS, PKSI, PKSother, PKS-NRP hybrids, saccharides, and others, were estimated by calculating their distances to computationally predicted databases (RefSeq database within BiG-FAM\u003csup\u003e23\u003c/sup\u003e) and experimentally validated databases (MIBIG 2.0) using BIG-SLICE. A BGC with distance greater than 0.6 and 1.0 was considered as novel and completely novel, respectively. We applied BIG-SCAPE to calculate pairwise cosine distances between all BGCs and clustered them using average linkage into GCFs and GCCs, with distance thresholds at 0.3 and 0.9, respectively.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\n\u003ch2\u003eExploration of microbial-derived small molecules\u003c/h2\u003e\n\u003cp\u003eAmong the 26,503 (BGCs), totally 244 small molecule compounds with SMILES structure were identified after removing redundancy (Supplementary table 4). Subsequently, the results of aligning the rumen transcriptome to the MIBIG database showed that 58 BGCs with products\u0026rsquo; SMILES structures expressed in rumen microbiota and (Supplementary table 5). Finally, 134 products from 23 species in BGMGM were identified by searching the Natural Products Atlas 2.0\u003csup\u003e81\u003c/sup\u003e (NPA, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.npatlas.org/\u003c/span\u003e\u003c/span\u003e) and comprehensive marine natural products database\u003csup\u003e82\u003c/sup\u003e (CMNPD, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://cmnpd.org/\u003c/span\u003e\u003c/span\u003e) databases by species (Supplementary table 6). These metabolites were collected from our NRPS and RiPPs BGCs (244 metabolites), Minimum Information about a Biosynthetic Gene cluster (MIBIG 58 metabolites), The Comprehensive Marine Natural Products Database (CMNPD 82 metabolites) and The Natural Products Atlas (NPA) database (52 metabolites). In total, 436 microbial-derived small molecular in the cattle GIT were collected.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\n\u003ch2\u003eCollection of small molecule antioxidant data\u003c/h2\u003e\n\u003cp\u003eAntioxidants data was collected from three different databases, including AODB\u003csup\u003e29\u003c/sup\u003e, CHEMBL\u003csup\u003e83\u003c/sup\u003e, PubChem\u003csup\u003e84\u003c/sup\u003e. All the antioxidant assays information downloaded from the ADOB and manually collected and curated from PubChem and CHEMBL were merged, resulting with 60,621 antioxidant assays data. Through manually examination of the experimental description, the 60,621 antioxidant assays were classified into 5 classes (electron transfer, ET; Hydrogen atom transfer, HAT; Inhibition of lipid oxidation; Targeting NRF2-ARE, NRF2; and others) based on the classification criteria in the AODB database\u003csup\u003e29\u003c/sup\u003e. Within each category, molecules demonstrating ROS scavenging capability less than 50% of Trolox, or an inhibitory concentration (IC50) value higher than 1,000 nM, were classified as non-antioxidants. Finally, five datasets with antioxidants properties were obtained, with 5,118 antioxidants and 5,741 non-antioxidants in the ET antioxidant class, 1,516 antioxidants and 862 non-antioxidants in the HAT antioxidant assay, 1,762 antioxidants and 1,810 non-antioxidants in the inhibition of lipid oxidant assay, 1,985 antioxidants and 15,600 non-antioxidants in the inhibition of NRF2-ARE targeting assay, 1,584 antioxidants and 13,651 non-antioxidants in the other antioxidant assays.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\n\u003ch2\u003eConstruction of antioxidants identification graph neural network\u003c/h2\u003e\n\u003cp\u003eTo construct molecular properties predicting deep learning model, we applied graph neural network structures, chemprop\u003csup\u003e28\u003c/sup\u003e (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/chemprop/chemprop\u003c/span\u003e\u003c/span\u003e), for antioxidants prediction model training. The graph neural networks performing convolution on the fingerprint matrix of atoms and bonds from each molecule\u003csup\u003e20\u003c/sup\u003e. To address the issue of data imbalance, we utilized Area Under the Precision-Recall Curve (PRAUC) as the model\u0026rsquo;s major loss metric, binary cross-entropy as the loss function, and enhanced the features using morgan and rdkit 2D normalized data. Then, five binary classified datasets annotated by ourselves were used for training. Ten folds cross-validation were applied for models training with 80%-10%-10% splits of the training, testing, and validation dataset, respectively. The features were augmented by a list of RDKIT-computed molecular features (morgan, morgan_count, rdkit_2d_normalized) to improve the performance of models.\u003c/p\u003e\n\u003cp\u003eThe hyperparameters employed for chemprop were configured as follows: data splitting involved an 80% allocation for training, 10% for validation, and 10% for testing, utilizing scaffold balancing. A 10-fold cross-validation was conducted, accompanied by an ensemble size of 6. The aggregation method utilized normalization with a norm value of 50. The loss function chosen was binary cross-entropy. Evaluation metrics encompassed precision-recall area under the curve (PRC-AUC), alongside supplementary metrics such as F1 score, Matthews correlation coefficient (MCC), and area under the curve (AUC). Each prediction scores of five antioxidant models were assembled by the average scores of 60 ensemble models of the 10 folds.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\n\u003ch2\u003eSingle-cell dimensionality reduction, clustering and identification of marker genes\u003c/h2\u003e\n\u003cp\u003eThe clustering and dimensionality reduction methods for single-cell analysis across all tissues are the same as described in our companion papers (Han et al., 2024, in preparation; Shi et al., 2024, in preparation). In brief, CellRanger (v7.0.1, 10x Genomics) was utilized to perform sample demultiplexing, barcode processing, and single-cell 3\u0026rsquo; gene counting. The scRNA-seq data were then aligned to the ARS-UCD1.2 cattle reference genome in order to identify gene expression profiles at the single-cell level. Then, the features, barcodes, and count matrix were loaded into Seurat\u003csup\u003e85\u003c/sup\u003e (4.3.0, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/satijalab/seurat\u003c/span\u003e\u003c/span\u003e) to facilitate downstream single-cell analysis and visualization. Each library underwent cell quality control using ddgcR (v0.1.0 \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ayshwaryas/ddqc\u003c/span\u003e\u003c/span\u003e). Initially, the cells were clustered using standard scRNA-seq analysis preprocessing and clustering steps. Within each cluster, cells with values of n.counts and n.genes less than 2 median absolute deviations were filtered out. Following the exclusion of cells with mitochondrial gene ratio surpassing 10%, we employed the DoubletFinder package\u003csup\u003e86\u003c/sup\u003e (v2.0.3, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/chris-mcginnis-ucsf/DoubletFinder\u003c/span\u003e\u003c/span\u003e) to eliminate doublets. The conventional dimensionality reduction and clustering workflow is performed according to the following steps: The \u0026ldquo;NormalizeData\u0026rdquo; function was employed to calculate gene expression values with \u0026lsquo;LogNormalize\u0026rsquo; method and 10,000 \u0026lsquo;scale.factors\u0026rsquo;. The function \u0026ldquo;FindVariableGenes\u0026rdquo; was performed to select top 2,000 high variable genes, and the expression levels of these genes were scaled using the \u0026ldquo;ScaleData\u0026rdquo; function. PCA (principal component analysis) was performed for dimensionality reduction and clustering count matrix using the \u0026ldquo;RunPCA\u0026rdquo; function, and number of PCs from 30 to 50 were tested to assess the most suitable number of PCs. Harmony\u003csup\u003e87\u003c/sup\u003e (v0.1.0, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/immunogenomics/harmony\u003c/span\u003e\u003c/span\u003e) was used to correct batch effects. Cell clustering was performed using the \u0026ldquo;FindClusters\u0026rdquo; function, with resolution from 0.4 to 1.3 tested for an appropriate resolution. Cell visualization was achieved using the \u0026ldquo;RunUMAP\u0026rdquo; function. The \u0026ldquo;FindAllMarkers\u0026rdquo; function was utilized to identify differentially expressed genes (DEGs) or marker genes (|\u0026lsquo;avg_logFC\u0026rsquo;| \u0026gt; 0.25 and \u0026lsquo;p_val_adj\u0026rsquo; \u0026lt; 0.05).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEnrichment analysis and gene pathways scoring analysis in single-cell atlas.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eEnrichment analysis was conducted using \u0026lsquo;enrichGO\u0026rsquo; functions in R package \u0026lsquo;clusterprofiler\u0026rsquo;, utilizing OrgDb set as \u0026lsquo;org.Bt.eg.db\u0026rsquo;, with both \u0026lsquo;pvaluecutoff\u0026rsquo; and \u0026lsquo;qvaluecutoff\u0026rsquo; set to 0.05. Genes within the 16 oxidase generation, 19 antioxidant, 22 response to oxygen signaling gene sets are listed in Supplementary Table\u0026nbsp;10. The \u0026lsquo;score_genes\u0026rsquo; function within Scanpy\u003csup\u003e88\u003c/sup\u003e (v1.9.3, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/scverse/scanpy\u003c/span\u003e\u003c/span\u003e) was applied to evaluate the activity of a specific pathway expression score of all the 1,803,004 cells. The differences in signature scores among cell types were assessed using a two-sided Wilcoxon rank-sum test. A significance level of 0.05 was employed.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\n\u003ch2\u003ePseudo-time analysis of cell subclusters in the forestomach\u003c/h2\u003e\n\u003cp\u003eTo model differentiation trajectories, we performed trajectory analysis using Monocle2\u003csup\u003e89\u003c/sup\u003e (v2.26.0, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/cole-trapnell-lab/monocle-release\u003c/span\u003e\u003c/span\u003e) for all the spinous and mitotic basal cells in the rumen, reticulum, and omasum, according to the general pipeline (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://cole-trapnell-lab.github.io/monocle-release/docs/\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\n\u003ch2\u003eCell metabolic state estimation by compass algorithm\u003c/h2\u003e\n\u003cp\u003eTo characterize the metabolic state of cells using single-cell sequencing data and flux balance analysis, Compass\u003csup\u003e90\u003c/sup\u003e (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/YosefLab/Compass\u003c/span\u003e\u003c/span\u003e) was applied for cell metabolism analysis following the default parameters (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://yoseflab.github.io/Compass/tutorial.html\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\n\u003ch2\u003eSingle-cell transcription factors regulatory network inference\u003c/h2\u003e\n\u003cp\u003eWe computed the gene regulatory networks of these tissue cell types.By calculating the Regulon specificity score (RSS), we were able to ascertain the specific correspondence of regulons within each cell type. The GRN consisting primarily of genes and their transcription factors (TF) in spinous and basal cells was calculated using pySCENIC\u003csup\u003e35\u003c/sup\u003e (v 0.12.1, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/aertslab/pySCENIC\u003c/span\u003e\u003c/span\u003e). The activity of each TF in single cells was analyzed using the AUCell function of pySCENIC. The results obtained from pySCENIC are presented in Supplementary Table S3. To identify the main TFs in spinous and basal cells, we employed the RSS method implemented in the 'calcRSS' function of the R package SCENIC (v 1.3.1, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/aertslab/SCENIC\u003c/span\u003e\u003c/span\u003e) for the identification of cell type-specific TFs.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec22\" class=\"Section2\"\u003e\n\u003ch2\u003eSingle-cell high dimension co-expression weighted gene co-expression network analysis (hdWGCNA) and cellular metabolism analysis\u003c/h2\u003e\n\u003cp\u003eThe \u0026ldquo;hdWGCNA\u0026rdquo; package\u003csup\u003e91\u003c/sup\u003e (0.2.19, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/smorabit/hdWGCNA\u003c/span\u003e\u003c/span\u003e) was employed to perform hdWGCNA in the rumen, reticulum, omasum single cell atlas following the default parameters.\u003c/p\u003e\n\u003cdiv id=\"Sec23\" class=\"Section3\"\u003e\n\u003ch2\u003eInteraction inferring between microbial antioxidants and cell types using deep learning model\u003c/h2\u003e\n\u003cp\u003eA total of 6,484,608 unique pairs were identified, comprising protein-metabolite interactions between 436 different microbial metabolites and 14,976 marker proteins across all cell types within GIT. These pairs were utilized for inferring potential interactions based on the inference model, TransformerCPI 2.0 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/lifanchen-simm/transformerCPI2.0\u003c/span\u003e\u003c/span\u003e)\u003csup\u003e38\u003c/sup\u003e, which scored the interaction using peptide sequence and SMILES structures by end-to-end differentiable learning. \u0026lsquo;Featurizer.py\u0026rsquo; were used for tokenizing and encoding the protein sequence and compounds. The interaction scores were predicted by trained model \u0026lsquo;Virtual Screening.pt\u0026rsquo; in \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://drive.google.com/drive/folders/1X7i1eO-EykCQcvqMeWeB7QXT3E9eLG08?usp=sharing\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec24\" class=\"Section2\"\u003e\n\u003ch2\u003eCorrelations between cell types and production traits in Chinese Holstein dairy cows\u003c/h2\u003e\n\u003cp\u003eTotally 56 cell types and complex production traits (milk protein, milk fat and milk yield) were correlated using genome-wide association study analysis in Chinese Holstein dairy cattle. Dairy cows GWAS data and all the cell types within the rumen, reticulum and omasum single-cell atlas were used for correlation analysis. \u0026lsquo;ScpaGwas\u0026rsquo; (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/sulab-wmu/scPagwas\u003c/span\u003e\u003c/span\u003e)\u003csup\u003e92\u003c/sup\u003e was utilized to perform this analysis, which employs a polygenic regression model to prioritize a set of trait-relevant genes. Moreover, the ScpaGwas uncovered trait-relevant cell subpopulations by incorporating pathway activity in scRNA-seq data with GWAS summary data. In details, for the sake of comprehensive results, we employed 317 human KEGG pathways post the removal of duplicates and the conversion of homologous genes. Regarding the cell type association, ScpaGwas applied the block bootstrap to estimate standard errors and calculate a t-statistic accompanied by an associated \u003cem\u003eP-value\u003c/em\u003e for each cell type. Totally 200 iterations of the block bootstrap procedure were employed for each cell-type association study. Additionally, it offers optional parameters that users can customize for the execution of the block bootstrap process. The Boot evaluate function was applied to calculate the Pearson correlation between cell-types and traits, while the \u0026lsquo;scPagwas_perform_score\u0026rsquo; function was applied to define the enrichment level of pathways for each trait. P-value lower than 0.05 was considered as significant.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData and materials availability \u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll the raw sequencing of third generation data have been deposited to Genome Sequence Archive (GSA) database (accession number: PRJCA022361). Derived data supporting the findings of this study are available from the corresponding author upon request (HS:
[email protected]).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe thank all the members in the Institute of Dairy Science, College of Animal Sciences, Zhejiang University for their assistant in the sample collection. This work was supported by the following funds: Natural Science Foundation of Zhejiang Province (LR23C170001), National Key R\u0026amp;D Program of China (2022YFD130100106, 2022YFD1301700).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eContributions \u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eS.L.Z, J.X.L, George E. L., Y.J., D.X.S, L.Z.F and H.Z.S. designed the research. S.L.Z, M.H.J., J.H.X, L.C, Y.N.Y., W.Q, F.F.G and H.Z.S., improved the Bovine Gastro Microbial Genome Map (BGMGM). S.L.Z., M.H.J., X.J., Y.N.Y. and W.Q, performed the biosynthesis gene cluster identification from the BGMGM. S.L.Z., M.H.J., H.C.L., B.H., Q.Z., W.J.Z., and T.S. performed the single cell downstream analysis, cell type annotation, pathway analysis and pseudo-time analysis. S.L.Z., J.H.X, L.C. collected the antioxidant assays. S.L.Z. trained the deep-learning models, performed the interactive analysis and visualized the results. H.Z.S., S.L.Z, L.Z.F., D.X.S. and Y.J. interpreted the data and wrote the manuscript with input from all other authors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics declarations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eVercellino, I. \u0026amp; Sazanov, L. A. The assembly, regulation and function of the mitochondrial respiratory chain. \u003cem\u003eNat. Rev. Mol. Cell Biol.\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 141\u0026ndash;161 (2022).\u003c/li\u003e\n\u003cli\u003eLee, Y. M., He, W. \u0026amp; Liou, Y.-C. The redox language in neurodegenerative diseases: oxidative post-translational modifications by hydrogen peroxide. \u003cem\u003eCell Death Dis.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 58 (2021).\u003c/li\u003e\n\u003cli\u003eGouda, M., Chen, K., Li, X., Liu, Y. \u0026amp; He, Y. Detection of microalgae single-cell antioxidant and electrochemical potentials by gold microelectrode and Raman micro-spectroscopy combined with chemometrics. \u003cem\u003eSens. Actuators B\u003c/em\u003e \u003cstrong\u003e329\u003c/strong\u003e, 129229 (2021).\u003c/li\u003e\n\u003cli\u003eBhattacharyya, A., Chattopadhyay, R., Mitra, S. \u0026amp; Crowe, S. E. Oxidative Stress: An Essential Factor in the Pathogenesis of Gastrointestinal Mucosal Diseases. \u003cem\u003ePhysiol. Rev.\u003c/em\u003e \u003cstrong\u003e94\u003c/strong\u003e, 329\u0026ndash;354 (2014).\u003c/li\u003e\n\u003cli\u003eWang, Y., Chen, Y., Zhang, X., Lu, Y. \u0026amp; Chen, H. New insights in intestinal oxidative stress damage and the health intervention effects of nutrients: A review. \u003cem\u003eJ. Funct. Foods\u003c/em\u003e \u003cstrong\u003e75\u003c/strong\u003e, 104248 (2020).\u003c/li\u003e\n\u003cli\u003eChandra, P., Sharma, R. K. \u0026amp; Arora, D. S. Antioxidant compounds from microbial sources: A review. \u003cem\u003eFood Res. Int.\u003c/em\u003e \u003cstrong\u003e129\u003c/strong\u003e, 108849 (2020).\u003c/li\u003e\n\u003cli\u003eKalelkar, P. P., Riddick, M. \u0026amp; Garc\u0026iacute;a, A. J. Biomaterial-based antimicrobial therapies for the treatment of bacterial infections. \u003cem\u003eNat. Rev. Mater.\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 39\u0026ndash;54 (2021).\u003c/li\u003e\n\u003cli\u003eZhang, X. \u0026amp; Jia, Y. Recent Advances in \u0026beta;-lactam Derivatives as Potential Anticancer Agents. \u003cem\u003eCurr. Top. Med. Chem.\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 1468\u0026ndash;1480 (2020).\u003c/li\u003e\n\u003cli\u003eBelhadj Slimen, I., Najar, T. \u0026amp; Abderrabba, M. Chemical and Antioxidant Properties of Betalains. \u003cem\u003eJ. Agric. Food Chem.\u003c/em\u003e \u003cstrong\u003e65\u003c/strong\u003e, 675\u0026ndash;689 (2017).\u003c/li\u003e\n\u003cli\u003eGeller-McGrath, D. \u003cem\u003eet al.\u003c/em\u003e Diverse secondary metabolites are expressed in particle-associated and free-living microorganisms of the permanently anoxic Cariaco Basin. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 656 (2023).\u003c/li\u003e\n\u003cli\u003eChevrette, M. G. \u003cem\u003eet al.\u003c/em\u003e Microbiome composition modulates secondary metabolism in a multispecies bacterial community. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e \u003cstrong\u003e119\u003c/strong\u003e, e2212930119 (2022).\u003c/li\u003e\n\u003cli\u003eBlin, K. \u003cem\u003eet al.\u003c/em\u003e antiSMASH 7.0: new and improved predictions for detection, regulation, chemical structures and visualisation. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e51\u003c/strong\u003e, W46\u0026ndash;W50 (2023).\u003c/li\u003e\n\u003cli\u003eYuan, Y. \u003cem\u003eet al.\u003c/em\u003e Efficient exploration of terpenoid biosynthetic gene clusters in filamentous fungi. \u003cem\u003eNat. Catal.\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 277\u0026ndash;287 (2022).\u003c/li\u003e\n\u003cli\u003ePaoli, L. \u003cem\u003eet al.\u003c/em\u003e Biosynthetic potential of the global ocean microbiome. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e607\u003c/strong\u003e, 111\u0026ndash;118 (2022).\u003c/li\u003e\n\u003cli\u003eZhang, J. \u003cem\u003eet al.\u003c/em\u003e Microbial enzymes induce colitis by reactivating triclosan in the mouse gastrointestinal tract. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 136 (2022).\u003c/li\u003e\n\u003cli\u003eCui, Y. \u003cem\u003eet al.\u003c/em\u003e Development of a versatile and efficient C\u0026ndash;N lyase platform for asymmetric hydroamination via computational enzyme redesign. \u003cem\u003eNat. Catal.\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, 364\u0026ndash;373 (2021).\u003c/li\u003e\n\u003cli\u003eDeng, W., Xi, D., Mao, H. \u0026amp; Wanapat, M. The use of molecular techniques based on ribosomal RNA and DNA for rumen microbial ecosystem studies: a review. \u003cem\u003eMol. Biol. Rep.\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, 265\u0026ndash;274 (2008).\u003c/li\u003e\n\u003cli\u003eStewart, R. D. \u003cem\u003eet al.\u003c/em\u003e Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. \u003cem\u003eNat. Biotechnol.\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 953\u0026ndash;961 (2019).\u003c/li\u003e\n\u003cli\u003eHungate1000 project collaborators \u003cem\u003eet al.\u003c/em\u003e Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection. \u003cem\u003eNat. Biotechnol.\u003c/em\u003e \u003cstrong\u003e36\u003c/strong\u003e, 359\u0026ndash;367 (2018).\u003c/li\u003e\n\u003cli\u003eZaidi, S. \u003cem\u003eet al.\u003c/em\u003e Pre-training via Denoising for Molecular Property Prediction. Preprint at http://arxiv.org/abs/2206.00133 (2022).\u003c/li\u003e\n\u003cli\u003eWang, Y., Wang, J., Cao, Z. \u0026amp; Barati Farimani, A. Molecular contrastive learning of representations via graph neural networks. \u003cem\u003eNature Machine Intelligence\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, 279\u0026ndash;287 (2022).\u003c/li\u003e\n\u003cli\u003eTeseo, S. \u003cem\u003eet al.\u003c/em\u003e A global phylogenomic and metabolic reconstruction of the large intestine bacterial community of domesticated cattle. \u003cem\u003eMicrobiome\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 155 (2022).\u003c/li\u003e\n\u003cli\u003eKautsar, S. A., Blin, K., Shaw, S., Weber, T. \u0026amp; Medema, M. H. BiG-FAM: the biosynthetic gene cluster families database. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e49\u003c/strong\u003e, D490\u0026ndash;D497 (2021).\u003c/li\u003e\n\u003cli\u003eDu, R., Xiong, W., Xu, L., Xu, Y. \u0026amp; Wu, Q. Metagenomics reveals the habitat specificity of biosynthetic potential of secondary metabolites in global food fermentations. \u003cem\u003eMicrobiome\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 115 (2023).\u003c/li\u003e\n\u003cli\u003eCarberry, Ci. A., Waters, S. M., Kenny, D. A. \u0026amp; Creevey, C. J. Rumen Methanogenic Genotypes Differ in Abundance According to Host Residual Feed Intake Phenotype and Diet Type. \u003cem\u003eAppl. Environ. Microbiol.\u003c/em\u003e \u003cstrong\u003e80\u003c/strong\u003e, 586\u0026ndash;594 (2014).\u003c/li\u003e\n\u003cli\u003eMoss, A. R., Jouany, J.-P. \u0026amp; Newbold, J. Methane production by ruminants:its contribution to global warming. \u003cem\u003eAnnales de Zootechnie\u003c/em\u003e \u003cstrong\u003e49\u003c/strong\u003e, 231\u0026ndash;253 (2000).\u003c/li\u003e\n\u003cli\u003eWong, F. \u003cem\u003eet al.\u003c/em\u003e Discovery of a structural class of antibiotics with explainable deep learning. \u003cem\u003eNature\u003c/em\u003e (2023) doi:10.1038/s41586-023-06887-8.\u003c/li\u003e\n\u003cli\u003eStokes, J. M. \u003cem\u003eet al.\u003c/em\u003e A Deep Learning Approach to Antibiotic Discovery. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e180\u003c/strong\u003e, 688-702.e13 (2020).\u003c/li\u003e\n\u003cli\u003eDeng, W., Chen, Y., Sun, X. \u0026amp; Wang, L. AODB: A comprehensive database for antioxidants including small molecules, peptides and proteins. \u003cem\u003eFood Chem.\u003c/em\u003e \u003cstrong\u003e418\u003c/strong\u003e, 135992 (2023).\u003c/li\u003e\n\u003cli\u003eDimise, E. J., Widboom, P. F. \u0026amp; Bruner, S. D. Structure elucidation and biosynthesis of fuscachelins, peptide siderophores from the moderate thermophile \u003cem\u003eThermobifida fusca\u003c/em\u003e. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e \u003cstrong\u003e105\u003c/strong\u003e, 15311\u0026ndash;15316 (2008).\u003c/li\u003e\n\u003cli\u003eDumitrescu, D. G. \u003cem\u003eet al.\u003c/em\u003e A microbial transporter of the dietary antioxidant ergothioneine. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e185\u003c/strong\u003e, 4526-4540.e18 (2022).\u003c/li\u003e\n\u003cli\u003eGu, F. \u003cem\u003eet al.\u003c/em\u003e The hindgut microbiome contributes to host oxidative stress in postpartum dairy cows by affecting glutathione synthesis process. \u003cem\u003eMicrobiome\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 87 (2023).\u003c/li\u003e\n\u003cli\u003eSies, H. \u003cem\u003eet al.\u003c/em\u003e Defining roles of specific reactive oxygen species (ROS) in cell biology and physiology. \u003cem\u003eNat. Rev. Mol. Cell Biol.\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 499\u0026ndash;515 (2022).\u003c/li\u003e\n\u003cli\u003eWinterbourn, C. C., Kettle, A. J. \u0026amp; Hampton, M. B. Reactive Oxygen Species and Neutrophil Function. \u003cem\u003eAnnu. Rev. Biochem.\u003c/em\u003e \u003cstrong\u003e85\u003c/strong\u003e, 765\u0026ndash;792 (2016).\u003c/li\u003e\n\u003cli\u003eVan De Sande, B. \u003cem\u003eet al.\u003c/em\u003e A scalable SCENIC workflow for single-cell gene regulatory network analysis. \u003cem\u003eNat. Protoc.\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 2247\u0026ndash;2276 (2020).\u003c/li\u003e\n\u003cli\u003ePisoschi, A. M. \u0026amp; Pop, A. The role of antioxidants in the chemistry of oxidative stress: A review. \u003cem\u003eEur. J. Med. Chem.\u003c/em\u003e \u003cstrong\u003e97\u003c/strong\u003e, 55\u0026ndash;74 (2015).\u003c/li\u003e\n\u003cli\u003eSudhakar, P. \u003cem\u003eet al.\u003c/em\u003e Targeted interplay between bacterial pathogens and host autophagy. \u003cem\u003eAutophagy\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 1620\u0026ndash;1633 (2019).\u003c/li\u003e\n\u003cli\u003eChen, L. \u003cem\u003eet al.\u003c/em\u003e Sequence-based drug design as a concept in computational drug design. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 4217 (2023).\u003c/li\u003e\n\u003cli\u003eXue, M.-Y. \u003cem\u003eet al.\u003c/em\u003e Investigation of fiber utilization in the rumen of dairy cows based on metagenome-assembled genomes and single-cell RNA sequencing. \u003cem\u003eMicrobiome\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 11 (2022).\u003c/li\u003e\n\u003cli\u003eLin, L. \u003cem\u003eet al.\u003c/em\u003e Genome-centric investigation of bile acid metabolizing microbiota of dairy cows and associated diet-induced functional implications. \u003cem\u003eThe ISME journal\u003c/em\u003e (2022) doi:10.1038/s41396-022-01333-5.\u003c/li\u003e\n\u003cli\u003eHuang, B. \u003cem\u003eet al.\u003c/em\u003e Filling gaps of genome scaffolds via probabilistic searching optical maps against assembly graph. \u003cem\u003eBMC Bioinf.\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 533 (2021).\u003c/li\u003e\n\u003cli\u003eAnderson, C. L. \u0026amp; Fernando, S. C. Insights into rumen microbial biosynthetic gene cluster diversity through genome-resolved metagenomics. \u003cem\u003eCommunications Biology\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, 818 (2021).\u003c/li\u003e\n\u003cli\u003eXue, M.-Y., Sun, H.-Z., Wu, X.-H., Liu, J.-X. \u0026amp; Guan, L. L. Multi-omics reveals that the rumen microbiome and its metabolome together with the host metabolome contribute to individualized dairy cow performance. \u003cem\u003eMicrobiome\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 64 (2020).\u003c/li\u003e\n\u003cli\u003eLiu, K. \u003cem\u003eet al.\u003c/em\u003e Ruminal microbiota\u0026ndash;host interaction and its effect on nutrient metabolism. \u003cem\u003eAnimal Nutrition\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 49\u0026ndash;55 (2021).\u003c/li\u003e\n\u003cli\u003ePerry, E. K., Meirelles, L. A. \u0026amp; Newman, D. K. From the soil to the clinic: the impact of microbial secondary metabolites on antibiotic tolerance and resistance. \u003cem\u003eNat. Rev. Microbiol.\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 129\u0026ndash;142 (2022).\u003c/li\u003e\n\u003cli\u003eMahanta, N., Szantai-Kis, D. M., Petersson, E. J. \u0026amp; Mitchell, D. A. Biosynthesis and Chemical Applications of Thioamides. \u003cem\u003eACS Chem. Biol.\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 142\u0026ndash;163 (2019).\u003c/li\u003e\n\u003cli\u003eEyles, T. H., Vior, N. M., Lacret, R. \u0026amp; Truman, A. W. Understanding thioamitide biosynthesis using pathway engineering and untargeted metabolomics. \u003cem\u003eChem. Sci.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 7138\u0026ndash;7150 (2021).\u003c/li\u003e\n\u003cli\u003eChan, D. C. K. \u0026amp; Burrows, L. L. Thiopeptides: antibiotics with unique chemical structures and diverse biological activities. \u003cem\u003eJ. Antibiot.\u003c/em\u003e \u003cstrong\u003e74\u003c/strong\u003e, 161\u0026ndash;175 (2021).\u003c/li\u003e\n\u003cli\u003eChernov\u0026rsquo;yants, M. S., Kolesnikova, T. S. \u0026amp; Karginova, A. O. Thioamides as radical scavenging compounds: Methods for screening antioxidant activity and detection. \u003cem\u003eTalanta\u003c/em\u003e \u003cstrong\u003e149\u003c/strong\u003e, 319\u0026ndash;325 (2016).\u003c/li\u003e\n\u003cli\u003eLi, Q. S. \u003cem\u003eet al.\u003c/em\u003e Dietary selection of metabolically distinct microorganisms drives hydrogen metabolism in ruminants. \u003cem\u003eThe ISME Journal\u003c/em\u003e (2022) doi:10.1038/s41396-022-01294-9.\u003c/li\u003e\n\u003cli\u003eUchiyama, J., Akiyama, M., Hase, K., Kumagai, Y. \u0026amp; Kim, Y.-G. Gut microbiota reinforce host antioxidant capacity via the generation of reactive sulfur species. \u003cem\u003eCell Rep.\u003c/em\u003e \u003cstrong\u003e38\u003c/strong\u003e, 110479 (2022).\u003c/li\u003e\n\u003cli\u003eZhang, W. \u003cem\u003eet al.\u003c/em\u003e Intracellular GSH/GST antioxidants system change as an earlier biomarker for toxicity evaluation of iron oxide nanoparticles. \u003cem\u003eNanoImpact\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 100338 (2021).\u003c/li\u003e\n\u003cli\u003eGiustarini, D. \u003cem\u003eet al.\u003c/em\u003e Assessment of glutathione/glutathione disulphide ratio and S-glutathionylated proteins in human blood, solid tissues, and cultured cells. \u003cem\u003eFree Radic. Biol. Med.\u003c/em\u003e \u003cstrong\u003e112\u003c/strong\u003e, 360\u0026ndash;375 (2017).\u003c/li\u003e\n\u003cli\u003eFlasinski, S. \u0026amp; Cassidy, B. G. Potyvirus aphid transmission requires helper component and homologous coat protein for maximal efficiency. \u003cem\u003eArch. Virol.\u003c/em\u003e \u003cstrong\u003e143\u003c/strong\u003e, 2159\u0026ndash;2172 (1998).\u003c/li\u003e\n\u003cli\u003eGonzales, K. A. U. \u0026amp; Fuchs, E. Skin and Its Regenerative Powers: An Alliance between Stem Cells and Their Niche. \u003cem\u003eDevelopmental Cell\u003c/em\u003e \u003cstrong\u003e43\u003c/strong\u003e, 387\u0026ndash;401 (2017).\u003c/li\u003e\n\u003cli\u003eZhang, K. \u003cem\u003eet al.\u003c/em\u003e Early concentrate starter introduction induces rumen epithelial parakeratosis by blocking keratinocyte differentiation with excessive ruminal butyrate accumulation. \u003cem\u003eJ. Adv. Res.\u003c/em\u003e S2090123223004010 (2023) doi:10.1016/j.jare.2023.12.016.\u003c/li\u003e\n\u003cli\u003eBeckett, L. \u003cem\u003eet al.\u003c/em\u003e Rumen volatile fatty acid molar proportions, rumen epithelial gene expression, and blood metabolite concentration responses to ruminally degradable starch and fiber supplies. \u003cem\u003eJ. Dairy Sci.\u003c/em\u003e \u003cstrong\u003e104\u003c/strong\u003e, 8857\u0026ndash;8869 (2021).\u003c/li\u003e\n\u003cli\u003eKim, S. \u003cem\u003eet al.\u003c/em\u003e ESRRA (estrogen related receptor alpha) is a critical regulator of intestinal homeostasis through activation of autophagic flux via gut microbiota. \u003cem\u003eAutophagy\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 2856\u0026ndash;2875 (2021).\u003c/li\u003e\n\u003cli\u003eCipolletta, D. \u003cem\u003eet al.\u003c/em\u003e PPAR-\u0026gamma; is a major driver of the accumulation and phenotype of adipose tissue Treg cells. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e486\u003c/strong\u003e, 549\u0026ndash;553 (2012).\u003c/li\u003e\n\u003cli\u003eLiu, C. \u003cem\u003eet al.\u003c/em\u003e Pparg promotes differentiation and regulates mitochondrial gene expression in bladder epithelial cells. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 4589 (2019).\u003c/li\u003e\n\u003cli\u003eWu, J.-J. \u003cem\u003eet al.\u003c/em\u003e Cross-tissue single-cell transcriptomic landscape reveals the key cell subtypes and their potential roles in the nutrient absorption and metabolism in dairy cattle. \u003cem\u003eJ. Adv. Res.\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 1\u0026ndash;18 (2022).\u003c/li\u003e\n\u003cli\u003eSadybekov, A. A. \u003cem\u003eet al.\u003c/em\u003e Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e601\u003c/strong\u003e, 452\u0026ndash;459 (2022).\u003c/li\u003e\n\u003cli\u003eLyu, J. \u003cem\u003eet al.\u003c/em\u003e Ultra-large library docking for discovering new chemotypes. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e566\u003c/strong\u003e, 224\u0026ndash;229 (2019).\u003c/li\u003e\n\u003cli\u003eGorgulla, C. \u003cem\u003eet al.\u003c/em\u003e An open-source drug discovery platform enables ultra-large virtual screens. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e580\u003c/strong\u003e, 663\u0026ndash;668 (2020).\u003c/li\u003e\n\u003cli\u003eMichellod, D. \u0026amp; Liebeke, M. Host\u0026ndash;microbe metabolic dialogue. \u003cem\u003eNature Microbiology\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 318\u0026ndash;319 (2024).\u003c/li\u003e\n\u003cli\u003eZhang, Y., Chen, R., Zhang, D., Qi, S. \u0026amp; Liu, Y. Metabolite interactions between host and microbiota during health and disease: Which feeds the other? \u003cem\u003eBiomed. Pharmacother.\u003c/em\u003e \u003cstrong\u003e160\u003c/strong\u003e, 114295 (2023).\u003c/li\u003e\n\u003cli\u003eHuang, H. \u003cem\u003eet al.\u003c/em\u003e Simultaneous Enrichment of Cysteine-containing Peptides and Phosphopeptides Using a Cysteine-specific Phosphonate Adaptable Tag (CysPAT) in Combination with titanium dioxide (TiO2) Chromatography. \u003cem\u003eMol. Cell. Proteomics\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 3282\u0026ndash;3296 (2016).\u003c/li\u003e\n\u003cli\u003eJones, D. P. Radical-free biology of oxidative stress. \u003cem\u003eAmerican Journal of Physiology-Cell Physiology\u003c/em\u003e \u003cstrong\u003e295\u003c/strong\u003e, C849\u0026ndash;C868 (2008).\u003c/li\u003e\n\u003cli\u003eGiron, P., Dayon, L. \u0026amp; Sanchez, J. Cysteine tagging for MS‐based proteomics. \u003cem\u003eMass Spectrom. Rev.\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 366\u0026ndash;395 (2011).\u003c/li\u003e\n\u003cli\u003eBolger, A. M., Lohse, M. \u0026amp; Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 2114\u0026ndash;2120 (2014).\u003c/li\u003e\n\u003cli\u003eZimin, A. V. \u003cem\u003eet al.\u003c/em\u003e Hybrid assembly of the large and highly repetitive genome of \u003cem\u003eAegilops tauschii\u003c/em\u003e , a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. \u003cem\u003eGenome Res.\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 787\u0026ndash;792 (2017).\u003c/li\u003e\n\u003cli\u003eNurk, S., Meleshko, D., Korobeynikov, A. \u0026amp; Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. \u003cem\u003eGenome Res.\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 824\u0026ndash;834 (2017).\u003c/li\u003e\n\u003cli\u003eBertrand, D. \u003cem\u003eet al.\u003c/em\u003e Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. \u003cem\u003eNat. Biotechnol.\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 937\u0026ndash;944 (2019).\u003c/li\u003e\n\u003cli\u003eOlm, M. R., Brown, C. T., Brooks, B. \u0026amp; Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. \u003cem\u003eThe ISME Journal\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 2864\u0026ndash;2868 (2017).\u003c/li\u003e\n\u003cli\u003eParks, D. H. \u003cem\u003eet al.\u003c/em\u003e A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. \u003cem\u003eNat. Biotechnol.\u003c/em\u003e \u003cstrong\u003e36\u003c/strong\u003e, 996\u0026ndash;1004 (2018).\u003c/li\u003e\n\u003cli\u003eLi, H. Minimap2: pairwise alignment for nucleotide sequences. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e34\u003c/strong\u003e, 3094\u0026ndash;3100 (2018).\u003c/li\u003e\n\u003cli\u003eBoetzer, M. \u0026amp; Pirovano, W. SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. \u003cem\u003eBMC Bioinf.\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 211 (2014).\u003c/li\u003e\n\u003cli\u003eBlin, K. \u003cem\u003eet al.\u003c/em\u003e antiSMASH 6.0: improving cluster detection and comparison capabilities. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e49\u003c/strong\u003e, W29\u0026ndash;W35 (2021).\u003c/li\u003e\n\u003cli\u003eKautsar, S. A., van der Hooft, J. J. J., de Ridder, D. \u0026amp; Medema, M. H. BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters. \u003cem\u003eGigaScience\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, giaa154 (2021).\u003c/li\u003e\n\u003cli\u003eNavarro-Mu\u0026ntilde;oz, J. C. \u003cem\u003eet al.\u003c/em\u003e A computational framework to explore large-scale biosynthetic diversity. \u003cem\u003eNat. Chem. Biol.\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 60\u0026ndash;68 (2020).\u003c/li\u003e\n\u003cli\u003evan Santen, J. A. \u003cem\u003eet al.\u003c/em\u003e The Natural Products Atlas 2.0: a database of microbially-derived natural products. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, D1317\u0026ndash;D1323 (2022).\u003c/li\u003e\n\u003cli\u003eLyu, C. \u003cem\u003eet al.\u003c/em\u003e CMNPD: a comprehensive marine natural products database towards facilitating drug discovery from the ocean. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e49\u003c/strong\u003e, D509\u0026ndash;D515 (2021).\u003c/li\u003e\n\u003cli\u003eGaulton, A. \u003cem\u003eet al.\u003c/em\u003e ChEMBL: a large-scale bioactivity database for drug discovery. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e40\u003c/strong\u003e, D1100\u0026ndash;D1107 (2012).\u003c/li\u003e\n\u003cli\u003eKim, S. \u003cem\u003eet al.\u003c/em\u003e PubChem Substance and Compound databases. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e44\u003c/strong\u003e, D1202\u0026ndash;D1213 (2016).\u003c/li\u003e\n\u003cli\u003eHao, Y. \u003cem\u003eet al.\u003c/em\u003e Integrated analysis of multimodal single-cell data. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e184\u003c/strong\u003e, 3573-3587.e29 (2021).\u003c/li\u003e\n\u003cli\u003eMcGinnis, C. S., Murrow, L. M. \u0026amp; Gartner, Z. J. DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. \u003cem\u003eCell Systems\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 329-337.e4 (2019).\u003c/li\u003e\n\u003cli\u003eKorsunsky, I. \u003cem\u003eet al.\u003c/em\u003e Fast, sensitive and accurate integration of single-cell data with Harmony. \u003cem\u003eNat. Methods\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 1289\u0026ndash;1296 (2019).\u003c/li\u003e\n\u003cli\u003eWolf, F. A., Angerer, P. \u0026amp; Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. \u003cem\u003eGenome Biol.\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 15 (2018).\u003c/li\u003e\n\u003cli\u003eHan, L. \u003cem\u003eet al.\u003c/em\u003e Cell transcriptomic atlas of the non-human primate Macaca fascicularis. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e604\u003c/strong\u003e, 723\u0026ndash;731 (2022).\u003c/li\u003e\n\u003cli\u003eWagner, A. \u003cem\u003eet al.\u003c/em\u003e Metabolic modeling of single Th17 cells reveals regulators of autoimmunity. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e184\u003c/strong\u003e, 4168-4185.e21 (2021).\u003c/li\u003e\n\u003cli\u003eMorabito, S., Reese, F., Rahimzadeh, N., Miyoshi, E. \u0026amp; Swarup, V. hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data. \u003cem\u003eCell Reports Methods\u003c/em\u003e \u003cstrong\u003e3\u003c/strong\u003e, 100498 (2023).\u003c/li\u003e\n\u003cli\u003eMa, Y. \u003cem\u003eet al.\u003c/em\u003e Polygenic regression uncovers trait-relevant cellular contexts through pathway activation transformation of single-cell RNA sequencing data. \u003cem\u003eCell Genomics\u003c/em\u003e \u003cstrong\u003e3\u003c/strong\u003e, 100383 (2023).\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Supplementary Tables","content":"\u003cp\u003eSupplementary Tables 1-14 are not available with this version.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Biosynthetic gene clusters database, Microbial secondary metabolites, Antioxidants prediction, Cattle single-cell atlas, Host-microbiota interaction","lastPublishedDoi":"10.21203/rs.3.rs-4193125/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4193125/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eOxidative stress is a pivotal trigger of immune responses and cellular dysfunction. The ruminant gastrointestinal tract (GIT) with complex microbial community demonstrated strong metabolic capabilities and close relationships with host oxidative stress. However, microbial antioxidant secondary metabolites in the GIT and their interactions with the host are still under-studied. Here, based on metagenome assembled genomes (MAGs) resources, deep learning, single-cell RNA-sequencing, and large number of protein-metabolites interactions inferring, we discovered the antioxidants from the microbial secondary metabolites and deciphered their potential interactions with the GIT epithelial cells. Totally 26,503 biosynthetic gene clusters (BGCs, 8,672 novel ones) were identified from 14,093 non-redundant MAGs distributed in 10 segments of cattle GIT. From the 436 BGCs\u0026rsquo; products, totally 396 secondary metabolites were predicted into 5 categories of antioxidants using a custom-trained deep learning tool. The GIT epithelial cells showed higher expression of antioxidant genes among 1,006 clusters (belong to 126 cell types) of 51 tissues in cattle, especially the spinous cells and basal cells in the forestomach. Moreover, using metabolite-protein interaction inference, we predicted over 6\u0026nbsp;million pairs of interactive scores between 396 secondary metabolites and 14,976 marker proteins in the GIT cell types. Significant interactive scores between Cys-Cys-Cys and marker proteins participating in antioxidative metabolism such as CYC1, MGST1, GSTA1 in rumen and omasum spinous cells were highlighted. Our study presented a comprehensive computational framework for exploring natural antioxidants from MAGs, revealed the potential antioxidants from cattle GITs microbiota, and inferred their potential interactions with host GIT cell types, which will provide novel insights into the under-investigated antioxidant potential of cattle GIT microbiota and reshaping our comprehension of the symbiotic interplay between the gut microbiota and host antioxidant defense mechanisms.\u003c/p\u003e","manuscriptTitle":"Microbial Antioxidants and Their Interactions with Gastrointestinal Tract Epithelial Cells in the Cattle","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-04-17 17:17:55","doi":"10.21203/rs.3.rs-4193125/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"b0c2edce-3801-4a76-8bec-264be2cc3de5","owner":[],"postedDate":"April 17th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":30759880,"name":"Biological sciences/Chemical biology/Biosynthesis/Oxidoreductases"},{"id":30759881,"name":"Biological sciences/Microbiology/Microbial communities/Metagenomics"},{"id":30759882,"name":"Biological sciences/Microbiology/Microbial communities/Microbiome"}],"tags":[],"updatedAt":"2024-06-25T15:30:55+00:00","versionOfRecord":[],"versionCreatedAt":"2024-04-17 17:17:55","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4193125","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4193125","identity":"rs-4193125","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.