{"paper_id":"254fdb75-4fab-4a96-b652-e7a3b0a3b8ae","body_text":"Automated genome mining predicts structural diversity and taxonomic distribution of \npeptide metallophores across bacteria \n \nZachary L. Reitz* 1,2, Bita Pourmohsenin* 3, Melanie Susman 4, Emil Thomsen 4, Daniel Roth 4, \nAlison Butler4, Nadine Ziemert# 3, Marnix H. Medema# 1 \n \n1. Bioinformatics Group, Wageningen University, 6708 PB Wageningen, the Netherlands \n2. Department of Evolution, Ecology, and Marine Biology, University of California, Santa \nBarbara, CA 93117, USA \n3. Interfaculty Institute of Microbiology and Infection Medicine, Institute for Bioinformatics and \nMedical Informatics, Tübingen, Germany \n4. Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA \n93117, USA \n \n* Contributed equally \n# Corresponding authors \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nAbstract  \nMicrobial competition for trace metals shapes their communities and interactions with humans \nand plants. Many bacteria scavenge trace metals with metallophores, small molecules that \nchelate environmental metal ions. Metallophore production may be predicted by genome \nmining, where genomes are scanned for homologs of known biosynthetic gene clusters (BGCs). \nHowever, accurately detecting non-ribosomal peptide (NRP) metallophore biosynthesis requires \nexpert manual inspection, stymieing large-scale investigations. Here, we introduce automated \nidentification of NRP metallophore BGCs through a comprehensive algorithm, implemented in \nantiSMASH, that detects chelator biosynthesis genes with 97% precision and 78% recall against \nmanual curation. We showcase the utility of the detection algorithm by experimentally \ncharacterizing metallophores from several taxa. High-throughput NRP metallophore BGC \ndetection enabled metallophore detection across 69,929 genomes spanning the bacterial \nkingdom. We predict that 25% of all bacterial non-ribosomal peptide synthetases encode \nmetallophore production and that significant chemical diversity remains undiscovered. A \nreconstructed evolutionary history of NRP metallophores supports that some chelating groups \nmay predate the Great Oxygenation Event. The inclusion of NRP metallophore detection in \nantiSMASH will aid non-expert researchers and continue to facilitate large-scale investigations \ninto metallophore biology. \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nIntroduction \nAcross environments, microbes compete for a scarce pool of trace metals. Many microbes \nscavenge metal ions with small-molecule chelators called metallophores, which diffuse through \nthe environment and chelate metal ions with high affinity. 1,2 A microbe possessing the right \nmembrane transporters will be able to recognize and import a metallophore–metal complex, \nwhile other strains are unable to access the chelated metal ions. Thus, the metallophore \nexcreted by one microbe can either support or inhibit growth of a neighboring strain, driving \ncomplex community dynamics in marine, freshwater, soil, and host environments. 3 The most \nwell studied metallophores are the Fe(III)-binding siderophores, which have found applications \nin biocontrol, bioremediation, and medicine. 4 Two recent studies demonstrated that the disease \nsuppression ability of a rhizosphere microbiome is strongly determined by whether or not the \npathogen can use siderophores produced by the community; a microbiome can even encourage \npathogen growth when a compatible siderophore is produced. 5,6 Compared to siderophores, \nother metallophore classes are relatively understudied, but they likely play equally important \nbiological roles, as exemplified by recent reports of both commensal and pathogenic bacteria \nrelying on zincophores to effectively colonize human hosts.7,8  \nHundreds of unique metallophore structures have been characterized, each with specific \nchemical properties (e.g., effective pH range, hydrophobicity, and metal selectivity) and \nbiological effects on other microbes (based on membrane transporter compatibility). \nExperimentally characterizing metallophores can be time-consuming and costly, and thus \nresearchers often use genome mining to predict metallophore production in silico.9 Taxonomy \nalone is not sufficient to predict what metallophores will be produced by a microbe, as \nproduction can vary significantly even within a single species. 10 Instead, metallophores must be \npredicted from each genome based on the presence of biosynthetic gene clusters (BGCs) that \nencode their biosynthesis. The majority of known metallophores are non-ribosomal peptides \n(NRPs), a broad class of natural products that also includes many antibiotics, antitumor \ncompounds, and toxins. Specialized chelating moieties bind directly to the metal ion (in the case \nof siderophores, Fe 3+), while other amino acids in the peptide chain give the metallophore the \nrequired flexibility for chelation. Nearly all NRP metallophores contain one or more of the \nsubstructures shown in Fig.\n 1A: 2,3-dihydroxybenzoate (catechol, 2,3-DHB), hydroxamates, \nsalicylate, β-hydroxyaspartate (β-OHAsp), β-hydroxyhistidine (β-OHHis), graminine, Dmaq \n(1,1-dimethyl-3-amino-1,2,3,4- tetrahydro-7,8-dihydroxy-quinoline), and the pyoverdine \nchromophore. Biosynthetic pathways are known for each of the chelating groups (Fig.  1B), and \nthe presence of a chelator pathway may be used as a marker for metallophore production. \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nMining genomes for metallophore BGCs has facilitated the discovery of chemically and \nbiologically diverse metallophore systems; however, automated detection tools are still severely \nlacking.9 The peptidic backbones of NRP metallophores are produced by non-ribosomal peptide \nsynthetases (NRPSs), large multi-domain enzymes that activate and condense amino acids and \nother substrates in an assembly-line manner. 11 In the past two decades, a variety of \nbioinformatic tools have been developed to identify NRPS BGCs in a genome. One of the most \npopular is the secondary metabolite prediction platform antiSMASH, which uses a library of \nprofile hidden Markov models (pHMMs) to identify (combinations of) enzyme-coding genes that \nare indicative of certain classes of specialized metabolite biosynthetic pathways. 12,13 For \nexample, antiSMASH identifies an NRPS BGC region by the minimum requirement of a gene \ncontaining at least one condensation and one adenylation domain. NRP metallophore BGCs are \ntechnically detected by this rule as well; however, NRPSs also produce many other families of \ncompounds, and additional manual annotation has still been required to identify NRP \nmetallophore BGCs specifically. Accordingly, accurate prediction of BGCs encoding \nsiderophores and other metallophores was limited to experts in natural product biosynthesis, \nand even experts cannot manually curate the thousands of BGCs produced by high-throughput \nmetagenomic or comparative genomic analyses. To date, no global analysis of NRP \nmetallophores has been performed, and thus the prevalence, combinatorics, and taxonomic \ndistribution of different chelating groups are unknown. \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\n \nFigure 1. Chelating substructures found in bacterial NRP metallophores and their biosynthetic \npathways. (A) Representative NRP metallophore structures. Nearly all known NRP \nmetallophores contain one or more of the eight labeled chelating groups. Most chelating groups \nprovide bidentate metal chelation, as shown for ferric pyoverdine L48. (B) Chelator biosynthesis \npathways that form the basis for the new antiSMASH detection algorithm, as described in the \ntext. The same chelator colors are used in each figure. \n \nHere, we describe the development and application of a high-accuracy \nantiSMASH-integrated method for the automated detection of NRP metallophore BGCs, using \nthe presence of chelator biosynthesis genes within NRPS BGCs as key markers for predicting \nmetallophore production. The new detection rules were applied to 15,562 representative \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nbacterial genomes, allowing us to take the first census of NRP metallophore production across \nbacteria. At least 25% of all NRPS clusters in these representative genomes code for the \nproduction of metallophores and significant biosynthetic diversity remains undiscovered. We \nthen leveraged our computational analyses to guide characterization of siderophores from \nmultiple bacterial taxa, finding structures that matched our genome-based predictions. By \nmapping NRP metallophore BGCs from 59,851 genomes to the Genome Taxonomic Database \n(GTDB) phylogeny, we identified myxobacterial and cyanobacterial metallophores as \nunderstudied and reconstructed a possible evolutionary history of the chelating groups. \n \nResults  \nA chelator-based strategy for detection of NRP metallophore biosynthetic gene clusters \nThe specialized chelating moieties found in NRP metallophores are rarely found in other natural \nproducts, and thus we sought to automate metallophore BGC prediction by searching for genes \nencoding their biosynthesis. An extensive review of published NRP metallophore structures \nrevealed that nearly all contain one or more of just eight chelator substructures (Fig.  1A). \nProtein domains responsible for their biosyntheses have been reported (Fig.  1B), and thus \npHMMs could be constructed to allow detection of putative chelator biosynthesis genes. \nGenerally, draft pHMMs were built from alignments of known and predicted NRP metallophore \nbiosynthesis genes collected from literature, and cutoffs were manually determined (see \nSupplemental Discussion 1). The final multiple sequence alignments, pHMMs, and cutoffs are \nprovided in the Supplemental Dataset.  \nA full description of each biosynthetic pathway detection strategy, including caveats and \nknown limitations, is provided in Supplemental Discussion 1 and briefly summarized here. The \nprofile HMMs implemented within antiSMASH are given in monospaced bold font. The \nbiosynthetic cassette for 2,3-DHB is detected by an isochorismate synthase (EntC ) and \n2,3-dihydro-2,3- dihydroxybenzoate dehydrogenase (EntA ).14 Two salicylate biosynthesis \npathways are detected by the presence of either an isochorismate pyruvate-lyase (IPL )15 or a \nbifunctional salicylate synthase (SalSyn ).16 We also included detection of two condensation \ndomain subtypes specific to catecholic and phenolic metallophores: VibH-like enzymes \n(VibH)17,18 and tandem heterocyclization domains (Cy_tandem ).19  Peptidic hydroxamate \npathways are detected by an ornithine (Orn) or Lys N-monooxygenase (Orn_monoox  or \nLys_monoox, respectively). 20 We could not accurately detect the vicibactin hydroxylase VbsO \nusing a pHMM, 21 and so the characteristic acyl-hydroxyornithine epimerase VbsL is used to \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\ndetect vicibactin biosynthesis. 21 We previously identified three families of siderophore-specific \nFe(II)/α-ketoglutarate-dependent enzymes responsible for β-OHAsp (TBH_Asp  and IBH_Asp) or \nβ-OHHis (IBH_His).22 Based on the recent discovery of β-OHAsp-containing cyanochelins from \ncyanobacteria,23 we have now identified two new clades that are putatively \nmetallophore-specific and tentatively named CyanoBH_Asp1 and CyanoBH_Asp2. The \ndiazeniumdiolate-containing graminine may be detected by the presence of the cryptic \nnecessary enzymes GrbD and GrbE.24,25 The quinoline chelator Dmaq is detected by FbnL and \nFbnM, which initiate Dmaq biosynthesis. 26 The chromophore of pyoverdines is detected by the \npresence of a tyrosinase PvdP and/or an oxidoreductase PvdO.27,28 \nSeveral known chelating group pathways are not currently detected. Our detection \nstrategy is limited to clades or combinations of biosynthetic enzymes that are distinct to NRP \nmetallophore pathways. Several chelators are synthesized by the core NRPS and/or polyketide \nsynthase (PKS) machinery and could not be detected without also retrieving many false \npositives, including NRPS-derived thiazol(id)ine and oxazol(id)ine heterocycles (see pyochelin, \nFig. 1A) and PKS-derived 5-alkylsalicylate (e.g. in micocacidin 29). We also did not include \ndetection of a pathway currently only reported in fabrubactins that produces two \nα-hydroxycarboxylate chelating moieties (Fig.  1A, bolded atoms). 26 Finally, we have not yet \ndesigned detection rules for the recently discovered chelating groups 5-aminosalicylate of \npseudonochelin30 or 2-napthoate of ecteinamines; 31 however, we expect that their biosyntheses \nwill be amenable to detection by the method used herein (Fig. S1). The NRP metallophore \ndetection algorithm is publicly available in the antiSMASH web server and command line tool \n(https://antismash.secondarymetabolites.org, version 7 and upwards).  \nValidating antiSMASH NRP metallophore detection against manually curated BGCs \nIn order to assess the performance of our NRP metallophore BGC detection strategy, we \nmanually predicted metallophore production among a large set of BGCs. A total of 758 NRPS \nBGC regions from 330 genera were annotated with default antiSMASH v6.1 and inspected for \nknown markers of metallophore production, including genes encoding transporters, iron \nreductases, chelator biosynthesis, and known metallophore NRPS domain motifs. We thus \nmanually classified 176 BGC regions (23%) as metallophore BGCs (Supplemental Table 2). The \nnew antiSMASH detection rules were applied to the same BGC regions, resulting in 145 \nputative metallophore BGC regions (F1 = 0.86; Table\n 1 and Supplemental Table 2). Nine \nmetallophore BGC regions were only detected by antiSMASH. Upon reinvestigation, four were \ndetermined to likely represent genuine metallophore BGC regions missed during manual \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nanalysis, leaving only five putative false positives in which seemingly unrelated genes matched \nthe pHMMs (97% precision). Conversely, a total of 40 metallophore BGC regions could only be \ndetected manually (78% recall). In the majority of false negatives, NRP metallophore BGCs \nwere missed because chelator biosynthesis genes, on which the detection strategy is based, \nwere not present in the cluster. In 21 cases, genes encoding catechol, salicylate, or \nhydroxamate biosynthesis were located elsewhere in the genome. In ten cases, chelator \nbiosynthesis pathways were not found anywhere in the genome; these clusters may be \nnon-functional fragments, rely on exogenous precursors (as seen in equibactin biosynthesis 32), \nor have evolved to use novel chelator biosyntheses. Two of the false negatives encoded the \n5-alkylsalicylate PKS that is currently undetectable, as described above. Finally, seven manually \nassigned NRP metallophore BGC regions had no genes corresponding to known chelator \npathways (Supplemental Table 2); if correctly annotated, they may represent novel structural \nclasses. In one particularly promising case, a salicylate pathway appears to have been replaced \nwith a partial menaquinone pathway to produce a putative 1,4-dihydroxy-2-naphthoate chelating \ngroup (Supplemental discussion 2). \n \n Performance metricsa \nNumber of NRP metallophore BGC regions \ndetected in representative bacterial genomesb \n Precision Recall F1c \nComplete \nNRPS regions \nn=11,704 \nPartial NRPS \nregions \nn=8,403 \nTotal NRPS \nregions \nn=20,107 \nAntiSMASH rules 0.97 0.78 0.86 \n2,485 \n(21%)d \n725 \n(8.6%) \n3,210 \n(16%) \nTransporter genes 0.93 0.56 0.69 \n1,723 \n(15%) \n376 \n(4.5%) \n2,099 \n(10%) \nEither/or ensemble 0.92 0.88 0.90 \n2,948 \n(25%) \n855 \n(10%) \n3,803 \n(19%) \nTable 1. Summary of NRP metallophore BGC detection, comparing the chelator-based rules newly \nimplemented in antiSMASH, the transporter-based method of Crits-Christoph et al.,41 and a combined \neither/or ensemble. a Detection methods were each tested on a set of 758 manually annotated NRPS \nBGC regions (180 true positives). Full results are given in Supplemental Table 2. b Detection methods \nwere applied to 15,562 NCBI RefSeq representative bacterial genomes. The full results are given in \nSupplemental Table 3. A region is “complete” if it is not on a contig edge, as determined by antiSMASH. \nc F1 score is equal to 2×(Precision×Recall)/(Precision+Recall). d Percentages indicate the fraction of \nNRPS regions that were predicted to encode NRP metallophores. \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nAntiSMASH outperforms transporter-based detection, although both techniques are \ncomplementary \nCrits-Christoph et al. found that the presence of transporters could be used to predict \nsiderophore BGCs among other NRPS clusters. 33 Specifically, the Pfam families for \nTonB-dependent receptors, FecCD domains, and periplasmic binding proteins (PF00593, \nPF01032, and PF01497, respectively) were determined to be highly siderophore-specific, and \nthe authors used the presence of two of the three domains to predict a “siderophore-like” BGC \nregion (metallophores that transport other metals were also coded as siderophores in their \ndataset). We used a modified version of antiSMASH to detect the three transporter families \namong the 758 manually annotated NRPS BGC regions (Table\n 1 and Supplemental Table 2). In \ntotal, the transporter-based method detected 108 metallophore clusters (F1 = 0.69), including \neight putative false positives (93% precision), and had 80 false negatives (56% recall). One \nfalse positive was noted in the manual annotation as a likely “cheater”: while several Bordetella \ngenomes encode the synthesis of a putative graminine-containing metallophore, B. petrii   DSM \n12804 has retained only the transporter genes alongside a small fragment of the NRPS. In the \nseven other false positives, BGC regions appeared to coincidentally contain transporter genes \nin their periphery, as they were not conserved in homologous NRPS clusters. In one case, the \ntriggering genes were part of a putative vitamin B12 import and biosynthesis locus. Combining \nthe two methods in an either/or ensemble approach slightly improved overall performance \nversus the antiSMASH rules alone, achieving 92% precision, 88% recall, and an F1 score of \n0.90 (Table 1). \nCharting NRP metallophore biosynthesis across bacteria \nThe implementation of NRP metallophore BGC detection into antiSMASH allowed us to take the \nfirst bacterial census of NRP metallophore biosynthesis. The finalized detection rules were \napplied to 15,562 representative bacterial genomes from NCBI RefSeq (25 June 2022). In total, \n3,264 NRP metallophore BGC regions were detected (Table 1 and Supplemental Table 3), \nincluding 54 Type II (non- or semi-modular 34) NRPS regions that would otherwise not be \ndetected by antiSMASH, such as BGCs for acinetobactin and brucebactin. 35,36 NRP \nmetallophores comprised 16% of all NRPS BGC regions in the genomes. Among complete \nregions (not located on a contig edge), 21% of NRPS BGC regions were classified as NRP \nmetallophores, compared to just 8.6% of incomplete NRPS regions. This is consistent with \nprevious reports that low-quality, fragmented genomes result in low-quality BGC annotations in \nantiSMASH.37 The transporter-based approach predicted siderophore activity for 15% of \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\ncomplete NRPS regions, including 463 BGC regions without detectable chelator genes; when \nthe two methods are combined, over 25% of NRPS BGCs are predicted to produce NRP \nmetallophores (Table  1). Only complete NRP metallophore BGC regions detected by \nantiSMASH were used for downstream analyses. \nFrequency and hybridization of NRP metallophore chelating groups \nComplete NRP metallophore regions from the representative genomes were categorized by the \ntype(s) of chelator biosynthesis genes detected within (Fig.  2). Hydroxamates and catechols \nwere the most common pathways, present in 44% and 36% of BGC regions, respectively. In \ncontrast, β-OHHis, graminine, and Dmaq biosyntheses were rare in representative genomes, \neach present in less than 2% of detected regions. About 20% of regions contained genes for at \nleast two pathways and putatively produce a hybrid metallophore. Only 42 BGC regions (1.7%) \ncontained three different chelating groups: each encoded genes for the pyoverdine \nchromophore, a hydroxamate, and either β-OHAsp or β-OHHis. The proportion of hybrid \nmetallophores is likely higher than estimated here. As described above, some chelating \nmoieties could not be captured by the pHMM-based rules. Furthermore, metallophore \nbiosynthesis may require genes from multiple BGCs. Pyoverdine genes may be located in up to \nfive different loci, 38 and all 56 regions with only the pyoverdine chromophore pathway are \nexpected to produce hybrid siderophores. Representative characterized siderophores that \ncontain the chelator combinations in our dataset are shown in Fig. 3.  \n \n \nFigure 2. An upset plot of chelator frequency among 2,489 complete NRP metallophore BGC \nregions from RefSeq representative genomes. An additional 38 BGC regions were detected by \nmetallophore-specific NRPS domains (VibH-like or tandem Cy) rather than chelator \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nbiosynthesis, and may produce catechol and/or salicylate metallophores using biosyntheses \nencoded elsewhere in the genome.  \nThe most widespread NRP metallophore families have likely been found, yet significant \ndiversity remains unexplored \nDifferent species of bacteria can contain highly similar metallophore BGCs. To gauge the \nbiosynthetic diversity of the putative NRP metallophores (and thereby the structural diversity), \nthe complete BGC regions were organized into a sequence similarity network using \nBiG-SCAPE, which clusters BGCs based on their shared gene content and identity. An \nadditional 75 reported NRP metallophore BGCs were included as reference nodes \n(Supplemental Table 1), and a distance cutoff of 0.5 was chosen to allow highly similar \nreference BGCs to form connected components (gene cluster families; GCFs) in the network. \nThe final network, colored and organized by chelator type, is presented in Fig.\n 3. The majority of \nBGC regions (57%) clustered with the reference BGCs in just 45 GCFs, suggesting that many of \nthe most widespread NRP metallophore families with known chelating groups already have \ncharacterized representatives (Fig.  S4). However, 1093 BGC regions did not cluster with any \nreference BGC, forming 619 separate GCFs in the network (93% of all GCFs). Some of these \nmay encode orphan metallophores previously isolated from unsequenced strains, or be similar \nto known BGCs that were not included in our non-exhaustive literature search. Nevertheless, \nsignificant NRP metallophore structural diversity remains undiscovered, particularly among the \n484 BGC regions distinct enough to form isolated nodes in the network. \n \n \n \n \n \n \nFigure 3 (Next page). BiG-SCAPE similarity network of complete NRP metallophore BGC \nregions from RefSeq representative genomes. Numbered square nodes indicate published \nBGCs, as given in Supplemental Table 1. Select hybrid metallophore BGC nodes are \nhighlighted yellow, and their corresponding structures are shown. Nodes are colored by the \ntype(s) of chelator biosynthesis detected therein. BGC regions colored light gray contain only \nmetallophore-specific NRPS domains (VibH-like or tandem Cy) and may produce catechol \nand/or salicylate metallophores using biosyntheses encoded elsewhere in the genome. The \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nnetwork was constructed in BiG-SCAPE v1.1.2 using 2,596 BGC regions as input, including 78 \nreference BGCs, and a distance cutoff of 0.5.  \n \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nChemical identification of genome-predicted siderophores across taxa \nTo showcase how our large-scale automated genome mining methodology can be used to \neffectively predict functional metallophore biosynthetic pathways across taxa, we characterized \nthe siderophores of three bacterial strains with genomes containing BGCs that were closely \nconnected to reference BGCs in the BiG-SCAPE network (Fig. 3). Two strains belong to genera \nthat have no reported metallophores: Buttiauxella brennerae DSM 9396 was predicted to \nproduce enterobactin (Fig. 4a), and Terasakiispira papahanaumokuakeensis DSM 29361 was \npredicted to produce both marinobactin(s) (Fig. 4a) and enantio-pyochelin (Fig. 1a). The third \nstrain, Pseudomonas brassicacearum DSM 13227, was selected because its genome contains \na BGC that clustered with the histicorrugatin reference BGC. We predicted that the BGC may \nencode the biosynthesis of ornicorrugatin (Fig. 4a), 39 a previously reported siderophore with no \nknown BGC. A fragmented pyoverdine BGC was also present in the strain’s genome, which was \npredicted to produce the known siderophore pyoverdine A214 (Fig. 4a).40,41  \nEach strain was grown in low-iron conditions to induce siderophore production, then \norganic compounds were extracted from the culture supernatants using adsorbant resin prior to \nspectral analyses by electrospray ionization mass spectrometry (ESI-MS) and ESI-MS/MS; full \ndetails are provided in the Supplemental Methods and Results. From B. brennerae, we \nidentified four catecholic compounds (Fig. 4B): the predicted enterobactin (Fig. 4A), as well as \nthe enterobactin fragments 2,3-DHB–Ser (DHBS), (DHBS) 2 and linear (DHBS) 3. The crude \nextract of T. papahanaumokuakeensis indeed contained molecular ions consistent with \nmarinobactins A-E (Fig. 4A and C).  Tandem ESI-MS/MS yielded expected fragmentation \npatterns for marinobactins A-D, while the peak at m/z 988.5421, putatively marinobactin E, was \nlow abundance and did not give a clear spectrum.  No peaks consistent with enantio-pyochelin \n(m/z 324.4; Fig. 1a) could be observed. From P. brassicacearum , we identified both \nsiderophores predicted from the BGC analyses: ornicorrugatin and pyoverdine A214 (Fig. 4A \nand D). Fragmentation patterns closely matched those previously reported.39,40  \nThus, our method was able to successfully identify the putative BGC for the orphan \nsiderophore ornicorrugatin and also correctly predict the potential to produce known \nsiderophores by taxa that were not yet studied for their metallophore biosynthetic capacities. \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\n \nFigure 4. Identification of siderophores predicted from genome mining. (A) Chemical structures \nof marinobactins A-E, 42 produced by Terasakiispira papahanaumokuakeensis DSM 29361; \nenterobactin,43 produced by Buttiauxella brennerae DSM 9396; and pyoverdine A214 40 and \nornicorrugatin,39 both produced by Pseudomonas brassicacearum  DSM 13227. The position \nand orientation of the fatty acid desaturation in marinobactins B and D was not determined in \nthis work. (B-D) High pressure liquid chromatography / high-resolution mass spectrometry \n(HPLC-HRMS) total ion chromatograms of culture supernatant extracts, overlaid with extracted \nion chromatograms for siderophore features. Additional details and spectra are provided in the \nSupplemental Methods and Results. \n \n \nTaxonomic distribution of NRP-Metallophores \nWe investigated the taxonomic distribution and evolution of NRP siderophore biosynthesis \nwithin the bacterial kingdom by applying our antiSMASH detection rules to 59,851 \nrepresentative bacterial genomes from the Genome Taxonomy Database (GTDB). 44 Among \nthese, 4,098 genomes (6.8%) were predicted to contain at least one NRP metallophore BGC. A \ntotal of 5,366 BGC regions were detected, representing 14% of all detected NRPS regions.  \nTaxonomic distribution analysis using the GTDB phylogeny highlighted the uneven prevalence \nof NRP-metallophores across bacterial phyla (Table 2). Proteobacteria and Actinomycetota were \noverrepresented in the GTDB representatives, together accounting for 89% of all detected NRP \nmetallophore regions. After correcting for the number of representative genomes in each \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nphylum, NRP metallophore BGCs were most abundant in Actinomycetota, with 23% of genomes \ncontaining at least one detectable region. Proteobacteria, Cyanobacteria, and Myxococcota \neach had similar proportions of genomes with NRP metallophore BGCs; however, due to biased \ncoverage in the GTDB database, 49% of the detected BGC regions were from Proteobacteria, \ncompared to only 4% and 1.1% found in Cyanobacteria and Myxococcota. Thus, we expect that \nfurther sequencing efforts directed at these two phyla will yield many new NRP metallophore \nBGCs.  \n \nPhylum Number of detected \nNRP metallophore \nBGC regions  \nPercentage of total \ndetected NRP-met \nregions \nProportion of \ngenomes with ≥1 \nNRP-met regions \nProteobacteria 2439 49% 2042/16536 (12%) \nActinomycetota 1986 40% 1561/6931 (23%) \nCyanobacteria 200 4.0% 176/1318 (13%) \nFirmicutes_I 192  3.9% 191/4013 (4.8%) \nMyxococcota 55 1.1% 52/418 (12%) \nFirmicutes 28 0.6% 28/9026 (0.3%) \nChloroflexota 18 0.4% 14/1317 (1.1%) \nNitrospirota 16 0.3% 15/307 (4.9%) \nAcidobacteriota 9 0.2% 9/836 (1.1%) \nDesulfobacterota 5 0.1% 5/847 (0.6%) \nVerrucomicrobiota 2 <0.1% 2/1304 (0.2%) \nPlanctomycetota 1 <0.1% 1/1034 (0.1%) \nBdellovibrionota 1 <0.1% 1/248 (0.4%) \nGemmatimonadota 1 <0.1% 1/345 (0.3%) \nTable 2. Taxonomic distribution of 4,953 NRP-metallophore BGC regions detected in 59,851 GTDB \nrepresentative bacterial genomes. Phylum nomenclature is preserved from GTDB r207. An additional \n413 BGC regions with “unknown” taxonomy are not included here. Phyla not listed had zero detected \nregions. \n \nTo map the distribution of NRP-metallophore producers across the bacterial kingdom, we \nemployed Relative Evolutionary Divergence (RED) values, a framework proposed by Parks et \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nal. and utilized within the GTDB. 45 Building on this, Gavriilidou et al. defined \nREDgroups—phylogenetically consistent clusters based on RED values—that provide a \nstandardized framework analogous to genera. 46 Unlike traditional genera, which can vary \nsignificantly in their evolutionary distances, REDgroups offer greater consistency in evolutionary \nrelationships among their members. This framework allowed us to summarize the data as the \naverage number of NRP-metallophore BGC regions per genome within each group, enabling \neffective visualization and more equitable comparative analyses of biosynthetic potential across \nbacterial lineages. By collapsing the GTDB tree to the REDgroup level, we annotated each \ngroup with the average number of putative NRP-metallophore clusters (Fig. 5). The analysis \nrevealed that 16% of REDgroups encoded detected NRP-metallophores, and within each \nREDgroup, the number of NRP-metallophores was relatively consistent (standard deviation: \n0.3425). This observation aligns with the findings of Gavriilidou et al., who demonstrated that \nBGC diversity is consistent at the genus level. 46 While most REDgroups with \nNRP-metallophores averaged one per genome, several REDgroups, particularly within \nActinomycetota, Proteobacteria, and Cyanobacteria exhibited higher averages, with some \nexceeding two per genome. These results reveal lineage-specific patterns in siderophore \nbiosynthesis and highlight the utility of REDgroups as an alternative to traditional taxonomic \nunits. \nEvolution of Gene Families and Phylogenetic Reconciliation to Uncover the Evolutionary \nHistory of NRP-Metallophores \nTo investigate the evolution and origins of NRP-metallophores, we conducted a detailed \nphylogenetic analysis of each chelator group. Elucidating the evolutionary history of bacterial \ngene families is complicated by gene duplications, horizontal gene transfers (HGTs), and \ndeletions that cause discordance between the bacterial species phylogeny and each chelator \ngene phylogeny. To reconcile the trees, we used the software package eMPRess, which infers \nthe most likely series of duplication, HGT, and deletion events (maximum parsimony \nreconciliation) to reconstruct the evolutionary history of the gene family. 47 We first extracted \nnon-fragmented BGC regions from the GTDB representative genomes, then clustered them with \nBiG-SCAPE to yield 1,108 representative BGCs. From these BGCs, we extracted chelator \nbiosynthesis genes and reconstructed gene trees, which were then compared to the GTDB \nspecies tree with eMPRess.47 \nEstimates for the origins and early HGTs of the chelating groups are presented in the \ncenter of Fig. 5. Reconciliation indicates that the most wide-spread chelating \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\ngroups—catechols, hydroxamates, and salicylates—are among the most ancient. Genes for \nproducing 2,3-DHB may have originated in a common ancestor of Actinobacteriota (ca. 2.7 Ga, \naccording to rough estimates from TimeTree 48) and were then transferred stepwise to \nProteobacteria and to Firmicutes. Salicylate biosynthesis genes have an estimated origin in a \ncommon ancestor of Gammaproteobacteria (ca. 1.9 Ga 48), with early HGT to Cyanobacteria \nand Actinobacteriota. Hydroxamate NRP metallophores appear to have originated in the \ncommon ancestor of Alpha- and Gammaproteobacteria (ca. 2.3 Ga 48) and were transferred into \nActinobacteria, while Lys-based hydroxamates evolved in Actinobacteriota. The other chelator \ngroups display a more phylum-specific distribution, with HGT predominantly occurring within the \nsame phylum (see Supplemental Dataset, empress_reconciliations). Dmaq is predicted to be \namong the oldest chelating groups and may have been produced by the common ancestor of \nCyanobacteria (ca. 2.7 Ga 48), while the pyoverdine chromophore, exclusively observed within \nthe order Pseudomonadales, likely represents one of the most recent siderophore biosynthetic \npathways (ca. 1.2 Ga 48).  \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\n \nFigure 5. NRP metallophore biosynthesis across the bacterial kingdom. Center: The Genome \nTaxonomy Database (GTDB) phylogenetic tree (version r207), with strains collapsed to the \nREDgroup level. 46 Numbered circles indicate the most parsimonious origins of chelator \npathways, as determined by reconciliation with eMPRess. 47 The bottom-right legend lists the \nspecific hidden Markov models (pHMMs) associated with each estimated origin. Arrows indicate \nancient horizontal gene transfers predicted by eMPRess. Ring A: Phylal divisions. Phyla with \ndetected chelating groups are labeled using nomenclature from GTDB r207. Ring B: Chelator \nbiosynthetic pathways detected in at least one member of each REDgroup. Ring C: Average \nnumber of detected NRP metallophore BGC regions per genome for each REDgroup. \nAnnotations were mapped to the phylogenetic tree using iTOL v6.49 \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nDiscussion  \nTrace metal starvation shapes interactions within microbial communities and between bacteria \nand the host; therefore, natural and synthetic microbiomes cannot be understood without \nknowing the metallophore biosynthetic potential of the community. High-throughput \nbiotechnological applications will benefit from in silico metallophore prediction due to the \nprohibitively high cost of isolation and characterization. To date, distinguishing peptidic \nmetallophore BGCs from other NRPS BGCs has been largely limited to manual expert analyses, \nleading to blind spots in our understanding of microbes and their communities. We have now \nautomated bacterial NRP metallophore prediction by extending the secondary metabolite \nprediction platform antiSMASH to detect genes involved in the biosynthesis of metal chelating \nmoieties, enabling the first global analysis of bacterial metallophore biosynthetic diversity.  \nThe presence of genes encoding catechols, hydroxamates, and other chelating groups \nis one of the most frequently used markers of a metallophore BGC. 9 We have formalized and \nautomated the identification of eight chelator pathways, allowing us to detect 78% of NRP \nmetallophore BGCs with a 3% false positive rate against a manually annotated set of NRPS \nclusters. Biosynthetic genes are detected with custom pHMMs and significance score cutoffs \ncalibrated for accurate metallophore discovery, diminishing the ambiguity of interpreting gene \nannotations, protein families, and BLAST results. We acknowledge that human biases may have \ninfluenced which clusters were coded as putative metallophores during both algorithm \ndevelopment and testing; however, expert manual curation remains the most reliable method for \nNRP metallophore BGC detection. Unfortunately, 22% of manually identifiable metallophore \nBGCs could not be automatically distinguished from other NRPS clusters, as the algorithm \ndeveloped (for the purpose of being easily integrated into antiSMASH) relies on the presence of \none or more known chelator biosynthesis genes colocalized with the NRPS genes.  \nRecently, Crits-Christoph et al. demonstrated the use of transporter families to predict \nthat a BGC encodes siderophore (or metallophore) biosynthesis. 33 Among our test dataset, the \nbiosynthesis-based antiSMASH rules outperformed the transporter-based approach (F1 = 0.86 \nversus F1 = 0.69). However, some putative metallophore BGCs were only found using the \ntransporter-based approach, and a combined either/or ensemble approach slightly \noutperformed the antiSMASH rules alone (F1 = 0.90). Biosynthetic- and transporter-based \ntechniques are thus complementary, and future work could incorporate transporter genes into \nantiSMASH metallophore prediction. We note that the reported transporter-based approach \nuses just three pHMMs from Pfam, while our biosynthetic detection requires many custom \npHMMs. An extended set of metallophore-specific transporter pHMMs designed according to \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nthe same principles as those followed for the biosynthesis-related pHMMs could significantly \nimprove detection by reducing false positives and capturing other families of transporters. The \nNRP metallophore BGCs discovered in this study could serve as a dataset for developing a \nmore comprehensive model for metallophore transporter detection.  \nThe diverse enzyme families responsible for the biosynthesis of NRP metallophore \nchelating groups (Figure  1B) evince that metal chelation has evolved multiple times, and we \nexpect that more NRPS chelator substructures remain undiscovered. In fact, during manuscript \npreparation, the novel chelator 5-aminosalicylate was reported in the structure of the \nPseudonocardia NRP siderophore pseudonochelin, 30 and we found several unexplored clades \nof Fe(II)/α-ketoglutarate- dependent amino acid β-hydroxylases that are likely involved in \nmetallophore biosyntheses (Figure\n S2). Additionally, we have likely identified a new biosynthetic \npathway in the genome of Sporomusa termitida DSM 4440, which encodes a partial \nmenaquinone pathway in place of a salicylate synthase to putatively produce a novel \nkaramomycin-like metallophore (Figure S3).50 The modular nature of the pHMM-based detection \nrules will allow for new chelating groups to be added as their biosyntheses are experimentally \ncharacterized.  \nMetallophore BGC regions from representative genomes were compared to reference \nBGCs and organized into gene cluster families (GCFs) with BiG-SCAPE (Figure  3). We found \n1093 metallophore BGC regions that were dissimilar from any reference BGC, and almost 500 \ndistinct BGC regions were found in only a single strain. Although significant biosynthetic \ndiversity remains undiscovered, cluster de-replication will become increasingly important to \navoid re-isolating known compounds. We also assessed the taxonomic distribution of \nNRP-metallophore BGC regions by mapping their presence onto a GTDB REDgroup phylogeny. \nWe found that Cyanobacteria and Myxococcota were underrepresented in our analyses due to a \nrelatively low number of published genomes. Considering that only a handful of NRP \nmetallophores have been isolated from these two phyla, we suggest that Cyanobacteria and \nMyxococcota deserve coordinated efforts of genomic sequencing and experimental work to \nfurther characterize their metallophore diversity. \n Finally, we used our dataset of detected BGCs and paired taxonomic data from GTDB to \ninvestigate the complex evolutionary history of chelating group biosynthesis by reconstructing \nthe most likely origin and major HGT events for each pathway with eMPRess  (Fig. 5). 47 \nCatechols, hydroxamates, and salicylates were among the most widespread and ancient \nchelators with evidence of HGT between phyla. This widespread distribution suggests \nsignificant ecological relevance for these chelators in diverse bacterial lineages. Intriguingly, our \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\ntimeline estimates place the origin of 2,3-DHB and Dmaq prior to the Great Oxygenation Event \n(~2.4 - 2.1 Ga), during an era of abundant, soluble ferrous iron. This result leads credence to \nthe hypothesis that chelating molecules first evolved as metal detoxification mechanisms and \nwere repurposed when oxidized iron became scarce. 3 Tracing ancient evolutionary events, \nparticularly those involving multiple gene gains and losses, remains challenging due to the \nexponential increase in complexity as the number of possible events grows. More detailed \nexaminations dedicated to each individual chelating group may yield deeper insights into the \ncomplex evolutionary history of these pathways. For example, the origin of hydroxamates must \nconsider the homologous enzymes in NRPS-independent siderophore pathways, and we cannot \nyet state if metallophore-specific β-OHAsp biosynthesis is polyphyletic due to repeated \nincorporation into metallophores or a single incorporation followed by repeated transfer into \nnon-chelating roles. Nevertheless, this study represents, to our knowledge, the first attempt at a \nlarge-scale phylogenetic analysis into the origin of chelating groups in bacteria.  \nBy integrating chelator detection into antiSMASH, we have taken a major step towards \naccurate, automated NRP metallophore BGC detection. The new strategy affords a clear \npractical improvement over manual curation, and has already allowed for the high-throughput \nidentification of thousands of likely NRP metallophore BGC regions, both in this study and in \nseveral other recently published analyses that have been enabled by early availability of our \nmethodology.51,52 A future antiSMASH module might predict metallophore activity more \naccurately with a machine learning algorithm that considers multiple forms of genomic evidence, \nincluding the presence of transporter genes, NRPS domain architecture and sequence, \nmetal-responsive regulatory elements, and other markers of metallophore biosynthesis that are \nstill limited to manual inspection. 9 In particular, regulatory elements will likely be required to \naccurately distinguish siderophores, zincophores, and other classes of metallophores. \nImplementation of NRP metallophore BGC detection into antiSMASH will enable scientists of \ndiverse expertises to identify and quantify NRP metallophore biosynthetic pathways in their \nbacterial genomes of interest and promote large-scale investigations into the chemistry and \nbiology of metallophores. \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nMethods \nFor all software, default parameters were used unless otherwise specified. All python, R, and \nbash scripts used in this paper, as well as underlying data, is available in the Supplemental \nDataset, published to Zenodo: 10.5281/zenodo.16581519.  \nProfile hidden Markov model construction \nProfile hidden Markov models (pHMMs) were built from biosynthetic genes in known \nmetallophore pathways, supplemented with putative BGC genes where required (Figure  S1 and \nSupplemental Dataset, 1_development/). Amino acid sequences were aligned with MUSCLE \n(v3.8)53 and pHMMs were constructed using hmmbuild (HMMER3). 54 pHMMs were tested \nagainst the MIBiG database (v2.0)55 and an additional 37 NRP siderophore BGCs from literature \n(Supplemental Table 1) using hmmsearch (HMMER3). Rough bitscore significance cutoffs were \ndetermined for each pHMM. More precise cutoffs were assigned by testing against 28,688 \nNRPS BGC regions from the antiSMASH database (v3). 56 BGC regions containing genes near \nthe rough cutoff were manually inspected to determine if these were likely metallophore BGCs. \nIf no clear bitscore cutoff could be discerned, representative low-scoring putative true hits were \nadded to the pHMM seed alignment. This process was repeated until a precise bitscore cutoff \ncould be determined.  \nPhylogenetic analysis of Asp and His β-hydroxylases \nAdequately-performing pHMMs for Asp and His β-hydroxylase subtypes could not be \nconstructed using the above method. Siderophore β-hydroxylase functional subtypes were \npreviously shown to form distinct phylogenetic clades.22 An expanded phylogenetic analysis was \nperformed to serve as a guide for pHMM construction (Supplemental Dataset, \n1_development/hydroxylase_tree/). NRPS BGC regions from the antiSMASH database (v3) \nwere scanned for matches to previously reported β-hydroxylase pHMMs 22 and Pfam pHMMs for \nsiderophore-related transporters (PF00593, PF01032, and PF01497 33,57) using a modified \nversion of antiSMASH v6.0. β-Hydroxylase genes meeting a relaxed bitscore cutoff of 300 (1070 \ntotal) were dereplicated with CD-HIT web server 58 and a sequence identity cutoff of 70%, giving \n425 representative amino acid sequences. A multiple sequence alignment was created using \nhmmalign (HMMER3) and the TauD Pfam (PF02668), 57 and a maximum-likelihood phylogenetic \ntree was reconstructed with IQ-TREE (multicore v2.2.0-beta) 59 using the WAG+F+I+G4 \nevolutionary model. The presence of nearby transporters was mapped onto the phylogenetic \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\ntree to identify clades or paraphyletic groups putatively involved in siderophore biosynthesis. \nSequences in groups corresponding to previously reported TBH_Asp, IBH_Asp, and IBH_His \nsubtypes and the novel putative CyanoBH_Asp1 and CyanoBH_Asp2 subtypes were extracted, \nand pHMMs were constructed and tested as described above. \nIncorporation into antiSMASH \nThe pHMMs and cutoffs were added to antiSMASH as a single detection rule called \n“NRP-metallophore” with the following logic: \nVibH_like or Cy_tandem or  \n(cds(Condensation and AMP-binding) and ( \n(IBH_Asp and not SBH_Asp) or IBH_His or TBH_Asp or \n    CyanoBH_Asp1 or CyanoBH_Asp2 or  \nIPL or SalSyn or (EntA and EntC) or \n(GrbD and GrbE) or (FbnL and FbnM) or PvdO or PvdP or \n(Orn_monoox and not (KtzT or MetRS-like))  \nLys_monoox or VbsL)) \nManual validation \nRefSeq representative bacterial genomes were dereplicated at the genus level using R, \nrandomly selecting one genome for each of the 330 genera determined by GTDB \n(Supplemental Dataset, 3_manual_testing/). All NRPS BGC regions in the genomes were \nannotated with antiSMASH v6.1, yielding 758 BGC regions in the final testing dataset \n(Supplemental Table 2). The antiSMASH output for each BGC was manually inspected for \nevidence of NRP metallophore production, including genes encoding transporters, iron \nreductases, chelator biosynthesis, and known metallophore NRPS domain motifs. The same \n758 BGC regions were classified as NRP metallophores using the chelator-based strategy \ndescribed above, as well as the transporter-based strategy of Crits-Christoph et al. 33 Genes \nmatching Pfam pHMMs for siderophore-related transporters (PF00593, PF01032, and \nPF0149733,57) were identified using a modified version of antiSMASH v6.1, and BGC regions \nwere classified as metallophores if two of the three transporter families were present. 33 Each \nputative false positive was re-investigated before performance statistics were calculated, \nresulting in the reannotation of four BGCs. \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nBIG-SCAPE clustering \nNRP metallophore BGC regions from RefSeq representative genomes (Supplemental Dataset, \n2_refseq_reps_results/metallophores_Jun25.tar.gz) were filtered to remove clusters on \ncontig edges. The resulting 2,523 BGC regions, as well as 78 previously reported BGCs \n(Supplemental Table 2) were clustered using BiG-SCAPE v1.1.2 with the following settings: \n“--no_classify --mix --cutoffs 0.3 0.4 0.5 --clans-off ”. The network \n(Supplemental Dataset, 6_bigscape/mix_c0.50.network) was imported to Cytoscape for \nfigure preparation. \n \nPhylogenetic mapping \nGenome mining was performed on 62,291 GTDB representative genomes (59,851 after filtering; \nversion r207) 44 using AntiSMASH v7.0beta, 13,44 with the inclusion of the NRP metallophore \ndetection module. The outputs were analysed to identify predicted NRP-metallophore producers \nand categorized into distinct chelator groups based on predefined detection criteria. A total of \n5,366 NRP-metallophores were identified, representing approximately 14% of all detected \nNRPS regions. To map the distribution patterns of these producers, the results were integrated \nwith the GTDB tree. Due to the size of the tree, visualization tools such as iTOL 49 were \nimpractical, prompting dereplication to a higher taxonomic rank. The GTDB tree was collapsed \nto the REDgroup level—a phylogenetically defined rank analogous to genera—allowing \nnormalization to reflect the average number of NRP-metallophore biosynthetic gene clusters \n(BGCs) per genome within each REDgroup.46  \nTo uncover the evolutionary history of siderophore biosynthesis, phylogenetic analyses \nand reconciliation were performed. Gene sequences for each chelator group were extracted \nfrom 4,060 complete BGCs, filtered to exclude clusters located on contig edges, and clustered \ninto Gene Cluster Families (GCFs) using BiG-SCAPE 60 with a 0.5 cutoff. From each GCF, one \nrepresentative BGC was selected, resulting in a dataset of 1,108 clusters. Multiple sequence \nalignments (MSAs) were conducted using MAFFT v7, 61 and phylogenetic trees were \nconstructed using FastTree 2 with the WAG model. 62 Evolutionary events, including gene \nduplication, loss, and horizontal gene transfer, were identified using phylogenetic reconciliation \nin eMPRess47 by comparing gene trees to species trees. Reconciliation results were annotated \nusing iTOL v6 49 for visualization, manually mapping key evolutionary events onto the GTDB \ntree. Individual gene tree reconciliations are available in the Supplementary Dataset.  \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nData availability statement \nAll python, R, and bash scripts used in this paper, as well as underlying data, is available in the \nSupplemental Dataset, published to Zenodo: 10.5281/zenodo.16581519. The enterobactin, \nmarinobactin, and ornicorrugatin BGCs have been submitted to the MIBiG repository with \naccession numbers BGC0003172, BGC0003173, BGC0003174, respectively. \n \nConflict of interest statement \nThe authors declare the following financial interests/personal relationships that may be \nconsidered as potential competing interests: M.H.M. is a member of the Scientific Advisory \nBoard of Hexagon Bio. \n \nAcknowledgements \nThis project has received funding from the European Research Council under the European \nUnion’s Horizon 2020 research and innovation programme (Starting Grant 948770-DECIPHER; \nZR and MM), as well as from the US National Science Foundation (CHE-2108596; AB). BT and \nNZ were supported by H2020‐ FNR‐11‐2020: SECRETED—Grant agreement: 101000794. NZ \nwas supported by the German Center for Infection Research TTU09.717. This work was \nsupported by the Office of Navy Research Award Number N00014-23-2197. We thank Allegra \nAron for providing useful feedback on the manuscript. \n \nReferences \n1. Hider, R. C. & Kong, X. Chemistry and biology of siderophores. Nat. Prod. Rep. 27, \n637–657 (2010). \n2. Kraemer, S. M., Duckworth, O. W., Harrington, J. M. & Schenkeveld, W. D. C. \nMetallophores and Trace Metal Biogeochemistry. Aquat. Geochem. 21, 159–195 (2015). \n3. Kramer, J., Özkaya, Ö. & Kümmerli, R. Bacterial siderophores in community and host \ninteractions. Nat. Rev. Microbiol. 18, 152–163 (2020). \n4. Soares, E. V. Perspective on the biotechnological production of bacterial siderophores and \ntheir use. Appl. Microbiol. Biotechnol. 106, 3985–4004 (2022). \n5. Gu, S. et al. Competition for iron drives phytopathogen control by natural rhizosphere \nmicrobiomes. Nat Microbiol 5, 1002–1010 (2020). \n6. Gu, S. et al. Siderophore-Mediated Interactions Determine the Disease Suppressiveness of \nMicrobial Consortia. mSystems 5, e00811–19 (2020). \n7. Behnsen, J. et al. Siderophore-mediated zinc acquisition enhances enterobacterial \ncolonization of the inflamed gut. Nat. Commun. 12, 7016 (2021). \n8. Mehdiratta, K. et al. Kupyaphores are zinc homeostatic metallophores required for \ncolonization of Mycobacterium tuberculosis. Proc. Natl. Acad. Sci. U. S. A. 119, (2022). \n9. Reitz, Z. L. & Medema, M. H. Genome mining strategies for metallophore discovery. Curr. \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nOpin. Biotechnol. 77, 102757 (2022). \n10. Cézard, C., Farvacques, N. & Sonnet, P. Chemistry and biology of pyoverdines, \nPseudomonas primary siderophores. Curr. Med. Chem. 22, 165–186 (2015). \n11. Süssmuth, R. D. & Mainz, A. Nonribosomal Peptide Synthesis-Principles and Prospects. \nAngew. Chem. Int. Ed Engl. 56, 3770–3821 (2017). \n12. Blin, K. et al. antiSMASH 6.0: improving cluster detection and comparison capabilities. \nNucleic Acids Res. 49, W29–W35 (2021). \n13. Blin, K. et al. antiSMASH 7.0: new and improved predictions for detection, regulation, \nchemical structures and visualisation. Nucleic Acids Res. 51, W46–W50 (2023). \n14. Raymond, K. N., Dertz, E. A. & Kim, S. S. Enterobactin: an archetype for microbial iron \ntransport. Proc. Natl. Acad. Sci. U. S. A. 100, 3584–3588 (2003). \n15. Serino, L. et al. Structural genes for salicylate biosynthesis from chorismate in \nPseudomonas aeruginosa. Mol. Gen. Genet. 249, 217–228 (1995). \n16. Pelludat, C., Brem, D. & Heesemann, J. Irp9, encoded by the high-pathogenicity island of \nYersinia enterocolitica, is able to convert chorismate into salicylate, the precursor of the \nsiderophore yersiniabactin. J. Bacteriol. 185, 5648–5653 (2003). \n17. Keating, T. A., Marshall, C. G., Walsh, C. T. & Keating, A. E. The structure of VibH \nrepresents nonribosomal peptide synthetase condensation, cyclization and epimerization \ndomains. Nat. Struct. Biol. 9, 522–526 (2002). \n18. Reitz, Z. L. & Butler, A. Precursor-directed biosynthesis of catechol compounds in \nAcinetobacter bouvetii DSM 14964. Chem. Commun.  56, 12222–12225 (2020). \n19. Bloudoff, K., Fage, C. D., Marahiel, M. A. & Schmeing, T. M. Structural and mutational \nanalysis of the nonribosomal peptide synthetase heterocyclization domain provides insight \ninto catalysis. Proceedings of the National Academy of Sciences 114, 95–100 (2017). \n20. Olucha, J. & Lamb, A. L. Mechanistic and structural studies of the N-hydroxylating \nflavoprotein monooxygenases. Bioorg. Chem. 39, 171–177 (2011). \n21. Heemstra, J. R., Walsh, C. T. & Sattely, E. S. Enzymatic Tailoring of Ornithine in the \nBiosynthesis of the Rhizobium Cyclic Trihydroxamate Siderophore Vicibactin. J. Am. Chem. \nSoc. 131, 15317–15329 (2009). \n22. Reitz, Z. L., Hardy, C. D., Suk, J., Bouvet, J. & Butler, A. Genomic analysis of siderophore \nβ-hydroxylases reveals divergent stereocontrol and expands the condensation domain \nfamily. Proc. Natl. Acad. Sci. U. S. A. 116, 19805–19814 (2019). \n23. Galica, T. et al. Cyanochelins, an Overlooked Class of Widely Distributed Cyanobacterial \nSiderophores, Discovered by Silent Gene Cluster Awakening. Appl. Environ. Microbiol. 87, \ne0312820 (2021). \n24. Hermenau, R. et al. Genomics-Driven Discovery of NO-Donating Diazeniumdiolate \nSiderophores in Diverse Plant-Associated Bacteria. Angew. Chem. Int. Ed Engl. 58, \n13024–13029 (2019). \n25. Makris, C., Carmichael, J. R., Zhou, H. & Butler, A. C-Diazeniumdiolate Graminine in the \nSiderophore Gramibactin Is Photoreactive and Originates from Arginine. ACS Chem. Biol. \n17, 3140–3147 (2022). \n26. Vinnik, V. et al. Structural and Biosynthetic Analysis of the Fabrubactins, Unusual \nSiderophores from Agrobacterium fabrum Strain C58. ACS Chem. Biol. 16, 125–135 \n(2021). \n27. Nadal-Jimenez, P. et al. PvdP is a tyrosinase that drives maturation of the pyoverdine \nchromophore in Pseudomonas aeruginosa. J. Bacteriol. 196, 2681–2690 (2014). \n28. Ringel, M. T., Dräger, G. & Brüser, T. PvdO is required for the oxidation of \ndihydropyoverdine as the last step of fluorophore formation in Pseudomonas fluorescens. J. \nBiol. Chem. 293, 2330–2341 (2018). \n29. Kage, H., Kreutzer, M. F., Wackler, B., Hoffmeister, D. & Nett, M. An iterative type I \npolyketide synthase initiates the biosynthesis of the antimycoplasma agent micacocidin. \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\nChem. Biol. 20, 764–771 (2013). \n30. Zhang, F. et al. Genome Mining and Metabolomics Unveil Pseudonochelin: A Siderophore \nContaining 5-Aminosalicylate from a Marine-Derived Pseudonocardia sp. Bacterium. Org. \nLett. 24, 3998–4002 (2022). \n31. Wu, Q. et al. Metabolomics and Genomics Enable the Discovery of a New Class of \nNonribosomal Peptidic Metallophores from a Marine Micromonospora. J. Am. Chem. Soc. \n145, 58–69 (2023). \n32. Heather, Z. et al. A novel streptococcal integrative conjugative element involved in iron \nacquisition. Mol. Microbiol. 70, 1274–1292 (2008). \n33. Crits-Christoph, A., Bhattacharya, N., Olm, M. R., Song, Y. S. & Banfield, J. F. Transporter \ngenes in biosynthetic gene clusters predict metabolite characteristics and siderophore \nactivity. Genome Res. 31, 239–250 (2020). \n34. Jaremko, M. J., Davis, T. D., Corpuz, J. C. & Burkart, M. D. Type II non-ribosomal peptide \nsynthetase proteins: structure, mechanism, and protein-protein interactions. Nat. Prod. Rep. \n37, 355–379 (2020). \n35. Mihara, K. et al. Identification and transcriptional organization of a gene cluster involved in \nbiosynthesis and transport of acinetobactin, a siderophore produced by Acinetobacter \nbaumannii ATCC 19606T. Microbiology 150, 2587–2597 (2004). \n36. González Carreró, M. I., Sangari, F. J., Agüero, J. & García Lobo, J. M. Brucella abortus \nstrain 2308 produces brucebactin, a highly efficient catecholic siderophore. Microbiology \n148, 353–360 (2002). \n37. Blin, K., Medema, M. H., Kottmann, R., Lee, S. Y. & Weber, T. The antiSMASH database, a \ncomprehensive database of microbial secondary metabolite biosynthetic gene clusters. \nNucleic Acids Res. 45, D555–D559 (2017). \n38. Gross, H. & Loper, J. E. Genomics of secondary metabolite production by Pseudomonas \nspp. Nat. Prod. Rep. 26, 1408–1446 (2009). \n39. Matthijs, S., Budzikiewicz, H., Schäfer, M., Wathelet, B. & Cornelis, P. Ornicorrugatin, a \nNew Siderophore from Pseudomonas fluorescens AF76. Zeitschrift für Naturforschung C \n63, (2008). \n40. Ur�a Fern�ndez, D., Geoffroy, V., Sch�fer, M., Meyer, J.-M. & Budzikiewicz, H. Bacterial \nconstituents CXIII structure revision of several pyoverdins produced by plant-growth \npromoting and plant-deleterious Pseudomonas species. Monatsh. Chem. 134, 1421–1431 \n(2003). \n41. Matthijs, S. et al. Pyoverdine and histicorrugatin-mediated iron acquisition in Pseudomonas \nthivervalensis. Biometals (2016) doi:10.1007/s10534-016-9929-1. \n42. Martinez, J. S. & Butler, A. Marine amphiphilic siderophores: marinobactin structure, \nuptake, and microbial partitioning. J. Inorg. Biochem. 101, 1692–1698 (2007). \n43. Winkelmann, G., Cansier, A., Beck, W. & Jung, G. HPLC separation of enterobactin and \nlinear 2,3-dihydroxybenzoylserine derivatives: a study on mutants of Escherichia coli \ndefective in regulation (fur), esterase (fes) and transport (fepA). Biometals 7, 149–154 \n(1994). \n44. Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a \nphylogenetically consistent, rank normalized and complete genome-based taxonomy. \nNucleic Acids Res. 50, D785–D794 (2022). \n45. Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny \nsubstantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018). \n46. Gavriilidou, A. et al. Compendium of specialized metabolite biosynthetic diversity encoded \nin bacterial genomes. Nat. Microbiol. 7, 726–735 (2022). \n47. Santichaivekin, S. et al. eMPRess: a systematic cophylogeny reconciliation tool. \nBioinformatics 37, 2481–2482 (2021). \n48. Marin, J., Battistuzzi, F. U., Brown, A. C. & Hedges, S. B. The timetree of prokaryotes: New \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint \n\ninsights into their evolution and speciation. Mol. Biol. Evol. 34, 437–446 (2017). \n49. Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic \ntree display and annotation tool. Nucleic Acids Res. 52, W78–W82 (2024). \n50. Shaaban, K. A. et al. Karamomycins A-C: 2-Naphthalen-2-yl-thiazoles from Nonomuraea \nendophytica. J. Nat. Prod. 82, 870–877 (2019). \n51. Mohite, O. S. et al. Pangenome mining of theStreptomycesgenus redefines their \nbiosynthetic potential. bioRxiv 2024.02.20.581055 (2024) doi:10.1101/2024.02.20.581055. \n52. Jørgensen, T. S. et al. A treasure trove of 1034 actinomycete genomes. Nucleic Acids Res. \n52, 7487–7503 (2024). \n53. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high \nthroughput. Nucleic Acids Res. 32, 1792–1797 (2004). \n54. Eddy, S. R. Accelerated Profile HMM Searches. PLoS Comput. Biol. 7, e1002195 (2011). \n55. Kautsar, S. A. et al. MIBiG 2.0: a repository for biosynthetic gene clusters of known \nfunction. Nucleic Acids Res. 48, D454–D458 (2020). \n56. Blin, K., Shaw, S., Kautsar, S. A., Medema, M. H. & Weber, T. The antiSMASH database \nversion 3: increased taxonomic coverage and new query features for modular enzymes. \nNucleic Acids Res. 49, D639–D643 (2021). \n57. El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, \nD427–D432 (2019). \n58. Huang, Y., Niu, B., Gao, Y., Fu, L. & Li, W. CD-HIT Suite: a web server for clustering and \ncomparing biological sequences. Bioinformatics 26, 680–682 (2010). \n59. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective \nstochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, \n268–274 (2015). \n60. Navarro-Muñoz, J. C. et al. A computational framework to explore large-scale biosynthetic \ndiversity. Nat. Chem. Biol. 16, 60–68 (2020). \n61. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: \nimprovements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013). \n62. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2--approximately maximum-likelihood \ntrees for large alignments. PLoS One 5, e9490 (2010). \n \n.CC-BY 4.0 International licenseavailable under a \n(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprintthis version posted August 1, 2025. ; https://doi.org/10.1101/2025.07.29.667501doi: bioRxiv preprint","source_license":"CC-BY-4.0","license_restricted":false}