iDOM: Statistical analysis of dissolved organic matter based on high-resolution mass spectrometry

preprint OA: closed
Full text JSON View at publisher
Full text 102,305 characters · extracted from preprint-html · click to expand
iDOM: Statistical analysis of dissolved organic matter based on high-resolution mass spectrometry | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Method Article iDOM : Statistical analysis of dissolved organic matter based on high-resolution mass spectrometry Fanfan Meng, Ang Hu, Kyoung-Soon Jang, Jianjun Wang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4660944/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 14 Apr, 2025 Read the published version in mLife → Version 1 posted You are reading this latest preprint version Abstract Dissolved organic matter (DOM) is a complex mixture of thousands of molecules and plays crucial roles in aquatic and terrestrial ecosystems. The study of DOM has been advanced and accelerated by developments of instrumental and statistical approaches over the last decade. Due to the complexity of molecular data and underlying ecological mechanisms, there are substantial challenges for statistical analysis, visualization, and theoretical interpretation. Here, we developed an R package iDOM with functions for the basic and advanced statistical analyses and the visualization of DOM derived from Fourier transform ion cyclotron resonance mass spectrometer (FT-ICR MS). The iDOM package could handle various data types of DOM, including molecular compositional data, molecular traits, and unclassified molecules (that is, dark matter). It integrates additional explanatory data types such as environmental and microbial data to explore the interactions of DOM with abiotic and biotic drivers. To illustrate its use, we presented case studies with an example dataset of DOM under experimental warming. We included the case studies of basic functions for molecular trait calculation, molecular class assignment, and the compositional analyses of chemical diversity and dissimilarity. We further showed case studies with advanced functions for DOM assemblages, such as quantifying and exploring their assembly processes, the effects of dark matter on their ecological networks, and the associations between DOM and microbes under warming. We expect that iDOM will serve as a comprehensive pipeline for DOM statistical analyses and bridge the gap between chemical characterization and ecological interpretation. R package Dissolved organic matter Statistical analysis FT-ICR MS Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Dissolved organic matter (DOM) is a large and complex mixture of thousands of molecules that play crucial roles in biogeochemical cycles (Thurman 2012 ). The high heterogeneity of DOM composition has previously presented a great challenge for our understanding of their reactivity, fate, and functional significance (Cooper et al. 2022 , Ruan et al. 2023 ). However, DOM studies have made substantial progress recently due to the advancements in high-resolution mass spectrometry and statistical approaches. For instance, Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) was firstly applied for molecular characterization of humic and fulvic acids from the Suwannee River (Fievre et al. 1997 ). In the last decade, the utilization of FT-ICR MS for DOM analysis has increased dramatically, and the number of relevant publications increased by approximately 21 per year during 2014–2023 (Fig. S1). Compared to conventional bulk measurements such as those based on absorbance and fluorescence spectroscopy (Kellerman et al. 2015 ), FT-ICR MS provides more detailed information on the elemental composition and structural features of individual DOM molecules across global environments. There were also advances in statistical approaches and the associated graphical methods, such as the modified aromaticity index (AI mod ) (Koch and Dittmar 2006 ) and the van Krevelen diagram (Kim et al. 2003 ). These tools are widely applied to quantify the structural features of DOM molecules and visualize molecular composition. Additionally, DOM studies have progressed by integrating concepts and tools from community ecology, including the diversity metrics from functional ecology (Kellerman et al. 2014 ), the ecological processes from metacommunity ecology (Danczak et al. 2020 , Hu et al. 2022b ), and the ecological networks in community ecology (Hu et al. 2022a, Hu et al. 2023 ). For instance, the chemodiversity of DOM in different lake systems is quantified using Chao 1 diversity index to understand the effects of climate and hydrology (Kellerman et al. 2014 ). The assembly of DOM compositions could be distinguished into deterministic processes, such as environmental selection, and stochastic processes, such as ecological drift and dispersal, based on ecological null models (Hu et al. 2022b ). The interactions within DOM molecules, including those between assigned molecules and unclassified molecules (that is, dark matter), and the associations between molecules and microbes, could be quantified using DOM co-occurrence networks and DOM-microbe bipartite networks, respectively (Hu et al. 2022a, Hu et al. 2023 ). So far, there are a few open-source R packages (Bramer et al. 2020 ) and pipelines (Ayala-Ortiz et al. 2023 ) developed for analyzing and visualizing FT-ICR MS data. However, no software is available to integrate basic statistical analyses and especially the advanced analyses mentioned above. Here, we developed an R package iDOM to bridge current gap in the analysis of FT-ICR MS data (Fig. 1 ). iDOM is a multifunctional tool that facilitates basic analyses, such as the calculation of molecular traits, the assignment of molecular classes, and the evaluation of chemical diversity and dissimilarity. It also includes functions for advanced analyses to quantify the assembly processes of DOM assemblages (Hu et al. 2022b ), the effect of molecular dark matter on DOM molecular interactions (Hu et al. 2023 ), and the associations between DOM molecules and microbial taxa (Hu et al. 2022a). Additionally, iDOM includes visualization functions, such as the Van Krevelen diagrams to visualize the FT-ICR MS data regarding molecular H/C and O/C ratios in a two-dimension diagram and the elemental composition plots to represent the relative abundances of different molecular classes. Finally, we illustrated the application of the iDOM package using an example dataset of DOM in microcosm sediments under experimental warming. The experimental microcosms contained a common sterilized sediment but with different microbial communities inoculated from lake sediments in two contrasting climate zones, and were incubated for one month under temperature gradients ranging from 5 to 30°C. Description of functions in the R package The iDOM package is written in the R scientific computing language and relies on the R packages “vegan”, “iCAMP”, “SpiecEasi”, and “ftmsRanalysis”. The currently implemented functions are listed in Table 1 . The functions in iDOM are designed to process complex matrices, including molecular composition data and molecular trait data of DOM assemblages. Additionally, these functions could integrate other types of data that affect DOM, such as environmental variables and microbial data (Fig. 1 ). Table 1 The functions of R package “iDOM” Type Function Description Molecular properties and class assignment molTrait () Computes various molecular traits, such as molecular weight, stoichiometry, chemical structure, energy content, and oxidation state molTrans () Estimates putative biochemical transformations for each molecule molGroup () Partitions DOM molecules into four fractions based on two orthogonal trait dimensions of molecular reactivity and activity: labile-active, recalcitrant-active, recalcitrant-inactive, and labile-inactive Diversity and dissimilarity of DOM commTD () Calculates selected types of taxonomic diversity and evenness measures commFD () Calculates selected types of functional diversity measures, such as Rao’s quadratic entropy commDendro () Generates three relational metabolite dendrograms based on molecular traits and putative biochemical transformations commDD () Calculates selected types of dendrogram-based diversity measures (Based on metabolite dendrograms) commDis () Calculates the differences in DOM compositions between samples Community assembly of DOM commProc() Assesses how deterministic and stochastic processes influence DOM assemblages The effect of DOM dark matter iDME () Assesses the effect of dark matter on DOM assemblages (Based on intraspecific interactions) Microbial mechanisms H2 () Calculates the network-level specialization in DOM-microbe bipartite networks (Based on interspecific interactions) Visualization plotVK () Generates van Krevelen diagrams using the O/C and H/C ratios of elemental formulas plotRA () Plots the relative abundance of different molecular groups These functions of iDOM could be grouped into four aims. The first aim is to use molecular compositional data and trait data to describe molecular traits, classify groups of molecules based on their traits, and calculate the relative abundance of these groups. The second aim is to integrate environmental variables to describe the distribution of diversity and dissimilarity and explain the community assembly of DOM molecules along environmental gradients or across spatial scales. The third aim is to incorporate unknown molecular data to assess the effect of DOM dark matter on whole DOM assemblages. The fourth aim is to include the microbial data to evaluate the DOM-microbe associations and further the microbial mechanisms influencing DOM molecules production and degradation. Datasets The iDOM package provides example datasets of DOM under experimental warming. These datasets are derived from a laboratory microcosm experiment using sterilized Taihu Lake sediments as the organic carbon source, with distinct microbial communities inoculated from China’s lake sediments in subtropical and temperate climate zones, respectively (Hu et al. 2024 ). The microcosms were incubated in the dark for one month at six different temperature levels (5, 10, 15, 20, 25, and 30°C), with each temperature treatment replicated three times, resulting in a total of 36 samples across two climate zones. Five example datasets are used: mol.data , mol.trait , envi, mol.dark.matter , and micro.data . The datasets mol.data and mol.trait include the intensities of 5,474 assigned molecular formulae (referred to as “molecules” hereafter) from a total of 11,253 peaks and their corresponding molecular traits across 36 samples. The dataset envi contains experimental variables, such as incubation temperature, providing meta-information to enrich the analysis under varied environmental conditions. The dataset mol.dark.matter comprises the intensities of 5,779 uncharacterized molecules, offering additional insight into the molecular composition of DOM. The dataset micro.data contains the relative abundance of 463 bacterial genera across the samples and can be used to investigate the interactions between DOM and microbes. Examples of FTICR-MS datasets Calculation of molecular traits and class assignment of molecules The R package iDOM provides the molTrait function to calculate molecular properties, the molTrans function to estimate putative biochemical transformations, and the molGroup function to classify groups of DOM molecules based on these traits. The function molTrait can calculate chemical characteristics of molecules related to molecular weight, stoichiometry, chemical structure, and oxidation state (Table S1). These traits are mass, the number of carbon atoms (C), Kendrick Defect (kdefect CH2 ), O/C ratio, H/C ratio, N/C ratio, P/C ratio, S/C ratio, the modified aromaticity index (AI mod ), double bond equivalent (DBE), DBE minus oxygen (DBE O ), DBE minus AI (DBE AI ), standard Gibbs Free Energy (GFE), nominal oxidation state of carbon (NOSC), and carbon use efficiency (Y met ) (Hughey et al. 2001 , Koch and Dittmar 2006 , LaRowe and Van Cappellen 2011 , Koch and Dittmar 2016 , Song et al. 2020). Furthermore, the function molTrans estimates putative biochemical transformations for each molecule identified by aligning mass differences to a database of known transformations (Danczak et al. 2020 ). The molGroup function can classify DOM assemblages into different groups based on various molecular traits and graphical methods, such as molecular properties, putative biochemical transformations, and Van Krevelen diagrams (Kim et al. 2003 , Danczak et al. 2020 ). For instance, the assigned molecules can be classified into the CHO, CHON, CHOS, CHOP, CHONS, CHONP, CHOSP, and CHONSP formula groups based on the composition of molecular elements. Each molecule aligned on the Van Krevelen diagrams can be correlated to specific natural biomolecules (Kim et al. 2003 ). Molecules in different regions of the diagram can be categorized into distinct classes, such as lipids, proteins, amino sugars, carbohydrates, unsaturated hydrocarbons, condensed aromatics, lignin, and tannins (Sleighter and Hatcher 2007 , Hockaday et al. 2009 ). Recently, a new method is developed to divide molecules into four fractions based on molecular H/C ratio and putative biochemical transformations, which indicate molecular reactivity and activity, respectively (Hu et al. 2022b ). The four fractions are labile-active (H/C ≥ 1.5, transformations > 10), recalcitrant-active (H/C 10), recalcitrant-inactive (H/C < 1.5, transformations ≤ 1), and labile-inactive (H/C ≥ 1.5, transformations ≤ 1). The function molGroup revealed that CHO and CHON groups consistently exhibited higher relative abundances compared to other formula groups across the temperature gradient from 5℃ to 30℃. Additionally, condensed aromatics and lignin consistently showed dominance throughout the temperature range (Fig. 2 a). The function molGroup further classified 4,150 out of 5,474 molecules into four fractions based on molecular reactivity and activity: labile-active, recalcitrant-active, recalcitrant-inactive, and labile-inactive. The molecular activity could provide new insights in addition to molecular reactivity, which is supported by the overall overlap between active and inactive molecules with an H/C ratio above or below 1.5 in the Van Krevelen diagram (Fig. 2 b). Compared to active molecules, the inactive molecules showed lower relative abundance in both labile and recalcitrant fractions (Fig. S2). Diversity and dissimilarity of DOM The R package iDOM provides the commTD , commFD , and commDD functions to calculate within-assemblage diversity, and the chemoDis function to assess between-assemblage compositional differences of DOM molecules. To apply diversity metrics, originally designed for ecological species, to molecular data, individual compounds are treated as species, with the relative intensities of their peaks representing species abundance. The function commTD calculates taxonomic diversity using the most common indices of α-diversity and evenness, including molecular richness, which was based on the number of molecular formulas; the abundance-based diversity metrics such as Shannon, Gini-Simpson (or Simpson’s index), or the Chao 1 indices (Kellerman et al. 2014 , Li et al. 2018 ), which were based on the molecular richness and the molecular relative intensity. The function commFD calculates functional diversity based on molecular traits using Rao’s quadratic entropy, which measures the average abundance-weighted trait-based difference between any two molecules in a community. Greater differences in traits between any two individuals in a community result in higher quadratic entropy (Mentges et al. 2017 , Tanentzap et al. 2019 ). To further understand the relationships among molecules, the function commDendro generates relational molecular dendrograms, analogous to phylogeny trees, based on molecular properties and putative biochemical transformations. These dendrograms include the molecular characteristics dendrogram (MCD), transformation-based dendrograms (TD), and transformation-weighted characteristics dendrogram (TWCD), representing shared and divergent molecular traits among molecules. After generating molecular dendrograms, the function commDD calculates dendrogram-based diversity measurements, including dendrogram diversity (DD), mean pairwise distance (MPD), and mean nearest taxon distance (MNTD). The DD quantifies the total dendrogram branch length occupied by a given molecular assemblage, analogous to Faith’s Phylogenetic Diversity (Faith 1992 ). Higher DD values indicate molecular assemblages that span a broader range of molecular properties (MCD), a more extensive biochemical transformation network (TD), or both (TWCD). MPD determines the average dendrogram distance between molecules, while MNTD determines the average dendrogram distance between nearest neighbors (Danczak et al. 2020 ). As a complement to alpha-diversity, the function commDis compares the differences in DOM compositions between samples by generating a dissimilarity matrix. The dissimilarity metrics include incidence-based Jaccard, abundance-based Bray-Curtis, and dendrogram-based UniFrac dissimilarity. For each dissimilarity matrix, non-metric multidimensional scaling (NMDS) and principal coordinate analysis (PCoA) could be subsequently employed to visually depict the relationships among samples based on the first two major axes of variation. For the illustrated dataset, the functions commTD, commFD and commDD revealed that DOM molecular diversity showed different correlations with experimental temperature in temperate and subtropical regions. The molecular richness significantly decreased with rising temperature in the temperate region ( P 0.05). Meanwhile, RaoQ and MNTD significantly decreased with rising temperature in the subtropical region ( P 0.05). Further, the function commDis revealed that the molecular composition showed similarity in the subtropical and temperate regions, as indicated by Permutational Multivariate Analysis of Variance (PERMANOVA, P > 0.05) (Fig. 3 d). Community assembly of DOM assemblages The assembly processes underlying molecular assemblages could be quantified based on dendrogram-based β-diversity null modeling (Danczak et al. 2020 , Hu et al. 2022b ). The function commProc quantifies the relative influences of deterministic and stochastic processes governing the assembly of DOM assemblages. The function calculates the dendrogram-based β-nearest taxon index (βNTI) to quantify tip-level clustering or overdispersion of a molecular dendrogram, analogous to its application in phylogenetic trees in ecological communities (Stegen et al. 2012 , Wang et al. 2013 ). The βNTI is calculated by comparing the observed β-mean nearest taxon distance (βMNTD) between pairs of local DOM assemblages to a null expectation generated by randomizing observed dendrogram associations (Danczak et al. 2020 ). When the comparison between two DOM assemblages significantly deviates from the null expectation (| β NTI| > 2), deterministic processes are likely responsible for the observed pattern. Deterministic processes could lead to a pattern of divergent molecular composition across local assemblages via “variable selection” ( β NTI > 2) or convergent molecular composition via “homogeneous selection” ( β NTI < -2) (Stegen et al. 2015 ). Conversely, if the pairwise comparison instead mirrors the null expectation (| β NTI| < 2), stochastic processes are likely responsible for the observed differences. Based on molecular incidence data and the relevant trait-based dendrograms, we applied the function commProc to quantify the relative influences of deterministic and stochastic processes governing the assembly of DOM assemblages. Most of the | β NTI| values for MCD, TD, and TWCD larger than 2, indicating that deterministic processes predominantly governed the molecular assembly (Fig. 4 a). To further understand the assembly mechanisms of different DOM fractions compared to whole DOM assemblages, we applied the molGroup function to partition the DOM composition into labile-active, recalcitrant-active, recalcitrant-inactive, and labile-inactive fractions based on molecular trait dimensions of reactivity and activity (Fig. 4 b). In both subtropical and temperate regions, deterministic processes caused by variable selection dominated the assembly of labile or recalcitrant molecules in the active fractions, while stochastic processes are more important for the assembly of molecules within the inactive fractions. Homogeneous selection showed little importance across the fractions in both regions (Fig. 4 c). Effect of DOM dark matter DOM peaks can be assigned to identifiable molecular formulae using FT-ICR MS, yet a large proportion of DOM remains uncharacterized, often referred to as chemical “dark matter” (Hu et al. 2023 ). The role of molecular dark matter and its relationship with assigned (i.e., known) molecules, represent a major challenge for a complete understanding of biogeochemical cycles. The function iDME quantifies the effect of dark matter on DOM assemblages by constructing co-occurrence networks based on the presence and absence of dark matter (Hu et al. 2023 ). In each network, the nodes represent individual molecules, and the edges identify the interactions among molecules. Specifically, two types of networks are constructed: 'KK' networks, which include only known molecules, and 'DK' networks, which encompass both dark matter and known molecules at a 1:1 ratio or at the observed ratio in a DOM assemblage (Fig. 5 a). These two networks have an identical number of nodes that are randomly subsampled from the whole DOM molecule pool and are further bootstrapped 100 times. The function iDME calculates the indicator of dark matter effects (iDME) by quantifying the percentage change in the mean value of a given network metric, such as degree centrality, between “KK” and “DK” networks. Degree is defined as the number of edges connecting a focal node to other nodes (Proulx et al. 2005 ), and molecules with a higher degree have more interactions within an assemblage. Thus, positive and negative iDME values indicate that dark matter enhances and reduces network interactions within DOM assemblages, respectively, while an iDME of zero suggests a neutral effect. The iDME could be further divided into intra-iDME and inter-iDME to clarify whether the effects of dark matter result from changes in interactions between dark-dark nodes or between dark-known nodes. In the example dataset, the function iDME showed that DOM dark matter substantially decreased the network connectivity in both temperate and subtropical regions along the temperature gradients (Figs. 5 b, c). All iDME values for the network metric of degree were negative and significantly different from zero, with values ranging from − 24.3% to -17.9% for temperate DOM assemblages and from − 22.7% to -20.7% for subtropical DOM assemblages. The iDME values of temperate regions exhibited a significant increase along the temperature gradient ( P 0.05). This result suggests that the negative effect of dark matter on temperate DOM assemblages decreased as the temperature increased. Furthermore, the partitioning of iDME showed that the effects of dark matter were mainly due to changes in links between dark-known nodes, followed by changes in links between dark-dark nodes for both temperate and subtropical regions. Microbial mechanisms influencing DOM production and degradation The fate of DOM is intimately linked to the metabolism of complex microbial communities, as microbes regulate the production and degradation of specific molecules, thus playing a crucial role in sustaining biogeochemical cycles (Hu et al. 2022a). The function H2 helps quantify the degree of specialization between DOM molecules and microbial taxa by constructing DOM-microbe bipartite networks based on resource-consumer theory. In the DOM-microbe networks, individual DOM molecules are connected exclusively to microbial taxa that use those specific molecules, while the direct interactions within molecules or taxa are not explicitly considered. According to resource-consumer relationships, negative network interactions likely indicate the degradation of larger molecules into smaller structures, while positive network interactions may relate to the production of new molecules, either through degradation or biosynthetic processes (Hu et al. 2022a). The function H2 calculates the specialization index H 2 ’ to quantify the degree of specialization between DOM and microbes, and standardizes H 2 ’ using null modeling, such as the shuffle.web algorithm, to directly compare the network indices across different samples (Hu et al. 2022a). An elevated H 2 ’ indicates a high degree of specialization between DOM and microbes (Bluthgen et al. 2006 ), with extreme cases where a single bacterial taxon might consume or produce just one specific DOM molecule. Conversely, lower H 2 ’ values suggest a more generalized bipartite network where different DOM molecules can be used by a wide range of bacterial taxa (Hu et al. 2022a). We applied the H2 function to the example dataset to examine how DOM-microbe associations vary under experimental temperatures (Fig. 6 ). In total, there were 1,108 and 1,938 interactions for the negative and positive networks (|SparCC ρ | > 0.5), respectively (Figs. 6 a-b). The standardized H 2 ’ values were negative and significantly lower than expected by chance (P < 0.05), indicating that the interactions between DOM and bacteria were non-random (Figs, 6c-d). Experimental warming showed divergent effects on the H 2 ’ of negative or positive networks between the two regions. Specifically, for the positive networks, experimental warming significantly decreased H 2 ’ for both temperate and subtropical regions ( P < 0.05). For the negative networks, experimental warming significantly increased H 2 ’ for the temperate region ( P < 0.05), while there was no significant correlation at the subtropical region. Experimental warming in the temperate region could thus contribute to the greater recalcitrance of DOM by increasing production (i.e., less specialized positive networks) and reducing decomposition of molecules (i.e., more specialized negative networks) (Hu et al. 2022a). Availability The iDOM open-source software package is implemented in R and available for download via Github ( https://github.com/jianjunwang/iDOM ). Conclusion The package iDOM is a comprehensive set of functions developed to facilitate the chemical characterization and ecological interpretation of DOM based on high-resolution mass spectrometry. iDOM enables us to perform chemical characterization, such as molecular trait calculation, molecular class assignment, and compositional analyses of chemical diversity and dissimilarity. Further, iDOM integrates concepts and tools from community ecology to facilitate the theoretical interpretation of community assembly of DOM assemblages, the effect of molecular dark matter on DOM molecular interactions, and the DOM-microbe associations. The iDOM is expected to promote the standardized methodologies and reproducible research in DOM studies, and its extensibility makes it suitable for a wide range of applications across global environments. Declarations Author contributions JW and AH designed the study. FM analyzed the data with contributions from JW and AH. FM and JW finished the first draft. JW, AH and FM finished the manuscript with contributions from KJ. Acknowledgements This study was supported by National Natural Science Foundation of China (42225708, 92251304, 42377122), Research Program of Sino-Africa Joint Research Center, Chinese Academy of Sciences (151542KYSB20210007), and Science and Technology Planning Project of NIGLAS (NIGLAS2022GS09). References Ayala-Ortiz, C., N. Graf-Grachet, V. Freire-Zapata, J. Fudyma, G. Hildebrand, R. AminiTabrizi, C. Howard-Varona, Y. E. Corilo, N. Hess, M. B. Duhaime, M. B. Sullivan, and M. M. Tfaily. 2023. MetaboDirect: an analytical pipeline for the processing of FT-ICR MS-based metabolomic data. Microbiome 11 :28. Bluthgen, N., F. Menzel, and N. Bluthgen. 2006. Measuring specialization in species interaction networks. BMC Ecology 6 :9. Bramer, L. M., A. M. White, K. G. Stratton, A. M. Thompson, D. Claborne, K. Hofmockel, and L. A. McCue. 2020. ftmsRanalysis: An R package for exploratory data analysis and interactive visualization of FT-MS data. PLoS Comput Biol 16 :e1007654. Cooper, W. T., J. C. Chanton, J. D'Andrilli, S. B. Hodgkins, D. C. Podgorski, A. C. Stenson, M. M. Tfaily, and R. M. Wilson. 2022. A history of molecular level analysis of natural organic matter by FTICR mass spectrometry and the paradigm shift in organic geochemistry. Mass spectrometry reviews 41 :215-239. Danczak, R. E., R. K. Chu, S. J. Fansler, A. E. Goldman, E. B. Graham, M. M. Tfaily, J. Toyoda, and J. C. Stegen. 2020. Using metacommunity ecology to understand environmental metabolomes. Nat Commun 11 :6369. Faith, D. P. 1992. Conservation evaluation and phylogenetic diversity. Biological Conservation 61 :1-10. Fievre, A., T. Solouki, A. G. Marshall, and W. T. Cooper. 1997. High-resolution Fourier transform ion cyclotron resonance mass spectrometry of humic and fulvic acids by laser desorption/ionization and electrospray ionization. Energy & Fuels 11 :554-560. Friedman, J., and E. J. Alm. 2012. Inferring correlation networks from genomic survey data. PLoS Comput Biol 8 :e1002687. Hockaday, W. C., J. M. Purcell, A. G. Marshall, J. A. Baldock, and P. G. Hatcher. 2009. Electrospray and photoionization mass spectrometry for the characterization of organic matter in natural waters: a qualitative assessment. Limnology and Oceanography: Methods 7 :81-95. Hu, A., M. Choi, A. J. Tanentzap, J. Liu, K. S. Jang, J. T. Lennon, Y. Liu, J. Soininen, X. Lu, Y. Zhang, J. Shen, and J. Wang. 2022a. Ecological networks of dissolved organic matter and microorganisms under global change. Nat Commun 13 :3600. Hu, A., K. S. Jang, F. Meng, J. Stegen, A. J. Tanentzap, M. Choi, J. T. Lennon, J. Soininen, and J. Wang. 2022b. Microbial and Environmental Processes Shape the Link between Organic Matter Functional Traits and Composition. Environmental Science & Technology 56 :10504-10516. Hu, A., K. S. Jang, A. J. Tanentzap, W. Zhao, J. T. Lennon, J. Liu, M. Li, J. Stegen, M. Choi, Y. Lu, X. Feng, and J. Wang. 2024. Thermal responses of dissolved organic matter under global change. Nat Commun 15 :576. Hu, A., F. Meng, A. J. Tanentzap, K. S. Jang, and J. Wang. 2023. Dark Matter Enhances Interactions within Both Microbes and Dissolved Organic Matter under Global Change. Environmental Science & Technology 57 :761-769. Hughey, C. A., C. L. Hendrickson, R. P. Rodgers, A. G. Marshall, and K. Qian. 2001. Kendrick mass defect spectrum: a compact visual analysis for ultrahigh-resolution broadband mass spectra. Analytical chemistry 73 :4676-4681. Kellerman, A. M., T. Dittmar, D. N. Kothawala, and L. J. Tranvik. 2014. Chemodiversity of dissolved organic matter in lakes driven by climate and hydrology. Nat Commun 5 :3804. Kellerman, A. M., D. N. Kothawala, T. Dittmar, and L. J. Tranvik. 2015. Persistence of dissolved organic matter in lakes related to its molecular characteristics. Nature Geoscience 8 :454-U452. Kim, S., R. W. Kramer, and P. G. Hatcher. 2003. Graphical method for analysis of ultrahigh-resolution broadband mass spectra of natural organic matter, the van Krevelen diagram. Anal Chem 75 :5336-5344. Koch, B. P., and T. Dittmar. 2006. From mass to structure: An aromaticity index for high‐resolution mass data of natural organic matter. Rapid communications in mass spectrometry 20 :926-932. Koch, B. P., and T. Dittmar. 2016. From mass to structure: an aromaticity index for high-resolution mass data of natural organic matter. Rapid Communications in Mass Spectrometry 30 :250-250. LaRowe, D. E., and P. Van Cappellen. 2011. Degradation of natural organic matter: a thermodynamic analysis. Geochimica et Cosmochimica Acta 75 :2030-2042. Li, X. M., G. X. Sun, S. C. Chen, Z. Fang, H. Y. Yuan, Q. Shi, and Y. G. Zhu. 2018. Molecular Chemodiversity of Dissolved Organic Matter in Paddy Soils. Environ Sci Technol 52 :963-971. Mentges, A., C. Feenders, M. Seibt, B. Blasius, and T. Dittmar. 2017. Functional molecular diversity of marine dissolved organic matter is reduced during degradation. Frontiers in Marine Science 4 :194. Proulx, S. R., D. E. Promislow, and P. C. Phillips. 2005. Network thinking in ecology and evolution. Trends in Ecology & Evolution 20 :345-353. Ruan, M., F. Wu, F. Sun, F. Song, T. Li, C. He, and J. Jiang. 2023. Molecular-level exploration of properties of dissolved organic matter in natural and engineered water systems: A critical review of FTICR-MS application. Critical Reviews in Environmental Science and Technology 53 :1534-1562. Sleighter, R. L., and P. G. Hatcher. 2007. The application of electrospray ionization coupled to ultrahigh resolution mass spectrometry for the molecular characterization of natural organic matter. Journal of Mass Spectrometry 42 :559-574. Song, H.-S., J. C. Stegen, E. B. Graham, J.-Y. Lee, V. A. Garayburu-Caruso, W. C. Nelson, X. Chen, J. D. Moulton, and T. D. Scheibe. 2020. Representing organic matter thermodynamics in biogeochemical reactions via substrate-explicit modeling. Frontiers in microbiology 11 :531756. Stegen, J. C., X. Lin, J. K. Fredrickson, and A. E. Konopka. 2015. Estimating and mapping ecological processes influencing microbial community assembly. Frontiers in microbiology 6 :370. Stegen, J. C., X. Lin, A. E. Konopka, and J. K. Fredrickson. 2012. Stochastic and deterministic assembly processes in subsurface microbial communities. ISME J 6 :1653-1664. Tanentzap, A. J., A. Fitch, C. Orland, E. J. S. Emilson, K. M. Yakimovich, H. Osterholz, and T. Dittmar. 2019. Chemical and microbial diversity covary in fresh water to influence ecosystem functioning. Proc Natl Acad Sci U S A 116 :24689-24695. Thurman, E. M. 2012. Organic geochemistry of natural waters. Springer Science & Business Media. Wang, J., J. Shen, Y. Wu, C. Tu, J. Soininen, J. C. Stegen, J. He, X. Liu, L. Zhang, and E. Zhang. 2013. Phylogenetic beta diversity in bacterial assemblages across ecosystems: deterministic versus stochastic processes. ISME J 7 :1310-1321. Additional Declarations The authors declare no competing interests. Supplementary Files SupplementaryMaterials.docx Cite Share Download PDF Status: Published Journal Publication published 14 Apr, 2025 Read the published version in mLife → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4660944","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Method Article","associatedPublications":[],"authors":[{"id":320687617,"identity":"acc0c72c-a189-48b5-a16c-a42e5cfa2d82","order_by":0,"name":"Fanfan Meng","email":"","orcid":"","institution":"Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Fanfan","middleName":"","lastName":"Meng","suffix":""},{"id":320687618,"identity":"3924e8c4-9935-4d7e-be74-ab976940c4bc","order_by":1,"name":"Ang Hu","email":"","orcid":"","institution":"Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Ang","middleName":"","lastName":"Hu","suffix":""},{"id":320687619,"identity":"2c9488c3-285e-4972-b16d-5e4eaced878b","order_by":2,"name":"Kyoung-Soon Jang","email":"","orcid":"","institution":"Korea Basic Science Institute","correspondingAuthor":false,"prefix":"","firstName":"Kyoung-Soon","middleName":"","lastName":"Jang","suffix":""},{"id":320687620,"identity":"258a15f8-16b7-4285-a705-53b80b92dbf2","order_by":3,"name":"Jianjun Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA4klEQVRIiWNgGAWjYBACxoYDIMqGgY2BuYEkLWlALYxEaoGCw2DdxKllbjxj+Lng1/k8Pv6FjR9/MNjJM7CfPUDAYWeMpWf23S5mk3jYLM3DkGzYwJOXQEDL2Q3SvD23E9skDjZIA21NYJDgMSCkZfNv3p5zIC3NP38w1BOlZZs0z48DiW38jW0SPAyHidFy/ps1b0My0BbGNmseg+OGbTw5+LUYzjiWfJvnj13i/P7Dh2/+qKiW52c/Q0jLAaBVbUCWRAKQACpmw6seCOT5G4DkHyDmP0BI7SgYBaNgFIxUAACm9EcgSwKP4wAAAABJRU5ErkJggg==","orcid":"","institution":"Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences","correspondingAuthor":true,"prefix":"","firstName":"Jianjun","middleName":"","lastName":"Wang","suffix":""}],"badges":[],"createdAt":"2024-06-30 02:09:26","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-4660944/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4660944/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1002/mlf2.70002","type":"published","date":"2025-04-15T00:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":59516173,"identity":"0f5fca16-cad7-43b2-8dc0-5dbfcd0b7710","added_by":"auto","created_at":"2024-07-02 17:35:55","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":780530,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eConcept figure of R package “\u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eiDOM\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e”. \u003c/strong\u003eThe \u003cem\u003eiDOM \u003c/em\u003epackage provides five datasets, which are used here as examples: \u003cem\u003emol.data\u003c/em\u003e, \u003cem\u003emol.trait\u003c/em\u003e, \u003cem\u003eenvi\u003c/em\u003e, \u003cem\u003emol.dark.matter\u003c/em\u003e, and \u003cem\u003emicro.data\u003c/em\u003e. Based on these datasets, the \u003cem\u003eiDOM \u003c/em\u003eprovides functions to 1) calculate molecular properties and assign molecular classes; 2) quantify the diversity and dissimilarity of DOM and explain the assembly process of DOM assemblages; 3) quantify the effect of molecular dark matter on DOM molecular interactions; 4) evaluate the association between DOM and microbes.\u003c/p\u003e","description":"","filename":"Figure1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4660944/v1/731d226539826c4c6ba5c32a.jpg"},{"id":59516176,"identity":"eec53ce9-ce57-4463-b9ab-885cb74f815c","added_by":"auto","created_at":"2024-07-02 17:35:56","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1340543,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe relative abundance of molecular groups and the Van Krevelen diagrams. \u003c/strong\u003e(a) The relative abundance of different molecular groups based on the composition of molecular elements. (b) The relative abundance of different molecular groups based on H/C ratios and O/C ratios (Kim et al. 2003), namely, lipids (O/C = 0-0.3, H/C = 1.5-2.0), proteins (O/C = 0.3-0.55, H/C = 1.5-2.2), amino sugars (O/C = 0.55-0.67, H/C = 1.5-2.2), carbohydrates (Carb; O/C = 0.67-1.2, H/C = 1.5-2), unsaturated hydrocarbons (UnsatHC; O/C = 0-0.1, H/C = 0.7-1.5), lignin (O/C = 0.1-0.67, H/C = 0.7-1.5), tannin (O/C = 0.67-1.2, H/C = 0.5-1.5), and condensed aromatics (ConHC; O/C = 0-0.67, H/C = 0.2-0.7). (c) The distribution of the molecules in the Van Krevelen diagrams. Different colored dots represent different molecular groups, that is LA: labile-active, LI: labile-inactive, RA: recalcitrant-active, RI: recalcitrant-inactive, and other.\u003c/p\u003e","description":"","filename":"Figure2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4660944/v1/3b465c0cddb488d53fb07167.jpg"},{"id":59516178,"identity":"a05772c3-ce6e-405a-a1a4-c72b6dea2368","added_by":"auto","created_at":"2024-07-02 17:35:56","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":263170,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe relationships between various biodiversity metrics and experimental temperature across temperate and subtropical regions\u003c/strong\u003e. The molecular richness (a), Rao's quadratic entropy (RaoQ) based on AI\u003csub\u003emod\u003c/sub\u003e (b), and Mean Nearest Taxon Distance (MNTD) based on MCD (c) change with experimental temperature in the temperate and subtropical regions. The compositional dissimilarity among samples is illustrated based on NMDS using Bray-Curtis distance (d).\u003c/p\u003e","description":"","filename":"Figure3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4660944/v1/ada74d21782abf14c43742b0.jpg"},{"id":59516879,"identity":"abc666ad-b7da-4ea1-96df-224181c00e0c","added_by":"auto","created_at":"2024-07-02 17:43:56","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":612918,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003e\u0026nbsp;The community assembly of DOM.\u003c/strong\u003e (a) The bNTI based on three trait-based dendrograms: MCD, molecular characteristics dendrogram; TD, transformation-based dendrograms; TWCD, transformation-weighted characteristics dendrogram. (b) The molecular characteristic dendrogram (MCD) was generated using 16 molecular traits (Table S1). For the illustrated dataset, 4,150 of 5,474 molecules in the 36 samples were parsed into four fractions based on their reactivity and activity: LA: Labile-Active; RA: Recalcitrant-Active, RI: Recalcitrant-Inactive; LI: Labile-Inactive. (c) The assembly process underlying the DOM fractions.\u003c/p\u003e","description":"","filename":"Figure4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4660944/v1/a499eacfd387647d7a17cb76.jpg"},{"id":59516174,"identity":"946cdbb2-af6a-440d-9410-94293e0e27bc","added_by":"auto","created_at":"2024-07-02 17:35:55","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":907619,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe molecular co-occurrencenetwork and indicator of molecular dark matter effect.\u003c/strong\u003e (a) The illustration of “KK” and “DK” molecular co-occurrence network with 10 nodes. The “KK” network includes\u003cstrong\u003e \u003c/strong\u003eonly known molecules, while the “DK” network replaces half of known molecules in “KK” network with molecular dark matter. (b) The indicator of dark matter effect (iDME) was calculated based on the “KK” and “DK” co-occurrence networks, which were inferred using SparCC (Sparse Correlations for Compositional data) (Friedman and Alm 2012). The networks were constructed using molecules observed in more than 30% of the total samples, with a threshold value for SparCC ρ correlations set at |ρ| = 0.30 to filter out uncorrelated or weakly correlated interactions. For each sample, the “KK” and “DK” co-occurrence networks were obtained with 400 nodes randomly subsampled from its DOM composition and further bootstrapped 100 times. The “DK” network maintains a 1:1 ratio of dark and known molecules. (c) The partition of iDME along experimental temperature gradients.\u003c/p\u003e","description":"","filename":"Figure5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4660944/v1/a6b205ef20f6ae7bb67a508b.jpg"},{"id":59516175,"identity":"54016df2-bf05-4acd-a3df-f091db28130b","added_by":"auto","created_at":"2024-07-02 17:35:55","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":1262646,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe specialization H\u003c/strong\u003e\u003csub\u003e\u003cstrong\u003e2\u003c/strong\u003e\u003c/sub\u003e\u003cstrong\u003e’ of the DOM-microbe bipartite network. \u003c/strong\u003eThe bipartite networks of negative (a) and positive (b) interactions between DOM molecules and bacterial genera were built using Sparse Correlations for Compositional data (SparCC) (Friedman and Alm 2012). The negative and positive networks were constructed using DOM molecules and bacterial genera observed in more than 30% of the total samples, and negative and positive correlation coefficients (SparCC ρ \u0026lt; −0.50 and ρ \u0026gt; 0.50, respectively). The SparCC ρ values were multiplied by 100,000 and rounded to integers, and the absolute values were taken for negative networks to calculate standardized specialization indices (H\u003csub\u003e2\u003c/sub\u003e’). For each sample, a separate negative and positive sub-network was obtained by selecting the DOM molecules and bacterial taxa in each sample based on its bacterial and DOM compositions. Upper nodes represent the top 10 bacterial genera, colored by their genus, which showed more associations with molecules than other bacterial genera. Lower nodes represent DOM molecules, colored from gray to black, with darker shades indicating molecules that have more associations with bacterial genera. The H\u003csub\u003e2\u003c/sub\u003e’ was calculated based on negative (c) and positive (d) bipartite networks for subtropical and temperate samples, respectively, along experimental temperature gradients.\u003c/p\u003e","description":"","filename":"Figure6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4660944/v1/bd58a17357fad91acfd6558e.jpg"},{"id":92883690,"identity":"de2be0ed-c451-47fe-8853-40523da0b5f9","added_by":"auto","created_at":"2025-10-06 16:07:55","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5810209,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4660944/v1/fcd68e16-7655-46c7-ad86-bd480073ee9d.pdf"},{"id":59516179,"identity":"e2228834-f165-40dc-a9bb-2d618359bdcc","added_by":"auto","created_at":"2024-07-02 17:35:56","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":125615,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryMaterials.docx","url":"https://assets-eu.researchsquare.com/files/rs-4660944/v1/df621ef9f40bc152c0253552.docx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cem\u003eiDOM\u003c/em\u003e: Statistical analysis of dissolved organic matter based on high-resolution mass spectrometry\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eDissolved organic matter (DOM) is a large and complex mixture of thousands of molecules that play crucial roles in biogeochemical cycles (Thurman \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). The high heterogeneity of DOM composition has previously presented a great challenge for our understanding of their reactivity, fate, and functional significance (Cooper et al. \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2022\u003c/span\u003e, Ruan et al. \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). However, DOM studies have made substantial progress recently due to the advancements in high-resolution mass spectrometry and statistical approaches. For instance, Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) was firstly applied for molecular characterization of humic and fulvic acids from the Suwannee River (Fievre et al. \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e1997\u003c/span\u003e). In the last decade, the utilization of FT-ICR MS for DOM analysis has increased dramatically, and the number of relevant publications increased by approximately 21 per year during 2014\u0026ndash;2023 (Fig. S1). Compared to conventional bulk measurements such as those based on absorbance and fluorescence spectroscopy (Kellerman et al. \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2015\u003c/span\u003e), FT-ICR MS provides more detailed information on the elemental composition and structural features of individual DOM molecules across global environments.\u003c/p\u003e \u003cp\u003eThere were also advances in statistical approaches and the associated graphical methods, such as the modified aromaticity index (AI\u003csub\u003emod\u003c/sub\u003e) (Koch and Dittmar \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2006\u003c/span\u003e) and the van Krevelen diagram (Kim et al. \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2003\u003c/span\u003e). These tools are widely applied to quantify the structural features of DOM molecules and visualize molecular composition. Additionally, DOM studies have progressed by integrating concepts and tools from community ecology, including the diversity metrics from functional ecology (Kellerman et al. \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2014\u003c/span\u003e), the ecological processes from metacommunity ecology (Danczak et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e, Hu et al. \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022b\u003c/span\u003e), and the ecological networks in community ecology (Hu et al. 2022a, Hu et al. \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). For instance, the chemodiversity of DOM in different lake systems is quantified using Chao 1 diversity index to understand the effects of climate and hydrology (Kellerman et al. \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). The assembly of DOM compositions could be distinguished into deterministic processes, such as environmental selection, and stochastic processes, such as ecological drift and dispersal, based on ecological null models (Hu et al. \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022b\u003c/span\u003e). The interactions within DOM molecules, including those between assigned molecules and unclassified molecules (that is, dark matter), and the associations between molecules and microbes, could be quantified using DOM co-occurrence networks and DOM-microbe bipartite networks, respectively (Hu et al. 2022a, Hu et al. \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). So far, there are a few open-source R packages (Bramer et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) and pipelines (Ayala-Ortiz et al. \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) developed for analyzing and visualizing FT-ICR MS data. However, no software is available to integrate basic statistical analyses and especially the advanced analyses mentioned above.\u003c/p\u003e \u003cp\u003eHere, we developed an R package \u003cem\u003eiDOM\u003c/em\u003e to bridge current gap in the analysis of FT-ICR MS data (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). \u003cem\u003eiDOM\u003c/em\u003e is a multifunctional tool that facilitates basic analyses, such as the calculation of molecular traits, the assignment of molecular classes, and the evaluation of chemical diversity and dissimilarity. It also includes functions for advanced analyses to quantify the assembly processes of DOM assemblages (Hu et al. \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022b\u003c/span\u003e), the effect of molecular dark matter on DOM molecular interactions (Hu et al. \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), and the associations between DOM molecules and microbial taxa (Hu et al. 2022a). Additionally, \u003cem\u003eiDOM\u003c/em\u003e includes visualization functions, such as the Van Krevelen diagrams to visualize the FT-ICR MS data regarding molecular H/C and O/C ratios in a two-dimension diagram and the elemental composition plots to represent the relative abundances of different molecular classes. Finally, we illustrated the application of the \u003cem\u003eiDOM\u003c/em\u003e package using an example dataset of DOM in microcosm sediments under experimental warming. The experimental microcosms contained a common sterilized sediment but with different microbial communities inoculated from lake sediments in two contrasting climate zones, and were incubated for one month under temperature gradients ranging from 5 to 30\u0026deg;C.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Description of functions in the R package","content":"\u003cp\u003eThe \u003cem\u003eiDOM\u003c/em\u003e package is written in the R scientific computing language and relies on the R packages \u0026ldquo;vegan\u0026rdquo;, \u0026ldquo;iCAMP\u0026rdquo;, \u0026ldquo;SpiecEasi\u0026rdquo;, and \u0026ldquo;ftmsRanalysis\u0026rdquo;. The currently implemented functions are listed in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. The functions in \u003cem\u003eiDOM\u003c/em\u003e are designed to process complex matrices, including molecular composition data and molecular trait data of DOM assemblages. Additionally, these functions could integrate other types of data that affect DOM, such as environmental variables and microbial data (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe functions of R package \u003cem\u003e\u0026ldquo;iDOM\u0026rdquo;\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eType\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFunction\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDescription\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e\u003cb\u003eMolecular properties and class assignment\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003emolTrait ()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eComputes various molecular traits, such as molecular weight, stoichiometry, chemical structure, energy content, and oxidation state\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003emolTrans ()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEstimates putative biochemical transformations for each molecule\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003emolGroup ()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePartitions DOM molecules into four fractions based on two orthogonal trait dimensions of molecular reactivity and activity: labile-active, recalcitrant-active, recalcitrant-inactive, and labile-inactive\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"4\" rowspan=\"5\"\u003e \u003cp\u003e\u003cb\u003eDiversity and dissimilarity of DOM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecommTD ()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCalculates selected types of taxonomic diversity and evenness measures\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecommFD ()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCalculates selected types of functional diversity measures, such as Rao\u0026rsquo;s quadratic entropy\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecommDendro ()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGenerates three relational metabolite dendrograms based on molecular traits and putative biochemical transformations\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecommDD ()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCalculates selected types of dendrogram-based diversity measures (Based on metabolite dendrograms)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecommDis ()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCalculates the differences in DOM compositions between samples\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eCommunity assembly of DOM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecommProc()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAssesses how deterministic and stochastic processes influence DOM assemblages\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eThe effect of DOM dark matter\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eiDME ()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAssesses the effect of dark matter on DOM assemblages (Based on intraspecific interactions)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eMicrobial mechanisms\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eH2 ()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCalculates the network-level specialization in DOM-microbe bipartite networks (Based on interspecific interactions)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e\u003cb\u003eVisualization\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eplotVK ()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGenerates van Krevelen diagrams using the O/C and H/C ratios of elemental formulas\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eplotRA ()\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePlots the relative abundance of different molecular groups\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThese functions of \u003cem\u003eiDOM\u003c/em\u003e could be grouped into four aims. The first aim is to use molecular compositional data and trait data to describe molecular traits, classify groups of molecules based on their traits, and calculate the relative abundance of these groups. The second aim is to integrate environmental variables to describe the distribution of diversity and dissimilarity and explain the community assembly of DOM molecules along environmental gradients or across spatial scales. The third aim is to incorporate unknown molecular data to assess the effect of DOM dark matter on whole DOM assemblages. The fourth aim is to include the microbial data to evaluate the DOM-microbe associations and further the microbial mechanisms influencing DOM molecules production and degradation.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eDatasets\u003c/h2\u003e \u003cp\u003eThe \u003cem\u003eiDOM\u003c/em\u003e package provides example datasets of DOM under experimental warming. These datasets are derived from a laboratory microcosm experiment using sterilized Taihu Lake sediments as the organic carbon source, with distinct microbial communities inoculated from China\u0026rsquo;s lake sediments in subtropical and temperate climate zones, respectively (Hu et al. \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). The microcosms were incubated in the dark for one month at six different temperature levels (5, 10, 15, 20, 25, and 30\u0026deg;C), with each temperature treatment replicated three times, resulting in a total of 36 samples across two climate zones.\u003c/p\u003e \u003cp\u003eFive example datasets are used: \u003cem\u003emol.data\u003c/em\u003e, \u003cem\u003emol.trait\u003c/em\u003e, \u003cem\u003eenvi, mol.dark.matter\u003c/em\u003e, and \u003cem\u003emicro.data\u003c/em\u003e. The datasets \u003cem\u003emol.data\u003c/em\u003e and \u003cem\u003emol.trait\u003c/em\u003e include the intensities of 5,474 assigned molecular formulae (referred to as \u0026ldquo;molecules\u0026rdquo; hereafter) from a total of 11,253 peaks and their corresponding molecular traits across 36 samples. The dataset \u003cem\u003eenvi\u003c/em\u003e contains experimental variables, such as incubation temperature, providing meta-information to enrich the analysis under varied environmental conditions. The dataset \u003cem\u003emol.dark.matter\u003c/em\u003e comprises the intensities of 5,779 uncharacterized molecules, offering additional insight into the molecular composition of DOM. The dataset \u003cem\u003emicro.data\u003c/em\u003e contains the relative abundance of 463 bacterial genera across the samples and can be used to investigate the interactions between DOM and microbes.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eExamples of FTICR-MS datasets\u003c/h2\u003e \u003cdiv id=\"Sec5\" class=\"Section3\"\u003e \u003ch2\u003eCalculation of molecular traits and class assignment of molecules\u003c/h2\u003e \u003cp\u003eThe R package \u003cem\u003eiDOM\u003c/em\u003e provides the \u003cem\u003emolTrait\u003c/em\u003e function to calculate molecular properties, the \u003cem\u003emolTrans\u003c/em\u003e function to estimate putative biochemical transformations, and the \u003cem\u003emolGroup\u003c/em\u003e function to classify groups of DOM molecules based on these traits. The function \u003cem\u003emolTrait\u003c/em\u003e can calculate chemical characteristics of molecules related to molecular weight, stoichiometry, chemical structure, and oxidation state (Table S1). These traits are mass, the number of carbon atoms (C), Kendrick Defect (kdefect\u003csub\u003eCH2\u003c/sub\u003e), O/C ratio, H/C ratio, N/C ratio, P/C ratio, S/C ratio, the modified aromaticity index (AI\u003csub\u003emod\u003c/sub\u003e), double bond equivalent (DBE), DBE minus oxygen (DBE\u003csub\u003eO\u003c/sub\u003e), DBE minus AI (DBE\u003csub\u003eAI\u003c/sub\u003e), standard Gibbs Free Energy (GFE), nominal oxidation state of carbon (NOSC), and carbon use efficiency (Y\u003csub\u003emet\u003c/sub\u003e) (Hughey et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2001\u003c/span\u003e, Koch and Dittmar \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2006\u003c/span\u003e, LaRowe and Van Cappellen \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2011\u003c/span\u003e, Koch and Dittmar \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2016\u003c/span\u003e, Song et al. 2020). Furthermore, the function \u003cem\u003emolTrans\u003c/em\u003e estimates putative biochemical transformations for each molecule identified by aligning mass differences to a database of known transformations (Danczak et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe \u003cem\u003emolGroup\u003c/em\u003e function can classify DOM assemblages into different groups based on various molecular traits and graphical methods, such as molecular properties, putative biochemical transformations, and Van Krevelen diagrams (Kim et al. \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2003\u003c/span\u003e, Danczak et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). For instance, the assigned molecules can be classified into the CHO, CHON, CHOS, CHOP, CHONS, CHONP, CHOSP, and CHONSP formula groups based on the composition of molecular elements. Each molecule aligned on the Van Krevelen diagrams can be correlated to specific natural biomolecules (Kim et al. \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2003\u003c/span\u003e). Molecules in different regions of the diagram can be categorized into distinct classes, such as lipids, proteins, amino sugars, carbohydrates, unsaturated hydrocarbons, condensed aromatics, lignin, and tannins (Sleighter and Hatcher \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2007\u003c/span\u003e, Hockaday et al. \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2009\u003c/span\u003e). Recently, a new method is developed to divide molecules into four fractions based on molecular H/C ratio and putative biochemical transformations, which indicate molecular reactivity and activity, respectively (Hu et al. \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022b\u003c/span\u003e). The four fractions are labile-active (H/C\u0026thinsp;\u0026ge;\u0026thinsp;1.5, transformations\u0026thinsp;\u0026gt;\u0026thinsp;10), recalcitrant-active (H/C\u0026thinsp;\u0026lt;\u0026thinsp;1.5, transformations\u0026thinsp;\u0026gt;\u0026thinsp;10), recalcitrant-inactive (H/C\u0026thinsp;\u0026lt;\u0026thinsp;1.5, transformations\u0026thinsp;\u0026le;\u0026thinsp;1), and labile-inactive (H/C\u0026thinsp;\u0026ge;\u0026thinsp;1.5, transformations\u0026thinsp;\u0026le;\u0026thinsp;1).\u003c/p\u003e \u003cp\u003eThe function \u003cem\u003emolGroup\u003c/em\u003e revealed that CHO and CHON groups consistently exhibited higher relative abundances compared to other formula groups across the temperature gradient from 5℃ to 30℃. Additionally, condensed aromatics and lignin consistently showed dominance throughout the temperature range (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea). The function \u003cem\u003emolGroup\u003c/em\u003e further classified 4,150 out of 5,474 molecules into four fractions based on molecular reactivity and activity: labile-active, recalcitrant-active, recalcitrant-inactive, and labile-inactive. The molecular activity could provide new insights in addition to molecular reactivity, which is supported by the overall overlap between active and inactive molecules with an H/C ratio above or below 1.5 in the Van Krevelen diagram (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb). Compared to active molecules, the inactive molecules showed lower relative abundance in both labile and recalcitrant fractions (Fig. S2).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eDiversity and dissimilarity of DOM\u003c/h2\u003e \u003cp\u003eThe R package \u003cem\u003eiDOM\u003c/em\u003e provides the \u003cem\u003ecommTD\u003c/em\u003e, \u003cem\u003ecommFD\u003c/em\u003e, and \u003cem\u003ecommDD\u003c/em\u003e functions to calculate within-assemblage diversity, and the \u003cem\u003echemoDis\u003c/em\u003e function to assess between-assemblage compositional differences of DOM molecules. To apply diversity metrics, originally designed for ecological species, to molecular data, individual compounds are treated as species, with the relative intensities of their peaks representing species abundance. The function \u003cem\u003ecommTD\u003c/em\u003e calculates taxonomic diversity using the most common indices of α-diversity and evenness, including molecular richness, which was based on the number of molecular formulas; the abundance-based diversity metrics such as Shannon, Gini-Simpson (or Simpson\u0026rsquo;s index), or the Chao 1 indices (Kellerman et al. \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2014\u003c/span\u003e, Li et al. \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2018\u003c/span\u003e), which were based on the molecular richness and the molecular relative intensity. The function \u003cem\u003ecommFD\u003c/em\u003e calculates functional diversity based on molecular traits using Rao\u0026rsquo;s quadratic entropy, which measures the average abundance-weighted trait-based difference between any two molecules in a community. Greater differences in traits between any two individuals in a community result in higher quadratic entropy (Mentges et al. \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2017\u003c/span\u003e, Tanentzap et al. \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eTo further understand the relationships among molecules, the function \u003cem\u003ecommDendro\u003c/em\u003e generates relational molecular dendrograms, analogous to phylogeny trees, based on molecular properties and putative biochemical transformations. These dendrograms include the molecular characteristics dendrogram (MCD), transformation-based dendrograms (TD), and transformation-weighted characteristics dendrogram (TWCD), representing shared and divergent molecular traits among molecules. After generating molecular dendrograms, the function \u003cem\u003ecommDD\u003c/em\u003e calculates dendrogram-based diversity measurements, including dendrogram diversity (DD), mean pairwise distance (MPD), and mean nearest taxon distance (MNTD). The DD quantifies the total dendrogram branch length occupied by a given molecular assemblage, analogous to Faith\u0026rsquo;s Phylogenetic Diversity (Faith \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e1992\u003c/span\u003e). Higher DD values indicate molecular assemblages that span a broader range of molecular properties (MCD), a more extensive biochemical transformation network (TD), or both (TWCD). MPD determines the average dendrogram distance between molecules, while MNTD determines the average dendrogram distance between nearest neighbors (Danczak et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eAs a complement to alpha-diversity, the function \u003cem\u003ecommDis\u003c/em\u003e compares the differences in DOM compositions between samples by generating a dissimilarity matrix. The dissimilarity metrics include incidence-based Jaccard, abundance-based Bray-Curtis, and dendrogram-based UniFrac dissimilarity. For each dissimilarity matrix, non-metric multidimensional scaling (NMDS) and principal coordinate analysis (PCoA) could be subsequently employed to visually depict the relationships among samples based on the first two major axes of variation.\u003c/p\u003e \u003cp\u003eFor the illustrated dataset, the functions \u003cem\u003ecommTD, commFD\u003c/em\u003e and \u003cem\u003ecommDD\u003c/em\u003e revealed that DOM molecular diversity showed different correlations with experimental temperature in temperate and subtropical regions. The molecular richness significantly decreased with rising temperature in the temperate region (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), while there was no significant change in the subtropical region (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.05). Meanwhile, RaoQ and MNTD significantly decreased with rising temperature in the subtropical region (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), whereas they had nonsignificant change in the temperate region (P\u0026thinsp;\u0026gt;\u0026thinsp;0.05). Further, the function \u003cem\u003ecommDis\u003c/em\u003e revealed that the molecular composition showed similarity in the subtropical and temperate regions, as indicated by Permutational Multivariate Analysis of Variance (PERMANOVA, \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.05) (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ed).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eCommunity assembly of DOM assemblages\u003c/h2\u003e \u003cp\u003eThe assembly processes underlying molecular assemblages could be quantified based on dendrogram-based β-diversity null modeling (Danczak et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e, Hu et al. \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022b\u003c/span\u003e). The function \u003cem\u003ecommProc\u003c/em\u003e quantifies the relative influences of deterministic and stochastic processes governing the assembly of DOM assemblages. The function calculates the dendrogram-based β-nearest taxon index (βNTI) to quantify tip-level clustering or overdispersion of a molecular dendrogram, analogous to its application in phylogenetic trees in ecological communities (Stegen et al. \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2012\u003c/span\u003e, Wang et al. \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). The βNTI is calculated by comparing the observed β-mean nearest taxon distance (βMNTD) between pairs of local DOM assemblages to a null expectation generated by randomizing observed dendrogram associations (Danczak et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). When the comparison between two DOM assemblages significantly deviates from the null expectation (|\u003cem\u003eβ\u003c/em\u003eNTI| \u0026gt; 2), deterministic processes are likely responsible for the observed pattern. Deterministic processes could lead to a pattern of divergent molecular composition across local assemblages via \u0026ldquo;variable selection\u0026rdquo; (\u003cem\u003eβ\u003c/em\u003eNTI\u0026thinsp;\u0026gt;\u0026thinsp;2) or convergent molecular composition via \u0026ldquo;homogeneous selection\u0026rdquo; (\u003cem\u003eβ\u003c/em\u003eNTI \u0026lt; -2) (Stegen et al. \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). Conversely, if the pairwise comparison instead mirrors the null expectation (|\u003cem\u003eβ\u003c/em\u003eNTI| \u0026lt; 2), stochastic processes are likely responsible for the observed differences.\u003c/p\u003e \u003cp\u003eBased on molecular incidence data and the relevant trait-based dendrograms, we applied the function \u003cem\u003ecommProc\u003c/em\u003e to quantify the relative influences of deterministic and stochastic processes governing the assembly of DOM assemblages. Most of the |\u003cem\u003eβ\u003c/em\u003eNTI| values for MCD, TD, and TWCD larger than 2, indicating that deterministic processes predominantly governed the molecular assembly (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea). To further understand the assembly mechanisms of different DOM fractions compared to whole DOM assemblages, we applied the \u003cem\u003emolGroup\u003c/em\u003e function to partition the DOM composition into labile-active, recalcitrant-active, recalcitrant-inactive, and labile-inactive fractions based on molecular trait dimensions of reactivity and activity (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb). In both subtropical and temperate regions, deterministic processes caused by variable selection dominated the assembly of labile or recalcitrant molecules in the active fractions, while stochastic processes are more important for the assembly of molecules within the inactive fractions. Homogeneous selection showed little importance across the fractions in both regions (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ec).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eEffect of DOM dark matter\u003c/h2\u003e \u003cp\u003eDOM peaks can be assigned to identifiable molecular formulae using FT-ICR MS, yet a large proportion of DOM remains uncharacterized, often referred to as chemical \u0026ldquo;dark matter\u0026rdquo; (Hu et al. \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). The role of molecular dark matter and its relationship with assigned (i.e., known) molecules, represent a major challenge for a complete understanding of biogeochemical cycles. The function \u003cem\u003eiDME\u003c/em\u003e quantifies the effect of dark matter on DOM assemblages by constructing co-occurrence networks based on the presence and absence of dark matter (Hu et al. \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). In each network, the nodes represent individual molecules, and the edges identify the interactions among molecules. Specifically, two types of networks are constructed: 'KK' networks, which include only known molecules, and 'DK' networks, which encompass both dark matter and known molecules at a 1:1 ratio or at the observed ratio in a DOM assemblage (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea). These two networks have an identical number of nodes that are randomly subsampled from the whole DOM molecule pool and are further bootstrapped 100 times.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe function \u003cem\u003eiDME\u003c/em\u003e calculates the indicator of dark matter effects (iDME) by quantifying the percentage change in the mean value of a given network metric, such as degree centrality, between \u0026ldquo;KK\u0026rdquo; and \u0026ldquo;DK\u0026rdquo; networks. Degree is defined as the number of edges connecting a focal node to other nodes (Proulx et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2005\u003c/span\u003e), and molecules with a higher degree have more interactions within an assemblage. Thus, positive and negative iDME values indicate that dark matter enhances and reduces network interactions within DOM assemblages, respectively, while an iDME of zero suggests a neutral effect. The iDME could be further divided into intra-iDME and inter-iDME to clarify whether the effects of dark matter result from changes in interactions between dark-dark nodes or between dark-known nodes.\u003c/p\u003e \u003cp\u003eIn the example dataset, the function \u003cem\u003eiDME\u003c/em\u003e showed that DOM dark matter substantially decreased the network connectivity in both temperate and subtropical regions along the temperature gradients (Figs.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eb, c). All iDME values for the network metric of degree were negative and significantly different from zero, with values ranging from \u0026minus;\u0026thinsp;24.3% to -17.9% for temperate DOM assemblages and from \u0026minus;\u0026thinsp;22.7% to -20.7% for subtropical DOM assemblages. The iDME values of temperate regions exhibited a significant increase along the temperature gradient (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), while those of subtropical regions showed nonsignificant trend (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.05). This result suggests that the negative effect of dark matter on temperate DOM assemblages decreased as the temperature increased. Furthermore, the partitioning of iDME showed that the effects of dark matter were mainly due to changes in links between dark-known nodes, followed by changes in links between dark-dark nodes for both temperate and subtropical regions.\u003c/p\u003e \u003cp\u003e \u003cb\u003eMicrobial mechanisms influencing DOM production\u003c/b\u003e and \u003cb\u003edegradation\u003c/b\u003e\u003c/p\u003e \u003cp\u003eThe fate of DOM is intimately linked to the metabolism of complex microbial communities, as microbes regulate the production and degradation of specific molecules, thus playing a crucial role in sustaining biogeochemical cycles (Hu et al. 2022a). The function \u003cem\u003eH2\u003c/em\u003e helps quantify the degree of specialization between DOM molecules and microbial taxa by constructing DOM-microbe bipartite networks based on resource-consumer theory. In the DOM-microbe networks, individual DOM molecules are connected exclusively to microbial taxa that use those specific molecules, while the direct interactions within molecules or taxa are not explicitly considered. According to resource-consumer relationships, negative network interactions likely indicate the degradation of larger molecules into smaller structures, while positive network interactions may relate to the production of new molecules, either through degradation or biosynthetic processes (Hu et al. 2022a).\u003c/p\u003e \u003cp\u003eThe function \u003cem\u003eH2\u003c/em\u003e calculates the specialization index H\u003csub\u003e2\u003c/sub\u003e\u0026rsquo; to quantify the degree of specialization between DOM and microbes, and standardizes H\u003csub\u003e2\u003c/sub\u003e\u0026rsquo; using null modeling, such as the shuffle.web algorithm, to directly compare the network indices across different samples (Hu et al. 2022a). An elevated H\u003csub\u003e2\u003c/sub\u003e\u0026rsquo; indicates a high degree of specialization between DOM and microbes (Bluthgen et al. \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2006\u003c/span\u003e), with extreme cases where a single bacterial taxon might consume or produce just one specific DOM molecule. Conversely, lower H\u003csub\u003e2\u003c/sub\u003e\u0026rsquo; values suggest a more generalized bipartite network where different DOM molecules can be used by a wide range of bacterial taxa (Hu et al. 2022a).\u003c/p\u003e \u003cp\u003eWe applied the \u003cem\u003eH2\u003c/em\u003e function to the example dataset to examine how DOM-microbe associations vary under experimental temperatures (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). In total, there were 1,108 and 1,938 interactions for the negative and positive networks (|SparCC ρ | \u0026gt; 0.5), respectively (Figs.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ea-b). The standardized H\u003csub\u003e2\u003c/sub\u003e\u0026rsquo; values were negative and significantly lower than expected by chance (P\u0026thinsp;\u0026lt;\u0026thinsp;0.05), indicating that the interactions between DOM and bacteria were non-random (Figs, 6c-d). Experimental warming showed divergent effects on the H\u003csub\u003e2\u003c/sub\u003e\u0026rsquo; of negative or positive networks between the two regions. Specifically, for the positive networks, experimental warming significantly decreased H\u003csub\u003e2\u003c/sub\u003e\u0026rsquo; for both temperate and subtropical regions (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05). For the negative networks, experimental warming significantly increased H\u003csub\u003e2\u003c/sub\u003e\u0026rsquo; for the temperate region (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05), while there was no significant correlation at the subtropical region. Experimental warming in the temperate region could thus contribute to the greater recalcitrance of DOM by increasing production (i.e., less specialized positive networks) and reducing decomposition of molecules (i.e., more specialized negative networks) (Hu et al. 2022a).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec9\" class=\"Section3\"\u003e \u003ch2\u003eAvailability\u003c/h2\u003e \u003cp\u003eThe \u003cem\u003eiDOM\u003c/em\u003e open-source software package is implemented in R and available for download via Github (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/jianjunwang/iDOM\u003c/span\u003e\u003cspan address=\"https://github.com/jianjunwang/iDOM\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThe package \u003cem\u003eiDOM\u003c/em\u003e is a comprehensive set of functions developed to facilitate the chemical characterization and ecological interpretation of DOM based on high-resolution mass spectrometry. \u003cem\u003eiDOM\u003c/em\u003e enables us to perform chemical characterization, such as molecular trait calculation, molecular class assignment, and compositional analyses of chemical diversity and dissimilarity. Further, \u003cem\u003eiDOM\u003c/em\u003e integrates concepts and tools from community ecology to facilitate the theoretical interpretation of community assembly of DOM assemblages, the effect of molecular dark matter on DOM molecular interactions, and the DOM-microbe associations. The \u003cem\u003eiDOM\u003c/em\u003e is expected to promote the standardized methodologies and reproducible research in DOM studies, and its extensibility makes it suitable for a wide range of applications across global environments.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor contributions\u003c/h2\u003e \u003cp\u003eJW and AH designed the study. FM analyzed the data with contributions from JW and AH. FM and JW finished the first draft. JW, AH and FM finished the manuscript with contributions from KJ.\u003c/p\u003e\u003ch2\u003eAcknowledgements\u003c/h2\u003e \u003cp\u003eThis study was supported by National Natural Science Foundation of China (42225708, 92251304, 42377122), Research Program of Sino-Africa Joint Research Center, Chinese Academy of Sciences (151542KYSB20210007), and Science and Technology Planning Project of NIGLAS (NIGLAS2022GS09).\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eAyala-Ortiz, C., N. Graf-Grachet, V. Freire-Zapata, J. Fudyma, G. Hildebrand, R. AminiTabrizi, C. Howard-Varona, Y. E. Corilo, N. Hess, M. B. Duhaime, M. B. Sullivan, and M. M. Tfaily. 2023. MetaboDirect: an analytical pipeline for the processing of FT-ICR MS-based metabolomic data. Microbiome \u003cstrong\u003e11\u003c/strong\u003e:28.\u003c/li\u003e\n \u003cli\u003eBluthgen, N., F. Menzel, and N. Bluthgen. 2006. Measuring specialization in species interaction networks. BMC Ecology \u003cstrong\u003e6\u003c/strong\u003e:9.\u003c/li\u003e\n \u003cli\u003eBramer, L. M., A. M. White, K. G. Stratton, A. M. Thompson, D. Claborne, K. Hofmockel, and L. A. McCue. 2020. ftmsRanalysis: An R package for exploratory data analysis and interactive visualization of FT-MS data. PLoS Comput Biol \u003cstrong\u003e16\u003c/strong\u003e:e1007654.\u003c/li\u003e\n \u003cli\u003eCooper, W. T., J. C. Chanton, J. D\u0026apos;Andrilli, S. B. Hodgkins, D. C. Podgorski, A. C. Stenson, M. M. Tfaily, and R. M. Wilson. 2022. A history of molecular level analysis of natural organic matter by FTICR mass spectrometry and the paradigm shift in organic geochemistry. Mass spectrometry reviews \u003cstrong\u003e41\u003c/strong\u003e:215-239.\u003c/li\u003e\n \u003cli\u003eDanczak, R. E., R. K. Chu, S. J. Fansler, A. E. Goldman, E. B. Graham, M. M. Tfaily, J. Toyoda, and J. C. Stegen. 2020. Using metacommunity ecology to understand environmental metabolomes. Nat Commun \u003cstrong\u003e11\u003c/strong\u003e:6369.\u003c/li\u003e\n \u003cli\u003eFaith, D. P. 1992. Conservation evaluation and phylogenetic diversity. Biological Conservation \u003cstrong\u003e61\u003c/strong\u003e:1-10.\u003c/li\u003e\n \u003cli\u003eFievre, A., T. Solouki, A. G. Marshall, and W. T. Cooper. 1997. High-resolution Fourier transform ion cyclotron resonance mass spectrometry of humic and fulvic acids by laser desorption/ionization and electrospray ionization. Energy \u0026amp; Fuels \u003cstrong\u003e11\u003c/strong\u003e:554-560.\u003c/li\u003e\n \u003cli\u003eFriedman, J., and E. J. Alm. 2012. Inferring correlation networks from genomic survey data. PLoS Comput Biol \u003cstrong\u003e8\u003c/strong\u003e:e1002687.\u003c/li\u003e\n \u003cli\u003eHockaday, W. C., J. M. Purcell, A. G. Marshall, J. A. Baldock, and P. G. Hatcher. 2009. Electrospray and photoionization mass spectrometry for the characterization of organic matter in natural waters: a qualitative assessment. Limnology and Oceanography: Methods \u003cstrong\u003e7\u003c/strong\u003e:81-95.\u003c/li\u003e\n \u003cli\u003eHu, A., M. Choi, A. J. Tanentzap, J. Liu, K. S. Jang, J. T. Lennon, Y. Liu, J. Soininen, X. Lu, Y. Zhang, J. Shen, and J. Wang. 2022a. Ecological networks of dissolved organic matter and microorganisms under global change. Nat Commun \u003cstrong\u003e13\u003c/strong\u003e:3600.\u003c/li\u003e\n \u003cli\u003eHu, A., K. S. Jang, F. Meng, J. Stegen, A. J. Tanentzap, M. Choi, J. T. Lennon, J. Soininen, and J. Wang. 2022b. Microbial and Environmental Processes Shape the Link between Organic Matter Functional Traits and Composition. Environmental Science \u0026amp; Technology \u003cstrong\u003e56\u003c/strong\u003e:10504-10516.\u003c/li\u003e\n \u003cli\u003eHu, A., K. S. Jang, A. J. Tanentzap, W. Zhao, J. T. Lennon, J. Liu, M. Li, J. Stegen, M. Choi, Y. Lu, X. Feng, and J. Wang. 2024. Thermal responses of dissolved organic matter under global change. Nat Commun \u003cstrong\u003e15\u003c/strong\u003e:576.\u003c/li\u003e\n \u003cli\u003eHu, A., F. Meng, A. J. Tanentzap, K. S. Jang, and J. Wang. 2023. Dark Matter Enhances Interactions within Both Microbes and Dissolved Organic Matter under Global Change. Environmental Science \u0026amp; Technology \u003cstrong\u003e57\u003c/strong\u003e:761-769.\u003c/li\u003e\n \u003cli\u003eHughey, C. A., C. L. Hendrickson, R. P. Rodgers, A. G. Marshall, and K. Qian. 2001. Kendrick mass defect spectrum: a compact visual analysis for ultrahigh-resolution broadband mass spectra. Analytical chemistry \u003cstrong\u003e73\u003c/strong\u003e:4676-4681.\u003c/li\u003e\n \u003cli\u003eKellerman, A. M., T. Dittmar, D. N. Kothawala, and L. J. Tranvik. 2014. Chemodiversity of dissolved organic matter in lakes driven by climate and hydrology. Nat Commun \u003cstrong\u003e5\u003c/strong\u003e:3804.\u003c/li\u003e\n \u003cli\u003eKellerman, A. M., D. N. Kothawala, T. Dittmar, and L. J. Tranvik. 2015. Persistence of dissolved organic matter in lakes related to its molecular characteristics. Nature Geoscience \u003cstrong\u003e8\u003c/strong\u003e:454-U452.\u003c/li\u003e\n \u003cli\u003eKim, S., R. W. Kramer, and P. G. Hatcher. 2003. Graphical method for analysis of ultrahigh-resolution broadband mass spectra of natural organic matter, the van Krevelen diagram. Anal Chem \u003cstrong\u003e75\u003c/strong\u003e:5336-5344.\u003c/li\u003e\n \u003cli\u003eKoch, B. P., and T. Dittmar. 2006. From mass to structure: An aromaticity index for high‐resolution mass data of natural organic matter. Rapid communications in mass spectrometry \u003cstrong\u003e20\u003c/strong\u003e:926-932.\u003c/li\u003e\n \u003cli\u003eKoch, B. P., and T. Dittmar. 2016. From mass to structure: an aromaticity index for high-resolution mass data of natural organic matter. Rapid Communications in Mass Spectrometry \u003cstrong\u003e30\u003c/strong\u003e:250-250.\u003c/li\u003e\n \u003cli\u003eLaRowe, D. E., and P. Van Cappellen. 2011. Degradation of natural organic matter: a thermodynamic analysis. Geochimica et Cosmochimica Acta \u003cstrong\u003e75\u003c/strong\u003e:2030-2042.\u003c/li\u003e\n \u003cli\u003eLi, X. M., G. X. Sun, S. C. Chen, Z. Fang, H. Y. Yuan, Q. Shi, and Y. G. Zhu. 2018. Molecular Chemodiversity of Dissolved Organic Matter in Paddy Soils. Environ Sci Technol \u003cstrong\u003e52\u003c/strong\u003e:963-971.\u003c/li\u003e\n \u003cli\u003eMentges, A., C. Feenders, M. Seibt, B. Blasius, and T. Dittmar. 2017. Functional molecular diversity of marine dissolved organic matter is reduced during degradation. Frontiers in Marine Science \u003cstrong\u003e4\u003c/strong\u003e:194.\u003c/li\u003e\n \u003cli\u003eProulx, S. R., D. E. Promislow, and P. C. Phillips. 2005. Network thinking in ecology and evolution. Trends in Ecology \u0026amp; Evolution \u003cstrong\u003e20\u003c/strong\u003e:345-353.\u003c/li\u003e\n \u003cli\u003eRuan, M., F. Wu, F. Sun, F. Song, T. Li, C. He, and J. Jiang. 2023. Molecular-level exploration of properties of dissolved organic matter in natural and engineered water systems: A critical review of FTICR-MS application. Critical Reviews in Environmental Science and Technology \u003cstrong\u003e53\u003c/strong\u003e:1534-1562.\u003c/li\u003e\n \u003cli\u003eSleighter, R. L., and P. G. Hatcher. 2007. The application of electrospray ionization coupled to ultrahigh resolution mass spectrometry for the molecular characterization of natural organic matter. Journal of Mass Spectrometry \u003cstrong\u003e42\u003c/strong\u003e:559-574.\u003c/li\u003e\n \u003cli\u003eSong, H.-S., J. C. Stegen, E. B. Graham, J.-Y. Lee, V. A. Garayburu-Caruso, W. C. Nelson, X. Chen, J. D. Moulton, and T. D. Scheibe. 2020. Representing organic matter thermodynamics in biogeochemical reactions via substrate-explicit modeling. Frontiers in microbiology \u003cstrong\u003e11\u003c/strong\u003e:531756.\u003c/li\u003e\n \u003cli\u003eStegen, J. C., X. Lin, J. K. Fredrickson, and A. E. Konopka. 2015. Estimating and mapping ecological processes influencing microbial community assembly. Frontiers in microbiology \u003cstrong\u003e6\u003c/strong\u003e:370.\u003c/li\u003e\n \u003cli\u003eStegen, J. C., X. Lin, A. E. Konopka, and J. K. Fredrickson. 2012. Stochastic and deterministic assembly processes in subsurface microbial communities. ISME J \u003cstrong\u003e6\u003c/strong\u003e:1653-1664.\u003c/li\u003e\n \u003cli\u003eTanentzap, A. J., A. Fitch, C. Orland, E. J. S. Emilson, K. M. Yakimovich, H. Osterholz, and T. Dittmar. 2019. Chemical and microbial diversity covary in fresh water to influence ecosystem functioning. Proc Natl Acad Sci U S A \u003cstrong\u003e116\u003c/strong\u003e:24689-24695.\u003c/li\u003e\n \u003cli\u003eThurman, E. M. 2012. Organic geochemistry of natural waters. Springer Science \u0026amp; Business Media.\u003c/li\u003e\n \u003cli\u003eWang, J., J. Shen, Y. Wu, C. Tu, J. Soininen, J. C. Stegen, J. He, X. Liu, L. Zhang, and E. Zhang. 2013. Phylogenetic beta diversity in bacterial assemblages across ecosystems: deterministic versus stochastic processes. ISME J \u003cstrong\u003e7\u003c/strong\u003e:1310-1321.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"Nanjing Institute of Geography and Limnology","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"R package, Dissolved organic matter, Statistical analysis, FT-ICR MS","lastPublishedDoi":"10.21203/rs.3.rs-4660944/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4660944/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eDissolved organic matter (DOM) is a complex mixture of thousands of molecules and plays crucial roles in aquatic and terrestrial ecosystems. The study of DOM has been advanced and accelerated by developments of instrumental and statistical approaches over the last decade. Due to the complexity of molecular data and underlying ecological mechanisms, there are substantial challenges for statistical analysis, visualization, and theoretical interpretation. Here, we developed an R package \u003cem\u003eiDOM\u003c/em\u003e with functions for the basic and advanced statistical analyses and the visualization of DOM derived from Fourier transform ion cyclotron resonance mass spectrometer (FT-ICR MS). The \u003cem\u003eiDOM\u003c/em\u003epackage could handle various data types of DOM, including molecular compositional data, molecular traits, and unclassified molecules (that is, dark matter). It integrates additional explanatory data types such as environmental and microbial data to explore the interactions of DOM with abiotic and biotic drivers. To illustrate its use, we presented case studies with an example dataset of DOM under experimental warming. We included the case studies of basic functions for molecular trait calculation, molecular class assignment, and the compositional analyses of chemical diversity and dissimilarity. We further showed case studies with advanced functions for DOM assemblages, such as quantifying and exploring their assembly processes, the effects of dark matter on their ecological networks, and the associations between DOM and microbes under warming. We expect that \u003cem\u003eiDOM\u003c/em\u003e will serve as a comprehensive pipeline for DOM statistical analyses and bridge the gap between chemical characterization and ecological interpretation.\u003c/p\u003e","manuscriptTitle":"iDOM: Statistical analysis of dissolved organic matter based on high-resolution mass spectrometry","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-07-02 17:35:50","doi":"10.21203/rs.3.rs-4660944/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"03d5b6dd-78af-431a-8136-cbbf9b9a255c","owner":[],"postedDate":"July 2nd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-10-06T16:00:05+00:00","versionOfRecord":{"articleIdentity":"rs-4660944","link":"https://doi.org/10.1002/mlf2.70002","journal":{"identity":"mlife","isVorOnly":true,"title":"mLife"},"publishedOn":"2025-04-15 00:00:00","publishedOnDateReadable":"April 15th, 2025"},"versionCreatedAt":"2024-07-02 17:35:50","video":"","vorDoi":"10.1002/mlf2.70002","vorDoiUrl":"https://doi.org/10.1002/mlf2.70002","workflowStages":[]},"version":"v1","identity":"rs-4660944","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4660944","identity":"rs-4660944","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00