Spatially explicit terrestrial carbon densities for calibrating the carbon cycle in human-Earth system Models

doi:10.21203/rs.3.rs-6123546/v1

Spatially explicit terrestrial carbon densities for calibrating the carbon cycle in human-Earth system Models

2025 · doi:10.21203/rs.3.rs-6123546/v1

preprint OA: closed

Full text JSON View at publisher

Full text 203,298 characters · extracted from preprint-html · click to expand

Spatially explicit terrestrial carbon densities for calibrating the carbon cycle in human-Earth system Models | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Spatially explicit terrestrial carbon densities for calibrating the carbon cycle in human-Earth system Models Kanishka B Narayan, Alan V. Di Vittorio, Evan Margiotta, Seth Spawn-Lee, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6123546/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 15 Apr, 2025 Read the published version in Scientific Data → Version 1 posted You are reading this latest preprint version Abstract Soil and vegetation carbon stocks play a critical role in human-Earth system models. These stocks (denominated as densities in MgC/ha) affect variables such as land use change emissions and also influence land use change pathways under climate forcing scenarios where terrestrial carbon is assigned a carbon price. Here we present reharmonized soil and vegetation carbon densities both at the 5-arcmin resolution grid cell level and also aggregated to 235 water sheds for 4 land use types (Cropland, Grazed land, Urban land and unmanaged vegetation) and 15 unmanaged land cover types. Moreover, we use the distribution of carbon within and across pixels to define statistical “states” of carbon, once again differentiated by land type. These statistical states are used to define a range of possible carbon values that can be used for defining initial conditions of soil and vegetation carbon in human-Earth system models. We implement these data in a state-of-the-art multi sector dynamics model, namely the Global Change Analysis Model (GCAM), and show that these new data improve several land use responses, especially when terrestrial carbon is assigned a carbon price. Forestry Geographic Information Systems Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 1. Background & Summary Global and regional models such as Multisector dynamics models (MSDs) and Integrated human-Earth system (IHES) models are routinely used to assess alternative socio-economic, land use and energy transition pathways 1 . MSD and IHES models are in the same family of models but have one key difference. MSD models tend to be more economic in nature and lacking in the representation of biophysical processes (e.g. agriculture land use is well represented but the nitrogen cycle is not). IHES models generally are more representative of these biophysical processes through coupling the MSD model with finer resolution land models. As a simple example, GCAM is an MSD model (Calvin et al. 2019 , Binsted et al. 2021) while GCAM when coupled with a detailed land model like GLM (under the GCAM-E3SM framework) represents an IHES model (Calvin et al. 2018). These models also examine interactions between natural systems (e.g. land, water systems) and human systems (food and energy demand). Soil and vegetation carbon densities play a critical role in these models by influencing the productivity of and profitability from land types (e.g., forest yields, pasture yields, and crop yields) and land use change emissions. Moreover, these densities affect land use change pathways under climate forcing scenarios and low carbon transition scenarios implemented in these models (Thomson et al., 2010 ; Wise et al., 2009 ). IHES, MSD models and economic models generally need to be calibrated with specific carbon densities to initialize the carbon cycle since these models cannot simulate carbon densities through use of spin ups similar to process based earth system models (e.g. Community Land Model or CLM). Note that we define the term “densities” as the stock of carbon denominated in mg/ha, which has already been normalized to bulk density. This term is distinct from the bulk density which is a volumetric term. We also note that the data on carbon densities is also useful for models other than the above mentioned IHES, MSD models (as an example, the Global Trade Analysis Project which utilizes a CGE framework is also calibrated with data from the FAO HWSD compiled by Gibbs, Yui et al. (2014), aggregated to the GTAP AEZ boundaries (Aguiar et al. 2022). Land use focused models such as the Global Biosphere Management Model (GLOBIOM) also make use of emissions factors from the IPCC in addition to fine resolution data on carbon stocks to estimate carbon emissions from land use change (Frank et al. 2024). SI Table 4 shows the carbon inputs used by different models for initializing the carbon cycle for both soil and vegetation carbon. Another challenge is that these models need to be initialized with densities that represent long term potential maximum carbon values since these values are used to spin-up the model in historical years. Thus, carbon densities are used in these models to spin up the historical carbon cycle (e.g. 1700–2015) and model future land use change emissions (e.g. 2016–2100). There have been efforts to reconstruct long term potential carbon densities for soil and vegetation which can inform such calibration efforts. These studies have found that the long term potential carbon densities are much higher than the contemporary values due to ongoing land use and cover change (Erb et al., 2018 ; Walker et al., 2022 ).Moreover, studies have also highlighted difficulties in estimating long term potential carbon densities, since these estimations require long spin up periods themselves (Fang et al., 2014 ). To address the above issues, models currently make use of carbon density data that are differentiated by land type but are not always spatially explicit. These carbon densities are often representative of undisturbed land, and thus represent a long-term potential maximum. For example, models have previously used estimates of carbon values on undisturbed land from Houghton et al. (Houghton, 1999 ) and the IPCC (Jackson et al., 2017 ), among others for initialization of the carbon cycle. But these data not being spatially explicit/differentiated often lead to over and underestimation of carbon sequestration potentials, especially in future land use scenarios. Recently, spatially explicit contemporary data on soil and vegetation carbon data have become available. For example, soil carbon density data have been made available by the FAO (Nachtergaele et al., 2010 ) at a 1 km resolution and at 250 m resolution by the SoilGrids team at the International Soil Reference and Information Centre (ISRIC) (Batjes et al., 2017 ; Hengl et al., 2014 ). Use of spatially distinct, fine resolution data such as these has the potential to significantly improve results from global and regional models by better capturing the geographies of soil and vegetation carbon stocks (Jungkunst et al., 2022 ). Even though these data represent contemporary carbon, they can be used to derive potential maximum carbon values that are spatially explicit. However, these carbon data need to be transformed significantly to be used in a robust manner by regional and global models. This is because each of these fine resolution datasets utilizes its own assumptions of land use and land cover which may be distinct from the land use and land cover definitions used by the models in question. For example, many of these fine resolution data use land cover definitions from the European Space Agency Climate Change Initiative (ESA CCI) dataset (Li et al., 2018 ; Liu et al., 2018 ) while models may use land use definitions from the Historical Database of the Global Environment (HYDE) dataset (Klein Goldewijk et al., 2017 ) and/or land cover definitions from the Moderate Resolution Imaging Spectrometer (MODIS)(Barnes et al., 2003 ; Justice et al., 2002 ). Resolution mismatch between data and models provides an additional challenge. The new, spatially distinct carbon densities are available at a very fine resolution (250m / 300 m) while models are often configured to use coarser data that better match their working resolution. For example, consistent land datasets that have frequently been used for climate modelling are available at a resolution of 5 arcmins (i.e. ~ 10km at the equator) (van Asselen & Verburg, 2012 ), and many regional models operate on land units defined by geopolitical and/or geophysical boundaries (See descriptions above for GTAP and GLOBIOM) (Aguiar et al. 2022, Frank et al. 2024). Given the difference in resolutions, and the above-mentioned differences in land classifications, a harmonization method is required to appropriately match the fine resolution carbon data with the appropriate land uses and land cover types within a model ( as a simple example, a method is required to assign forest carbon correctly to forest portion of pixels, grass carbon to grass portion pixels and that this occurs across the three pools- soil carbon, above ground biomass and below ground biomass). To our knowledge, there is no custom dataset or consistent method that is representative of spatially explicit carbon which can be used to calibrate the carbon cycle in the above mentioned global and regional models. To address the above limitations, we prepare and present a harmonized dataset of fine resolution organic carbon densities for soil and vegetation biomass to initialize the carbon cycle in IHES and MSD models. The soil data are based on the 250 m-resolution SoilGrids dataset and represent a depth of 0–30 cm (topsoil carbon) (Hengl et al., 2014 ). While multiple depths of soil carbon data are available (e.g. 30 cm -200 cm), we use the depth of 0–30 cm which refers to topsoil carbon since most regional models only make use of carbon stock data at this depth. We note that our programmatic method can produce soil carbon data at different depths if required. These original soil carbon data from the SoilGrids dataset are denominated in MgC of soil carbon per hectare and are derived from several soil properties including the bulk density, clay content, sand and silt content. The aboveground and below ground carbon data are based on the 300 m-resolution Spawn et al. biomass dataset (Spawn et al., 2020 ). The harmonization process associates spatially-explicit carbon densities with specific land types to avoid errors due to mismatches in land type distributions between carbon data and models. For example, the carbon data in a given pixel may be associated with forest, but the model considers this pixel as grassland. The harmonized data associates pixel level carbon with its appropriate land type such that it can be aggregated appropriately to model land types and grids. As a simple example Forest carbon is correctly assigned to forest portion of pixels, grass carbon is assigned to grass portion of pixels and we ensure that this occurs across the three pools- soil carbon, above ground biomass and below ground biomass. We implemented this carbon reharmonization programmatically in the Moirai land data system (Di Vittorio et al., 2020 ), which can be used to update the data, validate the data (e.g. generating fine resolution and tabular data which can be compared to other sources), define alternate land unit boundaries (e.g., water basins or agro-ecological zones), and harmonize the source carbon data with a generic land type distribution at coarser resolution that is consistent with other land data. The carbon dataset is effectively harmonized with the current Moirai land data system (Di Vittorio et al., 2020 ) land use and land cover definitions (Table 1 ) that can be aggregated to model-specific land types. Moirai is a software system can generate tabular land use and land cover data for any year based on fine resolution datasets (See section 2 for details). To demonstrate the application of the harmonized data, we use it to initialize the Global Change Analysis Model (GCAM). GCAM’s carbon cycle needs to be initialized via a spin up from 1700–2015 during which land change is applied to the pre-industrial state to determine the distribution and carbon contents of land types in 2015. The model uses a bookkeeping approach to track carbon changes during spin up and during future simulations (Calvin et al., 2019 ). Note that in GCAM’s bookkeeping approach the model begins by tracking a total stock of carbon for each water basin for each land type for two pools (soil and vegetation) in 1700. Following 1700 onwards, based on historical and future land use, the model calculates fluxes from this initial state. Fluxes are based on net land use change e.g. change in cropland, change in forests. The net land use change is calculated based on calibrated data before 2015 and is modelled based on economic profitability post 2015. Sigmoidal growth curves are used to track regrowth of vegetation and exponential decay functions are used to track gain and loss in soil carbon. Using the harmonized dataset described above, we derived data driven, long-term potential carbon values for soil and vegetation for GCAM’s land units, which are defined by geopolitical and watershed boundaries. By analyzing the distribution of carbon values within each land type-watershed combination we found two potential options for initializing GCAM’s carbon cycle. The first is the Q3 (3rd quartile of all 300 m pixels in a given land type in a given watershed) that represents a low carbon initialization (2144 PgC of global terrestrial carbon in 1700) and the second is the 90th percentile (90th percentile of all pixels in a given land type in a given watershed) state that represents a high carbon initialization (3028 PgC in 1700). We also calculated five additional data driven statistical states: area weighted average, minimum, maximum, median, and Q1. Users can calculate any percentile within the distribution of carbon both at a pixel level and at a land region/water-basin level for any land type (For example, the 95th percentile). Note that such a calculation would not be time intensive given that the six summary states are already available at multiple scales. This dataset can be effectively used to characterize uncertainty in carbon estimates in models such as GCAM. For validation, we also compared the Q3 and the 90th percentile carbon state in our dataset at a 5 arcmin resolution (which are intended to represent pre-industrial carbon states) with similar estimates of potential pre-industrial top-soil (0–30 cms)carbon by grid cell from Sanderman et al.(Sanderman et al., 2017 ) and with similar estimates of vegetation carbon from Walker et al. (Walker et al., 2022 ) at the same resolution. We also perform global-level validation of our carbon data, respecting that there is a high degree of uncertainty in carbon estimates from different datasets (Scharlemann et al., 2014 ; Tifafi et al., 2018 ). We also implemented these data in GCAM and found that utilizing this new carbon dataset for the spin-up improved several responses in GCAM, especially under forcing scenarios where the value of terrestrial carbon is priced using a carbon tax. The harmonized data are available as rasters at 5 arcmin resolution because the Moirai land data integration is performed at this resolution (Di Vittorio et al., 2020 ). We also present an easy-to-use tabular output summarizing the six carbon density states for each carbon pool for each land type within each of the 235 watersheds intersected with 207 country (ISO) boundaries that are modelled by GCAM. The final available dataset includes raster files for the different statistical states for each land use type (Cropland, Urban land, Pasture and Unmanaged land) and each carbon pool, bringing the total to 72 distinct raster files. Unmanaged land here refers to land that is currently not grazed or cropped or used as urban land, and is segregated into 15 different types. A thematic file labels each cell with the dominant biome for Unmanaged land (out of 15, Table 1 ). We also present the tabulated text file with the six carbon state values for each land type and carbon pool aggregated to 699 land regions (235 water basins intersected with 207 country boundaries). Making the data available at these different resolutions should help facilitate effective multiscale modelling of terrestrial carbon. 2. Methods Our carbon data processing method can be organized into three stages: Stage 1- Resampling source datasets based on fine resolution land cover Stage 2- Re-mapping the carbon to Moirai land use and land cover Stage 3- Aggregating raster carbon data to basin boundaries Stage 1 – resampling source data This stage combines the 250 m resolution organic soil carbon and 300 m vegetation carbon data (MgC/ha) with the 300 m resolution ESA CCI input land cover data corresponding with the carbon data. We resample both carbon datasets to match the ESA CCI 300 m grid before this stage. We use a simple GDAL resampling approach to align the 250m and 300m grids which makes use of a weighted average value for each land type. We first generate land cover masks (1 = respective land type present, 0 = otherwise) for each of 22 aggregated ESA CCI land cover types ( SI Table 1 ). We combine the land cover masks with the carbon data to create 66 rasters (22 land types X 3 carbon pools), each representing a carbon data mask for an ESA land type. The resulting rasters are calculated as follows: $$\:{Carbon\_LT\_300m}_{pool,j,LT}={Carbon\_300m}_{pool,j\:}*{LT\_mask\_300m}_{j}$$ 1 Where, j is the index of a 300m grid cell, pool is the carbon pool (soil, aboveground biomass, belowground biomass), LT is the ESA land type. Next we use six distinct resampling methods to re-grid these data to a 5 arcmin resolution. Each method is applied to each of the land types and thus we derive 6 statistical states for each land type in each 5 arcmin grid cell. These aggregated rasters are calculated as follows: $$\:{Carbon\_LT\_5arcmin}_{pool,i,state}=state\left(\begin{array}{cc}{Carbon\_LT\_300m}_{pool,j}&\:{Carbon\_LT\_300m}_{pool,j+2}\\\:{Carbon\_LT\_300m}_{pool,j+1}&\:{Carbon\_LT\_300m}_{pool,j+n}\end{array}\right)\:$$ 2 Where, i is the index of a 5 arcmin grid cell, pool is the carbon pool (soil, aboveground biomass, belowground biomass), state is the resampling method (weighted average, median, min, max, q1, q3), j is the index of each 300 m grid cell within aggregated cell i, n is the total number of 300 m cells that are aggregated into cell i. Thus, we generate 366 (22 land cover types X 3 types of carbon X 6 states) layers of carbon that correspond to the aggregated ESA CCI land cover types. This processing is largely conducted through the GDAL software (Warmerdam, 2008 ) and implemented using bash scripts. Stage 2 – remapping the carbon data to Moirai land use/cover Harmonization of ESA land cover with Moirai land cover at 5 arcmins using a prioritization matrix Next, the 366 layers described above are aligned with the default initial Moirai land use/cover for (2010) at a 5 arcmin resolution. These initial land use/cover data are based on land use data from the HYDE (Klein Goldewijk et al., 2017 ) database and a one-half degree land cover product (Meiyappan & Jain, 2012 ). Moirai can generate land use and land cover maps for any year based on the these datasets combined with a potential vegetation dataset from Ramankutty et al. (1999). The potential vegetation is that which would most likely exist now in the absence of human activities. The Moirai land use and land cover types are listed in Table 1 . It is important to note that carbon values are independently assigned to each of the four Moirai land use types in each cell, and that the unmanaged land use type can be only one of the Moirai land cover types in each cell. Moirai is described in more detail in Di Vittorio et al.(Di Vittorio et al., 2020 ). Ultimately moirai generates land use and land cover data for 18 different land types, which are data-specific but not model specific, that can be aggregated to coarser land types required by models such as GCAM. When these data are implemented in regional models like GCAM, they are aggregated to coarser land types (e.g. 7 aggregate land types in GCAM). Table 1 land use, land cover types for Moirai/GCAM. Total of 4 land use types, 15 types of land cover tracked for Unmanaged land type Land use Land cover Cropland Cropland Pasture Pasture Urbanland Urbanland Unmanaged TropicalEvergreenForest/Woodland TropicalDeciduousForest/Woodland TemperateBroadleafEvergreenForest/Woodland TemperateNeedleleafEvergreenForest/Woodland TemperateDeciduousForest/Woodland BorealEvergreenForest/Woodland BorealDeciduousForest/Woodland Evergreen/DeciduousMixedForest/Woodland Savanna Grassland/Steppe DenseShrubland OpenShrubland Tundra Desert Polardesert/Rock/Ice The carbon for each Moirai land type in a cell needs to be selected from the 366 rasters generated in Stage 1 described above. We use a rule-based harmonization approach where we select the appropriate carbon values by matching the Moirai land type with the corresponding ESA land cover type (Table 2). We assign 6 possible ESA land cover types to each Moirai land type and rank them according to their similarity with the Moirai land type. This means that carbon values for a particular Moirai land type can come from any of six ESA land cover types, as long as they are present in a given cell. For example, a Tropical Evergreen Forest cell in Moirai, may be assigned carbon values from the Evergreen_Combined, Mixed_Forests, Mosaic_Tree, Flood_Tree_Cover, Unknown_Tree_Cover, or Sparse_Treecover ESA land cover types. The similarity ranking both maximizes the number of Moirai land type assignments and ensures that the most appropriate carbon values are selected. The first ESA land cover in the ranked list that is present in each cell provides the carbon values for the corresponding Moirai land type in the same cell (Table 2). In the example above, The Evergreen_Combined carbon data would be chosen first over all other ESA land covers if it existed in a given cell and the Sparse_Treecover carbon data would be chosen if it were the only ESA land cover from the list that existed in a given cell. These prioritization rules are designed such that carbon data from one biome is not assigned to a different biome when reharmonizing and re-gridding the carbon. The ESA land cover selection is done once for each cell and Moirai land type, and then the data from the corresponding carbon pool and state rasters are assigned to the Moirai land type in the target cell. This results in 72 rasters that become input files for Moirai. We used expert judgement when developing the matrix to best represent the Moirai land types when selecting from the ESA land types. For certain land types we allow less than six choices (Table 2). For example, carbon for a Moirai Desert cell can only be chosen from a corresponding desert cell in the ESA masks. On the other hand, Moirai Tundra includes eight ESA land covers because ESA does not have an explicit Tundra class. The increased number of options aims to provide adequate data coverage for Tundra. Tundra data selection prioritizes polar desert rock ice pixels. The location of these pixels coincides with the Tundra land cover and they also represent pixels with high values for soil carbon densities. Furthermore, certain biome types that are not represented explicitly in Moirai or not modelled by GCAM receive low priority rankings. For example, Flooded land types are never included as a first priority choice for any land type since Moirai does not explicitly include flooded land types. Conversely, the ESA land cover data do not include any explicit representation of pastures or rangeland. Our rules assign pasture carbon values based on proximate grassland or shrubland carbon values. Note that we do separate out pastures (grazed grassland) and unmanaged grasslands as separate land types. However, the carbon for pastures have to be imputed from the carbon from grasslands since both the vegetation and soil carbon data are based on the ESA land cover data that do not differentiate between unmanaged grassland and pastures. SI Fig. 7 shows an illustrative example of how the hierarchical rules are applied for 1 land type (Tropical Evergreen Forests). Table 2: Prioritization matrix to match ESA land cover with moirai land types Implementation of nearest neighbor algorithm to increase data coverage After implementing the prioritization rules there remain 5 arcmin cells with no carbon data coverage for a given land type and carbon pool. This is expected since the land cover data used to generate the carbon masks (ESA CCI land cover data) may be different from the land cover data used in HYDE and SAGE. We therefore implement a nearest neighbor algorithm to interpolate data to each ‘no data’ cell based on availability in 40 neighboring cells. This algorithm fills the target cell and land type with the corresponding carbon data of the closest cell with matching land type. If no matches are found within the prescribed window then the target cell remains without data for that particular carbon pool and that particular land type. Environmental and topographical criteria are not considered at this stage, but the source carbon data have included topographical characteristics in sampling their values. Carbon data coverage after interpolation is reasonable except for a few land types. Table 3 shows the data coverage by land type after implementation of the nearest neighbor algorithm. All but three land types have over 80% data coverage for soil and vegetation carbon. At least 25% of Tundra and Polar desert cells remain without carbon data. This is likely a result of differences in way Tundra land cover is defined by different datasets. There have been more recent efforts to collect soil carbon data specifically for the permafrost and Tundra regions such as that by Hugelius et al.(Hugelius et al., 2014 ). This suggests that a future area of work would be to incorporate these more detailed datasets into either the source data or our processing workflow. Along with Tundra and Polar deserts, over 20% of the Urban land cells do not have carbon data. This is once again likely due to the different definitions of Urban land cover indifferent datasets. Our data coverage suggests that there exists more uncertainty in the Tundra, Polar, and Urban carbon values purely based on limited data availability. Recognizing and quantifying data availability by land type enables users to utilize their own judgement when using the carbon values for these land types. Table 3 Details of NODATA cells after nearest neighbor interpolation Land type Total 5arcmin grid cells Vegetation carbon Percentage unfound (NO DATA cells after interpolation) Soil carbon Percentage unfound (NO DATA cells after interpolation) Pasture 1195396 2.3 2.3 Cropland 952850 17 17 Grassland/Steppe 498404 15 14.6 OpenShrubland 274296 16 16 Desert 195579 1 1.1 TropicalEvergreenForest/Woodland 190780 0 0.3 Savanna 173776 8 7.6 BorealEvergreenForest/Woodland 148756 0 0 Polardesert/rock/ice 132021 29 24.9 Urban 119597 22.3 22.3 TemperateDeciduousForest/Woodland 86922 1 1.1 DenseShrubland 78065 10 9.5 TemperateNeedleleafEvergreenForest/Woodland 71600 1 0.5 BorealDeciduousForest/Woodland 65824 0 0.4 TropicalDeciduousForest/Woodland 56377 1 1.4 Tundra 25000 29 24.9 TemperateBroadleafEvergreenForest/Woodland 14395 0 0.3 Stage 3 - Aggregating raster carbon data to 699 land regions As a final step, we us the 72 rasters generated in Stage 2 as inputs to the Moirai land data system. Moirai integrates these data with other land data (e.g., protected area, agricultural suitability, and specific crop data) and aggregates all the data to 699 land regions from the 5 arcmin grid cell level. The 699 land regions are the intersection of 235 water basins and 207 countries and are shown as a map in SI Fig. 1 . GCAM uses water basin definitions from the Community Land Model (CLM) (Tesfa et al 2014)).The definition of the water basins is a user-specified feature in Moirai and can be changed to any desired boundary set. For example, an alternative set of boundaries based on agro-ecological zones used by GTAP is included with Moirai (and in the final data product). The final carbon state values for each land type are aggregated to each land region for each carbon pool (aboveground biomass, belowground biomass, soil 0–30 cms). These outputs are available as a tabular text file. The moirai land data system performs this aggregation using the same land masks for the year 2010 which are used in the Stage 2 processing. The basic aggregation performed by moirai is summarized in Eq. 3 below $$\:{Carbon\_tabular}_{pool,GLU,state,LT}=state\left(\begin{array}{cc}{Carbon\_5arcmin\_LT}_{pool,j}&\:{Carbon\_5arcmin\_LT}_{pool,j+2}\\\:{Carbon\_5arcmin\_LT}_{pool,j+1}&\:{Carbon\_5arcmin\_LT}_{pool,j+n}\end{array}\right)$$ 3 Where, pool is the carbon pool (aboveground biomass, belowground biomass, topsoil (0–30 cms)), state is the aggregation method (area-weighted average, median, min, max, q1, q3), GLU represents a land region which is an intersection of 207 country boundaries and 235 watershed boundaries, j is the grid cell index for each 5 arcmin grid cell in a basin with land type LT, n is the total number of cells in a basin for a given land type, and LT is the land type. Stage 4 – Deriving any other percentile using our six statistical states Using our six summary states, users can calculate any percentile for the carbon value in any pixel for each of our 19 land types and three carbon pools (soil, above ground biomass, below ground biomass). These values can also be calculated directly for a land region/water basin. The percentile values can be calculated assuming that carbon values are lognormally distributed (this is established in our analysis below- See section 4.1) The steps to calculate any percentile are as follows, Compute a mean value as a natural log of the median state. Since the distribution of carbon values is lognormal, the natural log of our median would be an estimated mean for the lognormal distribution. Compute an estimated standard deviation using a natural log of the Q3 and the mean value in step 1, specifically we use the formula- (LN(Q3)- LN(mean))/0.675. Note that this formula used here is simple and assumes a normal distribution of carbon. However, the statistics available can be used to fit any distribution. Estimate the percentile value from the mean and standard deviation above. Since the logged distribution is normal, users can compute this value using a z table for a normal distribution. Calculate the exponent of the value in step 3. Constrain this value to the max observed value in our dataset. This method would enable a timely calculation of percentiles and would be much faster than re-running the code to derive individual percentiles using re-sampling. 3. Data records Final data are available for download here- https://zenodo.org/records/13988220 (Narayan et al., 2024) The data repository contains the following- 72 rasters (4 land use types X 6 states X 3 carbon pools) at a 5 arcmin resolution representative of carbon in 2010 1 thematic raster which tracks 15 vegetation biomes for Unmanaged land use type (from 1. above) Tabular data file showing aggregated carbon stocks for 6 states of carbon for 699 land regions for soil (0-30cm), aboveground biomass and belowground biomass. Tabular data file showing aggregated carbon stocks for 6 states of carbon for GTAP AEZ for soil (0-30cm), aboveground biomass and belowground biomass. All data files are available as binary raster files stored as .bil files which can be opened in any GIS software (such as ArcGIS or QGIS) or using programming languages (such as R, python, C or C + + for example). While we do not release the intermediate data described in Fig. 1 , given their size (~ 300 GB). These can be reproduced programmatically (See section 6 below) and saved if the user requires. We recommend that users only regenerate this data selectively given its size, however. 4. Technical Validation In this section we present the technical validation of our dataset. We begin by exploring our main data products. In section 4 A , we show we select different carbon initializations for GCAM from our dataset. In section 4 B , we validate our initialization densities at the pixel level with similar estimates in the literature. In section 4 C we compare the global and regional carbon densities with similar estimates in the literature. In section 4 D we explore spatial uncertainties in our dataset by comparing carbon values across biomes. Finally in section 4 E , we use our dataset to explore alternative scenarios in GCAM with a low and high carbon initialization. We first evaluate our main data products, namely the maps of soil and vegetation carbon across gridcells by land type (e.g., Fig. 2 and Fig. 3 ), with the goal of identifying the most appropriate carbon state for GCAM modeling, and then take a closer look at data uncertainty and spatial variability. Note that the authors of the source data on soil (Hengl et al. 2014 ) and vegetation (Spawn et al.) did a detailed spatial validation of the data in their resperctive papers. Our validation will focus on uncertainties that have been introduced through our re-harmonization process. We will also compare our Q3 and 90th percentile (determined as described above) estimates with similar estimates from the literature since these estimates will be used to initialize GCAM. 4A Selecting potential carbon states for initializing GCAM This dataset provides several carbon levels to choose from because different models have different data needs. GCAM requires a potential maximum carbon state that represents mature ecosystems that have not been affected by land use. This state is used for both the pre-industrial initialization in 1700 and for the asymptotic parameters of the vegetation growth and soil carbon accumulation curves. The pre-industrial carbon state has been estimated to be much higher than the contemporary carbon stored in land (Erb et al., 2018 ) due to a long history of land use. Various studies have highlighted the difficulties in calculating the long term potential maximum (Fang et al., 2014 ), and our statistical aggregation method enables a systematic approach to selecting a data-driven maximum value that we can use to initialize GCAM. The provided statistical states and the opportunity to calculate intermediate ones also enables systematic selection of other carbon levels corresponding to other models. To select potential pre-industrial equilibrium states, we compared the frequency distributions of carbon by pool within each land region for each land type with the final statistical states calculated. The frequency distributions represent a heterogeneous landscape at different stages of growth and management. The average or median values may be representative of the contemporary landscape, but not of an undisturbed landscape that has been allowed to equilibrate its carbon stocks. The maximum value in a land region may be an extreme outlier and likewise would not be representative of the undisturbed landscape. Our goal then is to find a value in between the contempory average and the maximum that is representative of a long-term potential maximum value. Fortunately, most distributions of soil carbon generally follow a log-normal shape with a long tail. We present the distributions of soil carbon in the Amazon basin (Fig. 4 ) for different land types as an example. One option for GCAM initialization is the Q3 statistical state. The soil Q3 values fall between the average and the maximum, as expected. Given the lognormal shape, the observations above the Q3 value are infrequent and can stretch to extremely high values. Most vegetation carbon distributions also follow a log-normal shape within each basin for each land type, but forests have distributions that are more bimodal (Fig. 5 ). Nonetheless, the Q3 state provides carbon estimates that are reasonably higher than the contemporary average or median value. Table 4 Initial potential terrestrial carbon stock calculated from different sources. Sources from moirai are calculated using land maps in 1700. Sanderman et al. represents a carbon stock in 1800 given no land use. Walker and Erb et al. are based on potential vegetation carbon estimations. Data source Topsoil (0–30 cms) carbon in PgC vegetation (above + below ground biomass) in PgC Erb et al 2019 916 moirai (Q3) 1553 591.7 moirai (90th percentile) 2063 966 Walker 2022 795 Sanderman et al. 2017 2119 Houghton 1999 1462 662 We selected two carbon states, Q3 and 90th percentile, to compare with published pre-industrial carbon estimates to inform a final selection for GCAM initialization. Using the Q3 values sets the initial global carbon stock in the year 1700 to 2144 PgC (1553 PgC of carbon in top soil and 591 PgC of vegetation biomass). This estimate is on the lower end of other similar estimates (Table 4 ). We also use the estimated 90th percentile state in order to represent a higher initialization of carbon in 1700. This 90th percentile is estimated from our six summary states using the methodology outlined in section 2.4 and sets an initial carbon stock of 3028 PgC (2063 PgC of carbon from topsoil and 966 PgC of vegetation biomass). Using these two states for initialization helps us understand the sensitivity of the model to the initial value. One reason why the Q3 vegetation values may be low while the 90th percentile values are high is that we derive carbon values for forests as a whole and do not differentiate between primary forests and secondary forests due to lack of available data. This means that our forest carbon distributions include the impact of harvesting, especially in regions with high levels of forest harvests, resulting in lower quartile values yet maintaining relatively high 90th percentile and maximum values. As more fine resolution data on different types of forests become available, a logical next step would be to derive separate carbon densities for primary and secondary forest types. 4B. Grid cell comparison of carbon values to other estimates of long-term potential carbon To evaluate the spatial distribution of our method we compare the 90th percentile values at the pixel level with other gridded data because the 90th percentile global values match the reference data bettar than the Q3 values. Sanderman et al. ( 2017 ) generated a pre-industrial soil carbon map for top soil in the year 1800. This map assumed no land use in that year. Similarly Walker et al. (Walker et al., 2022 ) generated a similar map for potential carbon in above and below ground vegetation. For a valid comparison we compared only our unmanaged land carbon values with these estimates (Figs. 6 and 7 ). We found that in the case of soil carbon, even though our maps track well with the maps from Sanderman et al. ( 2017 ) in terms of the overall spatial distribution, the mean error (moirai 90th percentile – Sanderman et al. 2017 ) across gridcells about − 23%. There are some higher latitude pixels from the Sanderman et al. ( 2017 ) dataset that show almost 100% higher values compared to our data. In case of aboveground vegetation carbon, the mean percent error (moirai 90th percentile – Walker et al. 2022 ) is -17%, which is lower than for soil carbon. The largest errors were observed for forest pixels. This is likely due to the combination of primary and secondary forests into a single forest category in our dataset (as described above), which lowers the carbon values. The highest differences between datasets are observed in forest pixels with high level of forest harvesting (Central and West Africa and South and East Asia). Note that SI Fig. 8 shows pixel level absolute value (MgC/ha) differences between datasets. 4C Comparison to C values to previously used in GCAM by land type and aggregate contemporary estimates We compared the distribution Moirai carbon densities across water basins by land type with global carbon densities from Houghton ( 1999 ) that were previously used for GCAM initialization. The Houghton carbon densities represent contemporary carbon on undisturbed land differentiated by biome. We also compared our statistical states with other contemporary values where available (e.g. Jackson et al. 2017 for soil carbon and Vlek et al. 2017 for fvegetation carbon). These distributions represent all of the statistical states across all basins differentiated by land type. For soil carbon (Fig. 8 and SI Table 2), we found that our global values (Q3, 90th percentile) are generally higher than the Houghton values for most land types. The values are especially higher for shrublands located in Boreal regions where the difference is approx 80 Mgc/ha. This is likely because the SoilGrids dataset shows high carbon values at high latitudes and includes peat soils in its estimates (e.g., Fig. 3 ). The high values of soil carbon at high latitudes may also be driven by low levels of predicted bulk density at those locations (Tifafi et al., 2018 ). Another more recent version of soil grids has recently produced lower values in these regions (Poggio et al., 2021 ). Our Q3 soil carbon values are generally higher than the contemporary estimates. For cropland, our Q3 estimates of carbon are as high as forest soil carbon (Fig. 9 and SI Table 3 ). This is investigated in more detail in the sections below. Similarly the soil carbon under Urban land cover is extremely high. This is likely due to how the samples were collected for Urban land cover. These samples are collected in parks, where soil carbon is relatively high, as opposed to built-up areas with little exposed soil, where soil carbon is relatively low due to development. As expected, Q3 values are higher than the contemporary values from Jackson et al. ( 2017 ), especially in the Boreal regions. However, the Q1 values are closest to contemporary values for soils. Forest vegetation carbon densities are significantly scaled down across Moirai states when compared to the literature (Fig. 9 ; Houghton, 1999 ; Vlek et al., 2017). This is not surprising because the spatial distribution of forest carbon is unknown for the Houghton data, especially for tropical forests (Houghton, 2005 ), while in our data there is significant variation in values across basins due to management and environmental conditions. Also, as noted above, our Moirai values for forest carbon densities are a combination of primary and secondary forests and therefore provide lower estimates than obtained from unmanaged forests. As expected, the Q3 and 90th percentile grassland and pasture vegetation carbon estimates are higher than the literature values (Fig. 9 ). The median values match the contemporary global estimates well. These land types also have a much narrower distribution of values across space. 4D Uncertainties in re-harmonized carbon data (spatially and across land types) Here we explore uncertainty in the available data by further examing spatial distributions, aggregation statistics, and land type considerations. i. Do managed land types show a deprecation in carbon compared to unmanaged land types? Studies show that managed land (i.e., Cropland, Pasture, Urbanland) has depleted carbon stocks in relation to undisturbed land (Cooper et al., 2021 ; Sanderman et al., 2017 ; Wei et al., 2014 ). The aim of processing the spatial managed land carbon data and adding it to Moirai is to obtain contemporary estimates for these lands that can be used in modeling rather than assuming a global value or that managed lands have a fixed fraction of unmanaged land carbon. Carbon data values do not correspond with a long-term potential maximum for these managed land types by definition, as these land are actively disturbed. However, we still want higher than average carbon values for the parameters that define the limits of carbon accumulation for these land types. We expect that the carbon data reflect the effects of these managed land types and that our desired values would be lower than those for the surrounding unmanaged land types. We checked this expectation by first comparing Q3 carbon values for soil and aboveground biomass for cropland with the corresponding values for unmanaged land cover in each of our land regions (Fig. 10 ). We found that cropland soil carbon values do not show a consistent depletion for soil carbon compared to unmanaged land. The reason for these differences among carbon pools is rooted in the source data sampling and processing methodologies. For the soilgrids dataset, the authors state that cropland soil carbon samples were largely collected in the US. For the vegetation carbon dataset from Spawn et al., the vegetation carbon was calculated for each crop type based on yields, which explains the low values on cropland compared to unmanaged land. In GCAM, crop yields are determined from harvested area and production data, while the carbon data are used for land use emissions and for valuing land carbon in global warming target scenarios. To address the relatively high cropland soil carbon data in our modeling experiments we reduce these data by 30% before using them in GCAM. Previous studies have found a similar loss of soil carbon through agricultural practices and land conver conversion from unmanaged land types to cropland(Cooper et al., 2021 ; Wei et al., 2014 ). We performed a similar analysis for pasture carbon densities and found that pasture carbon shows depletion or lower values compared to unmanaged land cover both for both soil and vegetation. In this case, it is reasonable to use the Q3 soil and vegetation carbon values for pasture in GCAM without adjustment. This is an interesting case because pasture is not one of the source land types and its carbon values were assigned based on the same land types as for grassland. Multiple factors could contribute to this result, including sampling bias, pasture location bias, and uncertainties in data and processing. Nonetheless, these data capture the expected difference between pasture and unmanaged grassland. ii. Assesing spatial variability in soil and vegetation carbon within and across basins We have established that carbon distributions within land regions generally follow a lognormal pattern for soil carbon and for vegetation carbon for most land types, except that forest vegetation carbon has a more bimodal distribution. However, there may be more dispersion across values in some basins for some land types compared to others. To assess this systematically, we computed a quartile coefficient of dispersion (QCD) for each basin and land type as: $$\:{QCD}_{GLU,\:LT,\:pool}={(Q3}_{GLU,\:LT,\:pool}-{Q1}_{GLU,\:LT,\:pool})/{(Q3}_{GLU,\:LT,\:pool}+{Q1}_{GLU,\:LT,\:pool})$$ 5 Where, pool is the carbon pool (aboveground biomass, belowground biomass, topsoil (0–30 cms)), GLU represents a land region which is an intersection of basin boundaries and country boundaries, and LT is the land type. The QCD values range from 0 to 1 where a value towards zero indicates less dispersion within a region-land type-carbon pool combination and a value towards 1 indicates more dispersion. The QCD values for soil carbon (Fig. 11 ) are generally similar across most basins across land types. This is expected since the distributions of soil carbon are generally lognormal. However, in some basins the QCD value is consistently high and similar across land types. This mainly occurs in individual basins in Russia and Indonesia which have high levels of peat soils which would mean that the level of dispersion across cells would be high since some cells would contain peat soils whereaes others would not. Based on QCD values across basins and land types for vegetation carbon (Fig. 12 ), we observe that there is significant variation in the QCD values within and across basins for tundra (with values ranging from 0–1). This is likely due to the way tundra pixels are defined in our dataset (they encompass different vegetation types). Similarly, there is significant variation within and across basins for grasslands, savannah and pastures, which is once again likely due to the definitions of what constitutes grasslands in the base land cover dataset. While there are also variations in vegetation carbon values for cropland and urban land, the overall range of values for these land types when it comes to vegetation carbon is low (Fig. 7 ). QCD values for forests across and between basins is lower. This may be due to the more narrow definitions for what constitutes forests across datasets. 4E Results from implementation of spatially explicit carbon in GCAM As a final validation step, we separately implemented two carbon density sets from Moirai (3rd quartile, 90th percentile) and the Houghton densities in GCAM and compared these three cases. We make the following assumptions when implementing the carbon densities in GCAM, based on the analyses above: a) Each set of carbon values are used throughout to reflect two potential options for a long-term potential maximum state of carbon in 1700 b) Cropland soil carbon is reduced by a factor of 0.3 (30% reduction) for all basins to reflect the effects of management. This is because we found that the soil carbon values do not show a depletion of carbon when comparing unmanaged soil carbon and crop carbon (See the findings of section 4 D above) c) Tundra, urban, desert, and polar desert/rock/ice do not change in GCAM and so the assigned carbon values do not influence model simulations. If a model does include dynamics for these land types, then the associated uncertainties should be addressed. We emphasize that the results described here are GCAM specific and would be different based on the model selected. 4E (i) Results sfrom historical spin up We initialized GCAM using each of the three cases identified above. This resulted in a pre-spin up carbon stock of 1912 PgC (1320 PgC in soil and 591.7 PgC in vegetation) when using the Q3 state and a carbon stock of 2718 PgC (1753 PgC in top soil and 965 PgC in vegetation) when using the 90th percentile (Fig. 13 and Table 5 ). Note that these initialization values are calculated using the land cover in 1700, which does include some managed area, and the spatially explicit carbon. The same spin up values from Houghton et al is 1905 PgC (1243 PgC in soil and 662 PgC in vegetation) . During spin up this carbon is reduced to 1735 PgC in 2015 when using the Q3 state(1249 PgC of topsoil carbon and 486 PgC of vegetation carbon) as a result of historical land transitions (Fig. 13 and Table 5 ). Similarly during the spin up, this carbon is reduced to 2448 PgC when using the 90th percentile values (1655 PgC of topsoil carbon and 793 PgC of vegetation carbon). The same values from Houghton et al is 1697 PgC (1181 PgC in soil and 516 PgC in vegetation) . An important point to note is that while the 90th percentile generates results more in line with independent pre-industrial estimates (Table 4 ), the Q3 state results in more realistic contemporary values in 2015 during the GCAM spin up (Figs. 8 and 9 ). For example, the Q3 state results in a contemporary value of 486 PgC of vegetation carbon in 2015, which is closer to contemporary vegetation carbon stock estimates. The Houhgton values produce a contemporary estimate of 516 PgC which is higher likely due to high estimates of tropical vegetation carbon.Whereaes, the 90th percentile results in a global vegetation carbon stock of 793 PgC. Using the 90th percentile would effectively result in an unrealistically high initial vegetation carbon stock that is close to equilibrium in 2015. Another point to note is that the amount of global historical emissions (1700–2015) produced by the Q3 initialization is 176 PgC which is much lower than the global historial emissions using the 90th percentile of 270 PgC (Fig. 14 ). For context, the Global Carbon Project (as of 2021) produced an estimate of annual LUC emissions from 1700–2015 of 196 PgC(Friedlingstein et al., 2022 ). The 90th percentile produces consistently higher annual LUC emissions than the other estimates, except for the dip in 2005. This dip is due to a shift in land use that accumulates excessive soil carbon rapidly in certain regions because of higher carbon densities in specific land types in the new spatially-explicit data. Based on these spin up results and our validation analyses, we found that the Q3 value from our dataset is appropriate for initialization and use in GCAM when using the model to estimate contemporary C dynamics. While the 90th percentile better resembles independent estimates of pre-settlmenet stocks, it results in substantial overestimation when used to estimate contemporary C fluxes. This is a result of assumptions and processes within GCAM pertaining to carbon dynamics. Furthermore, the Q3 data provide a much less dramatic shift from the Houghton data previously used, than the 90th percentile data. The appropriate carbon state for other models, which may implement carbon dynamics differently, could be different and would require a similar analysis. GCAM uses a simple bookkeeping approach to modeling carbon dynamics, with a primary assumption regarding the potential maximum carbon densitie of land types. The model begins by tracking a total stock of carbon for each water basin for each land type for two pools (soil and vegetation) in 1700. Following 1700 onwards, based on historical and future land use, the model calculates fluxes from this initial state. Sigmoidal growth curves are used to track regrowth of vegetation and exponential decay functions are used to track gain and loss in soil carbon. The model also calculates carbon fluxes based on net land use change (e.g. increase in cropland, reduction in forests) as opposed to gross transitions (e.g. cropland increase from grassland, cropland increase from forest loss). Table 5 Results from the historical spin up Initialization carbon pool Initial value in PgC In the year 1700) Contemporary value after spin up in PgC (2015) Historical emissions (PgC) (between 1700 and 2015) Value in 2100 under SSP1 2p6 Additional carbon sequestered during afforestation scenario (2100 value- 2015 value) Houghton vegetation carbon 662.0 516.1 145.9 605.3 89.2 moirai (Q3 value) vegetation carbon 591.7 486.3 105.4 515.9 29.6 moirai (90th percentile) vegetation carbon 965.8 793.2 172.6 847.9 54.7 Houghton soil carbon (top-soil) 1243.5 1181.4 62.1 1220.0 38.6 moirai (Q3 value) soil carbon (top-soil) 1320.5 1249.1 71.4 1274.6 25.5 moirai (90th percentile) soil carbon (top-soil) 1753.0 1655.2 97.8 1700.6 45.4 Houghton Total terrestrial carbon 1905.5 1697.5 208.0 1825.3 127.8 moirai (Q3 value) Total terrestrial carbon 1912.3 1735.4 176.9 1790.6 55.2 moirai (90th percentile) Total terrestrial carbon 2718.8 2448.3 270.5 2548.4 100.1 4E (ii) Results from climate forcing scenario and sensitivity analysis We use one climate forcing scenario with a maximum radiative forcing level of 2.6 watts per square meter by 2100 and shared socioeconomic pathway 1 (SSP1 2p6) to assess how the new carbon data influence land projection in GCAM. Under this scenario land carbon prices are implemented to assign value to terrestrial carbon at the same rate as carbon is valued in the energy system. SSP1 2p6 shows a large afforestation and more generally a large carbon response since it contains a carbon price while also having less stress from socio-economic factors across all IAMs (Popp et al. 2017 ). GCAM by default uses carbon densities from Houghton et al. (1999), which are described in SI Table 2 (soil) and SI Table 3 (vegetation). Note that the changes in land cover under the climate forcing scenario are driven by relative levels of carbon across land types rather than absolute levels of carbon. Therefore, even if forest carbon in some tropical regions are lower than other estimates, forests still sequester much more carbon compared to other land types in these regions. We compare the same three cases as for the historical period: Houghton, the Moirai Q3 value, and the Moirai 90th percentile. The global land allocation comparison under SSP1 2p6 scenario in GCAM (Fig. 15 ) shows that the afforestation/reforestation response is greatly reduced as a result of the spatially explicit carbon (the increase in forest cover from 2020 to 2100 globally is only 3.2 million km 2 when using the moirai Q3 as opposed to 7 million km 2 with the Houghton carbon). IAMs (Including GCAM) generally show a very optimistic afforestation response for this scenario that ranges from 0.5 to 12 million km 2 of trees planted as part of a nature based carbon sequestration strategy under SSP1 2p6 (Popp et al., 2017 ). The high afforestation response in some IAMs has been considered too optimistic by some studies (e.g., Pongratz et al., 2021 ). For Q3, the reduced forest expansion in GCAM with the new carbon data is largely driven by lower forest vegetation carbon densities in the new data, which reducd the incentive to expand forest. Conversely, the 90th percentile case reduces afforestation further, down to 0.1 million km 2 , even though the carbon densities are higher than for Q3. This is because smaller increases in forest cover are required to meet additional afforestation targets in the 90th percentile case. Despite the low afforestation, it adds another 54 PgC of vegetation carbon through afforestation that would result in an unrealistic value of vegetation carbon in 2100: about 847 PgC which is higher than undistrurbed carbon stocks in 1700. On the other hand, Q3 vegetation carbon stock in 2100 is close to 515 PgC (a dditional 30 PgC of carbon added through planted forests).Because the Q3 state reduces afforestation but maintains responsive land allocation and reasonable carbon accumulation, it is a better choice than the 90th percentile for initializing GCAM. Global cropland and shrubland dynamics show a more complicated response than forest (Fig. 15 ). The reduced emphasis on forest expansion reduces the need for cropland abandonment. Cropland also sequesters more soil carbon in some regions (even with the 30% reduction factor), which also reduces abandonment. The shrubland response is also enhanced by higher shrubland vegetation carbon densities in the new spatially-explicit data. Regional responses are dictated by their respective land type distrubutions. Generally, afforestion is maintained or enhanced in tropical forests and decreased in Boreal regions. For example, in Russia the afforestation strategy is completely replaced with a shrubland and grassland preservation strategy (SI Fig. 5 ). This is expected since the region has a relatively high amount of boreal forests. In South Asia however, where non-forest land types dominate, forest expansion persists and is supplemented by shrubland expansion (SI Fig. 6 ). The implementation of the spatially explicit carbon clearly improves land use responses and also suggests that high carbon sequestering shrubs can also be preserved as a part of nature based solutions to mitigate climate change. The robustness of these responses across other radiative forcing scenarios (implemented for more SSPs for example) and across other models need to be studied and is a subject worthy of exploration in a future paper. 5. Usage notes In this paper we present a new dataset of grid-cell level, spatially-explicit carbon data harmonized with Moriai land data types. Our harmonized dataset presents carbon values for 3 pools (topsoil, above ground biomass and below ground biomass) for six statistical states for various land use types. Our dataset is available both at a 5 arcmin resolution and aggregated to 699 land regions. This dataset is designed to enable initialization of spatially explicit carbon in IAMs and MSD models, and we provide and example by applying it to GCAM. In the future, this dataset can be extended to include deeper soil (beyond 0–30 cms) so that land use responses in models can account for an additional deep soil carbon pool. We note however that if deeper soil carbon layers are to be added, regional and global models must also improve their respective carbon dynamics beyond simple bookkeeping approaches to include detailed accounting of environmental conditions. We noted that there are some limitations with respect to the carbon observations (both for soil and vegetation) for tundra. For example, no data were found for 29% of the 5 arcmin gridcells for this land type. The biome mapping also needed to include several source land types to enable an increase in data coverage for tundra. This issue was likely caused by the different definitions of tundra land cover in different datasets. Recently, there have been efforts dedicated to collecting carbon data specifically for this land type. These data should be integrated in future releases of our data to address the current lack of data coverage. As a part of our analysis, we observed that SoilGrids soil carbon values for cropland do not show a depletion when compared to SoilGrids soil carbon values in unmanaged land. As discussed, this is likely due the locations of sampling for cropland soil carbon. As a result, we reduced cropland soil carbon by 30% when we applied it to GCAM. This is in line with similar estimates of loss of soil carbon through agricultural practices and land conver conversion from unmanaged land types to cropland (Cooper et al., 2021 ; Wei et al., 2014 ). If better/improved data on crop soil carbon become available, our data could be updated with the same. Users should also consider % adjustments to crop soil carbon contents based on local cropping intensities. We have also noted that our current estimates of forest vegetation carbon are based on both primary and secondary forests. This is due to the lack of availability of fine resolution (300 m) land masks that distinguish between primary and secondary forests. As more data become available related to forest cover types, a logical next step would be to break out different forest types in our dataset. Finally, our analysis showed that using the Q3 statistical state was most appropriate for GCAM even though it resulted in an initialization of pre-industrial carbon value that was lower than other estimates. Selection of the Q3 results in more accurate historical LUC emissions and the model therefore spins up to a value that is close to other estimates in the literature in 2015. The data are derived for several statistical states (with the Q3 being one of them) for a data-derived set of land types that are not model specific. The resulting dataset is then applied to GCAM as an example. This example for GCAM shows how we analyzed the data to select the appropriate value for GCAM, which includes analysis of the statistical range of options across the spatial distribution. Specific data uncertainty analyses have already been performed on the source data by their creators. Like any dataset, a user must determine how to use it for their particular application and model, and perform the required processing. While we use the data from 2010, we emphasize once more that we utilize contemporary data to extract statistical states to select a potential carbon value that can be used for calibrating the model. More specifically our q3 value selected from the 2010 values is used as a potential carbon density that a model builds towards and is used to spin up the model as seen in our results. We also note that the year 2010 is selected since the SoilGrids and Spawn et al. data are based on land masks for that year. The usage of more contemporary data may not affect this analysis in any significant way (unless the overall distribution of carbon is altered). Declarations Code availability statement As mentioned above, the data can be generated programmatically with scripts that are hosted on GitHub ( https://github.com/JGCRI/moirai/tree/master/ancillary/carbon_harmonization ). The process has been split into two steps where the computationally intensive stage 1 (approx.. 6 hours of processing) is optional with outputs made available in the repository. The Stage 1 processing is performed using bash scripts which use the GDAL software (Warmerdam, 2008 ). The second stage processing uses an R script and can be completed for all carbon pools in approx. 15 minutes to generate the final 72 rasters and the final tabular output file. We have also made available optional diagnostic functions in the R script which can be used to validate results. Competing Interests declaration The authors have declared that none of the authors has any competing interests. Author contributions K.B,N., A.D.V conceived the concept of this paper. K.B,N., A.D.V and E.V. produced the data from the raw input files and also made code changes to the moirai land data system. S.S.L. and H.G produced the vegetation carbon data required by this study and also provided inputs on data interpretation. K.B,N., A.D.V wrote the manuscript with input from all authors. Acknowledgements This research was supported by the U.S. Department of Energy, Office of Science, as part of research in Multi Sector Dynamics, Earth and Environmental System Modeling Program. The Pacific Northwest National Laboratory is operated for DOE by Battelle Memorial Institute under contract DE-AC05-76RL01830 References Barnes WL, Xiong X, Salomonson VV (2003) Status of terra MODIS and aqua MODIS. Adv Space Res 32(11):2099–2106 Batjes NH, Ribeiro E, Van Oostrum A, Leenaars J, Hengl T, de Jesus M, J (2017) WoSIS: providing standardised soil profile data for the world. Earth Syst Sci Data 9(1):1–14 Calvin K, Patel P, Clarke L, Asrar G, Bond-Lamberty B, Cui RY, Di Vittorio A, Dorheim K, Edmonds J, Hartin C (2019) GCAM v5. 1: representing the linkages between energy, water, land, climate, and economic systems. Geosci Model Dev 12(2):677–698 Cooper H, Sjögersten S, Lark R, Mooney S (2021) To till or not to till in a temperate ecosystem? Implications for climate change mitigation. Environ Res Lett 16(5):054022 Di Vittorio AV, Vernon CR, Shu S (2020) Moirai version 3: a data processing system to generate recent historical land inputs for global modeling applications at various scales. J Open Res Softw, 8 (PNNL-SA-142149). Erb K-H, Kastner T, Plutzar C, Bais ALS, Carvalhais N, Fetzel T, Gingrich S, Haberl H, Lauk C, Niedertscheider M (2018) Unexpectedly large impact of forest management and grazing on global vegetation biomass. Nature 553(7686):73–76 Fang Y, Liu C, Huang M, Li H, Leung LR (2014) Steady state estimation of soil organic carbon using satellite-derived canopy leaf area index. J Adv Model Earth Syst 6(4):1049–1064 Friedlingstein P, Jones MW, O'Sullivan M, Andrew RM, Bakker DC, Hauck J, Le Quéré C, Peters GP, Peters W, Pongratz J (2022) Global carbon budget 2021. Earth Syst Sci Data 14(4):1917–2005 Hengl T, de Jesus JM, MacMillan RA, Batjes NH, Heuvelink GB, Ribeiro E, Samuel-Rosa A, Kempen B, Leenaars JG, Walsh MG (2014) SoilGrids1km—global soil information based on automated mapping. PLoS ONE 9(8):e105992 Houghton R (2005) Aboveground forest biomass and the global carbon balance. Glob Change Biol 11(6):945–958 Houghton RA (1999) The annual net flux of carbon to the atmosphere from changes in land use 1850–1990. Tellus B 51(2):298–313 Hugelius G, Strauss J, Zubrzycki S, Harden JW, Schuur E, Ping C-L, Schirrmeister L, Grosse G, Michaelson GJ, Koven CD (2014) Estimated stocks of circumpolar permafrost carbon with quantified uncertainty ranges and identified data gaps. Biogeosciences 11(23):6573–6593 Jackson RB, Lajtha K, Crow SE, Hugelius G, Kramer MG, Piñeiro G (2017) The ecology of soil carbon: pools, vulnerabilities, and biotic and abiotic controls. Annu Rev Ecol Evol Syst 48(1):419–445 Jungkunst HF, Göpel J, Horvath T, Ott S, Brunn M (2022) Global soil organic carbon–climate interactions: Why scales matter. Wiley Interdisciplinary Reviews: Clim Change, e780 Justice C, Townshend J, Vermote E, Masuoka E, Wolfe R, Saleous N, Roy D, Morisette J (2002) An overview of MODIS Land data processing and product status. Remote Sens Environ 83(1–2):3–15 Klein Goldewijk K, Beusen A, Doelman J, Stehfest E (2017) Anthropogenic land use estimates for the Holocene–HYDE 3.2. Earth Syst Sci Data 9(2):927–953 Li W, MacBean N, Ciais P, Defourny P, Lamarche C, Bontemps S, Houghton RA, Peng S (2018) Gross and net land cover changes in the main plant functional types derived from the annual ESA CCI land cover maps (1992–2015). Earth Syst Sci Data 10(1):219–234 Liu X, Yu L, Si Y, Zhang C, Lu H, Yu C, Gong P (2018) Identifying patterns and hotspots of global land cover transitions using the ESA CCI Land Cover dataset. Remote Sens Lett 9(10):972–981 Meiyappan P, Jain AK (2012) Three distinct global estimates of historical land-cover change and land-use conversions for over 200 years. Front earth Sci 6(2):122–139 Nachtergaele F, van Velthuizen H, Verelst L, Batjes N, Dijkshoorn K, van Engelen V, Fischer G, Jones A, Montanarela L (2010) The harmonized world soil database. Proceedings of the 19th World Congress of Soil Science, Soil Solutions for a Changing World, Brisbane, Australia, 1–6 August 2010 Kanishka B, Narayan AD, Vittorio E, Margiotta SA, Spawn, Holly Gibbs (2023) Spatially explicit re-harmonized terrestrial carbon stocks for calibrating Integrated Multisectoral Models (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7884615 Poggio L, De Sousa LM, Batjes NH, Heuvelink G, Kempen B, Ribeiro E, Rossiter D (2021) SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty. Soil 7(1):217–240 Pongratz J, Schwingshackl C, Bultan S, Obermeier W, Havermann F, Guo S (2021) Land use effects on climate: current state, recent progress, and emerging topics. Curr Clim Change Rep, 1–22 Popp A, Calvin K, Fujimori S, Havlik P, Humpenöder F, Stehfest E, Bodirsky BL, Dietrich JP, Doelmann JC, Gusti M (2017) Land-use futures in the shared socio-economic pathways. Glob Environ Change 42:331–345 Ramankutty N, Foley JA (1999) Estimating historical changes in land cover: North American croplands from 1850 to 1992: GCTE/LUCC RESEARCH ARTICLE. Glob Ecol Biogeogr 8(5):381–396 Sanderman J, Hengl T, Fiske GJ (2017) Soil carbon debt of 12,000 years of human land use. Proceedings of the National Academy of Sciences , 114 (36), 9575–9580 Scharlemann JP, Tanner EV, Hiederer R, Kapos V (2014) Global soil carbon: understanding and managing the largest terrestrial carbon pool. Carbon Manag 5(1):81–91 Spawn SA, Sullivan CC, Lark TJ, Gibbs HK (2020) Harmonized global maps of above and belowground biomass carbon density in the year 2010. Sci Data 7(1):1–22 Thomson AM, Calvin KV, Chini LP, Hurtt G, Edmonds JA, Bond-Lamberty B, Frolking S, Wise MA, Janetos AC (2010) Climate mitigation and the future of tropical landscapes. Proceedings of the National Academy of Sciences , 107 (46), 19633–19638 Tifafi M, Guenet B, Hatté C (2018) Large differences in global and regional total soil carbon stock estimates based on SoilGrids, HWSD, and NCSCD: Intercomparison and evaluation based on field data from USA, England, Wales, and France. Glob Biogeochem Cycles 32(1):42–56 van Asselen S, Verburg PH (2012) AL and S ystem representation for global assessments and land-use modeling. Glob Change Biol 18(10):3125–3148 Walker WS, Gorelik SR, Cook-Patton SC, Baccini A, Farina MK, Solvik KK, Ellis PW, Sanderman J, Houghton RA, Leavitt SM (2022) The global potential for increased storage of carbon on land. Proceedings of the National Academy of Sciences , 119 (23), e2111312119 Warmerdam F (2008) Open source approaches in spatial data handling. by Hall, GB & Leahy, MG Berlin, Heidelberg: Springer Berlin Heidelberg , 87–104 Wei X, Shao M, Gale W, Li L (2014) Global pattern of soil carbon losses due to the conversion of forests to agricultural land. Sci Rep 4(1):1–6 Wieder WR, Boehnert J, Bonan GB (2014) Evaluating soil biogeochemistry parameterizations in Earth system models with observations. Glob Biogeochem Cycles 28(3):211–222 Wise M, Calvin K, Thomson A, Clarke L, Bond-Lamberty B, Sands R, Smith SJ, Janetos A, Edmonds J (2009) Implications of limiting CO2 concentrations for land use and energy. Science 324(5931):1183–1186 Tables Table 2 is available in the Supplementary Files section. Supplemental Table 1 SI Table 1 is not available with this version. Additional Declarations The authors declare no competing interests. Supplementary Files Table2.png Table 2: Prioritization matrix to match ESA land cover with moirai land types Cite Share Download PDF Status: Published Journal Publication published 15 Apr, 2025 Read the published version in Scientific Data → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6123546","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":421958729,"identity":"e0c55967-2e6a-40fd-8ccd-f9a9fccd00bf","order_by":0,"name":"Kanishka B Narayan","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABGUlEQVRIiWNgGAWjYBACCSA+wNiQgBDhBxEPSNIi2QAkErAqRmhhQNFicICAFsn2swcP/tyRJs8g3fzsw882uzzj273HHiQw1MqB9WIB0jx5CYd5z+QYNsgcM57Z25ZcbHbnXLpBAsNxY1xa5BhyDA4ztlUwNkgkGDPwtjEnbruRYyaRwHAscWYDDi38bwwO/myrsG+QSP/M+LetPnHzDAJapCVyDA7wtuUkNkjkGDPzth1O3CAB1lKT2I/L+zPeGBzmbUtLbpPIKWaWOXc8ccadM0AtBgeM+XFokTifY/zxZ1uybb9E+mbGN2XVif2ze8wkPlTUybHh0AIHYAWMIBIcVQaHCWmAgT8wLQx1xGoZBaNgFIyC4Q8AaYFddhDvLPoAAAAASUVORK5CYII=","orcid":"","institution":"Pacific Northwest National Lab","correspondingAuthor":true,"prefix":"","firstName":"Kanishka","middleName":"B","lastName":"Narayan","suffix":""},{"id":421958730,"identity":"a6ad5911-b216-49b7-b229-3c93da42ecfd","order_by":1,"name":"Alan V. Di Vittorio","email":"","orcid":"","institution":"Lawrence Berkeley National Lab (LBNL), Berkeley, CA, USA","correspondingAuthor":false,"prefix":"","firstName":"Alan","middleName":"V. Di","lastName":"Vittorio","suffix":""},{"id":421958731,"identity":"1d651083-8553-495f-85f5-47c51b8f9f78","order_by":2,"name":"Evan Margiotta","email":"","orcid":"","institution":"Pacific Northwest National Lab","correspondingAuthor":false,"prefix":"","firstName":"Evan","middleName":"","lastName":"Margiotta","suffix":""},{"id":421958732,"identity":"b038f838-7155-468a-a4ff-dd20b91ff8ed","order_by":3,"name":"Seth Spawn-Lee","email":"","orcid":"","institution":"Department of Geography, University of Wisconsin-Madison, Madison, WI, USA","correspondingAuthor":false,"prefix":"","firstName":"Seth","middleName":"","lastName":"Spawn-Lee","suffix":""},{"id":421958733,"identity":"0b7dd8af-0e98-43ec-ad48-169727eecbc3","order_by":4,"name":"Holly K. Gibbs","email":"","orcid":"","institution":"Department of Geography, University of Wisconsin-Madison, Madison, WI, USA","correspondingAuthor":false,"prefix":"","firstName":"Holly","middleName":"K.","lastName":"Gibbs","suffix":""}],"badges":[],"createdAt":"2025-02-27 19:26:44","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-6123546/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6123546/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41597-025-04723-4","type":"published","date":"2025-04-16T00:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":78253354,"identity":"61aa0057-0ab0-4776-93f9-6adf22319d4c","added_by":"auto","created_at":"2025-03-11 10:18:00","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":254629,"visible":true,"origin":"","legend":"\u003cp\u003eDescription of data processing implementation to generate carbon datasets\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/9e378d9c5c63b917ba7902f6.png"},{"id":78253347,"identity":"4f7e3835-87cc-4636-9e93-a6cb55ec13fb","added_by":"auto","created_at":"2025-03-11 10:17:59","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":357563,"visible":true,"origin":"","legend":"\u003cp\u003eSoil carbon for each 5 arcmin grid cell in MgC/ha. Values are shown for two statistical states, namely the Q1 and the Q3.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/fde94ba10615d92ea5671d07.png"},{"id":78253349,"identity":"846b4eae-52e5-41bd-a641-ebbd5653055e","added_by":"auto","created_at":"2025-03-11 10:18:00","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":394000,"visible":true,"origin":"","legend":"\u003cp\u003eVeg carbon (aboveground) in MgC /ha across 5 arcmin grid cells for aggregate land types for the Q3 state. Values are shown for two statistical states, namely the Q1 and the Q3\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/34136f2646b6c735e6a4a35b.png"},{"id":78256188,"identity":"4b22bbd2-b89b-4316-8528-59ee7176babd","added_by":"auto","created_at":"2025-03-11 10:50:00","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":119276,"visible":true,"origin":"","legend":"\u003cp\u003eWithin basin distributions of soil carbon in MgC/ha for the Amazon basin. Each facet shows a distribution for a land type. The final basin level statistical states are shown as dots with the Q3 state shown as the orange line.\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/ac1dc2b2e23fa192b48c8fea.png"},{"id":78253358,"identity":"74857912-5bd5-4715-9162-13577f9be8d3","added_by":"auto","created_at":"2025-03-11 10:18:00","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":104685,"visible":true,"origin":"","legend":"\u003cp\u003eWithin basin distributions of aboveground biomass carbon in MgC/ha for the Amazon basin. Each facet shows a distribution for a land type. The final basin level statistical states are shown as dots with the Q3 state shown as the orange line.\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/6239e778e26d2a60d0328992.png"},{"id":78254767,"identity":"5c736046-438c-4def-81f8-b651673b760d","added_by":"auto","created_at":"2025-03-11 10:34:00","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":466515,"visible":true,"origin":"","legend":"\u003cp\u003eA.) Topsoil (0-30 cms) carbon in MgC/ha for 5 arcmin pixels using moirai 90\u003csup\u003eth\u003c/sup\u003e percentile B.) Top soil (0-30cms) carbon in MgC/ha from Sanderman et al. assuming a no land use condition. C.) Histogram showing percent error between A and B. Dark blue dashed line represents mean error across all pixels which is at -27%.\u0026nbsp; \u0026nbsp;\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/bc4e9a0280f721eb22bf2b72.png"},{"id":78254448,"identity":"75698e33-46bf-437d-a7b7-5a120cfb48f5","added_by":"auto","created_at":"2025-03-11 10:26:00","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":503708,"visible":true,"origin":"","legend":"\u003cp\u003eA.) Vegetation (aboveground) carbon in MgC/ha for 5 arcmin pixels using moirai 90\u003csup\u003eth\u003c/sup\u003e percentile B.) Vegetation (aboveground) carbon in MgC/ha from Walker et al. constrained for initial land use C.) Histogram showing percent error between A and B. Dark blue dashed line represents mean error across all pixels which is at -15%.\u003c/p\u003e","description":"","filename":"image9.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/29a08fc9bcdcbc3b2bbda22b.png"},{"id":78254454,"identity":"2d74b455-5790-4bec-8295-906fc209d314","added_by":"auto","created_at":"2025-03-11 10:26:00","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":145515,"visible":true,"origin":"","legend":"\u003cp\u003eThis study’s soil carbon densities by land type across basins vs. other contemporary global estimates.\u003c/p\u003e","description":"","filename":"image11.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/28d83d91411286f20e8560af.png"},{"id":78254455,"identity":"13a44e8f-2f3d-42c5-8b89-acb6033e0513","added_by":"auto","created_at":"2025-03-11 10:26:00","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":176091,"visible":true,"origin":"","legend":"\u003cp\u003eThis study’s vegetation carbon densities by land type across basins vs. other contemporary estimates.\u003c/p\u003e","description":"","filename":"image12.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/c538c8a66dbb5b00d1cf99cb.png"},{"id":78253372,"identity":"684ea642-1cb9-4c47-a3c0-3811084924a5","added_by":"auto","created_at":"2025-03-11 10:18:00","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":341277,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of unmanaged Q3 carbon stocks and Cropland carbon stocks for A.) soil carbon and B.) aboveground biomass. Values here are aggregated to individual countries\u003c/p\u003e","description":"","filename":"image13.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/054fc5a64c0f9bb96aab1925.png"},{"id":78255957,"identity":"43e01fe8-6f2a-4650-9ae4-4614cd7d1bc8","added_by":"auto","created_at":"2025-03-11 10:42:01","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":559260,"visible":true,"origin":"","legend":"\u003cp\u003eQCD values for topsoil carbon across basins and land types\u003c/p\u003e","description":"","filename":"image14.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/495f8285d9590bfec308d1ee.png"},{"id":78253359,"identity":"9af69ef4-4c85-4e5b-9689-749c47a88552","added_by":"auto","created_at":"2025-03-11 10:18:00","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":541258,"visible":true,"origin":"","legend":"\u003cp\u003eQCD values for aboveground vegetation across basins and land type\u003c/p\u003e","description":"","filename":"image15.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/0e448e1b74c05d60aa9693f1.png"},{"id":78253351,"identity":"80076e7a-9ef4-4c60-b490-42435762a4d0","added_by":"auto","created_at":"2025-03-11 10:18:00","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":98827,"visible":true,"origin":"","legend":"\u003cp\u003eDescriptions of the results of the spin up process. Global vegetation carbon during spin up (1700-2015) and the SSP1 2p6 climate forcing scenario (2016-2100) for our initialization options.\u003c/p\u003e","description":"","filename":"image16.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/95ed3dd702239bc66d6d7a71.png"},{"id":78253361,"identity":"5f5fdf50-9578-44f3-aa15-1b0b44170324","added_by":"auto","created_at":"2025-03-11 10:18:00","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":39951,"visible":true,"origin":"","legend":"\u003cp\u003eAnnual Global LUC emissions from GCP 2021 and our two initialization options\u003c/p\u003e","description":"","filename":"image17.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/29fb937d0f36c071b1d0642f.png"},{"id":78255954,"identity":"acac4142-4ec2-4ced-876d-def11a0d656b","added_by":"auto","created_at":"2025-03-11 10:42:00","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":28335,"visible":true,"origin":"","legend":"\u003cp\u003eGlobal land allocation in GCAM under the SSP1 2p6 scenario by land type\u003c/p\u003e","description":"","filename":"image18.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/2e20dc0741a0050e57b32106.png"},{"id":80912461,"identity":"e7971792-d960-4094-86a6-10a46fe75a96","added_by":"auto","created_at":"2025-04-18 16:38:01","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5080541,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/f051a391-580b-4d15-b9f1-f2f44934a7bd.pdf"},{"id":78253348,"identity":"cdb8bd11-2f05-478c-82b2-e22501ff79b8","added_by":"auto","created_at":"2025-03-11 10:17:59","extension":"png","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":100052,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eTable 2: Prioritization matrix to match ESA land cover with moirai land types\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Table2.png","url":"https://assets-eu.researchsquare.com/files/rs-6123546/v1/94d452958ef71dad97fd9539.png"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eSpatially explicit terrestrial carbon densities for calibrating the carbon cycle in human-Earth system Models\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"1. Background \u0026 Summary","content":"\u003cp\u003eGlobal and regional models such as Multisector dynamics models (MSDs) and Integrated human-Earth system (IHES) models are routinely used to assess alternative socio-economic, land use and energy transition pathways\u003csup\u003e1\u003c/sup\u003e. MSD and IHES models are in the same family of models but have one key difference. MSD models tend to be more economic in nature and lacking in the representation of biophysical processes (e.g. agriculture land use is well represented but the nitrogen cycle is not). IHES models generally are more representative of these biophysical processes through coupling the MSD model with finer resolution land models. As a simple example, GCAM is an MSD model (Calvin et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2019\u003c/span\u003e, Binsted et al. 2021) while GCAM when coupled with a detailed land model like GLM (under the GCAM-E3SM framework) represents an IHES model (Calvin et al. 2018). These models also examine interactions between natural systems (e.g. land, water systems) and human systems (food and energy demand). Soil and vegetation carbon densities play a critical role in these models by influencing the productivity of and profitability from land types (e.g., forest yields, pasture yields, and crop yields) and land use change emissions. Moreover, these densities affect land use change pathways under climate forcing scenarios and low carbon transition scenarios implemented in these models (Thomson et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2010\u003c/span\u003e; Wise et al., \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2009\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIHES, MSD models and economic models generally need to be calibrated with specific carbon densities to initialize the carbon cycle since these models cannot simulate carbon densities through use of spin ups similar to process based earth system models (e.g. Community Land Model or CLM). Note that we define the term \u0026ldquo;densities\u0026rdquo; as the stock of carbon denominated in mg/ha, which has already been normalized to bulk density. This term is distinct from the bulk density which is a volumetric term. We also note that the data on carbon densities is also useful for models other than the above mentioned IHES, MSD models (as an example, the Global Trade Analysis Project which utilizes a CGE framework is also calibrated with data from the FAO HWSD compiled by Gibbs, Yui et al. (2014), aggregated to the GTAP AEZ boundaries (Aguiar et al. 2022). Land use focused models such as the Global Biosphere Management Model (GLOBIOM) also make use of emissions factors from the IPCC in addition to fine resolution data on carbon stocks to estimate carbon emissions from land use change (Frank et al. 2024). \u003cb\u003eSI\u003c/b\u003e Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows the carbon inputs used by different models for initializing the carbon cycle for both soil and vegetation carbon. Another challenge is that these models need to be initialized with densities that represent long term potential maximum carbon values since these values are used to spin-up the model in historical years. Thus, carbon densities are used in these models to spin up the historical carbon cycle (e.g. 1700\u0026ndash;2015) and model future land use change emissions (e.g. 2016\u0026ndash;2100). There have been efforts to reconstruct long term potential carbon densities for soil and vegetation which can inform such calibration efforts. These studies have found that the long term potential carbon densities are much higher than the contemporary values due to ongoing land use and cover change (Erb et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Walker et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).Moreover, studies have also highlighted difficulties in estimating long term potential carbon densities, since these estimations require long spin up periods themselves (Fang et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2014\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eTo address the above issues, models currently make use of carbon density data that are differentiated by land type but are not always spatially explicit. These carbon densities are often representative of undisturbed land, and thus represent a long-term potential maximum. For example, models have previously used estimates of carbon values on undisturbed land from Houghton et al. (Houghton, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e1999\u003c/span\u003e) and the IPCC (Jackson et al., \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2017\u003c/span\u003e), among others for initialization of the carbon cycle. But these data not being spatially explicit/differentiated often lead to over and underestimation of carbon sequestration potentials, especially in future land use scenarios.\u003c/p\u003e \u003cp\u003eRecently, spatially explicit contemporary data on soil and vegetation carbon data have become available. For example, soil carbon density data have been made available by the FAO (Nachtergaele et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2010\u003c/span\u003e) at a 1 km resolution and at 250 m resolution by the SoilGrids team at the International Soil Reference and Information Centre (ISRIC) (Batjes et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Hengl et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). Use of spatially distinct, fine resolution data such as these has the potential to significantly improve results from global and regional models by better capturing the geographies of soil and vegetation carbon stocks (Jungkunst et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Even though these data represent contemporary carbon, they can be used to derive potential maximum carbon values that are spatially explicit. However, these carbon data need to be transformed significantly to be used in a robust manner by regional and global models. This is because each of these fine resolution datasets utilizes its own assumptions of land use and land cover which may be distinct from the land use and land cover definitions used by the models in question. For example, many of these fine resolution data use land cover definitions from the European Space Agency Climate Change Initiative (ESA CCI) dataset (Li et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Liu et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2018\u003c/span\u003e) while models may use land use definitions from the Historical Database of the Global Environment (HYDE) dataset (Klein Goldewijk et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) and/or land cover definitions from the Moderate Resolution Imaging Spectrometer (MODIS)(Barnes et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2003\u003c/span\u003e; Justice et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2002\u003c/span\u003e). Resolution mismatch between data and models provides an additional challenge. The new, spatially distinct carbon densities are available at a very fine resolution (250m / 300 m) while models are often configured to use coarser data that better match their working resolution. For example, consistent land datasets that have frequently been used for climate modelling are available at a resolution of 5 arcmins (i.e. ~ 10km at the equator) (van Asselen \u0026amp; Verburg, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2012\u003c/span\u003e), and many regional models operate on land units defined by geopolitical and/or geophysical boundaries (See descriptions above for GTAP and GLOBIOM) (Aguiar et al. 2022, Frank et al. 2024). Given the difference in resolutions, and the above-mentioned differences in land classifications, a harmonization method is required to appropriately match the fine resolution carbon data with the appropriate land uses and land cover types within a model ( as a simple example, a method is required to assign forest carbon correctly to forest portion of pixels, grass carbon to grass portion pixels and that this occurs across the three pools- soil carbon, above ground biomass and below ground biomass). To our knowledge, there is no custom dataset or consistent method that is representative of spatially explicit carbon which can be used to calibrate the carbon cycle in the above mentioned global and regional models.\u003c/p\u003e \u003cp\u003eTo address the above limitations, we prepare and present a harmonized dataset of fine resolution organic carbon densities for soil and vegetation biomass to initialize the carbon cycle in IHES and MSD models. The soil data are based on the 250 m-resolution SoilGrids dataset and represent a depth of 0\u0026ndash;30 cm (topsoil carbon) (Hengl et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). While multiple depths of soil carbon data are available (e.g. 30 cm -200 cm), we use the depth of 0\u0026ndash;30 cm which refers to topsoil carbon since most regional models only make use of carbon stock data at this depth. We note that our programmatic method can produce soil carbon data at different depths if required. These original soil carbon data from the SoilGrids dataset are denominated in MgC of soil carbon per hectare and are derived from several soil properties including the bulk density, clay content, sand and silt content. The aboveground and below ground carbon data are based on the 300 m-resolution Spawn et al. biomass dataset (Spawn et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The harmonization process associates spatially-explicit carbon densities with specific land types to avoid errors due to mismatches in land type distributions between carbon data and models. For example, the carbon data in a given pixel may be associated with forest, but the model considers this pixel as grassland. The harmonized data associates pixel level carbon with its appropriate land type such that it can be aggregated appropriately to model land types and grids. As a simple example Forest carbon is correctly assigned to forest portion of pixels, grass carbon is assigned to grass portion of pixels and we ensure that this occurs across the three pools- soil carbon, above ground biomass and below ground biomass.\u003c/p\u003e \u003cp\u003eWe implemented this carbon reharmonization programmatically in the \u003cem\u003eMoirai\u003c/em\u003e land data system (Di Vittorio et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e), which can be used to update the data, validate the data (e.g. generating fine resolution and tabular data which can be compared to other sources), define alternate land unit boundaries (e.g., water basins or agro-ecological zones), and harmonize the source carbon data with a generic land type distribution at coarser resolution that is consistent with other land data. The carbon dataset is effectively harmonized with the current \u003cem\u003eMoirai\u003c/em\u003e land data system (Di Vittorio et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) land use and land cover definitions (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) that can be aggregated to model-specific land types. \u003cem\u003eMoirai\u003c/em\u003e is a software system can generate tabular land use and land cover data for any year based on fine resolution datasets (See section 2 for details).\u003c/p\u003e \u003cp\u003eTo demonstrate the application of the harmonized data, we use it to initialize the Global Change Analysis Model (GCAM). GCAM\u0026rsquo;s carbon cycle needs to be initialized via a spin up from 1700\u0026ndash;2015 during which land change is applied to the pre-industrial state to determine the distribution and carbon contents of land types in 2015. The model uses a bookkeeping approach to track carbon changes during spin up and during future simulations (Calvin et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Note that in GCAM\u0026rsquo;s bookkeeping approach the model begins by tracking a total stock of carbon for each water basin for each land type for two pools (soil and vegetation) in 1700. Following 1700 onwards, based on historical and future land use, the model calculates fluxes from this initial state. Fluxes are based on net land use change e.g. change in cropland, change in forests. The net land use change is calculated based on calibrated data before 2015 and is modelled based on economic profitability post 2015. Sigmoidal growth curves are used to track regrowth of vegetation and exponential decay functions are used to track gain and loss in soil carbon. Using the harmonized dataset described above, we derived data driven, long-term potential carbon values for soil and vegetation for GCAM\u0026rsquo;s land units, which are defined by geopolitical and watershed boundaries. By analyzing the distribution of carbon values within each land type-watershed combination we found two potential options for initializing GCAM\u0026rsquo;s carbon cycle. The first is the Q3 (3rd quartile of all 300 m pixels in a given land type in a given watershed) that represents a low carbon initialization (2144 PgC of global terrestrial carbon in 1700) and the second is the 90th percentile (90th percentile of all pixels in a given land type in a given watershed) state that represents a high carbon initialization (3028 PgC in 1700). We also calculated five additional data driven statistical states: area weighted average, minimum, maximum, median, and Q1. Users can calculate any percentile within the distribution of carbon both at a pixel level and at a land region/water-basin level for any land type (For example, the 95th percentile). Note that such a calculation would not be time intensive given that the six summary states are already available at multiple scales. This dataset can be effectively used to characterize uncertainty in carbon estimates in models such as GCAM.\u003c/p\u003e \u003cp\u003eFor validation, we also compared the Q3 and the 90th percentile carbon state in our dataset at a 5 arcmin resolution (which are intended to represent pre-industrial carbon states) with similar estimates of potential pre-industrial top-soil (0\u0026ndash;30 cms)carbon by grid cell from Sanderman et al.(Sanderman et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) and with similar estimates of vegetation carbon from Walker et al. (Walker et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) at the same resolution. We also perform global-level validation of our carbon data, respecting that there is a high degree of uncertainty in carbon estimates from different datasets (Scharlemann et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Tifafi et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). We also implemented these data in GCAM and found that utilizing this new carbon dataset for the spin-up improved several responses in GCAM, especially under forcing scenarios where the value of terrestrial carbon is priced using a carbon tax.\u003c/p\u003e \u003cp\u003eThe harmonized data are available as rasters at 5 arcmin resolution because the \u003cem\u003eMoirai\u003c/em\u003e land data integration is performed at this resolution (Di Vittorio et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). We also present an easy-to-use tabular output summarizing the six carbon density states for each carbon pool for each land type within each of the 235 watersheds intersected with 207 country (ISO) boundaries that are modelled by GCAM.\u003c/p\u003e \u003cp\u003eThe final available dataset includes raster files for the different statistical states for each land use type (Cropland, Urban land, Pasture and Unmanaged land) and each carbon pool, bringing the total to 72 distinct raster files. Unmanaged land here refers to land that is currently not grazed or cropped or used as urban land, and is segregated into 15 different types. A thematic file labels each cell with the dominant biome for Unmanaged land (out of 15, Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). We also present the tabulated text file with the six carbon state values for each land type and carbon pool aggregated to 699 land regions (235 water basins intersected with 207 country boundaries). Making the data available at these different resolutions should help facilitate effective multiscale modelling of terrestrial carbon.\u003c/p\u003e"},{"header":"2. Methods","content":"\u003cp\u003eOur carbon data processing method can be organized into three stages:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eStage 1- Resampling source datasets based on fine resolution land cover\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eStage 2- Re-mapping the carbon to Moirai land use and land cover\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eStage 3- Aggregating raster carbon data to basin boundaries\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eStage 1 \u0026ndash; resampling source data\u003c/b\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThis stage combines the 250 m resolution organic soil carbon and 300 m vegetation carbon data (MgC/ha) with the 300 m resolution ESA CCI input land cover data corresponding with the carbon data. We resample both carbon datasets to match the ESA CCI 300 m grid before this stage. We use a simple GDAL resampling approach to align the 250m and 300m grids which makes use of a weighted average value for each land type.\u003c/p\u003e \u003cp\u003eWe first generate land cover masks (1\u0026thinsp;=\u0026thinsp;respective land type present, 0\u0026thinsp;=\u0026thinsp;otherwise) for each of 22 aggregated ESA CCI land cover types (\u003cb\u003eSI\u003c/b\u003e Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). We combine the land cover masks with the carbon data to create 66 rasters (22 land types X 3 carbon pools), each representing a carbon data mask for an ESA land type. The resulting rasters are calculated as follows:\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Equ1\" class=\"Equation\"\u003e \u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:{Carbon\\_LT\\_300m}_{pool,j,LT}={Carbon\\_300m}_{pool,j\\:}*{LT\\_mask\\_300m}_{j}$$\u003c/div\u003e \u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eWhere,\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003ej is the index of a 300m grid cell,\u003c/p\u003e \u003cp\u003epool is the carbon pool (soil, aboveground biomass, belowground biomass),\u003c/p\u003e \u003cp\u003eLT is the ESA land type.\u003cdiv class=\"BlockQuote\"\u003e\u003cp\u003eNext we use six distinct resampling methods to re-grid these data to a 5 arcmin resolution. Each method is applied to each of the land types and thus we derive 6 statistical states for each land type in each 5 arcmin grid cell. These aggregated rasters are calculated as follows:\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:{Carbon\\_LT\\_5arcmin}_{pool,i,state}=state\\left(\\begin{array}{cc}{Carbon\\_LT\\_300m}_{pool,j}\u0026amp;\\:{Carbon\\_LT\\_300m}_{pool,j+2}\\\\\\:{Carbon\\_LT\\_300m}_{pool,j+1}\u0026amp;\\:{Carbon\\_LT\\_300m}_{pool,j+n}\\end{array}\\right)\\:$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eWhere,\u003c/p\u003e \u003cp\u003ei is the index of a 5 arcmin grid cell,\u003c/p\u003e \u003cp\u003epool is the carbon pool (soil, aboveground biomass, belowground biomass),\u003c/p\u003e \u003cp\u003estate is the resampling method (weighted average, median, min, max, q1, q3),\u003c/p\u003e \u003cp\u003ej is the index of each 300 m grid cell within aggregated cell i,\u003c/p\u003e \u003cp\u003en is the total number of 300 m cells that are aggregated into cell i.\u003c/p\u003e \u003cp\u003eThus, we generate 366 (22 land cover types X 3 types of carbon X 6 states) layers of carbon that correspond to the aggregated ESA CCI land cover types. This processing is largely conducted through the GDAL software (Warmerdam, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2008\u003c/span\u003e) and implemented using bash scripts.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eStage 2 \u0026ndash; remapping the carbon data to Moirai land use/cover\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eHarmonization of ESA land cover with Moirai land cover at 5 arcmins using a prioritization matrix\u003c/b\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eNext, the 366 layers described above are aligned with the default initial \u003cem\u003eMoirai\u003c/em\u003e land use/cover for (2010) at a 5 arcmin resolution. These initial land use/cover data are based on land use data from the HYDE (Klein Goldewijk et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) database and a one-half degree land cover product (Meiyappan \u0026amp; Jain, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). \u003cem\u003eMoirai\u003c/em\u003e can generate land use and land cover maps for any year based on the these datasets combined with a potential vegetation dataset from Ramankutty et al. (1999). The potential vegetation is that which would most likely exist now in the absence of human activities. The Moirai land use and land cover types are listed in Table \u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. It is important to note that carbon values are independently assigned to each of the four Moirai land use types in each cell, and that the unmanaged land use type can be only one of the Moirai land cover types in each cell. Moirai is described in more detail in Di Vittorio et al.(Di Vittorio et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Ultimately moirai generates land use and land cover data for 18 different land types, which are data-specific but not model specific, that can be aggregated to coarser land types required by models such as GCAM. When these data are implemented in regional models like GCAM, they are aggregated to coarser land types (e.g. 7 aggregate land types in GCAM).\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eland use, land cover types for Moirai/GCAM. Total of 4 land use types, 15 types of land cover tracked for Unmanaged land type\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLand use\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLand cover\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCropland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCropland\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePasture\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePasture\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUrbanland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUrbanland\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"14\" rowspan=\"15\"\u003e \u003cp\u003eUnmanaged\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTropicalEvergreenForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTropicalDeciduousForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTemperateBroadleafEvergreenForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTemperateNeedleleafEvergreenForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTemperateDeciduousForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBorealEvergreenForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBorealDeciduousForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEvergreen/DeciduousMixedForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSavanna\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGrassland/Steppe\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDenseShrubland\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOpenShrubland\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTundra\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDesert\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePolardesert/Rock/Ice\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eThe carbon for each Moirai land type in a cell needs to be selected from the 366 rasters generated in Stage 1 described above. We use a rule-based harmonization approach where we select the appropriate carbon values by matching the Moirai land type with the corresponding ESA land cover type (Table\u0026nbsp;2). We assign 6 possible ESA land cover types to each Moirai land type and rank them according to their similarity with the Moirai land type. This means that carbon values for a particular Moirai land type can come from any of six ESA land cover types, as long as they are present in a given cell. For example, a Tropical Evergreen Forest cell in Moirai, may be assigned carbon values from the Evergreen_Combined, Mixed_Forests, Mosaic_Tree, Flood_Tree_Cover, Unknown_Tree_Cover, or Sparse_Treecover ESA land cover types. The similarity ranking both maximizes the number of Moirai land type assignments and ensures that the most appropriate carbon values are selected. The first ESA land cover in the ranked list that is present in each cell provides the carbon values for the corresponding Moirai land type in the same cell (Table\u0026nbsp;2). In the example above, The Evergreen_Combined carbon data would be chosen first over all other ESA land covers if it existed in a given cell and the Sparse_Treecover carbon data would be chosen if it were the only ESA land cover from the list that existed in a given cell. These prioritization rules are designed such that carbon data from one biome is not assigned to a different biome when reharmonizing and re-gridding the carbon. The ESA land cover selection is done once for each cell and Moirai land type, and then the data from the corresponding carbon pool and state rasters are assigned to the Moirai land type in the target cell. This results in 72 rasters that become input files for Moirai.\u003c/p\u003e \u003cp\u003eWe used expert judgement when developing the matrix to best represent the Moirai land types when selecting from the ESA land types. For certain land types we allow less than six choices (Table\u0026nbsp;2). For example, carbon for a Moirai Desert cell can only be chosen from a corresponding desert cell in the ESA masks. On the other hand, Moirai Tundra includes eight ESA land covers because ESA does not have an explicit Tundra class. The increased number of options aims to provide adequate data coverage for Tundra. Tundra data selection prioritizes polar desert rock ice pixels. The location of these pixels coincides with the Tundra land cover and they also represent pixels with high values for soil carbon densities. Furthermore, certain biome types that are not represented explicitly in Moirai or not modelled by GCAM receive low priority rankings. For example, Flooded land types are never included as a first priority choice for any land type since Moirai does not explicitly include flooded land types. Conversely, the ESA land cover data do not include any explicit representation of pastures or rangeland. Our rules assign pasture carbon values based on proximate grassland or shrubland carbon values. Note that we do separate out pastures (grazed grassland) and unmanaged grasslands as separate land types. However, the carbon for pastures have to be imputed from the carbon from grasslands since both the vegetation and soil carbon data are based on the ESA land cover data that do not differentiate between unmanaged grassland and pastures. SI Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e shows an illustrative example of how the hierarchical rules are applied for 1 land type (Tropical Evergreen Forests).\u003c/p\u003e \u003cp\u003e \u003cem\u003eTable\u0026nbsp;2: Prioritization matrix to match ESA land cover with moirai land types\u003c/em\u003e \u003c/p\u003e \u003c/div\u003e \u003cp\u003e \u003cb\u003eImplementation of nearest neighbor algorithm to increase data coverage\u003c/b\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eAfter implementing the prioritization rules there remain 5 arcmin cells with no carbon data coverage for a given land type and carbon pool. This is expected since the land cover data used to generate the carbon masks (ESA CCI land cover data) may be different from the land cover data used in HYDE and SAGE. We therefore implement a nearest neighbor algorithm to interpolate data to each \u0026lsquo;no data\u0026rsquo; cell based on availability in 40 neighboring cells. This algorithm fills the target cell and land type with the corresponding carbon data of the closest cell with matching land type. If no matches are found within the prescribed window then the target cell remains without data for that particular carbon pool and that particular land type. Environmental and topographical criteria are not considered at this stage, but the source carbon data have included topographical characteristics in sampling their values.\u003c/p\u003e \u003cp\u003eCarbon data coverage after interpolation is reasonable except for a few land types. Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows the data coverage by land type after implementation of the nearest neighbor algorithm. All but three land types have over 80% data coverage for soil and vegetation carbon. At least 25% of Tundra and Polar desert cells remain without carbon data. This is likely a result of differences in way Tundra land cover is defined by different datasets.\u003c/p\u003e \u003cp\u003eThere have been more recent efforts to collect soil carbon data specifically for the permafrost and Tundra regions such as that by Hugelius et al.(Hugelius et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). This suggests that a future area of work would be to incorporate these more detailed datasets into either the source data or our processing workflow. Along with Tundra and Polar deserts, over 20% of the Urban land cells do not have carbon data. This is once again likely due to the different definitions of Urban land cover indifferent datasets. Our data coverage suggests that there exists more uncertainty in the Tundra, Polar, and Urban carbon values purely based on limited data availability. Recognizing and quantifying data availability by land type enables users to utilize their own judgement when using the carbon values for these land types.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDetails of NODATA cells after nearest neighbor interpolation\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLand type\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTotal 5arcmin grid cells\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eVegetation carbon Percentage unfound (NO DATA cells after interpolation)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSoil carbon Percentage unfound (NO DATA cells after interpolation)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePasture\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1195396\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e2.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCropland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e952850\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e17\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrassland/Steppe\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e498404\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e14.6\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOpenShrubland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e274296\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDesert\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e195579\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTropicalEvergreenForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e190780\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSavanna\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e173776\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e7.6\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBorealEvergreenForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e148756\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePolardesert/rock/ice\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e132021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e24.9\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUrban\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e119597\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e22.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e22.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTemperateDeciduousForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e86922\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDenseShrubland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e78065\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e9.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTemperateNeedleleafEvergreenForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e71600\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBorealDeciduousForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e65824\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTropicalDeciduousForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e56377\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTundra\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e25000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e24.9\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTemperateBroadleafEvergreenForest/Woodland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e14395\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eStage 3 - Aggregating raster carbon data to 699 land regions\u003c/b\u003e \u003c/p\u003e \u003cp\u003eAs a final step, we us the 72 rasters generated in Stage 2 as inputs to the Moirai land data system. Moirai integrates these data with other land data (e.g., protected area, agricultural suitability, and specific crop data) and aggregates all the data to 699 land regions from the 5 arcmin grid cell level. The 699 land regions are the intersection of 235 water basins and 207 countries and are shown as a map in \u003cb\u003eSI\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. GCAM uses water basin definitions from the Community Land Model (CLM) (Tesfa et al 2014)).The definition of the water basins is a user-specified feature in Moirai and can be changed to any desired boundary set. For example, an alternative set of boundaries based on agro-ecological zones used by GTAP is included with Moirai (and in the final data product).\u003c/p\u003e \u003cp\u003eThe final carbon state values for each land type are aggregated to each land region for each carbon pool (aboveground biomass, belowground biomass, soil 0\u0026ndash;30 cms). These outputs are available as a tabular text file. The moirai land data system performs this aggregation using the same land masks for the year 2010 which are used in the Stage 2 processing. The basic aggregation performed by moirai is summarized in Eq.\u0026nbsp;\u003cspan refid=\"Equ3\" class=\"InternalRef\"\u003e3\u003c/span\u003e below\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:{Carbon\\_tabular}_{pool,GLU,state,LT}=state\\left(\\begin{array}{cc}{Carbon\\_5arcmin\\_LT}_{pool,j}\u0026amp;\\:{Carbon\\_5arcmin\\_LT}_{pool,j+2}\\\\\\:{Carbon\\_5arcmin\\_LT}_{pool,j+1}\u0026amp;\\:{Carbon\\_5arcmin\\_LT}_{pool,j+n}\\end{array}\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eWhere,\u003c/p\u003e \u003cp\u003epool is the carbon pool (aboveground biomass, belowground biomass, topsoil (0\u0026ndash;30 cms)),\u003c/p\u003e \u003cp\u003estate is the aggregation method (area-weighted average, median, min, max, q1, q3),\u003c/p\u003e \u003cp\u003eGLU represents a land region which is an intersection of 207 country boundaries and 235 watershed boundaries,\u003c/p\u003e \u003cp\u003ej is the grid cell index for each 5 arcmin grid cell in a basin with land type LT,\u003c/p\u003e \u003cp\u003en is the total number of cells in a basin for a given land type,\u003c/p\u003e \u003cp\u003eand LT is the land type.\u003c/p\u003e \u003cp\u003e \u003cb\u003eStage 4 \u0026ndash; Deriving any other percentile using our six statistical states\u003c/b\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eUsing our six summary states, users can calculate any percentile for the carbon value in any pixel for each of our 19 land types and three carbon pools (soil, above ground biomass, below ground biomass). These values can also be calculated directly for a land region/water basin. The percentile values can be calculated assuming that carbon values are lognormally distributed (this is established in our analysis below- See section 4.1) The steps to calculate any percentile are as follows,\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eCompute a mean value as a natural log of the median state. Since the distribution of carbon values is lognormal, the natural log of our median would be an estimated mean for the lognormal distribution.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eCompute an estimated standard deviation using a natural log of the Q3 and the mean value in step 1, specifically we use the formula- (LN(Q3)- LN(mean))/0.675. Note that this formula used here is simple and assumes a normal distribution of carbon. However, the statistics available can be used to fit any distribution.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eEstimate the percentile value from the mean and standard deviation above. Since the logged distribution is normal, users can compute this value using a z table for a normal distribution.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eCalculate the exponent of the value in step 3.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eConstrain this value to the max observed value in our dataset.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eThis method would enable a timely calculation of percentiles and would be much faster than re-running the code to derive individual percentiles using re-sampling.\u003c/p\u003e"},{"header":"3. Data records","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eFinal data are available for download here- \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://zenodo.org/records/13988220\u003c/span\u003e\u003cspan address=\"https://zenodo.org/records/13988220\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (Narayan et al., 2024) The data repository contains the following-\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e72 rasters (4 land use types X 6 states X 3 carbon pools) at a 5 arcmin resolution representative of carbon in 2010\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e1 thematic raster which tracks 15 vegetation biomes for Unmanaged land use type (from 1. above)\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTabular data file showing aggregated carbon stocks for 6 states of carbon for 699 land regions for soil (0-30cm), aboveground biomass and belowground biomass.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTabular data file showing aggregated carbon stocks for 6 states of carbon for GTAP AEZ for soil (0-30cm), aboveground biomass and belowground biomass.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eAll data files are available as binary raster files stored as .bil files which can be opened in any GIS software (such as ArcGIS or QGIS) or using programming languages (such as R, python, C or C\u0026thinsp;+\u0026thinsp;+\u0026thinsp;for example).\u003c/p\u003e \u003cp\u003eWhile we do not release the intermediate data described in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, given their size (~\u0026thinsp;300 GB). These can be reproduced programmatically (See section 6 below) and saved if the user requires. We recommend that users only regenerate this data selectively given its size, however.\u003c/p\u003e"},{"header":"4. Technical Validation","content":"\u003cp\u003eIn this section we present the technical validation of our dataset. We begin by exploring our main data products. In section \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e\u003cstrong\u003eA\u003c/strong\u003e, we show we select different carbon initializations for GCAM from our dataset. In section \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e\u003cstrong\u003eB\u003c/strong\u003e, we validate our initialization densities at the pixel level with similar estimates in the literature. In section \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e\u003cstrong\u003eC\u003c/strong\u003e we compare the global and regional carbon densities with similar estimates in the literature. In section \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e\u003cstrong\u003eD\u003c/strong\u003e we explore spatial uncertainties in our dataset by comparing carbon values across biomes. Finally in section \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e\u003cstrong\u003eE\u003c/strong\u003e, we use our dataset to explore alternative scenarios in GCAM with a low and high carbon initialization.\u003c/p\u003e\n\u003cp\u003eWe first evaluate our main data products, namely the maps of soil and vegetation carbon across gridcells by land type (e.g., Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e and Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e), with the goal of identifying the most appropriate carbon state for GCAM modeling, and then take a closer look at data uncertainty and spatial variability. Note that the authors of the source data on soil (Hengl et al. \u003cspan class=\"CitationRef\"\u003e2014\u003c/span\u003e) and vegetation (Spawn et al.) did a detailed spatial validation of the data in their resperctive papers. Our validation will focus on uncertainties that have been introduced through our re-harmonization process. We will also compare our Q3 and 90th percentile (determined as described above) estimates with similar estimates from the literature since these estimates will be used to initialize GCAM.\u003c/p\u003e\n\u003ch3\u003e4A Selecting potential carbon states for initializing GCAM\u003c/h3\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eThis dataset provides several carbon levels to choose from because different models have different data needs. GCAM requires a potential maximum carbon state that represents mature ecosystems that have not been affected by land use. This state is used for both the pre-industrial initialization in 1700 and for the asymptotic parameters of the vegetation growth and soil carbon accumulation curves. The pre-industrial carbon state has been estimated to be much higher than the contemporary carbon stored in land (Erb et al., \u003cspan class=\"CitationRef\"\u003e2018\u003c/span\u003e) due to a long history of land use. Various studies have highlighted the difficulties in calculating the long term potential maximum (Fang et al., \u003cspan class=\"CitationRef\"\u003e2014\u003c/span\u003e), and our statistical aggregation method enables a systematic approach to selecting a data-driven maximum value that we can use to initialize GCAM. The provided statistical states and the opportunity to calculate intermediate ones also enables systematic selection of other carbon levels corresponding to other models.\u003c/p\u003e\n \u003cp\u003eTo select potential pre-industrial equilibrium states, we compared the frequency distributions of carbon by pool within each land region for each land type with the final statistical states calculated. The frequency distributions represent a heterogeneous landscape at different stages of growth and management. The average or median values may be representative of the contemporary landscape, but not of an undisturbed landscape that has been allowed to equilibrate its carbon stocks. The maximum value in a land region may be an extreme outlier and likewise would not be representative of the undisturbed landscape. Our goal then is to find a value in between the contempory average and the maximum that is representative of a long-term potential maximum value. Fortunately, most distributions of soil carbon generally follow a log-normal shape with a long tail.\u003c/p\u003e\n \u003cp\u003eWe present the distributions of soil carbon in the Amazon basin (Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e) for different land types as an example. One option for GCAM initialization is the Q3 statistical state. The soil Q3 values fall between the average and the maximum, as expected. Given the lognormal shape, the observations above the Q3 value are infrequent and can stretch to extremely high values. Most vegetation carbon distributions also follow a log-normal shape within each basin for each land type, but forests have distributions that are more bimodal (Fig. \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e). Nonetheless, the Q3 state provides carbon estimates that are reasonably higher than the contemporary average or median value.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gridtable\"\u003e\n \u003cdiv align=\"left\" class=\"colspec\"\u003e\u003cbr\u003e\u003c/div\u003e\u0026nbsp;\u003ctable id=\"Tab3\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eInitial potential terrestrial carbon stock calculated from different sources.\u003c/p\u003e\n \u003cdiv class=\"Credit\"\u003e\n \u003cp\u003eSources from moirai are calculated using land maps in 1700. Sanderman et al. represents a carbon stock in 1800 given no land use. Walker and Erb et al. are based on potential vegetation carbon estimations.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eData source\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTopsoil (0\u0026ndash;30 cms) carbon in PgC\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003evegetation (above +\u0026thinsp;below ground biomass) in PgC\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eErb et al 2019\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e916\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003emoirai (Q3)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1553\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e591.7\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003emoirai (90th percentile)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2063\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e966\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWalker 2022\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e795\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSanderman et al. \u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2119\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHoughton \u003cspan class=\"CitationRef\"\u003e1999\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1462\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e662\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eWe selected two carbon states, Q3 and 90th percentile, to compare with published pre-industrial carbon estimates to inform a final selection for GCAM initialization. Using the Q3 values sets the initial global carbon stock in the year 1700 to 2144 PgC (1553 PgC of carbon in top soil and 591 PgC of vegetation biomass). This estimate is on the lower end of other similar estimates (Table \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e). We also use the estimated 90th percentile state in order to represent a higher initialization of carbon in 1700. This 90th percentile is estimated from our six summary states using the methodology outlined in section 2.4 and sets an initial carbon stock of 3028 PgC (2063 PgC of carbon from topsoil and 966 PgC of vegetation biomass). Using these two states for initialization helps us understand the sensitivity of the model to the initial value.\u003c/p\u003e\n \u003cp\u003eOne reason why the Q3 vegetation values may be low while the 90th percentile values are high is that we derive carbon values for forests as a whole and do not differentiate between primary forests and secondary forests due to lack of available data. This means that our forest carbon distributions include the impact of harvesting, especially in regions with high levels of forest harvests, resulting in lower quartile values yet maintaining relatively high 90th percentile and maximum values. As more fine resolution data on different types of forests become available, a logical next step would be to derive separate carbon densities for primary and secondary forest types.\u003c/p\u003e\n\u003c/div\u003e\n\u003ch3\u003e4B. Grid cell comparison of carbon values to other estimates of long-term potential carbon\u003c/h3\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eTo evaluate the spatial distribution of our method we compare the 90th percentile values at the pixel level with other gridded data because the 90th percentile global values match the reference data bettar than the Q3 values.\u003c/p\u003e\n \u003cp\u003eSanderman et al. (\u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e) generated a pre-industrial soil carbon map for top soil in the year 1800. This map assumed no land use in that year. Similarly Walker et al. (Walker et al., \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e) generated a similar map for potential carbon in above and below ground vegetation. For a valid comparison we compared only our unmanaged land carbon values with these estimates (Figs. \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e and \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003eWe found that in the case of soil carbon, even though our maps track well with the maps from Sanderman et al. (\u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e) in terms of the overall spatial distribution, the mean error (moirai 90th percentile \u0026ndash; Sanderman et al. \u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e) across gridcells about \u0026minus;\u0026thinsp;23%. There are some higher latitude pixels from the Sanderman et al. (\u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e) dataset that show almost\u003c/p\u003e\n\u003c/div\u003e\n\u003ch3\u003e100% higher values compared to our data.\u003c/h3\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eIn case of aboveground vegetation carbon, the mean percent error (moirai 90th percentile \u0026ndash; Walker et al. \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e) is -17%, which is lower than for soil carbon. The largest errors were observed for forest pixels. This is likely due to the combination of primary and secondary forests into a single forest category in our dataset (as described above), which lowers the carbon values. The highest differences between datasets are observed in forest pixels with high level of forest harvesting (Central and West Africa and South and East Asia). Note that \u003cstrong\u003eSI\u003c/strong\u003e Fig. \u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e shows pixel level absolute value (MgC/ha) differences between datasets.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003e\u003cspan type=\"BoldUnderline\" class=\"BoldUnderline\" name=\"Emphasis\"\u003e4C Comparison to C values to previously used in GCAM by land type and aggregate contemporary estimates\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003eWe compared the distribution Moirai carbon densities across water basins by land type with global carbon densities from Houghton (\u003cspan class=\"CitationRef\"\u003e1999\u003c/span\u003e) that were previously used for GCAM initialization. The Houghton carbon densities represent contemporary carbon on undisturbed land differentiated by biome. We also compared our statistical states with other contemporary values where available (e.g. Jackson et al. \u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e for soil carbon and Vlek et al. 2017 for fvegetation carbon). These distributions represent all of the statistical states across all basins differentiated by land type.\u003c/p\u003e\n \u003cp\u003eFor soil carbon (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e and SI Table\u0026nbsp;2), we found that our global values (Q3, 90th percentile) are generally higher than the Houghton values for most land types. The values are especially higher for shrublands located in Boreal regions where the difference is approx 80 Mgc/ha. This is likely because the SoilGrids dataset shows high carbon values at high latitudes and includes peat soils in its estimates (e.g., Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e). The high values of soil carbon at high latitudes may also be driven by low levels of predicted bulk density at those locations (Tifafi et al., \u003cspan class=\"CitationRef\"\u003e2018\u003c/span\u003e). Another more recent version of soil grids has recently produced lower values in these regions (Poggio et al., \u003cspan class=\"CitationRef\"\u003e2021\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003eOur Q3 soil carbon values are generally higher than the contemporary estimates. For cropland, our Q3 estimates of carbon are as high as forest soil carbon (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e9\u003c/span\u003e and SI Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e). This is investigated in more detail in the sections below. Similarly the soil carbon under Urban land cover is extremely high. This is likely due to how the samples were collected for Urban land cover. These samples are collected in parks, where soil carbon is relatively high, as opposed to built-up areas with little exposed soil, where soil carbon is relatively low due to development. As expected, Q3 values are higher than the contemporary values from Jackson et al. (\u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e), especially in the Boreal regions. However, the Q1 values are closest to contemporary values for soils.\u003c/p\u003e\n \u003cp\u003eForest vegetation carbon densities are significantly scaled down across Moirai states when compared to the literature (Fig. \u003cspan class=\"InternalRef\"\u003e9\u003c/span\u003e; Houghton, \u003cspan class=\"CitationRef\"\u003e1999\u003c/span\u003e; Vlek et al., 2017). This is not surprising because the spatial distribution of forest carbon is unknown for the Houghton data, especially for tropical forests (Houghton, \u003cspan class=\"CitationRef\"\u003e2005\u003c/span\u003e), while in our data there is significant variation in values across basins due to management and environmental conditions. Also, as noted above, our Moirai values for forest carbon densities are a combination of primary and secondary forests and therefore provide lower estimates than obtained from unmanaged forests.\u003c/p\u003e\n \u003cp\u003eAs expected, the Q3 and 90th percentile grassland and pasture vegetation carbon estimates are higher than the literature values (Fig. \u003cspan class=\"InternalRef\"\u003e9\u003c/span\u003e). The median values match the contemporary global estimates well. These land types also have a much narrower distribution of values across space.\u003c/p\u003e\n\u003c/div\u003e\n\u003ch3\u003e4D Uncertainties in re-harmonized carbon data (spatially and across land types)\u003c/h3\u003e\n\u003cp\u003eHere we explore uncertainty in the available data by further examing spatial distributions, aggregation statistics, and land type considerations.\u003c/p\u003e\n\u003cp\u003e\u003cspan\u003e\u003cstrong\u003ei. Do managed land types show a deprecation in carbon compared to unmanaged land types?\u003c/strong\u003e\u003cbr\u003e\u003c/span\u003e\u003c/p\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eStudies show that managed land (i.e., Cropland, Pasture, Urbanland) has depleted carbon stocks in relation to undisturbed land (Cooper et al., \u003cspan class=\"CitationRef\"\u003e2021\u003c/span\u003e; Sanderman et al., \u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e; Wei et al., \u003cspan class=\"CitationRef\"\u003e2014\u003c/span\u003e). The aim of processing the spatial managed land carbon data and adding it to Moirai is to obtain contemporary estimates for these lands that can be used in modeling rather than assuming a global value or that managed lands have a fixed fraction of unmanaged land carbon. Carbon data values do not correspond with a long-term potential maximum for these managed land types by definition, as these land are actively disturbed. However, we still want higher than average carbon values for the parameters that define the limits of carbon accumulation for these land types. We expect that the carbon data reflect the effects of these managed land types and that our desired values would be lower than those for the surrounding unmanaged land types. We checked this expectation by first comparing Q3 carbon values for soil and aboveground biomass for cropland with the corresponding values for unmanaged land cover in each of our land regions (Fig. \u003cspan class=\"InternalRef\"\u003e10\u003c/span\u003e). We found that cropland soil carbon values do not show a consistent depletion for soil carbon compared to unmanaged land. The reason for these differences among carbon pools is rooted in the source data sampling and processing methodologies. For the soilgrids dataset, the authors state that cropland soil carbon samples were largely collected in the US. For the vegetation carbon dataset from Spawn et al., the vegetation carbon was calculated for each crop type based on yields, which explains the low values on cropland compared to unmanaged land.\u003c/p\u003e\n \u003cp\u003eIn GCAM, crop yields are determined from harvested area and production data, while the carbon data are used for land use emissions and for valuing land carbon in global warming target scenarios. To address the relatively high cropland soil carbon data in our modeling experiments we reduce these data by 30% before using them in GCAM. Previous studies have found a similar loss of soil carbon through agricultural practices and land conver conversion from unmanaged land types to cropland(Cooper et al., \u003cspan class=\"CitationRef\"\u003e2021\u003c/span\u003e; Wei et al., \u003cspan class=\"CitationRef\"\u003e2014\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eWe performed a similar analysis for pasture carbon densities and found that pasture carbon shows depletion or lower values compared to unmanaged land cover both for both soil and vegetation. In this case, it is reasonable to use the Q3 soil and vegetation carbon values for pasture in GCAM without adjustment. This is an interesting case because pasture is not one of the source land types and its carbon values were assigned based on the same land types as for grassland. Multiple factors could contribute to this result, including sampling bias, pasture location bias, and uncertainties in data and processing. Nonetheless, these data capture the expected difference between pasture and unmanaged grassland.\u003c/p\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003eii. Assesing spatial variability in soil and vegetation carbon within and across basins\u003c/strong\u003e\u003c/p\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eWe have established that carbon distributions within land regions generally follow a lognormal pattern for soil carbon and for vegetation carbon for most land types, except that forest vegetation carbon has a more bimodal distribution. However, there may be more dispersion across values in some basins for some land types compared to others. To assess this systematically, we computed a quartile coefficient of dispersion (QCD) for each basin and land type as:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\n \u003cdiv class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e$$\\:{QCD}_{GLU,\\:LT,\\:pool}={(Q3}_{GLU,\\:LT,\\:pool}-{Q1}_{GLU,\\:LT,\\:pool})/{(Q3}_{GLU,\\:LT,\\:pool}+{Q1}_{GLU,\\:LT,\\:pool})$$\u003c/div\u003e\n \u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003eWhere,\u003c/p\u003e\n\u003cp\u003epool is the carbon pool (aboveground biomass, belowground biomass, topsoil (0\u0026ndash;30 cms)),\u003c/p\u003e\n\u003cp\u003eGLU represents a land region which is an intersection of basin boundaries and country boundaries,\u003c/p\u003e\n\u003cp\u003eand LT is the land type.\u003c/p\u003e\n\u003cp\u003eThe QCD values range from 0 to 1 where a value towards zero indicates less dispersion within a region-land type-carbon pool combination and a value towards 1 indicates more dispersion.\u003c/p\u003e\n\u003cp\u003eThe QCD values for soil carbon (Fig. \u003cspan class=\"InternalRef\"\u003e11\u003c/span\u003e) are generally similar across most basins across land types. This is expected since the distributions of soil carbon are generally lognormal. However, in some basins the QCD value is consistently high and similar across land types. This mainly occurs in individual basins in Russia and Indonesia which have high levels of peat soils which would mean that the level of dispersion across cells would be high since some cells would contain peat soils whereaes others would not.\u003c/p\u003e\n\u003cp\u003eBased on QCD values across basins and land types for vegetation carbon (Fig. \u003cspan class=\"InternalRef\"\u003e12\u003c/span\u003e), we observe that there is significant variation in the QCD values within and across basins for tundra (with values ranging from 0\u0026ndash;1). This is likely due to the way tundra pixels are defined in our dataset (they encompass different vegetation types). Similarly, there is significant variation within and across basins for grasslands, savannah and pastures, which is once again likely due to the definitions of what constitutes grasslands in the base land cover dataset. While there are also variations in vegetation carbon values for cropland and urban land, the overall range of values for these land types when it comes to vegetation carbon is low (Fig. \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e). QCD values for forests across and between basins is lower. This may be due to the more narrow definitions for what constitutes forests across datasets.\u003c/p\u003e\n\u003ch3\u003e4E Results from implementation of spatially explicit carbon in GCAM\u003c/h3\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eAs a final validation step, we separately implemented two carbon density sets from Moirai (3rd quartile, 90th percentile) and the Houghton densities in GCAM and compared these three cases. We make the following assumptions when implementing the carbon densities in GCAM, based on the analyses above:\u003c/p\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cspan\u003ea) Each set of carbon values are used throughout to reflect two potential options for a long-term potential maximum state of carbon in 1700\u003cbr\u003e\u003c/span\u003e\u003c/p\u003e\n\u003cp\u003eb) Cropland soil carbon is reduced by a factor of 0.3 (30% reduction) for all basins to reflect the effects of management. This is because we found that the soil carbon values do not show a depletion of carbon when comparing unmanaged soil carbon and crop carbon (See the findings of section \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003eD above)\u003c/p\u003e\u003cspan\u003e\n \u003cp\u003ec) Tundra, urban, desert, and polar desert/rock/ice do not change in GCAM and so the assigned carbon values do not influence model simulations. If a model does include dynamics for these land types, then the associated uncertainties should be addressed.\u003c/p\u003e\n\u003c/span\u003e\n\u003cp\u003e\u003c/p\u003e\n\u003cp\u003eWe emphasize that the results described here are GCAM specific and would be different based on the model selected.\u003c/p\u003e\n\u003ch3\u003e4E (i) Results sfrom historical spin up\u003c/h3\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eWe initialized GCAM using each of the three cases identified above. This resulted in a pre-spin up carbon stock of 1912 PgC (1320 PgC in soil and 591.7 PgC in vegetation) when using the Q3 state and a carbon stock of 2718 PgC (1753 PgC in top soil and 965 PgC in vegetation) when using the 90th percentile (Fig. \u003cspan class=\"InternalRef\"\u003e13\u003c/span\u003e and Table \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e). Note that these initialization values are calculated using the land cover in 1700, which does include some managed area, and the spatially explicit carbon. The same spin up values from Houghton et al is 1905 PgC (1243 PgC in soil and 662 PgC in vegetation) .\u003c/p\u003e\n \u003cp\u003eDuring spin up this carbon is reduced to 1735 PgC in 2015 when using the Q3 state(1249 PgC of topsoil carbon and 486 PgC of vegetation carbon) as a result of historical land transitions (Fig. \u003cspan class=\"InternalRef\"\u003e13\u003c/span\u003e and Table \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e). Similarly during the spin up, this carbon is reduced to 2448 PgC when using the 90th percentile values (1655 PgC of topsoil carbon and 793 PgC of vegetation carbon). The same values from Houghton et al is 1697 PgC (1181 PgC in soil and 516 PgC in vegetation) .\u003c/p\u003e\n \u003cp\u003eAn important point to note is that while the 90th percentile generates results more in line with independent pre-industrial estimates (Table \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e), the Q3 state results in more realistic contemporary values in 2015 during the GCAM spin up (Figs. \u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e and \u003cspan class=\"InternalRef\"\u003e9\u003c/span\u003e). For example, the Q3 state results in a contemporary value of 486 PgC of vegetation carbon in 2015, which is closer to contemporary vegetation carbon stock estimates. The Houhgton values produce a contemporary estimate of 516 PgC which is higher likely due to high estimates of tropical vegetation carbon.Whereaes, the 90th percentile results in a global vegetation carbon stock of 793 PgC. Using the 90th percentile would effectively result in an unrealistically high initial vegetation carbon stock that is close to equilibrium in 2015.\u003c/p\u003e\n \u003cp\u003eAnother point to note is that the amount of global historical emissions (1700\u0026ndash;2015) produced by the Q3 initialization is 176 PgC which is much lower than the global historial emissions using the 90th percentile of 270 PgC (Fig. \u003cspan class=\"InternalRef\"\u003e14\u003c/span\u003e). For context, the Global Carbon Project (as of 2021) produced an estimate of annual LUC emissions from 1700\u0026ndash;2015 of 196 PgC(Friedlingstein et al., \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e). The 90th percentile produces consistently higher annual LUC emissions than the other estimates, except for the dip in 2005. This dip is due to a shift in land use that accumulates excessive soil carbon rapidly in certain regions because of higher carbon densities in specific land types in the new spatially-explicit data.\u003c/p\u003e\n \u003cp\u003eBased on these spin up results and our validation analyses, we found that the Q3 value from our dataset is appropriate for initialization and use in GCAM when using the model to estimate contemporary C dynamics. While the 90th percentile better resembles independent estimates of pre-settlmenet stocks, it results in substantial overestimation when used to estimate contemporary C fluxes. This is a result of assumptions and processes within GCAM pertaining to carbon dynamics. Furthermore, the Q3 data provide a much less dramatic shift from the Houghton data previously used, than the 90th percentile data.\u003c/p\u003e\n \u003cp\u003eThe appropriate carbon state for other models, which may implement carbon dynamics differently, could be different and would require a similar analysis. GCAM uses a simple bookkeeping approach to modeling carbon dynamics, with a primary assumption regarding the potential maximum carbon densitie of land types. The model begins by tracking a total stock of carbon for each water basin for each land type for two pools (soil and vegetation) in 1700. Following 1700 onwards, based on historical and future land use, the model calculates fluxes from this initial state. Sigmoidal growth curves are used to track regrowth of vegetation and exponential decay functions are used to track gain and loss in soil carbon. The model also calculates carbon fluxes based on net land use change (e.g. increase in cropland, reduction in forests) as opposed to gross transitions (e.g. cropland increase from grassland, cropland increase from forest loss).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u003ctable id=\"Tab4\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eResults from the historical spin up\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eInitialization\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003ecarbon pool\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eInitial value in PgC In the year 1700)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eContemporary value after spin up in PgC (2015)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eHistorical emissions (PgC) (between 1700 and 2015)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eValue in 2100 under SSP1 2p6\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAdditional carbon sequestered during afforestation scenario (2100 value- 2015 value)\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHoughton\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003evegetation carbon\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e662.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e516.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e145.9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e605.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e89.2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003emoirai (Q3 value)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003evegetation carbon\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e591.7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e486.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e105.4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e515.9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e29.6\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003emoirai (90th percentile)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003evegetation carbon\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e965.8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e793.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e172.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e847.9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e54.7\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHoughton\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003esoil carbon (top-soil)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1243.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1181.4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e62.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1220.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e38.6\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003emoirai (Q3 value)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003esoil carbon (top-soil)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1320.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1249.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e71.4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1274.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e25.5\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003emoirai (90th percentile)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003esoil carbon (top-soil)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1753.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1655.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e97.8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1700.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e45.4\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eHoughton\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eTotal terrestrial carbon\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e1905.5\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e1697.5\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e208.0\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e1825.3\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e127.8\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003emoirai (Q3 value)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eTotal terrestrial carbon\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e1912.3\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e1735.4\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e176.9\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e1790.6\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e55.2\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003emoirai (90th percentile)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eTotal terrestrial carbon\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e2718.8\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e2448.3\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e270.5\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e2548.4\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e100.1\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n\u003ch3\u003e4E (ii) Results from climate forcing scenario and sensitivity analysis\u003c/h3\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eWe use one climate forcing scenario with a maximum radiative forcing level of 2.6 watts per square meter by 2100 and shared socioeconomic pathway 1 (SSP1 2p6) to assess how the new carbon data influence land projection in GCAM. Under this scenario land carbon prices are implemented to assign value to terrestrial carbon at the same rate as carbon is valued in the energy system. SSP1 2p6 shows a large afforestation and more generally a large carbon response since it contains a carbon price while also having less stress from socio-economic factors across all IAMs (Popp et al. \u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e). GCAM by default uses carbon densities from Houghton et al. (1999), which are described in SI Table 2 (soil) and SI Table \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e (vegetation). Note that the changes in land cover under the climate forcing scenario are driven by relative levels of carbon across land types rather than absolute levels of carbon. Therefore, even if forest carbon in some tropical regions are lower than other estimates, forests still sequester much more carbon compared to other land types in these regions. We compare the same three cases as for the historical period: Houghton, the Moirai Q3 value, and the Moirai 90th percentile.\u003c/p\u003e\n \u003cp\u003eThe global land allocation comparison under SSP1 2p6 scenario in GCAM (Fig. \u003cspan class=\"InternalRef\"\u003e15\u003c/span\u003e) shows that the afforestation/reforestation response is greatly reduced as a result of the spatially explicit carbon (the increase in forest cover from 2020 to 2100 globally is only 3.2\u0026nbsp;million km\u003csup\u003e2\u003c/sup\u003e when using the moirai Q3 as opposed to 7 million km\u003csup\u003e2\u003c/sup\u003e with the Houghton carbon). IAMs (Including GCAM) generally show a very optimistic afforestation response for this scenario that ranges from 0.5 to 12 million km\u003csup\u003e2\u003c/sup\u003e of trees planted as part of a nature based carbon sequestration strategy under SSP1 2p6 (Popp et al., \u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e). The high afforestation response in some IAMs has been considered too optimistic by some studies (e.g., Pongratz et al., \u003cspan class=\"CitationRef\"\u003e2021\u003c/span\u003e). For Q3, the reduced forest expansion in GCAM with the new carbon data is largely driven by lower forest vegetation carbon densities in the new data, which reducd the incentive to expand forest. Conversely, the 90th percentile case reduces afforestation further, down to 0.1\u0026nbsp;million km\u003csup\u003e2\u003c/sup\u003e, even though the carbon densities are higher than for Q3. This is because smaller increases in forest cover are required to meet additional afforestation targets in the 90th percentile case. Despite the low afforestation, it adds another 54 PgC of vegetation carbon through afforestation that would result in an unrealistic value of vegetation carbon in 2100: about 847 PgC which is higher than undistrurbed carbon stocks in 1700. On the other hand, Q3 vegetation carbon stock in 2100 is close to 515 PgC (a dditional 30 PgC of carbon added through planted forests).Because the Q3 state reduces afforestation but maintains responsive land allocation and reasonable carbon accumulation, it is a better choice than the 90th percentile for initializing GCAM.\u003c/p\u003e\n \u003cp\u003eGlobal cropland and shrubland dynamics show a more complicated response than forest (Fig. \u003cspan class=\"InternalRef\"\u003e15\u003c/span\u003e). The reduced emphasis on forest expansion reduces the need for cropland abandonment. Cropland also sequesters more soil carbon in some regions (even with the 30% reduction factor), which also reduces abandonment. The shrubland response is also enhanced by higher shrubland vegetation carbon densities in the new spatially-explicit data.\u003c/p\u003e\n \u003cp\u003eRegional responses are dictated by their respective land type distrubutions. Generally, afforestion is maintained or enhanced in tropical forests and decreased in Boreal regions. For example, in Russia the afforestation strategy is completely replaced with a shrubland and grassland preservation strategy (SI Fig. \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e). This is expected since the region has a relatively high amount of boreal forests. In South Asia however, where non-forest land types dominate, forest expansion persists and is supplemented by shrubland expansion (SI Fig. \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cp\u003eThe implementation of the spatially explicit carbon clearly improves land use responses and also suggests that high carbon sequestering shrubs can also be preserved as a part of nature based solutions to mitigate climate change. The robustness of these responses across other radiative forcing scenarios (implemented for more SSPs for example) and across other models need to be studied and is a subject worthy of exploration in a future paper.\u003c/p\u003e"},{"header":"5. Usage notes","content":"\u003cp\u003eIn this paper we present a new dataset of grid-cell level, spatially-explicit carbon data harmonized with Moriai land data types. Our harmonized dataset presents carbon values for 3 pools (topsoil, above ground biomass and below ground biomass) for six statistical states for various land use types. Our dataset is available both at a 5 arcmin resolution and aggregated to 699 land regions. This dataset is designed to enable initialization of spatially explicit carbon in IAMs and MSD models, and we provide and example by applying it to GCAM. In the future, this dataset can be extended to include deeper soil (beyond 0\u0026ndash;30 cms) so that land use responses in models can account for an additional deep soil carbon pool. We note however that if deeper soil carbon layers are to be added, regional and global models must also improve their respective carbon dynamics beyond simple bookkeeping approaches to include detailed accounting of environmental conditions.\u003c/p\u003e \u003cp\u003eWe noted that there are some limitations with respect to the carbon observations (both for soil and vegetation) for tundra. For example, no data were found for 29% of the 5 arcmin gridcells for this land type. The biome mapping also needed to include several source land types to enable an increase in data coverage for tundra. This issue was likely caused by the different definitions of tundra land cover in different datasets. Recently, there have been efforts dedicated to collecting carbon data specifically for this land type. These data should be integrated in future releases of our data to address the current lack of data coverage.\u003c/p\u003e \u003cp\u003eAs a part of our analysis, we observed that SoilGrids soil carbon values for cropland do not show a depletion when compared to SoilGrids soil carbon values in unmanaged land. As discussed, this is likely due the locations of sampling for cropland soil carbon. As a result, we reduced cropland soil carbon by 30% when we applied it to GCAM. This is in line with similar estimates of loss of soil carbon through agricultural practices and land conver conversion from unmanaged land types to cropland (Cooper et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Wei et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). If better/improved data on crop soil carbon become available, our data could be updated with the same. Users should also consider % adjustments to crop soil carbon contents based on local cropping intensities.\u003c/p\u003e \u003cp\u003eWe have also noted that our current estimates of forest vegetation carbon are based on both primary and secondary forests. This is due to the lack of availability of fine resolution (300 m) land masks that distinguish between primary and secondary forests. As more data become available related to forest cover types, a logical next step would be to break out different forest types in our dataset.\u003c/p\u003e \u003cp\u003eFinally, our analysis showed that using the Q3 statistical state was most appropriate for GCAM even though it resulted in an initialization of pre-industrial carbon value that was lower than other estimates. Selection of the Q3 results in more accurate historical LUC emissions and the model therefore spins up to a value that is close to other estimates in the literature in 2015.\u003c/p\u003e \u003cp\u003eThe data are derived for several statistical states (with the Q3 being one of them) for a data-derived set of land types that are not model specific. The resulting dataset is then applied to GCAM as an example. This example for GCAM shows how we analyzed the data to select the appropriate value for GCAM, which includes analysis of the statistical range of options across the spatial distribution. Specific data uncertainty analyses have already been performed on the source data by their creators. Like any dataset, a user must determine how to use it for their particular application and model, and perform the required processing.\u003c/p\u003e \u003cp\u003eWhile we use the data from 2010, we emphasize once more that we utilize contemporary data to extract statistical states to select a potential carbon value that can be used for calibrating the model. More specifically our q3 value selected from the 2010 values is used as a potential carbon density that a model builds towards and is used to spin up the model as seen in our results. We also note that the year 2010 is selected since the SoilGrids and Spawn et al. data are based on land masks for that year. The usage of more contemporary data may not affect this analysis in any significant way (unless the overall distribution of carbon is altered).\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eCode availability statement\u003c/h2\u003e\u003cp\u003eAs mentioned above, the data can be generated programmatically with scripts that are hosted on GitHub (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/JGCRI/moirai/tree/master/ancillary/carbon_harmonization\u003c/span\u003e\u003cspan address=\"https://github.com/JGCRI/moirai/tree/master/ancillary/carbon_harmonization\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). The process has been split into two steps where the computationally intensive stage 1 (approx.. 6 hours of processing) is optional with outputs made available in the repository. The Stage 1 processing is performed using bash scripts which use the GDAL software (Warmerdam, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2008\u003c/span\u003e). The second stage processing uses an R script and can be completed for all carbon pools in approx. 15 minutes to generate the final 72 rasters and the final tabular output file. We have also made available optional diagnostic functions in the R script which can be used to validate results.\u003c/p\u003e \u003ch2\u003eCompeting Interests declaration\u003c/h2\u003e \u003cp\u003e \u003cspan type=\"BoldUnderline\" class=\"BoldUnderline\" name=\"Emphasis\"\u003e\u003c/span\u003eThe authors have declared that none of the authors has any competing interests.\u003c/p\u003e \n\u003ch2\u003eAuthor contributions\u003c/h2\u003e \u003cp\u003eK.B,N., A.D.V conceived the concept of this paper. K.B,N., A.D.V and E.V. produced the data from the raw input files and also made code changes to the moirai land data system. S.S.L. and H.G produced the vegetation carbon data required by this study and also provided inputs on data interpretation. K.B,N., A.D.V wrote the manuscript with input from all authors.\u003c/p\u003e\u003ch2\u003eAcknowledgements\u003c/h2\u003e \u003cp\u003eThis research was supported by the U.S. Department of Energy, Office of Science, as part of research in Multi Sector Dynamics, Earth and Environmental System Modeling Program. The Pacific Northwest National Laboratory is operated for DOE by Battelle Memorial Institute under contract DE-AC05-76RL01830\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eBarnes WL, Xiong X, Salomonson VV (2003) Status of terra MODIS and aqua MODIS. Adv Space Res 32(11):2099\u0026ndash;2106\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBatjes NH, Ribeiro E, Van Oostrum A, Leenaars J, Hengl T, de Jesus M, J (2017) WoSIS: providing standardised soil profile data for the world. Earth Syst Sci Data 9(1):1\u0026ndash;14\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCalvin K, Patel P, Clarke L, Asrar G, Bond-Lamberty B, Cui RY, Di Vittorio A, Dorheim K, Edmonds J, Hartin C (2019) GCAM v5. 1: representing the linkages between energy, water, land, climate, and economic systems. Geosci Model Dev 12(2):677\u0026ndash;698\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCooper H, Sj\u0026ouml;gersten S, Lark R, Mooney S (2021) To till or not to till in a temperate ecosystem? Implications for climate change mitigation. Environ Res Lett 16(5):054022\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDi Vittorio AV, Vernon CR, Shu S (2020) Moirai version 3: a data processing system to generate recent historical land inputs for global modeling applications at various scales. J Open Res Softw, \u003cem\u003e8\u003c/em\u003e(PNNL-SA-142149).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eErb K-H, Kastner T, Plutzar C, Bais ALS, Carvalhais N, Fetzel T, Gingrich S, Haberl H, Lauk C, Niedertscheider M (2018) Unexpectedly large impact of forest management and grazing on global vegetation biomass. Nature 553(7686):73\u0026ndash;76\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFang Y, Liu C, Huang M, Li H, Leung LR (2014) Steady state estimation of soil organic carbon using satellite-derived canopy leaf area index. J Adv Model Earth Syst 6(4):1049\u0026ndash;1064\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFriedlingstein P, Jones MW, O'Sullivan M, Andrew RM, Bakker DC, Hauck J, Le Qu\u0026eacute;r\u0026eacute; C, Peters GP, Peters W, Pongratz J (2022) Global carbon budget 2021. Earth Syst Sci Data 14(4):1917\u0026ndash;2005\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHengl T, de Jesus JM, MacMillan RA, Batjes NH, Heuvelink GB, Ribeiro E, Samuel-Rosa A, Kempen B, Leenaars JG, Walsh MG (2014) SoilGrids1km\u0026mdash;global soil information based on automated mapping. PLoS ONE 9(8):e105992\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHoughton R (2005) Aboveground forest biomass and the global carbon balance. Glob Change Biol 11(6):945\u0026ndash;958\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHoughton RA (1999) The annual net flux of carbon to the atmosphere from changes in land use 1850\u0026ndash;1990. Tellus B 51(2):298\u0026ndash;313\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHugelius G, Strauss J, Zubrzycki S, Harden JW, Schuur E, Ping C-L, Schirrmeister L, Grosse G, Michaelson GJ, Koven CD (2014) Estimated stocks of circumpolar permafrost carbon with quantified uncertainty ranges and identified data gaps. Biogeosciences 11(23):6573\u0026ndash;6593\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJackson RB, Lajtha K, Crow SE, Hugelius G, Kramer MG, Pi\u0026ntilde;eiro G (2017) The ecology of soil carbon: pools, vulnerabilities, and biotic and abiotic controls. Annu Rev Ecol Evol Syst 48(1):419\u0026ndash;445\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJungkunst HF, G\u0026ouml;pel J, Horvath T, Ott S, Brunn M (2022) Global soil organic carbon\u0026ndash;climate interactions: Why scales matter. Wiley Interdisciplinary Reviews: Clim Change, e780\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJustice C, Townshend J, Vermote E, Masuoka E, Wolfe R, Saleous N, Roy D, Morisette J (2002) An overview of MODIS Land data processing and product status. Remote Sens Environ 83(1\u0026ndash;2):3\u0026ndash;15\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKlein Goldewijk K, Beusen A, Doelman J, Stehfest E (2017) Anthropogenic land use estimates for the Holocene\u0026ndash;HYDE 3.2. Earth Syst Sci Data 9(2):927\u0026ndash;953\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi W, MacBean N, Ciais P, Defourny P, Lamarche C, Bontemps S, Houghton RA, Peng S (2018) Gross and net land cover changes in the main plant functional types derived from the annual ESA CCI land cover maps (1992\u0026ndash;2015). Earth Syst Sci Data 10(1):219\u0026ndash;234\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu X, Yu L, Si Y, Zhang C, Lu H, Yu C, Gong P (2018) Identifying patterns and hotspots of global land cover transitions using the ESA CCI Land Cover dataset. Remote Sens Lett 9(10):972\u0026ndash;981\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMeiyappan P, Jain AK (2012) Three distinct global estimates of historical land-cover change and land-use conversions for over 200 years. Front earth Sci 6(2):122\u0026ndash;139\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNachtergaele F, van Velthuizen H, Verelst L, Batjes N, Dijkshoorn K, van Engelen V, Fischer G, Jones A, Montanarela L (2010) The harmonized world soil database. Proceedings of the 19th World Congress of Soil Science, Soil Solutions for a Changing World, Brisbane, Australia, 1\u0026ndash;6 August 2010\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKanishka B, Narayan AD, Vittorio E, Margiotta SA, Spawn, Holly Gibbs (2023) Spatially explicit re-harmonized terrestrial carbon stocks for calibrating Integrated Multisectoral Models (1.0.0) [Data set]. Zenodo. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.5281/zenodo.7884615\u003c/span\u003e\u003cspan address=\"10.5281/zenodo.7884615\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePoggio L, De Sousa LM, Batjes NH, Heuvelink G, Kempen B, Ribeiro E, Rossiter D (2021) SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty. Soil 7(1):217\u0026ndash;240\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePongratz J, Schwingshackl C, Bultan S, Obermeier W, Havermann F, Guo S (2021) Land use effects on climate: current state, recent progress, and emerging topics. Curr Clim Change Rep, 1\u0026ndash;22\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePopp A, Calvin K, Fujimori S, Havlik P, Humpen\u0026ouml;der F, Stehfest E, Bodirsky BL, Dietrich JP, Doelmann JC, Gusti M (2017) Land-use futures in the shared socio-economic pathways. Glob Environ Change 42:331\u0026ndash;345\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRamankutty N, Foley JA (1999) Estimating historical changes in land cover: North American croplands from 1850 to 1992: GCTE/LUCC RESEARCH ARTICLE. Glob Ecol Biogeogr 8(5):381\u0026ndash;396\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSanderman J, Hengl T, Fiske GJ (2017) Soil carbon debt of 12,000 years of human land use. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e, \u003cem\u003e114\u003c/em\u003e(36), 9575\u0026ndash;9580\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eScharlemann JP, Tanner EV, Hiederer R, Kapos V (2014) Global soil carbon: understanding and managing the largest terrestrial carbon pool. Carbon Manag 5(1):81\u0026ndash;91\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSpawn SA, Sullivan CC, Lark TJ, Gibbs HK (2020) Harmonized global maps of above and belowground biomass carbon density in the year 2010. Sci Data 7(1):1\u0026ndash;22\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThomson AM, Calvin KV, Chini LP, Hurtt G, Edmonds JA, Bond-Lamberty B, Frolking S, Wise MA, Janetos AC (2010) Climate mitigation and the future of tropical landscapes. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e, \u003cem\u003e107\u003c/em\u003e(46), 19633\u0026ndash;19638\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTifafi M, Guenet B, Hatt\u0026eacute; C (2018) Large differences in global and regional total soil carbon stock estimates based on SoilGrids, HWSD, and NCSCD: Intercomparison and evaluation based on field data from USA, England, Wales, and France. Glob Biogeochem Cycles 32(1):42\u0026ndash;56\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan Asselen S, Verburg PH (2012) AL and S ystem representation for global assessments and land-use modeling. Glob Change Biol 18(10):3125\u0026ndash;3148\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWalker WS, Gorelik SR, Cook-Patton SC, Baccini A, Farina MK, Solvik KK, Ellis PW, Sanderman J, Houghton RA, Leavitt SM (2022) The global potential for increased storage of carbon on land. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e, \u003cem\u003e119\u003c/em\u003e(23), e2111312119\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWarmerdam F (2008) Open source approaches in spatial data handling. \u003cem\u003eby Hall, GB \u0026amp; Leahy, MG Berlin, Heidelberg: Springer Berlin Heidelberg\u003c/em\u003e, 87\u0026ndash;104\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWei X, Shao M, Gale W, Li L (2014) Global pattern of soil carbon losses due to the conversion of forests to agricultural land. Sci Rep 4(1):1\u0026ndash;6\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWieder WR, Boehnert J, Bonan GB (2014) Evaluating soil biogeochemistry parameterizations in Earth system models with observations. Glob Biogeochem Cycles 28(3):211\u0026ndash;222\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWise M, Calvin K, Thomson A, Clarke L, Bond-Lamberty B, Sands R, Smith SJ, Janetos A, Edmonds J (2009) Implications of limiting CO2 concentrations for land use and energy. Science 324(5931):1183\u0026ndash;1186\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTable 2 is available in the Supplementary Files section.\u003c/p\u003e"},{"header":"Supplemental Table 1","content":"\u003cp\u003eSI Table 1 is not available with this version.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"Pacific Northwest National Laboratory","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-6123546/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6123546/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eSoil and vegetation carbon stocks play a critical role in human-Earth system models. These stocks (denominated as densities in MgC/ha) affect variables such as land use change emissions and also influence land use change pathways under climate forcing scenarios where terrestrial carbon is assigned a carbon price. Here we present reharmonized soil and vegetation carbon densities both at the 5-arcmin resolution grid cell level and also aggregated to 235 water sheds for 4 land use types (Cropland, Grazed land, Urban land and unmanaged vegetation) and 15 unmanaged land cover types. Moreover, we use the distribution of carbon within and across pixels to define statistical \u0026ldquo;states\u0026rdquo; of carbon, once again differentiated by land type. These statistical states are used to define a range of possible carbon values that can be used for defining initial conditions of soil and vegetation carbon in human-Earth system models. We implement these data in a state-of-the-art multi sector dynamics model, namely the Global Change Analysis Model (GCAM), and show that these new data improve several land use responses, especially when terrestrial carbon is assigned a carbon price.\u003c/p\u003e","manuscriptTitle":"Spatially explicit terrestrial carbon densities for calibrating the carbon cycle in human-Earth system Models","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-03-11 10:17:55","doi":"10.21203/rs.3.rs-6123546/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"542bd4b3-86d7-4d85-88a8-ab31e8c085ad","owner":[],"postedDate":"March 11th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":44979599,"name":"Forestry"},{"id":44979600,"name":"Geographic Information Systems"}],"tags":[],"updatedAt":"2025-04-18T16:37:53+00:00","versionOfRecord":{"articleIdentity":"rs-6123546","link":"https://doi.org/10.1038/s41597-025-04723-4","journal":{"identity":"scientific-data","isVorOnly":false,"title":"Scientific Data"},"publishedOn":"2025-04-16 00:00:00","publishedOnDateReadable":"April 16th, 2025"},"versionCreatedAt":"2025-03-11 10:17:55","video":"","vorDoi":"10.1038/s41597-025-04723-4","vorDoiUrl":"https://doi.org/10.1038/s41597-025-04723-4","workflowStages":[]},"version":"v1","identity":"rs-6123546","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6123546","identity":"rs-6123546","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00