On Multimodal Freight Databases for Scalable Global-Local Transport Research and Applications | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article On Multimodal Freight Databases for Scalable Global-Local Transport Research and Applications Muhammad Haroon, Guoqiang Shen, Jinghua Zhao, Pengyu Zhu This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7288048/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 4 You are reading this latest preprint version Abstract Freight transportation modeling is the key to informed infrastructure planning, economic growth, and policy-making around the world, but the available data is still highly fragmented, differing widely in classification schemes, spatial granularity, temporal coverage, and documentation standards. The paper will discuss these challenges in a systematic way by creating an integrated and structured catalog of freight data in the United States, European Union, and China, arranged in a four-step freight-modeling structure, namely: Trip Generation, Trip Distribution, Mode Choice, and Route Assignment. Nine core data classes (socioeconomic-demographic, commodity/goods, multimodal networks, ports, trade, flow databases, geographical references, regulation/code and transportation means) were thoroughly listed, normalized and described in a single metadata spreadsheet. Methodological advances are elaborate criteria of dataset evaluation (spatial and temporal coverage, completeness, accuracy, accessibility, licensing and metadata quality), standardized commodity classification crosswalks and spatial and temporal harmonization workflows that can be reproduced. Among the barriers to data access and reuse, the paper mentions inconsistent documentation, the lack of appropriate metadata standards, inconsistent frequencies of update, and schema incompatibility. The paper suggests feasible solutions that may be used to improve the interoperability of datasets such as standard documentation, open-access portals, consistent version control, and stable APIs. Filling in the identified data gaps, especially at sub-national granularity, time resolution, commodity and mode specificity, and regulatory standardization will go a long way to increasing the accuracy and applicability of freight models. This data-driven systematic framework assists researchers and policymakers to develop more transparent, reproducible and internationally comparable freight-flow models that can be used to make informed infrastructure decisions and effective freight transportation policies. Freight Transportation Data Catalog Metadata Standardization Spatial Harmonization Data Integration Multimodal Networks Reproducible Research Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 1. Introduction Freight transportation systems that are efficient, robust, and responsive are determinants of economic vibrancy and resiliency of regions across the world (He et al. 2021 ; Nenavath 2023 ). Global trade, regional integration and sustainable economic growth depend on the smooth flow of goods through complex multimodal networks (Liao et al. 2023 ). Freight research revolves around better understanding and modeling the freight movement on highway networks to support local transportation, land use, economic development, and comprehensive planning (Shen et al. 2020 ). Freight transportation remains heavily constrained by the issue of data fragmentation; whereby different datasets are classified differently, have incompatible spatial scales, are not updated on the same frequency and are isolated to different governmental and organizational bodies (Albert and Schaefer 2013 ; Anderson et al. 2010 ; Illemann et al. 2021 ; Moschovou et al. 2019 ). This fragmentation greatly limits the capacity of planners, policymakers and researchers to properly model freight flows, locate infrastructure bottlenecks, and evaluate the effects of transportation policies in an integrated and coherent environment. Today, the data of interest to freight transportation is distributed among several administrative jurisdictions, including the U.S., the European Union (EU), and China, which have different data collection, management, and distribution strategies. The United States has various agencies that maintain key datasets such as the Bureau of Transportation Statistics (BTS), U.S. Census Bureau, Federal Highway Administration (FHWA), Energy Information Administration (EIA) among others. EU sources of freight-related information are mostly Eurostat and other Directorate-General websites (e.g. DG MOVE), with additional commercial and national databases. similarly, data sources of freight in China are predominantly maintained by the National Bureau of Statistics (NBS), Ministry of Transport (MOT), and the General Administration of Customs with each using different spatial definitions, coding schemes, and frequencies of update. Although large volumes of data are available in these regions, they are heterogeneous and thus integration is a major problem. The spatial granularity of freight data is extremely diverse, ranging over national and regional levels (e.g., states in the U.S., NUTS regions in the EU, provinces in China) to counties, municipalities, or even traffic analysis zones (TAZ) (Guo and Aultman 2014 ). Data update frequencies are also frequently widely variable, ranging between almost real-time tracking of vessels and trucks to decennial censuses and annual or quinquennial economic surveys, which complicates the harmonization of data across datasets (Gorman et al. 2023 ). Commodity classification also varies widely across jurisdictions and data sets, including between conversions to and from Harmonized System (HS), Standard Classification of Transported Goods (SCTG), North American Industry Classification System (NAICS) and European Combined Nomenclature (CN) and Chinese customs codes (Donnelly 1999 ). Lack of uniformity in commodity and geographic crosswalks has proved to be a continuous impediment to the strong freight-flow modeling. The implication of such a global challenge is very wide. In the absence of harmonized, integrated and comprehensive data, the process of freight-flow modeling can be cumbersome, methodologically inconsistent and difficult to replicate. Scientists and researchers often have to struggle with incompatible data formats, inaccessible or insufficiently documented data, and inconsistent coverage of the spatial or temporal extent of data (Kessler 2018 ; Osman and Qutayan 2023 ; Titschack et al. 2018 ). In addition, regulatory and policy data, which are essential in modeling freight transport because they affect allowable load limits and routing restrictions, and mode choice, are not readily available in standardized, machine-readable formats, and therefore, simulations and scenario tests are not practical. With these ongoing issues, this paper presents an in-depth answer to the problem by systematically cataloguing, classifying, and analyzing significant freight-related data in the U.S., EU, and China. The core of the work is the combination of these datasets into a longer, but obviously simplified conceptual framework of the classical four-step freight modeling process (Trip Generation Trip Distribution Mode Choice Route Assignment). This framework is an organizing tool more than an analytical destination, and it explicitly shows how various data can be used to feed particular decision points in freight-flow analysis. The remainder of this paper is organized as follows. Section 2 explains the research study design and methodology applied in gaining identification, screening and evaluation of freight-related dataset in the U.S., EU, and China, i.e. classification schemes as well as data-quality standard. Section 3 presents data inventory and classification. Section 4 is a cataloged and assessed list of nine data categories (socioeconomic, commodity, network, ports, trade, flow, geographical, regulatory, and transportation means) available per region with their main metadata. Section 5 corresponds these data sources to the four steps of modeling with noted overlap and multi uses. Section 6 provides best practice and guidelines on how to integrate space, time, codes, effective zones. Section 7 describe about the data sharing, accessibility and reproducibility. Section 8 lists the current gaps in data and future priorities in the development of data. Finally, Section 9 is a summary of the contributions and proposed further research. The metadata tables are presented at the appendices. 2. Methodology 2.1 Conceptual framework The conceptual model of the current work is a continuation of the traditional four-stage model of Trip Generation, Trip Distribution, Mode Choice, and Route Assignment as a framework support of world data catalogue. Here we begin with the basic decision steps on which the freight demand modeling and network loading is founded and incrementally build up with four interconnected modules, (Data Inventory and Classification, Integration and Harmonization Workflows, the Four-Step Modeling Framework, and Data Sharing, Accessibility and Reproducibility), to produce a solid end-to-end system. Nine categories of data are superimposed about this axis of sequential decisions–socioeconomic-demographic, commodity/goods, multi‐modal network, ports, trade, flow, geographical, regulation/code, and transportation means (each providing specific inputs or limitations to one or more stages). It uses spatial and classification crosswalks (e.g. NUTS ↔ FIPS ↔ provincial IDs; HS ↔ SCTG ↔ NAICS) thus enabling the different data architectures of the United States, European Union and China to be used together. A top-level data to decision mapping is established to identify which categories feed which modeling stage and feedback loops (Affum et al. 2003 ; Elghazaly et al. 2023 ) such as using results of the route-assignment to refine impedance functions and mode-choice parameters are identified to allow dynamic model refinement. Finally, sharing practices, such as stable APIs, machine‐readable metadata, version control, and open licensing, makes such that each dataset, transformation, and scenario is transparent, traceable, and easily reusable (see Fig. 1 ). Such a multi-region, standardized design ensures that modelers can be able to trace all their parameters to reported sources, compare the methodological choices between jurisdictions, and can iteratively revise their models as new data become available. 2.2 Systematic data identification A three-fold approach was used to develop a coverage of comprehensive inventory of freight-related datasets involving the United States, European Union, and China. We started by browsing the official websites and APIs of the major statistical and transportation agencies of every region. This was the U.S. Census Bureau (Census API for ACS and Decennial data), Bureau of Economic Analysis, Bureau of Labor Statistics, Bureau of Transportation Statistics (NTAD, CFS, FAF), Federal Highway Administration (HPMS), U.S. Department of Agriculture (QuickStats API), U.S. Army Corps of Engineers (WCSC), Energy Information Administration, FMCSA (Safety Measurement System), PHMSA (NPMS), and FAA (NFDC) in the United States. In the case of the European Union, we used Eurostat REST API and GISCO portal (NUTS regions, COMEXT, External Trade), EuroGeographics EuroGlobalMap and DG MOVE TRANSTOOLS resources. In China, we used National Bureau of Statistics (Statistical Yearbooks API), General Administration of Customs (HS trade data) and the GIS and annual reports of the Ministry of Transport. Second, targeted keywords of transportation journals and conference proceedings (e.g., “freight data sources”, “Commodity Flow Survey validation”, “Eurostat COMEXT assessment”) were used to find peer-reviewed assessments and other repositories. Third, each portal user guide, metadata dictionary and API documentation were systematically mined (including spatial/temporal coverage, classification schemes, update frequencies, access techniques and known limitations). This close-to-the-source process produced a standardized list of candidates in nine categories of data, the foundation of the metadata catalog and analysis of its quality. 2.3 Inclusion/Exclusion Criteria In order to ensure a consistent and high-quality inventory of data at the United States, European Union, and Chinese scale, we used four objective inclusion/exclusion criteria: (1) Spatial Coverage: only datasets with full national or regional coverage, i.e. covering over 50 U.S. states (plus D.C.), all EU member states at NUTS levels 02, or all Chinese provinces, were included; sub-national sources without clear aggregation pathways were catalogued separately but not included in the core framework; (2) Temporal Coverage: sources had to cover the 2019–2024 period or follow a documented update schedule (annual, quarterly or quinquennial) with any time series missing more than one reporting period excluded unless an explicit justification could be given and documented; (3) Documentation Quality: only sources with formal metadata dictionaries or user guides (including definitions of variables, units, classification schemes and methodological notes) were retained, and undocumented or poorly documented data were omitted; and (4) Cost and Accessibility: freely available or institution licensed data were favored; pay-per-download products were mentioned but not considered unless no free equivalent was available, and subscription only sites were only considered where academic or governmental access conditions applied. 2.4 Classification & Crosswalk Approach To be able to compare across jurisdictions, we developed and implemented two concurrent cross walks: one to map commodity classifications and the other to harmonize spatial units. (1) Mapping of Commodity Code: The basis was the conventional conversion tables- HS ↔ SCTG and SCTG ↔ NAICS of the Bureau of Transportation Statistics (BTS) and the U.S. Census Bureau, HS6 ↔ CN8 of Eurostat, HS equivalents of China Customs. These tools enabled us to convert import/export and production data to a common six-digit SCTG code schema. Proportional allocation of one-to-many mappings was done with published factors of share and any unmapped or deprecated codes were recorded to be manually resolved to maintain the consistency; and (2) Spatial Unit Harmonization: During the second step, FIPS and OMB MSA codes at the county level in the U.S., NUTS 02 regions in the European Union, and provincial/county codes in China, were reconciled through master lookup tables and GIS spatial joins. To relate zones at various scales, we used the definitions of MSA by OMB, GISCO boundary files by Eurostat and National Fundamental Geographic Information System by China. Zone centroids were calculated and where possible re-assigned to corresponding higher-level areas or TAZ polygons, so that all socioeconomic, commodity, network and flow datasets are referenced to the same spatial units in further modeling. 2.5 Data Quality Assessment To provide a solid basis of modeling, each dataset in the catalog was tested against four major dimensions, namely completeness, timeliness, accuracy, and documented limitations. The user guides and metadata of each of the sources were thoroughly read to determine (1) completeness, e.g., whether flows of suppressed CFS or missing HPMS truck-percentage records are tracked; (2) timeliness, e.g., by measuring the lag between the reference year and publication, e.g., CFS 2022 published in mid-2024; (3) accuracy, e.g., distinguishing between survey-based, modeled, or administrative data and noting reported margins of error (e.g., +/-8 percentage points in All these indicators of quality were compiled in a master metadata spreadsheet (Table 1 ), wherein each dataset had one row and columns to record spatial/temporal coverage, key variables, access methods, update schedules, cost/license, and a summary quality rating. This cross-table becomes the basis of the catalog introduced in Section 4 as well as the basis of the mapping of datasets to modeling stages in Section 5 where users can filter and prioritize sources based on their completeness, latency, and reliability needs. Table 1 Unified metadata fields Column Attributes Explanation Region Data coverage at specific region Dataset Name The official name of the data source (e.g., “QCEW”, “Freight Analysis Framework”). Agency / Source The organization responsible for producing and maintaining the data (e.g., U.S. Census Bureau, BTS, FHWA). Spatial Coverage The geographic granularity (e.g., “World”, “Country”, “State”, “Province”, “City”, “County”, “MSA”, “Census Tract”). Temporal Coverage The time coverage and frequency (e.g., “Annual”, “Quarterly”, “Every 5 years”, etc.,). Key Variables Key fields or metrics provided (e.g., “Population, Employment by NAICS”, “Tonnage, Value, Modes”, “Link AADT, Truck%”). Access Method Where and how to obtain the data (e.g., “Census API (JSON/CSV)”, “FHWA FTP”, “USDA QuickStats API”). Data Format File or service formats available (e.g., “CSV”, “XLSX”, “Shapefile”, “GeoJSON”, “GeoData”, “PDF”, “TransCAD”, “XML”, etc.). Cost / License Any fees or licensing requirements (e.g., “Free”, “Registration required”, “Subscription”, “License”). 3. Data inventory and classifications 3.1 Core data categories In order to catalog and assess the vast landscape of freight related datasets in a systematic way we have developed nine broad and mutually exclusive classes that are important to the analysis and modeling of freight flows: (1) Socioeconomic-Demographic Data , including population distributions, household income, employment data, and economic activity by industry (Kumar and Bisht 2020 ; Liu and Herold 2007 ; Sharath et al. 2016 ; Stávková et al. 2013 ; Stillwell et al. 2013 ); (2) Commodity/Goods Data , providing details on commodity-specific volumes, tonnage, values, and classification structures (Abzalov 2016 ; Eisele et al. 2013 ; Voytov 2013 ); (3) Multi-Modal Network Data , describing the geometry, capacity, speed, and attributes of roads, railways, waterways, airways, and pipelines (Baggag et al. 2018 ; Paipuri et al. 2021 ; Pasquale et al. 2021 ; Saeedi et al. 2020 ; Wang et al. 2023 ); (4) Ports Data , detailing port capacities, throughput, infrastructure, and intermodal connectivity (De Langen and Sharypova 2013 ; Dwarakish 2020 ; Tian et al. 2009 ); (5) Trade Data , capturing international import-export transactions, commodity types, and partner-country flows (Wagner 2018 ; Wang and Yu 2013 ); (6) Flow Databases , offering origin-destination matrices and observed traffic volumes to calibrate and validate freight models (Güler 2014 ; Mahmoudabadi and Mahmoudabadi 2015 ; Rezzouqi et al. 2024 ); (7) Geographical Data , outlining administrative boundaries, metropolitan areas, Traffic Analysis Zones (TAZ), and related geographic references (Butenko et al. 2024 ; Pidgrushnyi et al. 2021 ; Xie et al. 2024 ); (8) Regulation/Code Data , including rules, standards, and restrictions affecting freight transport such as vehicle weight limits, hazardous materials regulations, and routing restrictions (Field 2001 ; Moreno and Tutrone 2000 ); and (9) Transportation Means Data , describing the characteristics and capacities of freight transport vehicles across modes (Abate 2014 ; Beuthe et al. 2001 ; Nocera et al. 2021 ; Sun et al. 2024 ; Tjandra et al. 2024 ). 3.2 Alternative Data Grouping Schemes Considering the complexity and heterogeneity of freight data, other groupings were developed to help both researchers and practitioners to identify appropriate datasets within a short period of time based on particular analytical or application requirements. Three other classification schemas were formulated: (1) Spatial Granularity : The data were categorized according to their spatial resolution and coverage, i.e., national (e.g., GDP of a country, commodity flows), regional (e.g. states, NUTS regions, Chinese provinces), sub-regional (county and metropolitan statistical areas), and micro (TAZ and city-block). This hierarchical structure of spaces helps the user to choose a dataset applicable in macro-level policy analysis, meso-level infrastructure planning, or micro-level operational modeling (Nocera and Gardoni, 2023 ); (2) Temporal Frequency : Understanding the essential nature of up-to-date data, we have grouped the datasets in terms of how frequently they are updated, and the range is decennial (e.g., Decennial Census), quinquennial (Commodity Flow Survey), annual (Regional Economic Accounts, Freight Analysis Framework), quarterly (Quarterly Census of Employment and Wages), monthly (port and custom trade statistics), and near real-time (AIS vessel tracking, NPMRDS highway performance data). This temporal grouping enables data integration plans and the determination of appropriate temporal inputs to enable dynamic modeling and monitoring situations (Lunetta et al. 2004 ; Wulder et al. 2021 ); and (3) Functional Role : We also categorized the datasets based on their functionalities in freight modeling. Generation data are socioeconomic-demographic and commodity data used to inform trip-end generation; Distribution data cover flow matrices, trade and port data, which are useful in OD assignment; Network data describe transport infrastructure and support mode choice and route assignment; and Validation data are observed flow databases, such as FAF, TRANSEARCH, and traffic counts (HPMS, NPMRDS), which are necessary to calibrate and validate the model (Mahajan et al. 2021 ; Safdar et al. 2024 ). These alternative classification structures provide users with flexible, purpose-oriented approaches to quickly navigate the expansive data inventory, facilitating rapid identification and prioritization of datasets according to their modeling objectives. 3.3 Master Metadata Spreadsheet To efficiently capture, track and allow rapid access to query the large collection of data sets discovered, we developed a master metadata spreadsheet (Table 1 ) to capture, track and allow rapid access to query the large collection of data sets discovered. All the datasets were listed as separate rows of data, and standardized columns clearly documented their characteristics: Region (U.S., EU, China), Dataset Name, Agency/Source responsible to maintain the dataset, Spatial Coverage (national, regional, county, or TAZ-level), Temporal Coverage and frequency of update, Key Variables (e.g., tonnage, employment figures, network attributes), Access Method (direct download, API, web portal), Data Format (CSV, Shapefile, JSON, Excel), Cost and Licensing requirements (free, subscription, institution The spreadsheet with its detailed and organized format has several important purposes: to present a transparent, centralised reference point that allows researchers to easily find appropriate datasets given clear criteria; to serve as a source of systematic evaluation and comparison between datasets and jurisdictions; and as the basis of future updates of datasets, attempts at their integration, and sharing between researchers and modeling practitioners. This inventory dramatically increases the ease of use, reproducibility, and comparability of data to global freight transportation modeling and analysis because dataset metadata are standardized and data characteristics are well documented. 4. Data Categories & Source Catalog Following we provide an overview of the nine fundamental data categories, their application in the four-step model and exemplary data sources in the United States, European Union and China. Appendix A (Tables A.1 to A.9 ) contains full eight-column metadata tables. 4.1 Socioeconomic–Demographic The Trip Generation is based on socioeconomic and demographic data because it provides zone-level measures of freight generation (e.g. industrial employment, GDP) and attraction (e.g. population, household income) (Jahan and Zhou 2023 ; Ponte et al. 2020 ). Seven core sources are listed in Table A- 1 in Appendix A, covering the United States, European Union and China. The American Community Survey (ACS) in the U.S. offers five-year rolling estimates of population, income, and NAICS-2 employment at the tract to state levels; the Decennial Census offers a full population and housing count every ten years; the Quarterly Census of Employment and Wages (QCEW) delivers quarterly employment and establishment counts by 6-digit NAICS at the county/MSA level; BEA Regional Economic Accounts report annual GDP and personal income; and National Transportation Statistics at the BTS provides an annual set of socioeconomic profiles at national to The Eurostat NUTS Regional Statistics provide annual population, NACE-based employment and GDP of NUTS 0–3 units, whereas Chinese Regional Yearbooks provide population, GDP and sectoral employment at the province and municipality level. 4.2 Commodity/Goods The commodity and goods data have two purposes: they provide volumes of production and consumption to the Trip Generation and provide observed flow values to the Trip Distribution (Holguín and Thorson 2000 ; Kawasaki et al. 2023 ; Zhou et al. 2009 ). Appendix A Table A- 2 lists six major sources. In the United States the Commodity Flow Survey (CFS) gives quinquennial state- and MSA-level flows by six-digit SCTG code (tons, value, distance band, mode); the Freight Analysis Framework (FAF) gives a complementary five-year snapshot of two-digit SCTG flows (tons, ton-miles, value, mode shares) with spatial products in Excel, CSV, and shapefile formats; the USDA-NASS Crop Production series provides annual county- and ZIP-level estimates of acreage, The COMEXT database of Europe provides monthly and annual import/export tonnages and values of all member countries in CN/HS codes and the China Customs Statistics provides the similar national and provincial HS-coded trade figures. In combination, these sources allow calibrated production of commodity-specific trip ends and origin destination matrices that mirror the actual patterns of freight movement in the real world. 4.3 Multi-Modal Network Multi-modal network databases give the spatial graph and skim attributes: link geometry, functional class, capacity, and speed necessary to both mode-split analysis and network loading (Jetlund et al. 2019 ; Jiang et al. 2022 a). In the United States, quinquennial highway and rail centerlines with functional class, lane counts, and centerline length are provided by the National Transportation Atlas Database (NTAD); AADT, truck percentages, and pavement data by the Federal Highway Administration HPMS at a state level; segment capacity and speed limits by the AAR Class I Rail GIS (subscription); channel geometries and lock locations by the NTAD Inland Waterways; real-time and historical vessel paths by the NOAA AIS vessel-tracking (MarineCadastre.gov); 202 The pan-continental infrastructure in Europe is made up by the EuroGlobalMap Transportation layers of EuroGeographics and TEN-T GIS corridors. The Ministry of Transport Network GIS of China provides the yearly national highway, waterway, and rail links schematics, speeds, and capacities. These sources together form the foundation of the impedance functions and capacity constraints within Mode Choice models and also form the substrate of Route Assignment within U.S., EU, and Chinese freight networks (see Appendix A, Table A-3). 4.4 Ports The data on port throughput and terminals infrastructures are critical to the allocation of international and intermodal freight flows (Trip Distribution) and the evaluation of the appeal of maritime modes (Mode Choice) (Tongzon 2009 ; Xinchang 2010 ). Shen et al. ( 2020 ) visualized major sea ports with their handling capacities and optimal freight flow in 2D in GIS and in 3D in Google Earth with total and most important goods imported/exported via the maritime networks. Figure 3 depicts the ports and their connectivity around the world. Appendix A, Table A-4 lists four major sources in the U.S., EU, and China. The USACE Waterborne Commerce Statistics Center (WCSC) in the United States publishes annual tonnages by commodity group and domestic vs. foreign splits as CSV, PDF and Shapefile files; S&P Global PIERS (Customs Import/Export) reports monthly and annual container TEUs, weights, HS codes and partner country information (subscription required); BTS Port Performance Freight Statistics tracks monthly and annual dry-bulk tonnages and vessel calls by top 150 U.S. ports; and MARAD National Port Plans reports berth lengths, maximum draft, The Eurostat Port Throughput data set of Europe includes quarterly and annual TEUs, bulk tonnages and vessel calls in major EU ports and the Sea Ports compilation of China (sourced through UNCTAD, HDX/OSM, SeaRates and WPI) includes annual container throughput and total tonnage. 4.5 Trade The trade databases are used to obtain the statistics of the cross-border flow that are required in allocating freight volumes in Trip Distribution, and they record the monetary and mass flows between origin destination (Gingerich et al. 2016 ; Kirby 1970 ). USA Trade Online (USTO) provides monthly and annual values and weights of imports and exports by HS 6-digits by port and trading partner (subscription portal with free summaries) and TradeStats Express (ITA) provides annual trade values by NAICS or commodity group at national, state, and metropolitan levels. UN Comtrade provides data on the value and quantity of trade between countries on an annual basis broken down by HS code, freely available through an API (API key required) that can also be used as a cross-validation benchmark. The European Union is based on the Eurostat External Trade (COMEXT) for monthly and annual HS-coded import/export tonnages and values. The General Administration of Customs in China makes national and provincial HS 6-digit trade statistics available on its web portal. In combination, this allows us to have the ability to calibrate and validate the OD matrices of international and intermodal freight distributions (see Appendix A, Table A-5). 4.6 Flow Databases Flow databases deliver the observed origin destination matrices and link level performance measures required to calibrate Trip Distribution models and to verify Route Assignment results (Furlonge 2019 ; Wei et al. 2025 ). Shen ( 2017 ) highlights about the US trade flow at global and local level, important US international trade patterns for top regions is shown in Fig. 4 . The Freight Analysis Framework (FAF 5.6.1) in the United States provides annual state-to-state and MSA-to-MSA OD tonnage, ton-miles, value, and mode shares in 20072022; TRANSEARCH (IHS Markit) supplies subscription-based annual county-to-county OD flows by commodity group and mode; and the NPMRDS of FHWA provides monthly/daily travel times, speeds, measures of reliability, and implied truck volumes on highway segments. Freight Transport Statistics In the Freight Transport Statistics provided by Eurostat, annual OD tonnage and ton-kilometer flows by mode within EU member states are provided. The provincial OD flows by commodity are reported annually in China Freight Transport Yearbook. The combination of these datasets allows stringent calibration of distribution parameter and empirical verification of route assignments at scales and regions (see Appendix A, Table A-6). Source: Bureau of Transportation Statistics - FAF 5.6.1 4.7 Geographical Geographical datasets specify the analysis areas and facilitate all modeling steps, including the creation of trip origins and destinations, assigning flows on networks, by supplying consistent definitions of boundaries and network nodes (Curtin 2007 ; LI 2006 ). Appendix A, Table A-7 lists 5 primary sources: U.S. Census TIGER/Line shapefiles (annual boundary files of states, counties, tracts, roads and TAZs); BTS NTAD geofiles (continuous updates of multimodal network lines and intermodal points); MPO-provided Traffic Analysis Zone boundaries (variable 5-year update on local TAZ definitions); Eurostat GISCO (annual NUTS boundaries and European network geofiles); and China National Fundamental GIS (annual administrative boundaries for provinces, cities, and counties) 4.8 Regulation/Code The legal and operational constraints that exist in modal feasibility and routing choices are defined by regulatory and code databases (Baldacci et al. 2009 ; Browne et al. 2022 ; Dopilka and Balobanov 2024 ; Fialkoff et al. 2017 ). Appendix A Table A-8 lists six primary sources in the United States, European Union, and China, weight and dimensional restrictions, hazmat classifications, carrier safety ratings, and infrastructure load ratings. The U.S. has 49 CFR Title 49 (eCFR) national HAZMAT classes, weight/dimension limits, and routing requirements; monthly carrier-level safety and inspection scores on the Federal Motor Carrier Safety Administration (FMCSA) Safety Measurement System (SMS); pipeline alignments and high-consequence area designations on the PHMSA National Pipeline Mapping System (NPMS); and annual bridge load ratings and posted weight restrictions on the National Bridge Inventory (NBI). ADR Dangerous Goods by Road agreement established continent-wide regulations on HAZMAT in Europe, whereas GB Road Vehicle Standards establish national size and weight requirements in China. These datasets are used by modelers to impose mode-choice utility functions and link-level constraints in the route assignment process. 4.9 Transportation Means Databases of transportation means provide the composition of fleets and equipment capacities that are essential to the estimation of modal costs and the imposition of capacity constraints in Mode Choice and Route Assignment (Jiang et al. 2022 b; Xu and Chow, 2022 ; Zhu et al. 2020 ). Table A-9 in Appendix A lists five important sources: in the United States, the BTS reports counts of trucks, railcars, vessels, and aircraft by classes with payload capacities in the National Transportation Statistics (NTS); the USACE Waterborne Commerce Vessel Characteristics (WCUS) Parts 1–4 reports vessel types, deadweight tonnages and trip counts on the U.S. waterways; the STB Public Use Waybill Sample (PUWS) provides annual origin-destination BEA flows, car counts, and tonnages of the national rail network; The Transport Equipment Statistics in Eurostat provides annual EU fleet figures and payload capacities by mode; the Transport Equipment tables in the Chinese Statistical Yearbook provide national truck fleets, railcars, vessel inventories and aircraft counts. These data sets make mode choice models consistent with capacity constraints and route assignments distribute flows at actual equipment capacities. The spatial granularity and update frequencies vary for each region. For complete attribute details refer to the individual tables in Appendix A. Each region’s relative spatial granularity, update frequency, and metadata transparency for the nine data categories is expressed in Fig. 5 —guiding practitioners toward regions and datasets with the strongest foundation for four-step freight modeling. 5. Mapping Databases to the Four-Step Framework To make the four-step model practical, every category of data (Section 4) is clearly associated with the stage(s) of the decision in which it is used. These mappings are summarized in Table 2 (below), and the following subsections give the detail of the role of each category, with reference to the full metadata in Appendix A (Tables A.1-A.9). 5.1 Trip Generation Inputs Trip Generation needs zone-based marks of the freight sources and attractions. Socioeconomic-Demographic statistics (Section 4.1; see Appendix A, Table A.1) provide population, income, employment, and GDP values that are associated with production and consumption. Sources of Commodity/Goods (Section 4.2; see Table A.2) give physical volumes, in tons, value and production statistics, which allow commodity-specific generation. The analysis zones (counties, NUTS, provinces, TAZ) are determined by geographical boundaries (Section 4.7; Appendix A, Table A.7) and all the generation inputs have a common spatial framework to be allocated. 5.2 Trip Distribution Inputs Trip Distribution assigns created trip ends to origin destination matrices. Commodity/Goods data (Table A.2) provide commodity specific origin and destination totals. International and intermodal flow details are injected into cross-border and gateway allocations using ports (Section 4.4; Table A.4) and Trade datasets (Section 4.5; Table A.5). Flow Databases (Section 4.6; Table A.6) provide OD matrices and link volumes which are observed to allow calibration and validation of the model. Multi-Modal Network attributes (Section 4.3; Table A.3) give network impedance values–distances, speeds, connectivity, to a gravity or opportunity model. The definitions of geographical zones (Table A.7) are consistent in the origin and destination indexing and facilitate spatial aggregation/disaggregation. 5.3 Mode Choice Inputs Mode Choice identifies modal splits on an OD basis based on both cost and constraint information. The Multi-Modal Network layers (Section 4.3; Table A.3) provide travel distance, speed, and capacity of a highway, rail, water, air and pipeline modes. Regulation/Code databases (Section 4.8; Table A.8) have weight/dimension constraints, HAZMAT restrictions, and routing constraints which govern possible mode sets. The statistics on Transportation Means (Section 4.9; Table A.9) give the composition of fleets, payload capacities (vessel deadweight, railcar and truck capacities, aircraft types) that are used as inputs to generalized cost and utility functions that are in discrete choice formulations. 5.4 Route Assignment Inputs Route Assignment allocates mode specific OD flows to the links of a network subject to capacity and regulatory constraints. Multi-Modal Network geometries (Table A.3) specify the assignment algorithm link graph. Flow Databases (Table A.6) provide empirical link-level counts (e.g. AADT, truck volumes, travel times) to be used to validate. Prohibited links (weight restricted roads, HAZMAT bans) are identified on Regulation/Code data (Table A.8) and maximum throughput by mode is established on Transportation Means capacities (Table A.9) so that assignment does not violate physical and legal constraints. Table 2 provides a brief summary of how each data category contributes to the four modelling stages: a ‘✔’ signifies a key input to that step, allowing the reader to quickly determine which data sets to use when setting up Trip Generation, Distribution, Mode Choice and Route Assignment. Table 2 Summary Matrix Data Category Trip Generation Trip Distribution Mode Choice Route Assignment Socioeconomic–Demographic ✔ Commodity/Goods ✔ ✔ Multi-Modal Network ✔ ✔ ✔ Ports ✔ Trade ✔ Flow Databases ✔ ✔ Geographical ✔ ✔ ✔ ✔ Regulation/Code ✔ ✔ Transportation Means ✔ ✔ 5.5 Narrative on Overlaps Some of the categories of data are used in two or more stages of the modeling. To give an example, both Trip Generation and Trip Distribution are supported by the commodity/goods datasets (4.2) that can be used to obtain zone-level tonnages and flow patterns. Geographical boundaries (4.7) are universal—the spatial frame of all the steps of trip-end estimation to assignment. Multi-Modal Network data (4.3) feeds the Trip Distribution with impedance data, Mode Choice with cost data, and forms the assignment graph. Trip Distribution and Route Assignment are validated by Flow Databases (4.6), which completes the calibration loop. Mode Choice and Route Assignment is simultaneously limited by Regulation/Code (4.8) and Transportation Means (4.9) to provide legal and capacity-constrained routing. Being aware of such overlaps helps to update efficiently: a new flow dataset only has to be disseminated to the stages where it is relevant, and any upgrade to the network or regulatory data will automatically benefit multi-modal component. 6. Integration algorithms A successful multimodal freight-flow modeling requires not only an appropriate choice of datasets but also a meaningful and reproducible method for integrating them. The guidelines and best practices below address common issues of spatial, temporal, and categorical harmonization and provide approaches for disaggregation and aggregation. 6.1 Spatial harmonization Analysis will require all geographic layers (Section 4.7 ) to have the same projection and zone definition. First, note the native CRS of each dataset in metadata (e.g. EPSG:4269, 4326, 3857). Second, reproject all layers to one model CRS (e.g. NAD 83 Albers, EPSG:5070). Third, correct and fix topology errors (self-intersections, gaps) through zero-width buffering. Finally, calculate zone centroid (GEOID, NUTS_ID, province code) and snap them to the nearest multimodal network node with a small tolerance ( < = 10 m) to ensure network connectivity to build skim-matrix. Algorithm 1: Spatial Harmonization Inputs : - layers ← list of (path, native_crs) - network_nodes (shapefile) - model_crs ← EPSG:5070 Outputs : - harmonized_layers - snapped_zone_centroids 1 for each layer in layers do 2 data ← ReadShapefile(layer.path) 3 AssignCRS(data, layer.native_crs) 4 data ← Reproject(data, model_crs) 5 data.geometry ← RepairGeometry(data.geometry) 6 WriteShapefile(data, layer.name+"_5070.shp") 7 end for 8 zones ← ReadAllZoneLayers().to_crs(model_crs) 9 centroids ← ComputeCentroids(zones) 10 nodes ← ReadShapefile(network_nodes).to_crs(model_crs) 11 for each c in centroids do 12 n ← FindNearest(nodes, c) 13 if Distance(c, n) ≤ 10 m then c ← n 14 end for 15 WriteShapefile(centroids, "snapped_centroids.shp") 6.2 Temporal Harmonization Datasets (Sections 4.1 – 4.6 ) must align to a common reference year (e.g., 2022). Annual series require no resampling; quarterly data (QCEW) are aggregated to annual totals; five-year surveys (CFS) are held constant or interpolated only if documented. Record publication lags (e.g., CFS 2022 → mid-2024) and fill single-year gaps via linear interpolation, clearly annotating all assumptions. Algorithm 2: Temporal Harmonization Inputs : - datasets ← list of (frequency, values, reference_year, release_date) - base_year ← 2022 Outputs : - aligned_series - lag_table 1 for each ds in datasets do 2 switch ds.frequency : 3 case "quarterly": ds.annual ← Sum(ds.values[quarters in base_year]) 4 case "5-year": ds.annual ← ds.values[base_year] 5 case "annual": ds.annual ← ds.values[base_year] 6 end switch 7 ds.lag_months ← MonthsBetween(ds.reference_year, ds.release_date) 8 if Missing(ds.values[base_year]) then 9 ds.annual ← LinearInterpolate(ds.values[base_year − 1], ds.values[base_year + 1]) 10 end if 11 end for Figure 6 plots each dataset’s update cadence by region representing the nine data categories. This figure underscores regional differences in data timeliness and supports targeted harmonization strategies. 6.3 Commodity Code Crosswalks To harmonize trade and production statistics, combine HS-coded data with SCTG (and optionally with NAICS) through the official crosswalk tables (BTS, Eurostat and China Customs). Proportionally map one-to-many mappings with share fields and map agricultural units (bushels to tons), and log unmapped codes to be reviewed manually. Algorithm 3: Commodity Code Crosswalk Inputs : - trade_data(hs_code, qty, value) - cw_hs_sctg(hs_code, sctg_code, share) - ag_factors(commodity, lb_per_bushel) Outputs : - sctg_flows(sctg_code, tonnes, value) 1 merged ← Join(trade_data, cw_hs_sctg on hs_code) 2 for each rec in merged do 3 rec.tonnes ← rec.qty * rec.share 4 rec.value_sctg ← rec.value * rec.share 5 if rec.commodity in ag_factors then 6 factor ← ag_factors[rec.commodity] / 2000 7 rec.tonnes ← rec.qty * factor 8 end if 9 end for 10 sctg_flows ← Aggregate(merged by sctg_code sum(tonnes, value_sctg)) 6.4 Zone-to-Zone Disaggregation & Aggregation In cases that flows are reported in coarse units, downscale them by proxy shares (employment on manufactured goods, production on crops) or aggregate sub-zone values to model zones. Parent zones are mapped to target zones using a look up table and marginal totals balanced using iterative proportional fitting (IPF) where required. Algorithm 4: Zone-to-Zone Disaggregation & Aggregation Inputs : - parent_flows(parent_zone, total_tons) - proxy(zone, parent_zone, proxy_value) - lookup(zone → model_zone) Outputs : - model_zone_flows(model_zone, allocated_tons) 1 for each p in proxy do 2 p.share ← p.proxy_value / Sum(proxy.proxy_value where parent_zone = p.parent_zone) 3 end for 4 alloc ← Join(parent_flows, proxy on parent_zone) 5 for each a in alloc do 6 a.allocation ← a.total_tons * a.share 7 a.model_zone ← lookup[a.zone] 8 end for 9 model_zone_flows ← Aggregate(alloc by model_zone sum(allocation)) 6.5 Handling Missing or Sparse Data Impute on a systematic basis suppressed or missing values by allocating parent-zone totals through proxy distributions, and flag imputed records. In the case of completely missing zones, use regional averages or fallback proxies, and log all the rules in an imputation log. Algorithm 5: Handling Missing or Sparse Data Inputs : - dataset(zone, value) - proxy(zone, proxy_value) - threshold Outputs : - dataset_imputed(zone, value, source_flag) 1 for each rec in dataset do 2 if rec.value is missing OR rec.value < threshold then 3 siblings ← dataset where same parent_zone 4 share ← proxy[rec.zone] / Sum(proxy[siblings.zone]) 5 rec.value ← parent_total * share 6 rec.source_flag ← "imputed" 7 Else 8 rec.source_flag ← "reported" 9 end if 10 end for Using these guidelines and algorithms, which are based on the metadata of Section 4, modelers can build, validate, and update four-step freight‐flow models in a transparent and reproducible manners. 7. Data Sharing, Accessibility & Reproducibility Freight transportation modeling, analysis, and effective policy formulation are important to data sharing, ease of access, and reproducibility. Though large volumes of data are available in the United States, European Union, and China, their practical use is frequently hindered by accessibility problems, restrictive licensing, poor documentation, and incompatibility of data formats. These concerns are addressed systematically in this section and the major barriers and opportunities that affect data reuse by a researcher, policy maker and practitioner are highlighted. 7.1 Access mechanism The three most common ways of accessing freight related data are direct downloads of the data on official websites or using FTP servers, Application Programming Interfaces (APIs) and data subscriptions. The U.S. has well-developed open-access systems (e.g. Census APIs, FHWA portals, BTS National Transportation Atlas Database (NTAD)), however, certain key sources like TRANSEARCH or detailed PIERS customs data are subscription-only thus restricting their usability. In the same way, the Eurostat COMEXT and GISCO portals of the EU are both widely open-access, with a limited number of more detailed trade and multimodal datasets that can only be accessed on registration or with institutional licenses. China The National Bureau of Statistics (NBS) and the General Administration of Customs provide data via publicly available portals, but these data are frequently in formats (e.g., PDFs, non-machine-readable formats) that limit more efficient data integration and reproducibility. The existence of inconsistent and restrictive access procedures therefore poses great challenges limiting large-scale comparative analysis and modeling activities across international boundaries. 7.2 Metadata standards Effective data reuse requires standardized and complete, clear metadata to explain the structure of data sets, methodologies, limitations and classification schemes. The quality of metadata currently differs significantly: the U.S federal datasets (e.g., ACS, FAF, USDA QuickStats) usually include detailed machine-readable documentation, definitions, and user guides, whereas many EU and Chinese datasets (e.g., China Customs, MOT) offer little metadata in human-readable formats (PDF, HTML), and need substantial manual interpretation. This absence of consistently structured, machine-readable metadata makes it much harder to automate the integration, cross-reference and reproducibility of datasets. Improved standardization in metadata (via international guidelines or widely-used schemas, e.g. ISO 19115 to describe geospatial datasets or DataCite standards to describe general datasets) would go a long way towards making data much easier to discover and integrate. 7.3 Version control Freight modeling that is accurate needs to be well documented with versioning of datasets, update frequency, and provenance. The frequency of updates is however widely varied, some are in real time (AIS vessel tracking, NPMRDS traffic data) and others annually or quinquennially (CFS, FAF) and this can be a source of potential mismatch in integration workflows. In addition, there are no clear version histories or change logs in data sets, which means that it is hard to run the same analyses at different times. As an example, there is a gap between data gathering and release (e.g., the CFS 2022 is released in the middle of 2024), which requires special treatment when combined with more up-to-date data. Implementation of standardized version-control (such as clear versioning, regular release, clear change logs, and Digital Object Identifiers (DOIs)) would help to increase transparency, enable reproducibility, and provide the possibility to conduct reliable time-series analysis. 7.4 Barrier to reuse The actual reuse of freight-related data is often complicated by long-standing operational impediments: broken or stale API endpoints and data connections, poorly documented fields and variables, inconsistent data schema, and format changes with insufficient warning. Such problems compel researchers and practitioners to consume a lot of time in cleaning, transforming, and reconciling data (Wang et al. 2015 ). Schema drift the gradual and unannounced change of data structures or definitions of variables is a special problem (González et al. 2024 ). Such discrepancies greatly diminish efficiency and make it hard to reproduce earlier-published research findings. 7.5 Public availability of the Master Metadata Spreadsheet To increase the reproducibility and generalizability of this research, the master metadata spreadsheet, which contains standardized information on more than 50 freight-related datasets in the U.S., EU, and China, is publicly available. This resource is deposited in GitHub, which is an open-access repository that grants persistent identifiers and version control to allow long-term access. The spreadsheet can be found at: https://github.com/haroonbaloch770/Master-Metadata-Spreadsheet-for-Multimodal-Freight-Databases-in-the-US-EU-and-China . It can be downloaded by users in the form of .xlsx. Subsequent releases, e.g. of new datasets, or harmonization workflows, will be versioned and announced through the repository. The strategy will overcome the usual obstacles to data reuse, including inconsistent access, and comply with the best practices of open science in transportation research. These challenges need to be handled holistically and therefore a concerted effort is needed. Agencies are encouraged to use and implement standardized, open data formats (e.g. CSV, JSON, GeoJSON), to keep API endpoints stable and backward compatible and provide well-documented machine-readable metadata. Moreover, data sets must be hosted on federated repositories or portals with stable interfaces, well defined data schemas, well documented and transparent licensing conditions. These steps will facilitate greater reuse and improve reproducibility and greatly simplify obstacles to strong, cross-jurisdiction freight modeling. 8. Data gaps and future needs Although freight data inventories are comprehensive in the United States, the European Union and China, significant gaps continue to limit the potential to conduct truly comprehensive, cross regional analysis. Spatially, numerous sub-national and cross-border corridors are not evenly covered: Traffic Analysis Zone definitions exist in only a few U.S. metros, the NUTS hierarchy in Europe excludes similar urban areas, and the flow data in China at the prefecture level are inconsistently reported, leaving essential sub-provincial movements to remain obscure. Temporally, infrequent major surveys (e.g. the five-year Commodity Flow Survey and Freight Analysis Framework) and publication lags of 12–18 months make it difficult to address changing market conditions in a timely manner, and high frequency series cannot provide the sectoral breakdown that is essential to detailed modeling. Taxonomies of modal and commodity transport are still coarse: inland waterway and coastal barge traffic is under-represented in non-dedicated AIS streams, intermodal terminal activity and last-mile urban freight are largely unobserved, and even two-digit SCTG or broad NACE categories conceal intra-category heterogeneity critical to accurate distribution and mode-choice analysis. The regulatory and policy restrictions are distributed into various repositories (weight limits, hazardous-goods regulations, carrier safety statistics are all regulated in different jurisdictions and in different formats) leading to a heterogeneous legal environment that is not standardized and does not have a machine‐readable interface. Future data initiatives must adopt a multi-faceted approach in order to address these weaknesses. The creation of a system of biennial, sub-national commodity-flow surveys located at the county or prefecture level would allow the fine-scale temporal and geographic resolution that is lacking in quinquennial reports. Link-level volumes and speeds of freight in real time, which would require IoT sensors, telematics, and mobile‐network data, could provide continuous operational decision information with less latency. The development of commodity taxonomies to lower levels of coding (e.g., HS 8-digit or SCTG 7-digit) with open‐source crosswalks would allow capturing important product differences and facilitate more refined distribution and mode‐choice modeling. Development of a centralized, version‐controlled global API of multimodal regulatory restrictions, including weight/dimension restrictions, HAZMAT categories, and emission regulations, would standardize the legal demands across boundaries and modes. Finally, the creation of standardized datasets of international freight corridors, with standard zone definitions and intermodal terminal inventories, would allow easy modelling of cross-jurisdictional supply chains, and assist with integrated infrastructure planning. Figure 7 superimposes the five important gap dimensions of the U.S. (blue), EU (green), and China (red), showing that the U.S. has high coverage for modal & commodity detail, regionally, and temporally but moderate gaps in regulatory information and policy data, the EU has mid-range coverage but severe gaps in policy-data coverage, and China has the lowest coverage in terms of commodity and temporal granularity but better coverage in terms of regulatory mapping. This comparative radar plot concisely points out the priority areas of new data initiatives in every region. 9. Conclusion The paper has developed a harmonized, data-driven methodology of multimodal freight-flow modeling, including Trip Generation, Trip Distribution, Mode Choice, and Route Assignment, based on a detailed, cross-regional inventory of nine fundamental data types. We have built a transparent master metadata spreadsheet, and given clear guidance on harmonization, crosswalking, spatial allocation, and imputation workflows by systematically identifying and evaluating socioeconomic-demographic, commodity, network, port, trade, flow, geographical, regulatory, and fleet data sets across the United States, European Union, and China. The master metadata spreadsheet will be made publicly available via GitHub https://github.com/haroonbaloch770/Master-Metadata-Spreadsheet-for-Multimodal-Freight-Databases-in-the-US-EU-and-China , ensuring transparency and enabling broader reuse by the research community. The four-step alignment (Section 5) states clearly which data drives each model step, and the algorithmic recipes (Section 6) give reproducible methods of processing heterogeneous inputs. The resulting mapping of data to decision provides an effective guide to practitioners to compile, test, and update freight models in a variety of settings without the complication of scattered/misplaced data or undocumented transformations. The main contributions of this work were the development of a global-regional metadata catalog, which standardized more than 50 datasets across three large economies into common fields of spatial/temporal coverage, variables, access methods, formats, and cost, and a refined assessment of completeness, timeliness, and accuracy. The clear correspondence between the data categories and the four stages of the modeling process eliminates methodological uncertainty and avoids misallocation of data, making sure that each dataset is utilized in the area where its informational value is the most significant. The reprojection and topology repair (Algorithm 1), temporal alignment and lag documentation (Algorithm 2), commodity-code crosswalking (Algorithm 3), zone‐to‐zone allocation (Algorithm 4), and principled imputation of missing data (Algorithm 5) common integration challenges we have developed as our best‐practice workflows can be implemented in open‐source GIS and statistical environments and used to provide a reproducible blueprint. Although progress has been made, there are still major challenges and data gaps, which indicates that further efforts should be put into the improvement of data collection, documentation, and distribution practices. Research efforts in the future must focus on the creation of universally applicable international metadata standards, improved machine-readable documentation, consistent versioning, and stable means of distributing data via API. Moreover, it is encouraged to work on addressing the most severe gaps in spatial granularity, temporal frequency, modal and commodity specificity, and be able to overcome the challenges linked to integrating regulatory and policy datasets. Further improvement of global data interoperability and analytical robustness would be achieved by establishing open, federated freight-data portals and by international cooperation to harmonise classification schemes. Finally, through a rigorous method of documenting available datasets, the explicit definition of integration and accessibility issues, and the outline of practical directions to a less fragmented and more transparent global data infrastructure, the paper contributes to the joint effort of generating reproducible, comparable, and policy-relevant freight transportation modeling. A further commitment to open data, solid documentation, and cross-border collaboration is a key to more sustainable, efficient and resilient international freight systems. Declarations Author Contribution Conceptualization, Writing - Original draft: M. Haroon, G. Shen; Data curation, formal analysis, methodology, and software: M. Haroon; Validation, supervision: G. Shen; writing - review & editing: G. Shen, J. Zhao, P.Zhu. Data Availability The master metadata spreadsheet supporting this study is openly available in GitHub at [https://github.com/haroonbaloch770/Master-Metadata-Spreadsheet-for-Multimodal-Freight-Databases-in-the-US-EU-and-China](https:/github.com/haroonbaloch770/Master-Metadata-Spreadsheet-for-Multimodal-Freight-Databases-in-the-US-EU-and-China) . All other datasets analyzed are publicly available from the sources cited in the manuscript. References Abate, M.: Determinants of capacity utilisation in road freight transportation. J. Transp. Econ. Policy (JTEP). 48 , 137–152 (2014) Abzalov, M.: Methodology of the mineral resource classification. Appl. Min. Geol., 355–363 (2016) Affum, J., Brown, A., Chan, Y.: Integrating air pollution modelling with scenario testing in road transport planning: The TRAEMS approach. Sci. Total Environ. 312 , 1–14 (2003) Albert, A., Schaefer, A.: Demand for freight transportation in the US: A high-level view, (2013) Anderson, M.D., Harris, G.A., Harrison, K.: Using aggregated federal data to model freight in a medium-sized community. Transp. Res. Rec. 2174 , 39–43 (2010) Baggag, A., Abbar, S., Zanouda, T., Srivastava, J.: Resilience analytics: Coverage and robustness in multi-modal transportation networks. EPJ Data Sci. 7 , (2018) Baldacci, R., Toth, P., Vigo, D.: Exact algorithms for routing problems under vehicle capacity constraints. Ann. Oper. Res. 175 , 213–245 (2009). https://doi.org/10.1007/s10479-009-0650-0 Beuthe, M., Jourquin, B., Geerts, J.-F., Ndjang, C.K.: Freight transportation demand elasticities: A geographic multimodal transportation network analysis. Transp. Res. E. 37 , 253–266 (2001) Browne, T., Tran, T.T., Veitch, B., Smith, D., Khan, F., Taylor, R.: A method for evaluating operational implications of regulatory constraints on Arctic shipping. Mar. Policy. 135 (2022). https://doi.org/10.1016/j.marpol.2021.104839 Butenko, Y., Tarnopolskyi, Y., Saliuta, V., Melnyk, D., Olishevskyi, V.: A pplication of GIS in Establishing (Changing) the Boundaries of Administrative-Territorial Units on the Example of Rural Settlements, 1–5 (2024) Curtin, K.M.: Network Analysis in Geographic Information Science: Review, Assessment, and Projections. Cartography Geographic Inform. Sci. 34 , 103–111 (2007). https://doi.org/10.1559/152304007781002163 De Langen, P.W., Sharypova, K.: Intermodal connectivity as a port performance indicator. Res. Transp. Bus. Manage. 8 , 97–102 (2013) Donnelly, W.A.: International and domestic product classifications. USITC Office of Economics Working Paper, 99–03 (1999) Dopilka, V.O., Balobanov, O.O.: Peculiarities of legal regulation of transportation of dangerous goods. Uzhhorod Natl. Univ. Herald Series: Law. 4 , 202–208 (2024). https://doi.org/10.24144/2307-3322.2024.85.4.29 Dwarakish, G.: Measuring port performance and productivity. ISH J. Hydraulic Eng. 26 , (2020) Eisele, W.L., Schrank, D.L., Bittner, J., Larson, G.: Incorporating Urban-Area Truck Freight Value into the Urban Mobility Report. Transp. Res. Rec. 2378 , 54–64 (2013) Elghazaly, G., Frank, R., Harvey, S., Safko, S.: High-definition maps: Comprehensive survey, challenges, and future perspectives. IEEE Open. J. Intell. Transp. Syst. 4 , 527–550 (2023) Fialkoff, M.R., Omitaomu, O.A., Peterson, S.K., Tuttle, M.A.: Using geographic information science to evaluate legal restrictions on freight transportation routing in disruptive scenarios. International J. Crit. Infrastructure Prot. 17 , 60–74 (2017). https://doi.org/10.1016/j.ijcip.2016.12.001 Field, M.A.: Certification of performance-based standards for truck size and weight limits: Implementation considerations and enforcement issues. Transp. Res. Rec. 1763 , 73–79 (2001) Furlonge, R.J.: A Probability Bias Model of Trip Distribution. International Conference on Transportation and Development, 258–268 (2019). https://doi.org/10.1061/9780784482575.025 Gingerich, K., Maoh, H., Anderson, W.: Characterization of International Origin–Destination Truck Movements Across Two Major U.S.–Canadian Border Crossings. Transp. Res. Record: J. Transp. Res. Board. 2547 , 1–10 (2016). https://doi.org/10.3141/2547-01 González, A., Bradford, M., Chis, A.E., González–Vélez, H.: Standardised Versioning of Datasets: A FAIR–compliant Proposal. Sci. Data. 11 (2024). https://doi.org/10.1038/s41597-024-03153-y Gorman, M.F., Clarke, J.-P., de Koster, R., Hewitt, M., Roy, D., Zhang, M.: Emerging practices and research issues for big data analytics in freight transportation. Maritime Econ. Logistics. 25 , 28–60 (2023) Güler, H.: An empirical modelling framework for forecasting freight transportation. Transport. 29 , 185–194 (2014) Guo, F., Aultman, L.: A zone design methodology for national freight origin–destination data and transportation modeling. Transp. Plann. Technol. 37 , 738–756 (2014) He, Z., Navneet, K., van Dam, W., Van Mieghem, P.: Robustness assessment of multimodal freight transport networks. Reliab. Eng. Syst. Saf. 207 , (2021) Holguín, J., Thorson, E.: Trip Length Distributions in Commodity-Based and Trip-Based Freight Demand Modeling: Investigation of Relationships. Transp. Res. Record: J. Transp. Res. Board. 1707 , 37–48 (2000). https://doi.org/10.3141/1707-05 Illemann, T., Karam, A., Hegner Reinau, K.: Towards sharing data of private freight companies with public policy makers: A proposed framework for identifying uses of the shared data. 132–136 (2021) Jahan, N., Zhou, Y.: Covid-19 and digital inclusion: Impact on employment. J. Digit. Econ. 2 , 190–203 (2023). https://doi.org/10.1016/j.jdec.2024.01.003 Jetlund, K., Onstein, E., Huang, L.: Information Exchange between GIS and Geospatial ITS Databases Based on a Generic Model. ISPRS Int. J. Geo-Information. 8 (2019). https://doi.org/10.3390/ijgi8030141 Jiang, X., Shan, X., Du, M.: Modeling Network Capacity for Urban Multimodal Transportation Applications. J. Adv. Transp. 1–22 (2022). https://doi.org/10.1155/2022/6034369 Kawasaki, T., Namba, Y., Oka, H., Dulebenets, M.A.: Freight trip distribution using spatiotemporal aggregate data: A modified collective flow diffusion model-based approach. Transp. Res. Interdisciplinary Perspect. 21 (2023). https://doi.org/10.1016/j.trip.2023.100904 Kessler, R.: Whitepaper: Practical challenges for researchers in data sharing. Learn. Publish. 31 , (2018) Kirby, R.F.: A Preferencing Model for Trip Distribution. Transport. Sci. 4 , 1–35 (1970). https://doi.org/10.1287/trsc.4.1.1 Kumar, D., Bisht, N.: Does employment status determine household consumption pattern in India: An analysis through dependency approach. Indian J. Econ. Dev. 16 , 547–558 (2020) LI, Y., CREATING STREET NETWORK, DATASETS IN GEOGRAPHICAL COORDINATES BY USING MAP IMAGES AND DIGITAL MAP DATA:. Proceedings of the Japan Society of Civil Engineers D 62, 121–130 (2006). https://doi.org/10.2208/jscejd.62.121 Liao, R., Liu, W., Yuan, Y.: Resilience Improvement and Risk Management of Multimodal Transport Logistics in the Post–COVID-19 Era: The Case of TIR-Based Sea–Road Multimodal Transport Logistics. Sustainability 15 , (2023) Liu, X., Herold, M.: Population estimation and interpolation using remote sensing. Urban Remote Sens., 269–290 (2007) Lunetta, R., Johnson, D.M., Lyon, J.G., Crotwell, J.: Impacts of imagery temporal frequency on land-cover change detection monitoring. 89 , (2004) Mahajan, V., Kuehnel, N., Intzevidou, A., Cantelmo, G., Moeckel, R., Antoniou, C.: Data to the people: A review of public and proprietary data for transport models. Transp. Reviews. 42 , 415–440 (2021). https://doi.org/10.1080/01441647.2021.1977414 Mahmoudabadi, A., Mahmoudabadi, M.: Developing a two-stage procedure for Estimating Origin-Destination matrix based on routes and traffic volumes. 1–6 (2015) Moreno, J.O., Tutrone, J.D.: Hazardous Materials Transportation, pp. 1–29. Kirk-Othmer Encyclopedia of Chemical Technology (2000) Moschovou, T., Vlahogianni, E., Rentziou, A.: Challenges for data sharing in freight transport. Adv. Transp. Stud. 48 , 141–152 (2019) Nenavath, S.: Does transportation infrastructure impact economic growth in India? J. Facilities Manage. 21 , 1–15 (2023) Nocera, F., Gardoni, P.: Digital Twins or Equivalent Infrastructure Models? The Role of Modeling Granularity in Regional Risk Analysis of Infrastructure. 3–7 (2023) Nocera, S., Pungillo, G., Bruzzone, F.: How to evaluate and plan the freight-passengers first-last mile. Transp. Policy. 113 , 56–66 (2021) Osman, R., Qutayan, S.M.S.B.: Overcoming Data Fabrication in Scientific Research. J. Sci. Technol. Innov. Policy. 9 , 26–31 (2023) Paipuri, M., Barmpounakis, E., Geroliminis, N., Leclercq, L.: Empirical observations of multi-modal network-level models: Insights from the pNEUMA experiment. Transp. Res. Part. C: Emerg. Technol. 131 , (2021) Pasquale, C., Siri, E., Sacone, S., Siri, S.: A modeling framework for passengers and freight in large-scale multi-modal transport networks. 29th Mediterranean Conference on Control and Automation (MED), 681–686 (2021) Pidgrushnyi, G., Marushchynets, A., Ishchenko, Y.: Kyiv metropolitan area: The problems of formation, composition and boundaries. Український Географічний Журнал. 4 , 47–56 (2021) Ponte, B., Puche, J., Rosillo, R., de la Fuente, D.: The effects of quantity discounts on supply chain performance: Looking through the Bullwhip lens. Transportation Res. Part. E: Logistics Transp. Rev. 143 (2020). https://doi.org/10.1016/j.tre.2020.102094 Rezzouqi, H., Naja, A., Sbihi, N., Ghogho, M.: Traffic Counts-based Origin-Destination Matrix Estimation using a Traffic Simulator and Machine Learning. Int. Wirel. Commun. Mob. Comput., 729–734 (2024) Saeedi, R., Sankaranarayanasamy, M., Vishwakarma, R., Singh, P., Vennelakanti, R.: Towards modular modeling and analytic for multi-modal transportation networks. IEEE International Conference on Big Data, 2426–2432 (2020) Safdar, M., Zhong, M., Ren, Z., Hunt, J.D.: An Integrated Framework for Estimating Origins and Destinations of Multimodal Multi-Commodity Import and Export Flows Using Multisource Data. Systems. 12 , 406 (2024). https://doi.org/10.3390/systems12100406 Sharath, R., Nirupam, K., Sowmya, B.: KG, S.: Data analytics to predict the income and economic hierarchy on census data. International Conference on Computation System and Information Technology for Sustainable Solutions, 249–254 (2016) Shen, G.: GIS-based analysis of US international seaborne trade flows. In: Ducruet, C. (ed.) 1) Advances in shipping data analysis and modeling tracking and mapping maritime flows in the age of big data, pp. 147–172. Routledge (2017) Shen, G., Yan, X., Zhou, L., Wang, Z.: Visualizing the USA’s Maritime Freight Flows Using DM, LP, and AON in GIS. Int. J. Geo-Information. 9 , 286 (2020). https://doi.org/10.3390/ijgi9050286 Shen, G., Zhou, L., Aydin, S.G.: A multi-level spatial-temporal model for freight movement: The case of manufactured goods flows on the U.S. highway networks. J. Transp. Geogr. 88 , 102868 (2020). https://doi.org/10.1016/j.jtrangeo.2020.102868 Stávková, J., Souček, M., Birčiaková, N.: Income situation of households as a social status indicator. Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis. 61 , 124 (2013) Stillwell, J., Hayes, J., Dymond, R., Reid, J., Duke, O., Dennett, A., Wathan, J.: Access to UK census data for spatial analysis: Towards an integrated census support service. In Planning Support Systems for Sustainable Urban Development, 329–348 (2013) Sun, S., Gu, M., Ou, J., Li, Z., Luan, S.: Regional truck travel characteristics analysis and freight volume estimation: Support for the sustainable development of freight. Sustainability 16 , (2024) Tian, D., Huang, L., Huang, C.: The Impact of Port Infrastructure on Port Handling Capacity in China. 1–4 (2009) Titschack, J., Baum, D., Matsuyama, K., Boos, K., Färber, C., Kahl, W.-A., Ehrig, K., Meinel, D., Soriano, C., Stock, S.R.: Ambient occlusion–A powerful algorithm to segment shell and skeletal intrapores in computed tomography data. Comput. Geosci. 115 , 75–87 (2018) Tjandra, S., Kraus, S., Ishmam, S., Grube, T., Linßen, J., May, J., Stolten, D.: Model-based analysis of future global transport demand. Transp. Res. Interdisciplinary Perspect. 23 , 101016 (2024) Tongzon, J.L.: Port choice and freight forwarders. Transp. Res. E. 45 , 186–195 (2009). https://doi.org/10.1016/j.tre.2008.02.004 Voytov, S.: Commodities clasification as an instrument of customs-tariff regulation: The aspect of definition and control. Актуальні Проблеми Економіки. 147 , 42–48 (2013) Wagner, J.: Germany’s trade in goods: A survey of the evidence from transaction data. AStA Wirtschafts-Und Sozialstatistisches Archiv. 12 , 69–82 (2018) Wang, C., Xia, Y., Shen, H.-L.: Routing and congestion in multi-modal transportation networks. Int. J. Mod. Phys. C 34 , (2023) Wang, Y.-H., Zhang, H.-B., Xu, J.: Int. Archives Photogrammetry Remote Sens. Spat. Inform. Sci. 175–179 (2015). https://doi.org/10.5194/isprsarchives-xl-7-w4-175-2015 A Survey of Appliactions and Researches on Schema Matching between GIS Spatial Data Wang, Z., Yu, Z.: Trading Partners, Traded Products and Firm Performances of China’s ExporterImporters, pp. 165–193. Does Processing Trade Make a Difference? The World Economy (2013) Wei, G., Gundlegård, D., Rydergren, C.: Consistent origin-destination and link flow estimation based on data-driven network assignment. Transp. Res. Procedia. 86 , 668–675 (2025). https://doi.org/10.1016/j.trpro.2025.04.083 Wulder, M.A., Hermosilla, T., White, J.C., Hobart, G., Masek, J.G.: Augmenting Landsat time series with Harmonized Landsat Sentinel-2 data products: Assessment of spectral correspondence. 4 (2021) Xie, X., Xu, Y., Feng, B., Wu, W.: Multiscale urban functional zone recognition based on landmark semantic constraints. ISPRS Int. J. Geo-Information. 13 , 95 (2024) XINCHANG, W.: Port Hinterland Estimation and Optimization for Intermodal Freight Transportation Networks. (2010) Xu, S.J., Chow, J.Y.J.: Online Route Choice Modeling for Mobility-as-a-Service Networks With Non-Separable, Congestible Link Capacity Effects. IEEE Trans. Intell. Transp. Syst. 23 , 11518–11527 (2022). https://doi.org/10.1109/tits.2021.3105230 Zhou, Z., Chen, A., Wong, S.C.: Alternative formulations of a combined trip generation, trip distribution, modal split, and trip assignment model. Eur. J. Oper. Res. 198 , 129–138 (2009). https://doi.org/10.1016/j.ejor.2008.07.041 Zhu, J.-X., Luo, Q.-Y., Guan, X.-Y., Yang, J.-L., Bing, X.: A Traffic Assignment Approach for Multi-Modal Transportation Networks Considering Capacity Constraints and Route Correlations. IEEE Access. 8 , 158862–158874 (2020). https://doi.org/10.1109/access.2020.3019301 Additional Declarations No competing interests reported. Supplementary Files AppendixA.docx Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 12 Aug, 2025 Editor assigned by journal 12 Aug, 2025 Submission checks completed at journal 05 Aug, 2025 First submitted to journal 04 Aug, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7288048","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":499479809,"identity":"eb026461-9e3f-44a7-b4cb-1221190e6c2f","order_by":0,"name":"Muhammad Haroon","email":"","orcid":"","institution":"Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Muhammad","middleName":"","lastName":"Haroon","suffix":""},{"id":499479812,"identity":"365724c9-b5fa-4d1a-b560-3e3074b7ae69","order_by":1,"name":"Guoqiang Shen","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA40lEQVRIie2RMQrCQBBFRxaSZiTtSBCvsLIQAx7CK2wIaBPEMoXogpA2rffwAhsEqxVbe8FasBTEmMLOGDvBfTDwB+bxiwGwWH4SLCcl5gHIMrGmigndjvpKaWWpx3W1NVB6o31xmjnki4M+E6TDSLl7Xav0j9NYrJFEoPWYwEwihVNZr6wx8JEoDgo1pla2jRQhr1dyM7ghp+VmBaVyb6D0IAkYSmLceSqqgcIpET5qYmQgDuVuIjJMPrTkpn9t3xfMy010vMyH3dw1H1r0K6KsnunU3j9b1Cu6+u2VxWKx/DcPmVE9TpHPMKkAAAAASUVORK5CYII=","orcid":"","institution":"Zhejiang University","correspondingAuthor":true,"prefix":"","firstName":"Guoqiang","middleName":"","lastName":"Shen","suffix":""},{"id":499479813,"identity":"4642420c-20b0-49f1-b6be-63a0b75616b6","order_by":2,"name":"Jinghua Zhao","email":"","orcid":"","institution":"Massachusetts Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Jinghua","middleName":"","lastName":"Zhao","suffix":""},{"id":499479815,"identity":"0fb9b73e-5fee-4c06-a71e-ddbc86d9c1da","order_by":3,"name":"Pengyu Zhu","email":"","orcid":"","institution":"Hong Kong University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Pengyu","middleName":"","lastName":"Zhu","suffix":""}],"badges":[],"createdAt":"2025-08-04 07:23:23","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7288048/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7288048/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":95863944,"identity":"cd4a268b-ec9b-4193-9c3e-56f944ceb496","added_by":"auto","created_at":"2025-11-13 18:44:08","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":223222,"visible":true,"origin":"","legend":"\u003cp\u003eConceptual framework of the study\u003c/p\u003e","description":"","filename":"Picture1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7288048/v1/af4a50df7f94b363e673ae16.jpg"},{"id":95863951,"identity":"2019c1bf-2f0f-46a8-9da8-3dc414c38efd","added_by":"auto","created_at":"2025-11-13 18:44:10","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":404171,"visible":true,"origin":"","legend":"\u003cp\u003eRoute network global to local\u003c/p\u003e","description":"","filename":"Picture2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7288048/v1/24f008f2fc0ef3caf9805dee.jpg"},{"id":95863948,"identity":"8135577c-f99c-40fc-999d-1e6336957dce","added_by":"auto","created_at":"2025-11-13 18:44:08","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":318609,"visible":true,"origin":"","legend":"\u003cp\u003eWorld Sea ports and Air ports\u003c/p\u003e","description":"","filename":"Picture3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7288048/v1/575ef6f9da6c75d7ed23bcdd.jpg"},{"id":96240659,"identity":"ac0d575c-4bb7-43a9-9b49-6a13fa203c1d","added_by":"auto","created_at":"2025-11-19 07:09:19","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":46665,"visible":true,"origin":"","legend":"\u003cp\u003eGlobal to local freight flow 2018-2023\u003c/p\u003e\n\u003cp\u003eSource: Bureau of Transportation Statistics - FAF 5.6.1\u003c/p\u003e","description":"","filename":"Picture4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7288048/v1/dee25265643e135064913206.jpg"},{"id":96241764,"identity":"f18b0de6-10a8-45eb-a8e2-ef3466ca9f14","added_by":"auto","created_at":"2025-11-19 07:11:19","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":139498,"visible":true,"origin":"","legend":"\u003cp\u003eComparative dataset coverage across regions\u003c/p\u003e","description":"","filename":"Picture5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7288048/v1/676da488c1ddf0fd6ded98a2.jpg"},{"id":95863947,"identity":"3e5954d5-2bff-4855-b12b-354caf91c2a6","added_by":"auto","created_at":"2025-11-13 18:44:08","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":30583,"visible":true,"origin":"","legend":"\u003cp\u003eUpdate frequency by region and data category\u003c/p\u003e","description":"","filename":"Picture6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7288048/v1/b533000b9d33ef83569208f4.jpg"},{"id":95863949,"identity":"a38eb856-dde3-41c8-a1c7-192a93247e98","added_by":"auto","created_at":"2025-11-13 18:44:08","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":125028,"visible":true,"origin":"","legend":"\u003cp\u003eData gaps \u0026amp; future needs by region\u003c/p\u003e","description":"","filename":"Picture7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7288048/v1/4dbc7748a344c379b37a0ef3.jpg"},{"id":96255038,"identity":"a37034d2-8f22-4e6b-8d6b-3ff9c6296236","added_by":"auto","created_at":"2025-11-19 07:47:28","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3074591,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7288048/v1/080cc6a0-eb7e-473a-ba9d-d9985e0d5c2c.pdf"},{"id":95863945,"identity":"85ef589a-71d8-48be-acb2-c693f5024ec4","added_by":"auto","created_at":"2025-11-13 18:44:08","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":38794,"visible":true,"origin":"","legend":"","description":"","filename":"AppendixA.docx","url":"https://assets-eu.researchsquare.com/files/rs-7288048/v1/84c770811c89e6f4f75a52fb.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"On Multimodal Freight Databases for Scalable Global-Local Transport Research and Applications","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eFreight transportation systems that are efficient, robust, and responsive are determinants of economic vibrancy and resiliency of regions across the world (He et al. \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Nenavath \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Global trade, regional integration and sustainable economic growth depend on the smooth flow of goods through complex multimodal networks (Liao et al. \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Freight research revolves around better understanding and modeling the freight movement on highway networks to support local transportation, land use, economic development, and comprehensive planning (Shen et al. \u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Freight transportation remains heavily constrained by the issue of data fragmentation; whereby different datasets are classified differently, have incompatible spatial scales, are not updated on the same frequency and are isolated to different governmental and organizational bodies (Albert and Schaefer \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Anderson et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2010\u003c/span\u003e; Illemann et al. \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Moschovou et al. \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). This fragmentation greatly limits the capacity of planners, policymakers and researchers to properly model freight flows, locate infrastructure bottlenecks, and evaluate the effects of transportation policies in an integrated and coherent environment.\u003c/p\u003e\u003cp\u003eToday, the data of interest to freight transportation is distributed among several administrative jurisdictions, including the U.S., the European Union (EU), and China, which have different data collection, management, and distribution strategies. The United States has various agencies that maintain key datasets such as the Bureau of Transportation Statistics (BTS), U.S. Census Bureau, Federal Highway Administration (FHWA), Energy Information Administration (EIA) among others. EU sources of freight-related information are mostly Eurostat and other Directorate-General websites (e.g. DG MOVE), with additional commercial and national databases. similarly, data sources of freight in China are predominantly maintained by the National Bureau of Statistics (NBS), Ministry of Transport (MOT), and the General Administration of Customs with each using different spatial definitions, coding schemes, and frequencies of update. Although large volumes of data are available in these regions, they are heterogeneous and thus integration is a major problem. The spatial granularity of freight data is extremely diverse, ranging over national and regional levels (e.g., states in the U.S., NUTS regions in the EU, provinces in China) to counties, municipalities, or even traffic analysis zones (TAZ) (Guo and Aultman \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). Data update frequencies are also frequently widely variable, ranging between almost real-time tracking of vessels and trucks to decennial censuses and annual or quinquennial economic surveys, which complicates the harmonization of data across datasets (Gorman et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Commodity classification also varies widely across jurisdictions and data sets, including between conversions to and from Harmonized System (HS), Standard Classification of Transported Goods (SCTG), North American Industry Classification System (NAICS) and European Combined Nomenclature (CN) and Chinese customs codes (Donnelly \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e1999\u003c/span\u003e). Lack of uniformity in commodity and geographic crosswalks has proved to be a continuous impediment to the strong freight-flow modeling.\u003c/p\u003e\u003cp\u003eThe implication of such a global challenge is very wide. In the absence of harmonized, integrated and comprehensive data, the process of freight-flow modeling can be cumbersome, methodologically inconsistent and difficult to replicate. Scientists and researchers often have to struggle with incompatible data formats, inaccessible or insufficiently documented data, and inconsistent coverage of the spatial or temporal extent of data (Kessler \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Osman and Qutayan \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Titschack et al. \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). In addition, regulatory and policy data, which are essential in modeling freight transport because they affect allowable load limits and routing restrictions, and mode choice, are not readily available in standardized, machine-readable formats, and therefore, simulations and scenario tests are not practical.\u003c/p\u003e\u003cp\u003eWith these ongoing issues, this paper presents an in-depth answer to the problem by systematically cataloguing, classifying, and analyzing significant freight-related data in the U.S., EU, and China. The core of the work is the combination of these datasets into a longer, but obviously simplified conceptual framework of the classical four-step freight modeling process (Trip Generation Trip Distribution Mode Choice Route Assignment). This framework is an organizing tool more than an analytical destination, and it explicitly shows how various data can be used to feed particular decision points in freight-flow analysis.\u003c/p\u003e\u003cp\u003eThe remainder of this paper is organized as follows. Section 2 explains the research study design and methodology applied in gaining identification, screening and evaluation of freight-related dataset in the U.S., EU, and China, i.e. classification schemes as well as data-quality standard. Section 3 presents data inventory and classification. Section 4 is a cataloged and assessed list of nine data categories (socioeconomic, commodity, network, ports, trade, flow, geographical, regulatory, and transportation means) available per region with their main metadata. Section 5 corresponds these data sources to the four steps of modeling with noted overlap and multi uses. Section 6 provides best practice and guidelines on how to integrate space, time, codes, effective zones. Section 7 describe about the data sharing, accessibility and reproducibility. Section 8 lists the current gaps in data and future priorities in the development of data. Finally, Section 9 is a summary of the contributions and proposed further research. The metadata tables are presented at the appendices.\u003c/p\u003e"},{"header":"2. Methodology","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e2.1 Conceptual framework\u003c/h2\u003e\u003cp\u003eThe conceptual model of the current work is a continuation of the traditional four-stage model of Trip Generation, Trip Distribution, Mode Choice, and Route Assignment as a framework support of world data catalogue. Here we begin with the basic decision steps on which the freight demand modeling and network loading is founded and incrementally build up with four interconnected modules, (Data Inventory and Classification, Integration and Harmonization Workflows, the Four-Step Modeling Framework, and Data Sharing, Accessibility and Reproducibility), to produce a solid end-to-end system. Nine categories of data are superimposed about this axis of sequential decisions\u0026ndash;socioeconomic-demographic, commodity/goods, multi‐modal network, ports, trade, flow, geographical, regulation/code, and transportation means (each providing specific inputs or limitations to one or more stages). It uses spatial and classification crosswalks (e.g. NUTS \u0026harr; FIPS \u0026harr; provincial IDs; HS \u0026harr; SCTG \u0026harr; NAICS) thus enabling the different data architectures of the United States, European Union and China to be used together. A top-level data to decision mapping is established to identify which categories feed which modeling stage and feedback loops (Affum et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2003\u003c/span\u003e; Elghazaly et al. \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) such as using results of the route-assignment to refine impedance functions and mode-choice parameters are identified to allow dynamic model refinement. Finally, sharing practices, such as stable APIs, machine‐readable metadata, version control, and open licensing, makes such that each dataset, transformation, and scenario is transparent, traceable, and easily reusable (see Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Such a multi-region, standardized design ensures that modelers can be able to trace all their parameters to reported sources, compare the methodological choices between jurisdictions, and can iteratively revise their models as new data become available.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e2.2 Systematic data identification\u003c/h2\u003e\u003cp\u003eA three-fold approach was used to develop a coverage of comprehensive inventory of freight-related datasets involving the United States, European Union, and China. We started by browsing the official websites and APIs of the major statistical and transportation agencies of every region. This was the U.S. Census Bureau (Census API for ACS and Decennial data), Bureau of Economic Analysis, Bureau of Labor Statistics, Bureau of Transportation Statistics (NTAD, CFS, FAF), Federal Highway Administration (HPMS), U.S. Department of Agriculture (QuickStats API), U.S. Army Corps of Engineers (WCSC), Energy Information Administration, FMCSA (Safety Measurement System), PHMSA (NPMS), and FAA (NFDC) in the United States. In the case of the European Union, we used Eurostat REST API and GISCO portal (NUTS regions, COMEXT, External Trade), EuroGeographics EuroGlobalMap and DG MOVE TRANSTOOLS resources. In China, we used National Bureau of Statistics (Statistical Yearbooks API), General Administration of Customs (HS trade data) and the GIS and annual reports of the Ministry of Transport.\u003c/p\u003e\u003cp\u003eSecond, targeted keywords of transportation journals and conference proceedings (e.g., \u0026ldquo;freight data sources\u0026rdquo;, \u0026ldquo;Commodity Flow Survey validation\u0026rdquo;, \u0026ldquo;Eurostat COMEXT assessment\u0026rdquo;) were used to find peer-reviewed assessments and other repositories. Third, each portal user guide, metadata dictionary and API documentation were systematically mined (including spatial/temporal coverage, classification schemes, update frequencies, access techniques and known limitations). This close-to-the-source process produced a standardized list of candidates in nine categories of data, the foundation of the metadata catalog and analysis of its quality.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003e2.3 Inclusion/Exclusion Criteria\u003c/h2\u003e\u003cp\u003eIn order to ensure a consistent and high-quality inventory of data at the United States, European Union, and Chinese scale, we used four objective inclusion/exclusion criteria: (1) Spatial Coverage: only datasets with full national or regional coverage, i.e. covering over 50 U.S. states (plus D.C.), all EU member states at NUTS levels 02, or all Chinese provinces, were included; sub-national sources without clear aggregation pathways were catalogued separately but not included in the core framework; (2) Temporal Coverage: sources had to cover the 2019\u0026ndash;2024 period or follow a documented update schedule (annual, quarterly or quinquennial) with any time series missing more than one reporting period excluded unless an explicit justification could be given and documented; (3) Documentation Quality: only sources with formal metadata dictionaries or user guides (including definitions of variables, units, classification schemes and methodological notes) were retained, and undocumented or poorly documented data were omitted; and (4) Cost and Accessibility: freely available or institution licensed data were favored; pay-per-download products were mentioned but not considered unless no free equivalent was available, and subscription only sites were only considered where academic or governmental access conditions applied.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\u003ch2\u003e2.4 Classification \u0026amp; Crosswalk Approach\u003c/h2\u003e\u003cp\u003eTo be able to compare across jurisdictions, we developed and implemented two concurrent cross walks: one to map commodity classifications and the other to harmonize spatial units. (1) Mapping of Commodity Code: The basis was the conventional conversion tables- HS \u0026harr; SCTG and SCTG \u0026harr; NAICS of the Bureau of Transportation Statistics (BTS) and the U.S. Census Bureau, HS6 \u0026harr; CN8 of Eurostat, HS equivalents of China Customs. These tools enabled us to convert import/export and production data to a common six-digit SCTG code schema. Proportional allocation of one-to-many mappings was done with published factors of share and any unmapped or deprecated codes were recorded to be manually resolved to maintain the consistency; and (2) Spatial Unit Harmonization: During the second step, FIPS and OMB MSA codes at the county level in the U.S., NUTS 02 regions in the European Union, and provincial/county codes in China, were reconciled through master lookup tables and GIS spatial joins. To relate zones at various scales, we used the definitions of MSA by OMB, GISCO boundary files by Eurostat and National Fundamental Geographic Information System by China. Zone centroids were calculated and where possible re-assigned to corresponding higher-level areas or TAZ polygons, so that all socioeconomic, commodity, network and flow datasets are referenced to the same spatial units in further modeling.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e2.5 Data Quality Assessment\u003c/h2\u003e\u003cp\u003eTo provide a solid basis of modeling, each dataset in the catalog was tested against four major dimensions, namely completeness, timeliness, accuracy, and documented limitations. The user guides and metadata of each of the sources were thoroughly read to determine (1) completeness, e.g., whether flows of suppressed CFS or missing HPMS truck-percentage records are tracked; (2) timeliness, e.g., by measuring the lag between the reference year and publication, e.g., CFS 2022 published in mid-2024; (3) accuracy, e.g., distinguishing between survey-based, modeled, or administrative data and noting reported margins of error (e.g., +/-8 percentage points in All these indicators of quality were compiled in a master metadata spreadsheet (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), wherein each dataset had one row and columns to record spatial/temporal coverage, key variables, access methods, update schedules, cost/license, and a summary quality rating. This cross-table becomes the basis of the catalog introduced in Section 4 as well as the basis of the mapping of datasets to modeling stages in Section 5 where users can filter and prioritize sources based on their completeness, latency, and reliability needs.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eUnified metadata fields\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"2\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eColumn Attributes\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eExplanation\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRegion\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eData coverage at specific region\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDataset Name\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eThe official name of the data source (e.g., \u0026ldquo;QCEW\u0026rdquo;, \u0026ldquo;Freight Analysis Framework\u0026rdquo;).\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAgency / Source\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eThe organization responsible for producing and maintaining the data (e.g., U.S. Census Bureau, BTS, FHWA).\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSpatial Coverage\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eThe geographic granularity (e.g., \u0026ldquo;World\u0026rdquo;, \u0026ldquo;Country\u0026rdquo;, \u0026ldquo;State\u0026rdquo;, \u0026ldquo;Province\u0026rdquo;, \u0026ldquo;City\u0026rdquo;, \u0026ldquo;County\u0026rdquo;, \u0026ldquo;MSA\u0026rdquo;, \u0026ldquo;Census Tract\u0026rdquo;).\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTemporal Coverage\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eThe time coverage and frequency (e.g., \u0026ldquo;Annual\u0026rdquo;, \u0026ldquo;Quarterly\u0026rdquo;, \u0026ldquo;Every 5 years\u0026rdquo;, etc.,).\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eKey Variables\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eKey fields or metrics provided (e.g., \u0026ldquo;Population, Employment by NAICS\u0026rdquo;, \u0026ldquo;Tonnage, Value, Modes\u0026rdquo;, \u0026ldquo;Link AADT, Truck%\u0026rdquo;).\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAccess Method\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eWhere and how to obtain the data (e.g., \u0026ldquo;Census API (JSON/CSV)\u0026rdquo;, \u0026ldquo;FHWA FTP\u0026rdquo;, \u0026ldquo;USDA QuickStats API\u0026rdquo;).\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eData Format\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eFile or service formats available (e.g., \u0026ldquo;CSV\u0026rdquo;, \u0026ldquo;XLSX\u0026rdquo;, \u0026ldquo;Shapefile\u0026rdquo;, \u0026ldquo;GeoJSON\u0026rdquo;, \u0026ldquo;GeoData\u0026rdquo;, \u0026ldquo;PDF\u0026rdquo;, \u0026ldquo;TransCAD\u0026rdquo;, \u0026ldquo;XML\u0026rdquo;, etc.).\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCost / License\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eAny fees or licensing requirements (e.g., \u0026ldquo;Free\u0026rdquo;, \u0026ldquo;Registration required\u0026rdquo;, \u0026ldquo;Subscription\u0026rdquo;, \u0026ldquo;License\u0026rdquo;).\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"3. Data inventory and classifications","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003e3.1 Core data categories\u003c/h2\u003e\u003cp\u003eIn order to catalog and assess the vast landscape of freight related datasets in a systematic way we have developed nine broad and mutually exclusive classes that are important to the analysis and modeling of freight flows: (1) \u003cb\u003eSocioeconomic-Demographic Data\u003c/b\u003e, including population distributions, household income, employment data, and economic activity by industry (Kumar and Bisht \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Liu and Herold \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2007\u003c/span\u003e; Sharath et al. \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; St\u0026aacute;vkov\u0026aacute; et al. \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Stillwell et al. \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2013\u003c/span\u003e); (2) \u003cb\u003eCommodity/Goods Data\u003c/b\u003e, providing details on commodity-specific volumes, tonnage, values, and classification structures (Abzalov \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Eisele et al. \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Voytov \u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e2013\u003c/span\u003e); (3) \u003cb\u003eMulti-Modal Network Data\u003c/b\u003e, describing the geometry, capacity, speed, and attributes of roads, railways, waterways, airways, and pipelines (Baggag et al. \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Paipuri et al. \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Pasquale et al. \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Saeedi et al. \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Wang et al. \u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e2023\u003c/span\u003e); (4) \u003cb\u003ePorts Data\u003c/b\u003e, detailing port capacities, throughput, infrastructure, and intermodal connectivity (De Langen and Sharypova \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Dwarakish \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Tian et al. \u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e2009\u003c/span\u003e); (5) \u003cb\u003eTrade Data\u003c/b\u003e, capturing international import-export transactions, commodity types, and partner-country flows (Wagner \u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Wang and Yu \u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e2013\u003c/span\u003e); (6) \u003cb\u003eFlow Databases\u003c/b\u003e, offering origin-destination matrices and observed traffic volumes to calibrate and validate freight models (G\u0026uuml;ler \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Mahmoudabadi and Mahmoudabadi \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Rezzouqi et al. \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2024\u003c/span\u003e); (7) \u003cb\u003eGeographical Data\u003c/b\u003e, outlining administrative boundaries, metropolitan areas, Traffic Analysis Zones (TAZ), and related geographic references (Butenko et al. \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Pidgrushnyi et al. \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Xie et al. \u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e2024\u003c/span\u003e); (8) \u003cb\u003eRegulation/Code Data\u003c/b\u003e, including rules, standards, and restrictions affecting freight transport such as vehicle weight limits, hazardous materials regulations, and routing restrictions (Field \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2001\u003c/span\u003e; Moreno and Tutrone \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2000\u003c/span\u003e); and (9) \u003cb\u003eTransportation Means Data\u003c/b\u003e, describing the characteristics and capacities of freight transport vehicles across modes (Abate \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Beuthe et al. \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2001\u003c/span\u003e; Nocera et al. \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Sun et al. \u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Tjandra et al. \u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\u003ch2\u003e3.2 Alternative Data Grouping Schemes\u003c/h2\u003e\u003cp\u003eConsidering the complexity and heterogeneity of freight data, other groupings were developed to help both researchers and practitioners to identify appropriate datasets within a short period of time based on particular analytical or application requirements. Three other classification schemas were formulated: (1) \u003cb\u003eSpatial Granularity\u003c/b\u003e: The data were categorized according to their spatial resolution and coverage, i.e., national (e.g., GDP of a country, commodity flows), regional (e.g. states, NUTS regions, Chinese provinces), sub-regional (county and metropolitan statistical areas), and micro (TAZ and city-block). This hierarchical structure of spaces helps the user to choose a dataset applicable in macro-level policy analysis, meso-level infrastructure planning, or micro-level operational modeling (Nocera and Gardoni, \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2023\u003c/span\u003e); (2) \u003cb\u003eTemporal Frequency\u003c/b\u003e: Understanding the essential nature of up-to-date data, we have grouped the datasets in terms of how frequently they are updated, and the range is decennial (e.g., Decennial Census), quinquennial (Commodity Flow Survey), annual (Regional Economic Accounts, Freight Analysis Framework), quarterly (Quarterly Census of Employment and Wages), monthly (port and custom trade statistics), and near real-time (AIS vessel tracking, NPMRDS highway performance data). This temporal grouping enables data integration plans and the determination of appropriate temporal inputs to enable dynamic modeling and monitoring situations (Lunetta et al. \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2004\u003c/span\u003e; Wulder et al. \u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e2021\u003c/span\u003e); and (3) \u003cb\u003eFunctional Role\u003c/b\u003e: We also categorized the datasets based on their functionalities in freight modeling. Generation data are socioeconomic-demographic and commodity data used to inform trip-end generation; Distribution data cover flow matrices, trade and port data, which are useful in OD assignment; Network data describe transport infrastructure and support mode choice and route assignment; and Validation data are observed flow databases, such as FAF, TRANSEARCH, and traffic counts (HPMS, NPMRDS), which are necessary to calibrate and validate the model (Mahajan et al. \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Safdar et al. \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThese alternative classification structures provide users with flexible, purpose-oriented approaches to quickly navigate the expansive data inventory, facilitating rapid identification and prioritization of datasets according to their modeling objectives.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003e3.3 Master Metadata Spreadsheet\u003c/h2\u003e\u003cp\u003eTo efficiently capture, track and allow rapid access to query the large collection of data sets discovered, we developed a master metadata spreadsheet (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) to capture, track and allow rapid access to query the large collection of data sets discovered. All the datasets were listed as separate rows of data, and standardized columns clearly documented their characteristics: Region (U.S., EU, China), Dataset Name, Agency/Source responsible to maintain the dataset, Spatial Coverage (national, regional, county, or TAZ-level), Temporal Coverage and frequency of update, Key Variables (e.g., tonnage, employment figures, network attributes), Access Method (direct download, API, web portal), Data Format (CSV, Shapefile, JSON, Excel), Cost and Licensing requirements (free, subscription, institution\u003c/p\u003e\u003cp\u003eThe spreadsheet with its detailed and organized format has several important purposes: to present a transparent, centralised reference point that allows researchers to easily find appropriate datasets given clear criteria; to serve as a source of systematic evaluation and comparison between datasets and jurisdictions; and as the basis of future updates of datasets, attempts at their integration, and sharing between researchers and modeling practitioners. This inventory dramatically increases the ease of use, reproducibility, and comparability of data to global freight transportation modeling and analysis because dataset metadata are standardized and data characteristics are well documented.\u003c/p\u003e\u003c/div\u003e"},{"header":"4. Data Categories \u0026 Source Catalog","content":"\u003cp\u003eFollowing we provide an overview of the nine fundamental data categories, their application in the four-step model and exemplary data sources in the United States, European Union and China. Appendix A (Tables \u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003eA.1\u003c/span\u003e to \u003cspan refid=\"Tab11\" class=\"InternalRef\"\u003eA.9\u003c/span\u003e) contains full eight-column metadata tables.\u003c/p\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003e4.1 Socioeconomic\u0026ndash;Demographic\u003c/h2\u003e\u003cp\u003eThe Trip Generation is based on socioeconomic and demographic data because it provides zone-level measures of freight generation (e.g. industrial employment, GDP) and attraction (e.g. population, household income) (Jahan and Zhou \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Ponte et al. \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Seven core sources are listed in Table A-\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e in Appendix A, covering the United States, European Union and China. The American Community Survey (ACS) in the U.S. offers five-year rolling estimates of population, income, and NAICS-2 employment at the tract to state levels; the Decennial Census offers a full population and housing count every ten years; the Quarterly Census of Employment and Wages (QCEW) delivers quarterly employment and establishment counts by 6-digit NAICS at the county/MSA level; BEA Regional Economic Accounts report annual GDP and personal income; and National Transportation Statistics at the BTS provides an annual set of socioeconomic profiles at national to The Eurostat NUTS Regional Statistics provide annual population, NACE-based employment and GDP of NUTS 0\u0026ndash;3 units, whereas Chinese Regional Yearbooks provide population, GDP and sectoral employment at the province and municipality level.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003e4.2 Commodity/Goods\u003c/h2\u003e\u003cp\u003eThe commodity and goods data have two purposes: they provide volumes of production and consumption to the Trip Generation and provide observed flow values to the Trip Distribution (Holgu\u0026iacute;n and Thorson \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2000\u003c/span\u003e; Kawasaki et al. \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Zhou et al. \u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e2009\u003c/span\u003e). Appendix A Table A-\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e lists six major sources. In the United States the Commodity Flow Survey (CFS) gives quinquennial state- and MSA-level flows by six-digit SCTG code (tons, value, distance band, mode); the Freight Analysis Framework (FAF) gives a complementary five-year snapshot of two-digit SCTG flows (tons, ton-miles, value, mode shares) with spatial products in Excel, CSV, and shapefile formats; the USDA-NASS Crop Production series provides annual county- and ZIP-level estimates of acreage, The COMEXT database of Europe provides monthly and annual import/export tonnages and values of all member countries in CN/HS codes and the China Customs Statistics provides the similar national and provincial HS-coded trade figures. In combination, these sources allow calibrated production of commodity-specific trip ends and origin destination matrices that mirror the actual patterns of freight movement in the real world.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003e4.3 Multi-Modal Network\u003c/h2\u003e\u003cp\u003eMulti-modal network databases give the spatial graph and skim attributes: link geometry, functional class, capacity, and speed necessary to both mode-split analysis and network loading (Jetlund et al. \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Jiang et al. \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2022\u003c/span\u003ea). In the United States, quinquennial highway and rail centerlines with functional class, lane counts, and centerline length are provided by the National Transportation Atlas Database (NTAD); AADT, truck percentages, and pavement data by the Federal Highway Administration HPMS at a state level; segment capacity and speed limits by the AAR Class I Rail GIS (subscription); channel geometries and lock locations by the NTAD Inland Waterways; real-time and historical vessel paths by the NOAA AIS vessel-tracking (MarineCadastre.gov); 202 The pan-continental infrastructure in Europe is made up by the EuroGlobalMap Transportation layers of EuroGeographics and TEN-T GIS corridors. The Ministry of Transport Network GIS of China provides the yearly national highway, waterway, and rail links schematics, speeds, and capacities. These sources together form the foundation of the impedance functions and capacity constraints within Mode Choice models and also form the substrate of Route Assignment within U.S., EU, and Chinese freight networks (see Appendix A, Table A-3).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003e4.4 Ports\u003c/h2\u003e\u003cp\u003eThe data on port throughput and terminals infrastructures are critical to the allocation of international and intermodal freight flows (Trip Distribution) and the evaluation of the appeal of maritime modes (Mode Choice) (Tongzon \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; Xinchang \u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e2010\u003c/span\u003e). Shen et al. (\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) visualized major sea ports with their handling capacities and optimal freight flow in 2D in GIS and in 3D in Google Earth with total and most important goods imported/exported via the maritime networks. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e depicts the ports and their connectivity around the world. Appendix A, Table A-4 lists four major sources in the U.S., EU, and China. The USACE Waterborne Commerce Statistics Center (WCSC) in the United States publishes annual tonnages by commodity group and domestic vs. foreign splits as CSV, PDF and Shapefile files; S\u0026amp;P Global PIERS (Customs Import/Export) reports monthly and annual container TEUs, weights, HS codes and partner country information (subscription required); BTS Port Performance Freight Statistics tracks monthly and annual dry-bulk tonnages and vessel calls by top 150 U.S. ports; and MARAD National Port Plans reports berth lengths, maximum draft, The Eurostat Port Throughput data set of Europe includes quarterly and annual TEUs, bulk tonnages and vessel calls in major EU ports and the Sea Ports compilation of China (sourced through UNCTAD, HDX/OSM, SeaRates and WPI) includes annual container throughput and total tonnage.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003e4.5 Trade\u003c/h2\u003e\u003cp\u003eThe trade databases are used to obtain the statistics of the cross-border flow that are required in allocating freight volumes in Trip Distribution, and they record the monetary and mass flows between origin destination (Gingerich et al. \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Kirby \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e1970\u003c/span\u003e). USA Trade Online (USTO) provides monthly and annual values and weights of imports and exports by HS 6-digits by port and trading partner (subscription portal with free summaries) and TradeStats Express (ITA) provides annual trade values by NAICS or commodity group at national, state, and metropolitan levels. UN Comtrade provides data on the value and quantity of trade between countries on an annual basis broken down by HS code, freely available through an API (API key required) that can also be used as a cross-validation benchmark. The European Union is based on the Eurostat External Trade (COMEXT) for monthly and annual HS-coded import/export tonnages and values. The General Administration of Customs in China makes national and provincial HS 6-digit trade statistics available on its web portal. In combination, this allows us to have the ability to calibrate and validate the OD matrices of international and intermodal freight distributions (see Appendix A, Table A-5).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\u003ch2\u003e4.6 Flow Databases\u003c/h2\u003e\u003cp\u003eFlow databases deliver the observed origin destination matrices and link level performance measures required to calibrate Trip Distribution models and to verify Route Assignment results (Furlonge \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Wei et al. \u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Shen (\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) highlights about the US trade flow at global and local level, important US international trade patterns for top regions is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e. The Freight Analysis Framework (FAF 5.6.1) in the United States provides annual state-to-state and MSA-to-MSA OD tonnage, ton-miles, value, and mode shares in 20072022; TRANSEARCH (IHS Markit) supplies subscription-based annual county-to-county OD flows by commodity group and mode; and the NPMRDS of FHWA provides monthly/daily travel times, speeds, measures of reliability, and implied truck volumes on highway segments. Freight Transport Statistics In the Freight Transport Statistics provided by Eurostat, annual OD tonnage and ton-kilometer flows by mode within EU member states are provided. The provincial OD flows by commodity are reported annually in China Freight Transport Yearbook. The combination of these datasets allows stringent calibration of distribution parameter and empirical verification of route assignments at scales and regions (see Appendix A, Table A-6).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eSource: Bureau of Transportation Statistics - FAF 5.6.1\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\u003ch2\u003e4.7 Geographical\u003c/h2\u003e\u003cp\u003eGeographical datasets specify the analysis areas and facilitate all modeling steps, including the creation of trip origins and destinations, assigning flows on networks, by supplying consistent definitions of boundaries and network nodes (Curtin \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2007\u003c/span\u003e; LI \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2006\u003c/span\u003e). Appendix A, Table A-7 lists 5 primary sources: U.S. Census TIGER/Line shapefiles (annual boundary files of states, counties, tracts, roads and TAZs); BTS NTAD geofiles (continuous updates of multimodal network lines and intermodal points); MPO-provided Traffic Analysis Zone boundaries (variable 5-year update on local TAZ definitions); Eurostat GISCO (annual NUTS boundaries and European network geofiles); and China National Fundamental GIS (annual administrative boundaries for provinces, cities, and counties)\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\u003ch2\u003e4.8 Regulation/Code\u003c/h2\u003e\u003cp\u003eThe legal and operational constraints that exist in modal feasibility and routing choices are defined by regulatory and code databases (Baldacci et al. \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; Browne et al. \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Dopilka and Balobanov \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Fialkoff et al. \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Appendix A Table A-8 lists six primary sources in the United States, European Union, and China, weight and dimensional restrictions, hazmat classifications, carrier safety ratings, and infrastructure load ratings. The U.S. has 49 CFR Title 49 (eCFR) national HAZMAT classes, weight/dimension limits, and routing requirements; monthly carrier-level safety and inspection scores on the Federal Motor Carrier Safety Administration (FMCSA) Safety Measurement System (SMS); pipeline alignments and high-consequence area designations on the PHMSA National Pipeline Mapping System (NPMS); and annual bridge load ratings and posted weight restrictions on the National Bridge Inventory (NBI). ADR Dangerous Goods by Road agreement established continent-wide regulations on HAZMAT in Europe, whereas GB Road Vehicle Standards establish national size and weight requirements in China. These datasets are used by modelers to impose mode-choice utility functions and link-level constraints in the route assignment process.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\u003ch2\u003e4.9 Transportation Means\u003c/h2\u003e\u003cp\u003eDatabases of transportation means provide the composition of fleets and equipment capacities that are essential to the estimation of modal costs and the imposition of capacity constraints in Mode Choice and Route Assignment (Jiang et al. \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2022\u003c/span\u003eb; Xu and Chow, \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Zhu et al. \u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Table A-9 in Appendix A lists five important sources: in the United States, the BTS reports counts of trucks, railcars, vessels, and aircraft by classes with payload capacities in the National Transportation Statistics (NTS); the USACE Waterborne Commerce Vessel Characteristics (WCUS) Parts 1\u0026ndash;4 reports vessel types, deadweight tonnages and trip counts on the U.S. waterways; the STB Public Use Waybill Sample (PUWS) provides annual origin-destination BEA flows, car counts, and tonnages of the national rail network; The Transport Equipment Statistics in Eurostat provides annual EU fleet figures and payload capacities by mode; the Transport Equipment tables in the Chinese Statistical Yearbook provide national truck fleets, railcars, vessel inventories and aircraft counts. These data sets make mode choice models consistent with capacity constraints and route assignments distribute flows at actual equipment capacities.\u003c/p\u003e\u003cp\u003eThe spatial granularity and update frequencies vary for each region. For complete attribute details refer to the individual tables in Appendix A. Each region\u0026rsquo;s relative spatial granularity, update frequency, and metadata transparency for the nine data categories is expressed in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e\u0026mdash;guiding practitioners toward regions and datasets with the strongest foundation for four-step freight modeling.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"5. Mapping Databases to the Four-Step Framework","content":"\u003cp\u003eTo make the four-step model practical, every category of data (Section 4) is clearly associated with the stage(s) of the decision in which it is used. These mappings are summarized in Table 2 (below), and the following subsections give the detail of the role of each category, with reference to the full metadata in Appendix A (Tables A.1-A.9).\u003c/p\u003e\n\u003cdiv id=\"Sec23\"\u003e\n \u003ch2\u003e5.1 Trip Generation Inputs\u003c/h2\u003e\n \u003cp\u003eTrip Generation needs zone-based marks of the freight sources and attractions. Socioeconomic-Demographic statistics (Section 4.1; see Appendix A, Table A.1) provide population, income, employment, and GDP values that are associated with production and consumption. Sources of Commodity/Goods (Section 4.2; see Table A.2) give physical volumes, in tons, value and production statistics, which allow commodity-specific generation. The analysis zones (counties, NUTS, provinces, TAZ) are determined by geographical boundaries (Section 4.7; Appendix A, Table A.7) and all the generation inputs have a common spatial framework to be allocated.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec24\"\u003e\n \u003ch2\u003e5.2 Trip Distribution Inputs\u003c/h2\u003e\n \u003cp\u003eTrip Distribution assigns created trip ends to origin destination matrices. Commodity/Goods data (Table A.2) provide commodity specific origin and destination totals. International and intermodal flow details are injected into cross-border and gateway allocations using ports (Section 4.4; Table A.4) and Trade datasets (Section 4.5; Table A.5). Flow Databases (Section 4.6; Table A.6) provide OD matrices and link volumes which are observed to allow calibration and validation of the model. Multi-Modal Network attributes (Section 4.3; Table A.3) give network impedance values–distances, speeds, connectivity, to a gravity or opportunity model. The definitions of geographical zones (Table A.7) are consistent in the origin and destination indexing and facilitate spatial aggregation/disaggregation.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec25\"\u003e\n \u003ch2\u003e5.3 Mode Choice Inputs\u003c/h2\u003e\n \u003cp\u003eMode Choice identifies modal splits on an OD basis based on both cost and constraint information. The Multi-Modal Network layers (Section 4.3; Table A.3) provide travel distance, speed, and capacity of a highway, rail, water, air and pipeline modes. Regulation/Code databases (Section 4.8; Table A.8) have weight/dimension constraints, HAZMAT restrictions, and routing constraints which govern possible mode sets. The statistics on Transportation Means (Section 4.9; Table A.9) give the composition of fleets, payload capacities (vessel deadweight, railcar and truck capacities, aircraft types) that are used as inputs to generalized cost and utility functions that are in discrete choice formulations.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec26\"\u003e\n \u003ch2\u003e5.4 Route Assignment Inputs\u003c/h2\u003e\n \u003cp\u003eRoute Assignment allocates mode specific OD flows to the links of a network subject to capacity and regulatory constraints. Multi-Modal Network geometries (Table A.3) specify the assignment algorithm link graph. Flow Databases (Table A.6) provide empirical link-level counts (e.g. AADT, truck volumes, travel times) to be used to validate. Prohibited links (weight restricted roads, HAZMAT bans) are identified on Regulation/Code data (Table A.8) and maximum throughput by mode is established on Transportation Means capacities (Table A.9) so that assignment does not violate physical and legal constraints.\u003c/p\u003e\n \u003cp\u003eTable\u0026nbsp;2 provides a brief summary of how each data category contributes to the four modelling stages: a ‘✔’ signifies a key input to that step, allowing the reader to quickly determine which data sets to use when setting up Trip Generation, Distribution, Mode Choice and Route Assignment.\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab3\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 2\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eSummary Matrix\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eData Category\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTrip Generation\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTrip Distribution\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMode Choice\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRoute Assignment\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSocioeconomic–Demographic\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCommodity/Goods\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMulti-Modal Network\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePorts\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTrade\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFlow Databases\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGeographical\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRegulation/Code\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTransportation Means\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e✔\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec27\"\u003e\n \u003ch2\u003e5.5 Narrative on Overlaps\u003c/h2\u003e\n \u003cp\u003eSome of the categories of data are used in two or more stages of the modeling. To give an example, both Trip Generation and Trip Distribution are supported by the commodity/goods datasets (4.2) that can be used to obtain zone-level tonnages and flow patterns. Geographical boundaries (4.7) are universal—the spatial frame of all the steps of trip-end estimation to assignment. Multi-Modal Network data (4.3) feeds the Trip Distribution with impedance data, Mode Choice with cost data, and forms the assignment graph. Trip Distribution and Route Assignment are validated by Flow Databases (4.6), which completes the calibration loop. Mode Choice and Route Assignment is simultaneously limited by Regulation/Code (4.8) and Transportation Means (4.9) to provide legal and capacity-constrained routing. Being aware of such overlaps helps to update efficiently: a new flow dataset only has to be disseminated to the stages where it is relevant, and any upgrade to the network or regulatory data will automatically benefit multi-modal component.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"6. Integration algorithms","content":"\u003cp\u003eA successful multimodal freight-flow modeling requires not only an appropriate choice of datasets but also a meaningful and reproducible method for integrating them. The guidelines and best practices below address common issues of spatial, temporal, and categorical harmonization and provide approaches for disaggregation and aggregation.\u003c/p\u003e\u003cdiv id=\"Sec29\" class=\"Section2\"\u003e\u003ch2\u003e6.1 Spatial harmonization\u003c/h2\u003e\u003cp\u003eAnalysis will require all geographic layers (Section \u003cspan refid=\"Sec19\" class=\"InternalRef\"\u003e4.7\u003c/span\u003e) to have the same projection and zone definition. First, note the native CRS of each dataset in metadata (e.g. EPSG:4269, 4326, 3857). Second, reproject all layers to one model CRS (e.g. NAD 83 Albers, EPSG:5070). Third, correct and fix topology errors (self-intersections, gaps) through zero-width buffering. Finally, calculate zone centroid (GEOID, NUTS_ID, province code) and snap them to the nearest multimodal network node with a small tolerance (\u0026thinsp;\u0026lt;\u0026thinsp;=\u0026thinsp;10 m) to ensure network connectivity to build skim-matrix.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colspan=\"3\" nameend=\"c3\" namest=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"SmallCaps\" class=\"SmallCaps\" name=\"Emphasis\"\u003eAlgorithm 1: Spatial Harmonization\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eInputs\u003c/span\u003e:\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- layers \u0026larr; list of (path, native_crs)\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- network_nodes (shapefile)\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- model_crs \u0026larr; EPSG:5070\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eOutputs\u003c/span\u003e:\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- harmonized_layers\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- snapped_zone_centroids\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e1\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003efor each layer in layers do\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e2\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003edata \u0026larr; ReadShapefile(layer.path)\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e3\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eAssignCRS(data, layer.native_crs)\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e4\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003edata \u0026larr; Reproject(data, model_crs)\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e5\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003edata.geometry \u0026larr; RepairGeometry(data.geometry)\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e6\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eWriteShapefile(data, layer.name+\"_5070.shp\")\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e7\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eend for\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e8\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003ezones \u0026larr; ReadAllZoneLayers().to_crs(model_crs)\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e9\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003ecentroids \u0026larr; ComputeCentroids(zones)\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e10\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003enodes \u0026larr; ReadShapefile(network_nodes).to_crs(model_crs)\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e11\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003efor each c in centroids do\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e12\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003en \u0026larr; FindNearest(nodes, c)\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e13\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eif Distance(c, n)\u0026thinsp;\u0026le;\u0026thinsp;10 m then c \u0026larr; n\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e14\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eend for\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e15\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eWriteShapefile(centroids, \"snapped_centroids.shp\")\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec30\" class=\"Section2\"\u003e\u003ch2\u003e6.2 Temporal Harmonization\u003c/h2\u003e\u003cp\u003eDatasets (Sections \u003cspan refid=\"Sec13\" class=\"InternalRef\"\u003e4.1\u003c/span\u003e\u0026ndash;\u003cspan refid=\"Sec18\" class=\"InternalRef\"\u003e4.6\u003c/span\u003e) must align to a common reference year (e.g., 2022). Annual series require no resampling; quarterly data (QCEW) are aggregated to annual totals; five-year surveys (CFS) are held constant or interpolated only if documented. Record publication lags (e.g., CFS 2022 \u0026rarr; mid-2024) and fill single-year gaps via linear interpolation, clearly annotating all assumptions.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Tabb\" border=\"1\"\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colspan=\"4\" nameend=\"c4\" namest=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"SmallCaps\" class=\"SmallCaps\" name=\"Emphasis\"\u003eAlgorithm 2: Temporal Harmonization\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eInputs\u003c/span\u003e:\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- datasets \u0026larr; list of (frequency, values, reference_year, release_date)\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- base_year \u0026larr; 2022\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eOutputs\u003c/span\u003e:\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- aligned_series\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- lag_table\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e1\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003efor each ds in datasets do\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e2\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eswitch ds.frequency\u003c/span\u003e:\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e3\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003ecase \"quarterly\": ds.annual \u0026larr; Sum(ds.values[quarters in base_year])\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e4\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003ecase \"5-year\": ds.annual \u0026larr; ds.values[base_year]\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e5\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003ecase \"annual\": ds.annual \u0026larr; ds.values[base_year]\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e6\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eend switch\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e7\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eds.lag_months \u0026larr; MonthsBetween(ds.reference_year, ds.release_date)\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e8\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eif Missing(ds.values[base_year]) then\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e9\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eds.annual \u0026larr; LinearInterpolate(ds.values[base_year\u0026thinsp;\u0026minus;\u0026thinsp;1], ds.values[base_year\u0026thinsp;+\u0026thinsp;1])\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e10\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eend if\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e11\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eend for\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e plots each dataset\u0026rsquo;s update cadence by region representing the nine data categories. This figure underscores regional differences in data timeliness and supports targeted harmonization strategies.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec31\" class=\"Section2\"\u003e\u003ch2\u003e6.3 Commodity Code Crosswalks\u003c/h2\u003e\u003cp\u003eTo harmonize trade and production statistics, combine HS-coded data with SCTG (and optionally with NAICS) through the official crosswalk tables (BTS, Eurostat and China Customs). Proportionally map one-to-many mappings with share fields and map agricultural units (bushels to tons), and log unmapped codes to be reviewed manually.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Tabc\" border=\"1\"\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colspan=\"4\" nameend=\"c4\" namest=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"SmallCaps\" class=\"SmallCaps\" name=\"Emphasis\"\u003eAlgorithm 3: Commodity Code Crosswalk\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eInputs\u003c/span\u003e:\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- trade_data(hs_code, qty, value)\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- cw_hs_sctg(hs_code, sctg_code, share)\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- ag_factors(commodity, lb_per_bushel)\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eOutputs\u003c/span\u003e:\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- sctg_flows(sctg_code, tonnes, value)\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e1\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003emerged \u0026larr; Join(trade_data, cw_hs_sctg on hs_code)\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e2\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003efor each rec in merged do\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e3\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003erec.tonnes \u0026larr; rec.qty * rec.share\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e4\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003erec.value_sctg \u0026larr; rec.value * rec.share\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e5\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eif rec.commodity in ag_factors then\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e6\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003efactor \u0026larr; ag_factors[rec.commodity] / 2000\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e7\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003erec.tonnes \u0026larr; rec.qty * factor\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e8\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eend if\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e9\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eend for\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e10\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003esctg_flows \u0026larr; Aggregate(merged by sctg_code sum(tonnes, value_sctg))\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec32\" class=\"Section2\"\u003e\u003ch2\u003e6.4 Zone-to-Zone Disaggregation \u0026amp; Aggregation\u003c/h2\u003e\u003cp\u003eIn cases that flows are reported in coarse units, downscale them by proxy shares (employment on manufactured goods, production on crops) or aggregate sub-zone values to model zones. Parent zones are mapped to target zones using a look up table and marginal totals balanced using iterative proportional fitting (IPF) where required.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Tabd\" border=\"1\"\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colspan=\"3\" nameend=\"c3\" namest=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"SmallCaps\" class=\"SmallCaps\" name=\"Emphasis\"\u003eAlgorithm 4: Zone-to-Zone Disaggregation \u0026amp; Aggregation\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eInputs\u003c/span\u003e:\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- parent_flows(parent_zone, total_tons)\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- proxy(zone, parent_zone, proxy_value)\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- lookup(zone \u0026rarr; model_zone)\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eOutputs\u003c/span\u003e:\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- model_zone_flows(model_zone, allocated_tons)\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e1\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003efor each p in proxy do\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e2\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003ep.share \u0026larr; p.proxy_value / Sum(proxy.proxy_value where parent_zone\u0026thinsp;=\u0026thinsp;p.parent_zone)\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e3\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eend for\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e4\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003ealloc \u0026larr; Join(parent_flows, proxy on parent_zone)\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e5\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003efor each a in alloc do\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e6\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003ea.allocation \u0026larr; a.total_tons * a.share\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e7\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003ea.model_zone \u0026larr; lookup[a.zone]\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e8\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eend for\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e9\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003emodel_zone_flows \u0026larr; Aggregate(alloc by model_zone sum(allocation))\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec33\" class=\"Section2\"\u003e\u003ch2\u003e6.5 Handling Missing or Sparse Data\u003c/h2\u003e\u003cp\u003eImpute on a systematic basis suppressed or missing values by allocating parent-zone totals through proxy distributions, and flag imputed records. In the case of completely missing zones, use regional averages or fallback proxies, and log all the rules in an imputation log.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Tabe\" border=\"1\"\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colspan=\"4\" nameend=\"c4\" namest=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"SmallCaps\" class=\"SmallCaps\" name=\"Emphasis\"\u003eAlgorithm 5: Handling Missing or Sparse Data\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eInputs\u003c/span\u003e:\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- dataset(zone, value)\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- proxy(zone, proxy_value)\u003c/span\u003e\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- threshold\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eOutputs\u003c/span\u003e:\u003c/p\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003e- dataset_imputed(zone, value, source_flag)\u003c/span\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e1\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003efor each rec in dataset do\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e2\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eif rec.value is missing OR rec.value\u0026thinsp;\u0026lt;\u0026thinsp;threshold then\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e3\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003esiblings \u0026larr; dataset where same parent_zone\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e4\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eshare \u0026larr; proxy[rec.zone] / Sum(proxy[siblings.zone])\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e5\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003erec.value \u0026larr; parent_total * share\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e6\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003erec.source_flag \u0026larr; \"imputed\"\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e7\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eElse\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e8\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003erec.source_flag \u0026larr; \"reported\"\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e9\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eend if\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cspan type=\"BoldSmallCaps\" class=\"BoldSmallCaps\" name=\"Emphasis\"\u003e10\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e\u003cspan type=\"ItalicSmallCaps\" class=\"ItalicSmallCaps\" name=\"Emphasis\"\u003eend for\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eUsing these guidelines and algorithms, which are based on the metadata of Section 4, modelers can build, validate, and update four-step freight‐flow models in a transparent and reproducible manners.\u003c/p\u003e\u003c/div\u003e"},{"header":"7. Data Sharing, Accessibility \u0026 Reproducibility","content":"\u003cp\u003eFreight transportation modeling, analysis, and effective policy formulation are important to data sharing, ease of access, and reproducibility. Though large volumes of data are available in the United States, European Union, and China, their practical use is frequently hindered by accessibility problems, restrictive licensing, poor documentation, and incompatibility of data formats. These concerns are addressed systematically in this section and the major barriers and opportunities that affect data reuse by a researcher, policy maker and practitioner are highlighted.\u003c/p\u003e\u003cdiv id=\"Sec35\" class=\"Section2\"\u003e\u003ch2\u003e7.1 Access mechanism\u003c/h2\u003e\u003cp\u003eThe three most common ways of accessing freight related data are direct downloads of the data on official websites or using FTP servers, Application Programming Interfaces (APIs) and data subscriptions. The U.S. has well-developed open-access systems (e.g. Census APIs, FHWA portals, BTS National Transportation Atlas Database (NTAD)), however, certain key sources like TRANSEARCH or detailed PIERS customs data are subscription-only thus restricting their usability. In the same way, the Eurostat COMEXT and GISCO portals of the EU are both widely open-access, with a limited number of more detailed trade and multimodal datasets that can only be accessed on registration or with institutional licenses. China The National Bureau of Statistics (NBS) and the General Administration of Customs provide data via publicly available portals, but these data are frequently in formats (e.g., PDFs, non-machine-readable formats) that limit more efficient data integration and reproducibility. The existence of inconsistent and restrictive access procedures therefore poses great challenges limiting large-scale comparative analysis and modeling activities across international boundaries.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec36\" class=\"Section2\"\u003e\u003ch2\u003e7.2 Metadata standards\u003c/h2\u003e\u003cp\u003eEffective data reuse requires standardized and complete, clear metadata to explain the structure of data sets, methodologies, limitations and classification schemes. The quality of metadata currently differs significantly: the U.S federal datasets (e.g., ACS, FAF, USDA QuickStats) usually include detailed machine-readable documentation, definitions, and user guides, whereas many EU and Chinese datasets (e.g., China Customs, MOT) offer little metadata in human-readable formats (PDF, HTML), and need substantial manual interpretation. This absence of consistently structured, machine-readable metadata makes it much harder to automate the integration, cross-reference and reproducibility of datasets. Improved standardization in metadata (via international guidelines or widely-used schemas, e.g. ISO 19115 to describe geospatial datasets or DataCite standards to describe general datasets) would go a long way towards making data much easier to discover and integrate.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec37\" class=\"Section2\"\u003e\u003ch2\u003e7.3 Version control\u003c/h2\u003e\u003cp\u003eFreight modeling that is accurate needs to be well documented with versioning of datasets, update frequency, and provenance. The frequency of updates is however widely varied, some are in real time (AIS vessel tracking, NPMRDS traffic data) and others annually or quinquennially (CFS, FAF) and this can be a source of potential mismatch in integration workflows. In addition, there are no clear version histories or change logs in data sets, which means that it is hard to run the same analyses at different times. As an example, there is a gap between data gathering and release (e.g., the CFS 2022 is released in the middle of 2024), which requires special treatment when combined with more up-to-date data. Implementation of standardized version-control (such as clear versioning, regular release, clear change logs, and Digital Object Identifiers (DOIs)) would help to increase transparency, enable reproducibility, and provide the possibility to conduct reliable time-series analysis.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec38\" class=\"Section2\"\u003e\u003ch2\u003e7.4 Barrier to reuse\u003c/h2\u003e\u003cp\u003eThe actual reuse of freight-related data is often complicated by long-standing operational impediments: broken or stale API endpoints and data connections, poorly documented fields and variables, inconsistent data schema, and format changes with insufficient warning. Such problems compel researchers and practitioners to consume a lot of time in cleaning, transforming, and reconciling data (Wang et al. \u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). Schema drift the gradual and unannounced change of data structures or definitions of variables is a special problem (Gonz\u0026aacute;lez et al. \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Such discrepancies greatly diminish efficiency and make it hard to reproduce earlier-published research findings.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec39\" class=\"Section2\"\u003e\u003ch2\u003e7.5 Public availability of the Master Metadata Spreadsheet\u003c/h2\u003e\u003cp\u003eTo increase the reproducibility and generalizability of this research, the master metadata spreadsheet, which contains standardized information on more than 50 freight-related datasets in the U.S., EU, and China, is publicly available. This resource is deposited in GitHub, which is an open-access repository that grants persistent identifiers and version control to allow long-term access. The spreadsheet can be found at: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/haroonbaloch770/Master-Metadata-Spreadsheet-for-Multimodal-Freight-Databases-in-the-US-EU-and-China\u003c/span\u003e\u003cspan address=\"https://github.com/haroonbaloch770/Master-Metadata-Spreadsheet-for-Multimodal-Freight-Databases-in-the-US-EU-and-China\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. It can be downloaded by users in the form of .xlsx. Subsequent releases, e.g. of new datasets, or harmonization workflows, will be versioned and announced through the repository. The strategy will overcome the usual obstacles to data reuse, including inconsistent access, and comply with the best practices of open science in transportation research.\u003c/p\u003e\u003cp\u003eThese challenges need to be handled holistically and therefore a concerted effort is needed. Agencies are encouraged to use and implement standardized, open data formats (e.g. CSV, JSON, GeoJSON), to keep API endpoints stable and backward compatible and provide well-documented machine-readable metadata. Moreover, data sets must be hosted on federated repositories or portals with stable interfaces, well defined data schemas, well documented and transparent licensing conditions. These steps will facilitate greater reuse and improve reproducibility and greatly simplify obstacles to strong, cross-jurisdiction freight modeling.\u003c/p\u003e\u003c/div\u003e"},{"header":"8. Data gaps and future needs","content":"\u003cp\u003eAlthough freight data inventories are comprehensive in the United States, the European Union and China, significant gaps continue to limit the potential to conduct truly comprehensive, cross regional analysis. Spatially, numerous sub-national and cross-border corridors are not evenly covered: Traffic Analysis Zone definitions exist in only a few U.S. metros, the NUTS hierarchy in Europe excludes similar urban areas, and the flow data in China at the prefecture level are inconsistently reported, leaving essential sub-provincial movements to remain obscure. Temporally, infrequent major surveys (e.g. the five-year Commodity Flow Survey and Freight Analysis Framework) and publication lags of 12\u0026ndash;18 months make it difficult to address changing market conditions in a timely manner, and high frequency series cannot provide the sectoral breakdown that is essential to detailed modeling. Taxonomies of modal and commodity transport are still coarse: inland waterway and coastal barge traffic is under-represented in non-dedicated AIS streams, intermodal terminal activity and last-mile urban freight are largely unobserved, and even two-digit SCTG or broad NACE categories conceal intra-category heterogeneity critical to accurate distribution and mode-choice analysis. The regulatory and policy restrictions are distributed into various repositories (weight limits, hazardous-goods regulations, carrier safety statistics are all regulated in different jurisdictions and in different formats) leading to a heterogeneous legal environment that is not standardized and does not have a machine‐readable interface.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eFuture data initiatives must adopt a multi-faceted approach in order to address these weaknesses. The creation of a system of biennial, sub-national commodity-flow surveys located at the county or prefecture level would allow the fine-scale temporal and geographic resolution that is lacking in quinquennial reports. Link-level volumes and speeds of freight in real time, which would require IoT sensors, telematics, and mobile‐network data, could provide continuous operational decision information with less latency. The development of commodity taxonomies to lower levels of coding (e.g., HS 8-digit or SCTG 7-digit) with open‐source crosswalks would allow capturing important product differences and facilitate more refined distribution and mode‐choice modeling. Development of a centralized, version‐controlled global API of multimodal regulatory restrictions, including weight/dimension restrictions, HAZMAT categories, and emission regulations, would standardize the legal demands across boundaries and modes. Finally, the creation of standardized datasets of international freight corridors, with standard zone definitions and intermodal terminal inventories, would allow easy modelling of cross-jurisdictional supply chains, and assist with integrated infrastructure planning.\u003c/p\u003e\u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e superimposes the five important gap dimensions of the U.S. (blue), EU (green), and China (red), showing that the U.S. has high coverage for modal \u0026amp; commodity detail, regionally, and temporally but moderate gaps in regulatory information and policy data, the EU has mid-range coverage but severe gaps in policy-data coverage, and China has the lowest coverage in terms of commodity and temporal granularity but better coverage in terms of regulatory mapping. This comparative radar plot concisely points out the priority areas of new data initiatives in every region.\u003c/p\u003e"},{"header":"9. Conclusion","content":"\u003cp\u003eThe paper has developed a harmonized, data-driven methodology of multimodal freight-flow modeling, including Trip Generation, Trip Distribution, Mode Choice, and Route Assignment, based on a detailed, cross-regional inventory of nine fundamental data types. We have built a transparent master metadata spreadsheet, and given clear guidance on harmonization, crosswalking, spatial allocation, and imputation workflows by systematically identifying and evaluating socioeconomic-demographic, commodity, network, port, trade, flow, geographical, regulatory, and fleet data sets across the United States, European Union, and China. The master metadata spreadsheet will be made publicly available via GitHub \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/haroonbaloch770/Master-Metadata-Spreadsheet-for-Multimodal-Freight-Databases-in-the-US-EU-and-China\u003c/span\u003e\u003cspan address=\"https://github.com/haroonbaloch770/Master-Metadata-Spreadsheet-for-Multimodal-Freight-Databases-in-the-US-EU-and-China\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e, ensuring transparency and enabling broader reuse by the research community. The four-step alignment (Section 5) states clearly which data drives each model step, and the algorithmic recipes (Section 6) give reproducible methods of processing heterogeneous inputs. The resulting mapping of data to decision provides an effective guide to practitioners to compile, test, and update freight models in a variety of settings without the complication of scattered/misplaced data or undocumented transformations.\u003c/p\u003e\u003cp\u003eThe main contributions of this work were the development of a global-regional metadata catalog, which standardized more than 50 datasets across three large economies into common fields of spatial/temporal coverage, variables, access methods, formats, and cost, and a refined assessment of completeness, timeliness, and accuracy. The clear correspondence between the data categories and the four stages of the modeling process eliminates methodological uncertainty and avoids misallocation of data, making sure that each dataset is utilized in the area where its informational value is the most significant. The reprojection and topology repair (Algorithm 1), temporal alignment and lag documentation (Algorithm 2), commodity-code crosswalking (Algorithm 3), zone‐to‐zone allocation (Algorithm 4), and principled imputation of missing data (Algorithm 5) common integration challenges we have developed as our best‐practice workflows can be implemented in open‐source GIS and statistical environments and used to provide a reproducible blueprint. Although progress has been made, there are still major challenges and data gaps, which indicates that further efforts should be put into the improvement of data collection, documentation, and distribution practices. Research efforts in the future must focus on the creation of universally applicable international metadata standards, improved machine-readable documentation, consistent versioning, and stable means of distributing data via API. Moreover, it is encouraged to work on addressing the most severe gaps in spatial granularity, temporal frequency, modal and commodity specificity, and be able to overcome the challenges linked to integrating regulatory and policy datasets. Further improvement of global data interoperability and analytical robustness would be achieved by establishing open, federated freight-data portals and by international cooperation to harmonise classification schemes.\u003c/p\u003e\u003cp\u003eFinally, through a rigorous method of documenting available datasets, the explicit definition of integration and accessibility issues, and the outline of practical directions to a less fragmented and more transparent global data infrastructure, the paper contributes to the joint effort of generating reproducible, comparable, and policy-relevant freight transportation modeling. A further commitment to open data, solid documentation, and cross-border collaboration is a key to more sustainable, efficient and resilient international freight systems.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eConceptualization, Writing - Original draft: M. Haroon, G. Shen; Data curation, formal analysis, methodology, and software: M. Haroon; Validation, supervision: G. Shen; writing - review \u0026amp; editing: G. Shen, J. Zhao, P.Zhu.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe master metadata spreadsheet supporting this study is openly available in GitHub at [https://github.com/haroonbaloch770/Master-Metadata-Spreadsheet-for-Multimodal-Freight-Databases-in-the-US-EU-and-China](https:/github.com/haroonbaloch770/Master-Metadata-Spreadsheet-for-Multimodal-Freight-Databases-in-the-US-EU-and-China) . All other datasets analyzed are publicly available from the sources cited in the manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAbate, M.: Determinants of capacity utilisation in road freight transportation. J. Transp. Econ. Policy (JTEP). \u003cb\u003e48\u003c/b\u003e, 137\u0026ndash;152 (2014)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAbzalov, M.: Methodology of the mineral resource classification. Appl. Min. Geol., 355\u0026ndash;363 (2016)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAffum, J., Brown, A., Chan, Y.: Integrating air pollution modelling with scenario testing in road transport planning: The TRAEMS approach. Sci. Total Environ. \u003cb\u003e312\u003c/b\u003e, 1\u0026ndash;14 (2003)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAlbert, A., Schaefer, A.: Demand for freight transportation in the US: A high-level view, (2013)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAnderson, M.D., Harris, G.A., Harrison, K.: Using aggregated federal data to model freight in a medium-sized community. Transp. Res. Rec. \u003cb\u003e2174\u003c/b\u003e, 39\u0026ndash;43 (2010)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBaggag, A., Abbar, S., Zanouda, T., Srivastava, J.: Resilience analytics: Coverage and robustness in multi-modal transportation networks. EPJ Data Sci. \u003cb\u003e7\u003c/b\u003e, (2018)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBaldacci, R., Toth, P., Vigo, D.: Exact algorithms for routing problems under vehicle capacity constraints. Ann. Oper. Res. \u003cb\u003e175\u003c/b\u003e, 213\u0026ndash;245 (2009). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10479-009-0650-0\u003c/span\u003e\u003cspan address=\"10.1007/s10479-009-0650-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBeuthe, M., Jourquin, B., Geerts, J.-F., Ndjang, C.K.: Freight transportation demand elasticities: A geographic multimodal transportation network analysis. Transp. Res. E. \u003cb\u003e37\u003c/b\u003e, 253\u0026ndash;266 (2001)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBrowne, T., Tran, T.T., Veitch, B., Smith, D., Khan, F., Taylor, R.: A method for evaluating operational implications of regulatory constraints on Arctic shipping. Mar. Policy. \u003cb\u003e135\u003c/b\u003e (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.marpol.2021.104839\u003c/span\u003e\u003cspan address=\"10.1016/j.marpol.2021.104839\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eButenko, Y., Tarnopolskyi, Y., Saliuta, V., Melnyk, D., Olishevskyi, V.: \u003cem\u003eA\u003c/em\u003epplication of GIS in Establishing (Changing) the Boundaries of Administrative-Territorial Units on the Example of Rural Settlements, 1\u0026ndash;5 (2024)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCurtin, K.M.: Network Analysis in Geographic Information Science: Review, Assessment, and Projections. Cartography Geographic Inform. Sci. \u003cb\u003e34\u003c/b\u003e, 103\u0026ndash;111 (2007). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1559/152304007781002163\u003c/span\u003e\u003cspan address=\"10.1559/152304007781002163\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDe Langen, P.W., Sharypova, K.: Intermodal connectivity as a port performance indicator. Res. Transp. Bus. Manage. \u003cb\u003e8\u003c/b\u003e, 97\u0026ndash;102 (2013)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDonnelly, W.A.: International and domestic product classifications. USITC Office of Economics Working Paper, 99\u0026ndash;03 (1999)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDopilka, V.O., Balobanov, O.O.: Peculiarities of legal regulation of transportation of dangerous goods. Uzhhorod Natl. Univ. Herald Series: Law. \u003cb\u003e4\u003c/b\u003e, 202\u0026ndash;208 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.24144/2307-3322.2024.85.4.29\u003c/span\u003e\u003cspan address=\"10.24144/2307-3322.2024.85.4.29\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDwarakish, G.: Measuring port performance and productivity. ISH J. Hydraulic Eng. \u003cb\u003e26\u003c/b\u003e, (2020)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eEisele, W.L., Schrank, D.L., Bittner, J., Larson, G.: Incorporating Urban-Area Truck Freight Value into the Urban Mobility Report. Transp. Res. Rec. \u003cb\u003e2378\u003c/b\u003e, 54\u0026ndash;64 (2013)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eElghazaly, G., Frank, R., Harvey, S., Safko, S.: High-definition maps: Comprehensive survey, challenges, and future perspectives. IEEE Open. J. Intell. Transp. Syst. \u003cb\u003e4\u003c/b\u003e, 527\u0026ndash;550 (2023)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFialkoff, M.R., Omitaomu, O.A., Peterson, S.K., Tuttle, M.A.: Using geographic information science to evaluate legal restrictions on freight transportation routing in disruptive scenarios. International J. Crit. Infrastructure Prot. \u003cb\u003e17\u003c/b\u003e, 60\u0026ndash;74 (2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ijcip.2016.12.001\u003c/span\u003e\u003cspan address=\"10.1016/j.ijcip.2016.12.001\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eField, M.A.: Certification of performance-based standards for truck size and weight limits: Implementation considerations and enforcement issues. Transp. Res. Rec. \u003cb\u003e1763\u003c/b\u003e, 73\u0026ndash;79 (2001)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFurlonge, R.J.: A Probability Bias Model of Trip Distribution. International Conference on Transportation and Development, 258\u0026ndash;268 (2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1061/9780784482575.025\u003c/span\u003e\u003cspan address=\"10.1061/9780784482575.025\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGingerich, K., Maoh, H., Anderson, W.: Characterization of International Origin\u0026ndash;Destination Truck Movements Across Two Major U.S.\u0026ndash;Canadian Border Crossings. Transp. Res. Record: J. Transp. Res. Board. \u003cb\u003e2547\u003c/b\u003e, 1\u0026ndash;10 (2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3141/2547-01\u003c/span\u003e\u003cspan address=\"10.3141/2547-01\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGonz\u0026aacute;lez, A., Bradford, M., Chis, A.E., Gonz\u0026aacute;lez\u0026ndash;V\u0026eacute;lez, H.: Standardised Versioning of Datasets: A FAIR\u0026ndash;compliant Proposal. Sci. Data. \u003cb\u003e11\u003c/b\u003e (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41597-024-03153-y\u003c/span\u003e\u003cspan address=\"10.1038/s41597-024-03153-y\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGorman, M.F., Clarke, J.-P., de Koster, R., Hewitt, M., Roy, D., Zhang, M.: Emerging practices and research issues for big data analytics in freight transportation. Maritime Econ. Logistics. \u003cb\u003e25\u003c/b\u003e, 28\u0026ndash;60 (2023)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eG\u0026uuml;ler, H.: An empirical modelling framework for forecasting freight transportation. Transport. \u003cb\u003e29\u003c/b\u003e, 185\u0026ndash;194 (2014)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGuo, F., Aultman, L.: A zone design methodology for national freight origin\u0026ndash;destination data and transportation modeling. Transp. Plann. Technol. \u003cb\u003e37\u003c/b\u003e, 738\u0026ndash;756 (2014)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHe, Z., Navneet, K., van Dam, W., Van Mieghem, P.: Robustness assessment of multimodal freight transport networks. Reliab. Eng. Syst. Saf. \u003cb\u003e207\u003c/b\u003e, (2021)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHolgu\u0026iacute;n, J., Thorson, E.: Trip Length Distributions in Commodity-Based and Trip-Based Freight Demand Modeling: Investigation of Relationships. Transp. Res. Record: J. Transp. Res. Board. \u003cb\u003e1707\u003c/b\u003e, 37\u0026ndash;48 (2000). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3141/1707-05\u003c/span\u003e\u003cspan address=\"10.3141/1707-05\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eIllemann, T., Karam, A., Hegner Reinau, K.: Towards sharing data of private freight companies with public policy makers: A proposed framework for identifying uses of the shared data. 132\u0026ndash;136 (2021)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJahan, N., Zhou, Y.: Covid-19 and digital inclusion: Impact on employment. J. Digit. Econ. \u003cb\u003e2\u003c/b\u003e, 190\u0026ndash;203 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jdec.2024.01.003\u003c/span\u003e\u003cspan address=\"10.1016/j.jdec.2024.01.003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJetlund, K., Onstein, E., Huang, L.: Information Exchange between GIS and Geospatial ITS Databases Based on a Generic Model. ISPRS Int. J. Geo-Information. \u003cb\u003e8\u003c/b\u003e (2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/ijgi8030141\u003c/span\u003e\u003cspan address=\"10.3390/ijgi8030141\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJiang, X., Shan, X., Du, M.: Modeling Network Capacity for Urban Multimodal Transportation Applications. J. Adv. Transp. 1\u0026ndash;22 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1155/2022/6034369\u003c/span\u003e\u003cspan address=\"10.1155/2022/6034369\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKawasaki, T., Namba, Y., Oka, H., Dulebenets, M.A.: Freight trip distribution using spatiotemporal aggregate data: A modified collective flow diffusion model-based approach. Transp. Res. Interdisciplinary Perspect. \u003cb\u003e21\u003c/b\u003e (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.trip.2023.100904\u003c/span\u003e\u003cspan address=\"10.1016/j.trip.2023.100904\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKessler, R.: Whitepaper: Practical challenges for researchers in data sharing. Learn. Publish. \u003cb\u003e31\u003c/b\u003e, (2018)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKirby, R.F.: A Preferencing Model for Trip Distribution. Transport. Sci. \u003cb\u003e4\u003c/b\u003e, 1\u0026ndash;35 (1970). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1287/trsc.4.1.1\u003c/span\u003e\u003cspan address=\"10.1287/trsc.4.1.1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKumar, D., Bisht, N.: Does employment status determine household consumption pattern in India: An analysis through dependency approach. Indian J. Econ. Dev. \u003cb\u003e16\u003c/b\u003e, 547\u0026ndash;558 (2020)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLI, Y., CREATING STREET NETWORK, DATASETS IN GEOGRAPHICAL COORDINATES BY USING MAP IMAGES AND DIGITAL MAP DATA:. Proceedings of the Japan Society of Civil Engineers D 62, 121\u0026ndash;130 (2006). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2208/jscejd.62.121\u003c/span\u003e\u003cspan address=\"10.2208/jscejd.62.121\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiao, R., Liu, W., Yuan, Y.: Resilience Improvement and Risk Management of Multimodal Transport Logistics in the Post\u0026ndash;COVID-19 Era: The Case of TIR-Based Sea\u0026ndash;Road Multimodal Transport Logistics. Sustainability \u003cb\u003e15\u003c/b\u003e, (2023)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu, X., Herold, M.: Population estimation and interpolation using remote sensing. Urban Remote Sens., 269\u0026ndash;290 (2007)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLunetta, R., Johnson, D.M., Lyon, J.G., Crotwell, J.: Impacts of imagery temporal frequency on land-cover change detection monitoring. \u003cb\u003e89\u003c/b\u003e, (2004)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMahajan, V., Kuehnel, N., Intzevidou, A., Cantelmo, G., Moeckel, R., Antoniou, C.: Data to the people: A review of public and proprietary data for transport models. Transp. Reviews. \u003cb\u003e42\u003c/b\u003e, 415\u0026ndash;440 (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/01441647.2021.1977414\u003c/span\u003e\u003cspan address=\"10.1080/01441647.2021.1977414\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMahmoudabadi, A., Mahmoudabadi, M.: Developing a two-stage procedure for Estimating Origin-Destination matrix based on routes and traffic volumes. 1\u0026ndash;6 (2015)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMoreno, J.O., Tutrone, J.D.: Hazardous Materials Transportation, pp. 1\u0026ndash;29. Kirk-Othmer Encyclopedia of Chemical Technology (2000)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMoschovou, T., Vlahogianni, E., Rentziou, A.: Challenges for data sharing in freight transport. Adv. Transp. Stud. \u003cb\u003e48\u003c/b\u003e, 141\u0026ndash;152 (2019)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNenavath, S.: Does transportation infrastructure impact economic growth in India? J. Facilities Manage. \u003cb\u003e21\u003c/b\u003e, 1\u0026ndash;15 (2023)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNocera, F., Gardoni, P.: \u003cem\u003eDigital Twins or Equivalent Infrastructure Models?\u003c/em\u003e The Role of Modeling Granularity in Regional Risk Analysis of Infrastructure. 3\u0026ndash;7 (2023)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNocera, S., Pungillo, G., Bruzzone, F.: How to evaluate and plan the freight-passengers first-last mile. Transp. Policy. \u003cb\u003e113\u003c/b\u003e, 56\u0026ndash;66 (2021)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOsman, R., Qutayan, S.M.S.B.: Overcoming Data Fabrication in Scientific Research. J. Sci. Technol. Innov. Policy. \u003cb\u003e9\u003c/b\u003e, 26\u0026ndash;31 (2023)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePaipuri, M., Barmpounakis, E., Geroliminis, N., Leclercq, L.: Empirical observations of multi-modal network-level models: Insights from the pNEUMA experiment. Transp. Res. Part. C: Emerg. Technol. \u003cb\u003e131\u003c/b\u003e, (2021)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePasquale, C., Siri, E., Sacone, S., Siri, S.: A modeling framework for passengers and freight in large-scale multi-modal transport networks. 29th Mediterranean Conference on Control and Automation (MED), 681\u0026ndash;686 (2021)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePidgrushnyi, G., Marushchynets, A., Ishchenko, Y.: Kyiv metropolitan area: The problems of formation, composition and boundaries. Український Географічний Журнал. \u003cb\u003e4\u003c/b\u003e, 47\u0026ndash;56 (2021)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePonte, B., Puche, J., Rosillo, R., de la Fuente, D.: The effects of quantity discounts on supply chain performance: Looking through the Bullwhip lens. Transportation Res. Part. E: Logistics Transp. Rev. \u003cb\u003e143\u003c/b\u003e (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.tre.2020.102094\u003c/span\u003e\u003cspan address=\"10.1016/j.tre.2020.102094\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRezzouqi, H., Naja, A., Sbihi, N., Ghogho, M.: Traffic Counts-based Origin-Destination Matrix Estimation using a Traffic Simulator and Machine Learning. Int. Wirel. Commun. Mob. Comput., 729\u0026ndash;734 (2024)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSaeedi, R., Sankaranarayanasamy, M., Vishwakarma, R., Singh, P., Vennelakanti, R.: Towards modular modeling and analytic for multi-modal transportation networks. IEEE International Conference on Big Data, 2426\u0026ndash;2432 (2020)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSafdar, M., Zhong, M., Ren, Z., Hunt, J.D.: An Integrated Framework for Estimating Origins and Destinations of Multimodal Multi-Commodity Import and Export Flows Using Multisource Data. Systems. \u003cb\u003e12\u003c/b\u003e, 406 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/systems12100406\u003c/span\u003e\u003cspan address=\"10.3390/systems12100406\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSharath, R., Nirupam, K., Sowmya, B.: KG, S.: Data analytics to predict the income and economic hierarchy on census data. International Conference on Computation System and Information Technology for Sustainable Solutions, 249\u0026ndash;254 (2016)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShen, G.: GIS-based analysis of US international seaborne trade flows. In: Ducruet, C. (ed.) 1) Advances in shipping data analysis and modeling tracking and mapping maritime flows in the age of big data, pp. 147\u0026ndash;172. Routledge (2017)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShen, G., Yan, X., Zhou, L., Wang, Z.: Visualizing the USA\u0026rsquo;s Maritime Freight Flows Using DM, LP, and AON in GIS. Int. J. Geo-Information. \u003cb\u003e9\u003c/b\u003e, 286 (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/ijgi9050286\u003c/span\u003e\u003cspan address=\"10.3390/ijgi9050286\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShen, G., Zhou, L., Aydin, S.G.: A multi-level spatial-temporal model for freight movement: The case of manufactured goods flows on the U.S. highway networks. J. Transp. Geogr. \u003cb\u003e88\u003c/b\u003e, 102868 (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jtrangeo.2020.102868\u003c/span\u003e\u003cspan address=\"10.1016/j.jtrangeo.2020.102868\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSt\u0026aacute;vkov\u0026aacute;, J., Souček, M., Birčiakov\u0026aacute;, N.: Income situation of households as a social status indicator. Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis. \u003cb\u003e61\u003c/b\u003e, 124 (2013)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eStillwell, J., Hayes, J., Dymond, R., Reid, J., Duke, O., Dennett, A., Wathan, J.: Access to UK census data for spatial analysis: Towards an integrated census support service. In Planning Support Systems for Sustainable Urban Development, 329\u0026ndash;348 (2013)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSun, S., Gu, M., Ou, J., Li, Z., Luan, S.: Regional truck travel characteristics analysis and freight volume estimation: Support for the sustainable development of freight. Sustainability \u003cb\u003e16\u003c/b\u003e, (2024)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTian, D., Huang, L., Huang, C.: The Impact of Port Infrastructure on Port Handling Capacity in China. 1\u0026ndash;4 (2009)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTitschack, J., Baum, D., Matsuyama, K., Boos, K., F\u0026auml;rber, C., Kahl, W.-A., Ehrig, K., Meinel, D., Soriano, C., Stock, S.R.: Ambient occlusion\u0026ndash;A powerful algorithm to segment shell and skeletal intrapores in computed tomography data. Comput. Geosci. \u003cb\u003e115\u003c/b\u003e, 75\u0026ndash;87 (2018)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTjandra, S., Kraus, S., Ishmam, S., Grube, T., Lin\u0026szlig;en, J., May, J., Stolten, D.: Model-based analysis of future global transport demand. Transp. Res. Interdisciplinary Perspect. \u003cb\u003e23\u003c/b\u003e, 101016 (2024)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTongzon, J.L.: Port choice and freight forwarders. Transp. Res. E. \u003cb\u003e45\u003c/b\u003e, 186\u0026ndash;195 (2009). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.tre.2008.02.004\u003c/span\u003e\u003cspan address=\"10.1016/j.tre.2008.02.004\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVoytov, S.: Commodities clasification as an instrument of customs-tariff regulation: The aspect of definition and control. Актуальні Проблеми Економіки. \u003cb\u003e147\u003c/b\u003e, 42\u0026ndash;48 (2013)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWagner, J.: Germany\u0026rsquo;s trade in goods: A survey of the evidence from transaction data. AStA Wirtschafts-Und Sozialstatistisches Archiv. \u003cb\u003e12\u003c/b\u003e, 69\u0026ndash;82 (2018)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang, C., Xia, Y., Shen, H.-L.: Routing and congestion in multi-modal transportation networks. Int. J. Mod. Phys. C \u003cb\u003e34\u003c/b\u003e, (2023)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang, Y.-H., Zhang, H.-B., Xu, J.: Int. Archives Photogrammetry Remote Sens. Spat. Inform. Sci. 175\u0026ndash;179 (2015). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.5194/isprsarchives-xl-7-w4-175-2015\u003c/span\u003e\u003cspan address=\"10.5194/isprsarchives-xl-7-w4-175-2015\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e A Survey of Appliactions and Researches on Schema Matching between GIS Spatial Data\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang, Z., Yu, Z.: Trading Partners, Traded Products and Firm Performances of China\u0026rsquo;s ExporterImporters, pp. 165\u0026ndash;193. Does Processing Trade Make a Difference? The World Economy (2013)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWei, G., Gundleg\u0026aring;rd, D., Rydergren, C.: Consistent origin-destination and link flow estimation based on data-driven network assignment. Transp. Res. Procedia. \u003cb\u003e86\u003c/b\u003e, 668\u0026ndash;675 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.trpro.2025.04.083\u003c/span\u003e\u003cspan address=\"10.1016/j.trpro.2025.04.083\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWulder, M.A., Hermosilla, T., White, J.C., Hobart, G., Masek, J.G.: Augmenting Landsat time series with Harmonized Landsat Sentinel-2 data products: Assessment of spectral correspondence. \u003cb\u003e4\u003c/b\u003e (2021)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXie, X., Xu, Y., Feng, B., Wu, W.: Multiscale urban functional zone recognition based on landmark semantic constraints. ISPRS Int. J. Geo-Information. \u003cb\u003e13\u003c/b\u003e, 95 (2024)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXINCHANG, W.: Port Hinterland Estimation and Optimization for Intermodal Freight Transportation Networks. (2010)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXu, S.J., Chow, J.Y.J.: Online Route Choice Modeling for Mobility-as-a-Service Networks With Non-Separable, Congestible Link Capacity Effects. IEEE Trans. Intell. Transp. Syst. \u003cb\u003e23\u003c/b\u003e, 11518\u0026ndash;11527 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/tits.2021.3105230\u003c/span\u003e\u003cspan address=\"10.1109/tits.2021.3105230\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhou, Z., Chen, A., Wong, S.C.: Alternative formulations of a combined trip generation, trip distribution, modal split, and trip assignment model. Eur. J. Oper. Res. \u003cb\u003e198\u003c/b\u003e, 129\u0026ndash;138 (2009). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ejor.2008.07.041\u003c/span\u003e\u003cspan address=\"10.1016/j.ejor.2008.07.041\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhu, J.-X., Luo, Q.-Y., Guan, X.-Y., Yang, J.-L., Bing, X.: A Traffic Assignment Approach for Multi-Modal Transportation Networks Considering Capacity Constraints and Route Correlations. IEEE Access. \u003cb\u003e8\u003c/b\u003e, 158862\u0026ndash;158874 (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/access.2020.3019301\u003c/span\u003e\u003cspan address=\"10.1109/access.2020.3019301\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"transportation","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"port","sideBox":"Learn more about [Transportation](http://link.springer.com/journal/11116)","snPcode":"11116","submissionUrl":"https://submission.nature.com/new-submission/11116/3","title":"Transportation","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Freight Transportation, Data Catalog, Metadata Standardization, Spatial Harmonization, Data Integration, Multimodal Networks, Reproducible Research","lastPublishedDoi":"10.21203/rs.3.rs-7288048/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7288048/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eFreight transportation modeling is the key to informed infrastructure planning, economic growth, and policy-making around the world, but the available data is still highly fragmented, differing widely in classification schemes, spatial granularity, temporal coverage, and documentation standards. The paper will discuss these challenges in a systematic way by creating an integrated and structured catalog of freight data in the United States, European Union, and China, arranged in a four-step freight-modeling structure, namely: Trip Generation, Trip Distribution, Mode Choice, and Route Assignment. Nine core data classes (socioeconomic-demographic, commodity/goods, multimodal networks, ports, trade, flow databases, geographical references, regulation/code and transportation means) were thoroughly listed, normalized and described in a single metadata spreadsheet. Methodological advances are elaborate criteria of dataset evaluation (spatial and temporal coverage, completeness, accuracy, accessibility, licensing and metadata quality), standardized commodity classification crosswalks and spatial and temporal harmonization workflows that can be reproduced. Among the barriers to data access and reuse, the paper mentions inconsistent documentation, the lack of appropriate metadata standards, inconsistent frequencies of update, and schema incompatibility. The paper suggests feasible solutions that may be used to improve the interoperability of datasets such as standard documentation, open-access portals, consistent version control, and stable APIs. Filling in the identified data gaps, especially at sub-national granularity, time resolution, commodity and mode specificity, and regulatory standardization will go a long way to increasing the accuracy and applicability of freight models. This data-driven systematic framework assists researchers and policymakers to develop more transparent, reproducible and internationally comparable freight-flow models that can be used to make informed infrastructure decisions and effective freight transportation policies.\u003c/p\u003e","manuscriptTitle":"On Multimodal Freight Databases for Scalable Global-Local Transport Research and Applications","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-13 18:44:03","doi":"10.21203/rs.3.rs-7288048/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-08-12T15:01:01+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-08-12T14:59:15+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-08-05T12:35:27+00:00","index":"","fulltext":""},{"type":"submitted","content":"Transportation","date":"2025-08-04T07:15:19+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"transportation","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"port","sideBox":"Learn more about [Transportation](http://link.springer.com/journal/11116)","snPcode":"11116","submissionUrl":"https://submission.nature.com/new-submission/11116/3","title":"Transportation","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"f6fe02a1-11a1-419a-90ce-48aab8e2fcad","owner":[],"postedDate":"November 13th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2025-11-13T18:44:03+00:00","versionOfRecord":[],"versionCreatedAt":"2025-11-13 18:44:03","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7288048","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7288048","identity":"rs-7288048","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.