Benchmarking Transformer-Based NLP Models for Multi-Platform Public Engagement Analysis in Transportation Projects

doi:10.21203/rs.3.rs-9351360/v1

Benchmarking Transformer-Based NLP Models for Multi-Platform Public Engagement Analysis in Transportation Projects

2026 · doi:10.21203/rs.3.rs-9351360/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 207,256 characters · extracted from preprint-html · click to expand

Benchmarking Transformer-Based NLP Models for Multi-Platform Public Engagement Analysis in Transportation Projects | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Benchmarking Transformer-Based NLP Models for Multi-Platform Public Engagement Analysis in Transportation Projects Alireza Shamshiri, Mahdi Jaberizadeh, Shah Salah Uddin Chowdhury, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9351360/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 5 You are reading this latest preprint version Abstract Meaningful public engagement is central to transportation planning, yet agencies face challenges in synthesizing large volumes of unstructured comments from hearings, news media, and social platforms. Although natural language processing methods are increasingly used for this purpose, clear guidance is lacking on which models are most suitable for different data characteristics and analytical goals. This study compares transformer-based and classical approaches for sentiment analysis and topic modeling in transportation contexts. A curated multi-source corpus from the North Houston Highway Improvement Project was developed, including Facebook posts, news articles, and public hearing documents. Sentiment classification using Bidirectional Encoder Representations from Transformers (BERT) models, specifically DistilBERT and RoBERTa, was benchmarked against lexicon-based approaches, while topic discovery using BERTopic was compared with probabilistic and matrix factorization models. Model performance was evaluated using classification accuracy and F1-Micro scores, topic coherence and interpretability, and cross-platform consistency. Transformer-based methods outperformed classical approaches, particularly in informal and context-rich settings where lexicon-based tools struggled with nuanced language and mixed sentiment. In addition, BERTopic produced more coherent and transferable topic structures across heterogeneous datasets, while lexicon-based methods remained useful for rapid screening. These findings show that model selection should be guided by data characteristics and analytical objectives rather than reliance on a single technique. The study introduces a method selection framework that provides practical guidance for transportation agencies. Public participation Transportation planning Text mining Sentiment analysis Topic modeling Decision support Figures Figure 1 Figure 2 1. Introduction Large scale transportation and urban infrastructure projects influence mobility networks, environmental outcomes, and neighborhood wellbeing for decades, shaping access, exposure, and displacement pressures and making meaningful public engagement an essential component of equitable and legitimate planning practice. Extensive research emphasizes that public participation supports transparency, fairness, and informed and accountable decision making, particularly when communities are affected by impactful or controversial infrastructure proposals (Geekiyanage et al., 2021 ; Rowe and Frewer, 2005 ). Stakeholder involvement is essential for recognizing concerns that shape long term project outcomes, since limited engagement often leads to mistrust, conflict, and project delays (Fitzpatrick and Sinclair, 2003 ; Li et al., 2013 ). Public participation is frequently described as a progression from informing and consulting to involving, collaborating, and empowering communities, and these stages are widely viewed as foundational for building trust and long-term project support (Babelon et al., 2021 ). Conventional public engagement practices, such as public hearings, community workshops, surveys, and written comment submissions, have historically served as primary mechanisms for collecting public input in transportation and urban infrastructure projects (Rowe and Frewer, 2005 ). While these approaches provide structured avenues for participation, they often capture only a limited subset of perspectives because they rely on attendance, scheduled meetings, or formal submission processes, which can constrain inclusiveness and representation (Babelon et al., 2021 ; Geekiyanage et al., 2021 ). As a result, valuable concerns from dispersed, marginalized, or less vocal groups may remain underrepresented, which contributes to incomplete understanding of public concerns and reduces perceived legitimacy of project decisions (Fitzpatrick and Sinclair, 2003 ). Despite this recognition, transportation and urban development projects often struggle to capture the full range of public perspectives during early phases, which can reduce legitimacy and amplify public resistance (Chow and Leiringer, 2020 ; Xiao and Hao, 2021 ). These limitations are especially consequential in transportation megaprojects, where governance structures are multi-level, timelines are long, and impacts are spatially concentrated. In such settings, early decisions can become path dependent and difficult to reverse once alternatives, delivery strategies, and right of way commitments are institutionalized (Flyvbjerg et al., 2003 ). In recent years, digital communication has expanded the landscape of public engagement. Social media platforms in particular generate large volumes of unstructured textual data that have been increasingly leveraged for transportation analysis, including the extraction of features from user-generated posts to support traffic prediction and system monitoring, and sentiment-based assessment of transportation policies and mobility trends using social media data (Acosta-Sequeda et al., 2024 ; Sun et al., 2023 ; Tori et al., 2024 ; Yao and Qian, 2021 ). Communities now express their views increasingly through online participation platforms, news outlets, and social media, which collectively generate diverse and unstructured textual data (Hou and Lampe, 2015 ; Reddick et al., 2017 ). These inputs vary widely in formality, tone, length, and linguistic style, creating a heterogeneous environment that challenges traditional analytic approaches. This digital expansion complements but also complicates conventional comment processes by introducing significantly larger comment volumes and more distributed channels of communication. Consequently, the need for scalable analytical methods capable of summarizing concerns across multiple platforms has become increasingly apparent (Fu, 2024 ; Ogryzek et al., 2021 ; Ram and Titarenko, 2022 ). At the same time, digital channels are not inherently more representative. Participation can reflect unequal access, language barriers, differences in time availability, and mobilization dynamics that amplify some voices while muting others. For this reason, multi-source text analytics is best interpreted as complementary evidence rather than a substitute for deliberative engagement, with explicit attention to representativeness and distributional impacts (Blank and Lutz, 2017 ; Mellon and Prosser, 2017 ). To improve scalability, early computational approaches introduced text mining and classical natural language processing techniques into public sector and infrastructure contexts. Topic modeling methods such as latent Dirichlet allocation and non-negative matrix factorization have been used to identify themes within large civic and infrastructure datasets (Jelodar et al., 2019 ; Maier et al., 2018 ; San Juan et al., 2017 ). These approaches have contributed to understanding public concerns in transportation megaprojects, environmental schemes, and major water diversion programs (Xue et al., 2020 ). However, classical models rely on bag of words representations that treat text as unordered tokens, limiting their ability to capture contextual meaning or semantic nuance (Chauhan and Shah, 2022 ; Hagen, 2018 ). These limitations become particularly evident when analyzing short or noisy messages typical of online participation platforms (Chowdhury and Alzarrad, 2023 ; Yin and Wang, 2014 ; Zhang and Feick, 2016 ). Sentiment analysis provides another perspective for assessing public attitudes. Prior studies have applied semantic and ontology-based sentiment analysis to transportation and city feature reviews in order to extract user perceptions related to safety and travel experience (Ali et al., 2017 ). Foundational reviews categorize sentiment techniques into lexicon based, machine learning based, and hybrid methods (Medhat et al., 2014 ; Taboada et al., 2011 ). Lexicon based approaches are widely used because they require no labeled training data and can be applied quickly across public engagement datasets (Wan et al., 2022 ; Ying Wang et al., 2019 ). However, they often struggle with domain specific terminology, conflicting sentiment polarity across different aspects of the same issue, and subtle evaluative expressions, which can lead to incomplete or misleading interpretations of public concerns in complex infrastructure debates (Geetha and Karthika Renuka, 2021 ; Raghunathan and Saravanakumar, 2023 ). This limitation matters for engagement practice because expressions of negative sentiment can reflect substantively different concerns, such as displacement risk, construction disruption, safety, procedural distrust, or perceived inequity, each of which implies different response and mitigation strategies. Accordingly, analytic outputs are most useful when they support traceable issue categorization and institutional responsiveness, rather than only aggregate measures of positivity or negativity (Bryson et al., 2013 ; Nabatchi and Amsler, 2014 ). With advances in transformer models, natural language processing has gained significantly improved contextual understanding and semantic representation. More broadly, deep learning approaches have become central to transportation analytics, enabling the modeling of complex, high-dimensional relationships across diverse transportation systems (Yuan Wang et al., 2019 ). Models in the BERT family learn bidirectional context and consistently outperform earlier techniques across many language tasks (Gu et al., 2022 ). Embedding based topic models such as BERTopic combine dense vector representations with clustering techniques to generate coherent topics even for short and informal text (Grootendorst, 2022 ; Zhao et al., 2021 ). These methods have enabled new applications in public sector analytics, including analysis of government inquiry platforms, ecosystem service discussions, and cross model comparisons using real world social media data (Egger and Yu, 2022; Lu et al., 2025 ; Tang et al., 2024 ). Reviews of natural language processing in construction and infrastructure research similarly show growing interest in advanced models due to their adaptability and ability to integrate information from multiple sources (Chung et al., 2023 ; Ding et al., 2022 ). Broader computational work highlights the relevance of graph attention, deep set architectures, and contextual learning for representing complex relationships in text (Do et al., 2018 ; Zaheer et al., 2018 ). Despite these developments, systematic benchmarking of classical and transformer-based models for transportation public engagement remains limited. Existing studies demonstrate the potential of text analytics for understanding public reactions to major transportation events and safety concerns (Zha et al., 2023 ), but most rely on a single data source or a single analytical method. This restricts insight into how model performance varies across informal social media posts, semi formal news commentary, and long form public comments (Chowdhury and Alzarrad, 2023 ; Rizun et al., 2025 ; Zhou and El-Gohary, 2016 ). Reviews consistently emphasize that the lack of cross-platform, multi method evaluation limits generalizability and constrains agencies’ ability to select suitable analytical tools (Maier et al., 2018 ; Zeng et al., 2023 ; Zoghbi et al., 2016 ). The lack of comparative evidence creates a clear methodological need. Classical and transformer-based models have rarely been evaluated side by side on matched public engagement datasets, and little is known about how model behavior shifts across platforms, writing styles, and document lengths. Without systematic benchmarking, agencies lack evidence-based guidance for choosing methods that balance accuracy, interpretability, robustness, and computational efficiency (Torres and De Picado-Santos, 2025 ; Zoghbi et al., 2016 ). Recent work in infrastructure text analytics and document classification further underscores the value of method selection frameworks that align analytical techniques with data characteristics and project needs (D’Orazio et al., 2022 ). In public decision contexts, interpretability and auditability are critical considerations for engagement analysis, particularly when results are used to inform planning decisions and public communication. Analytical methods should enable transparent explanation of how themes emerge, how patterns vary across platforms or over time, and how recurring concerns can be systematically tracked and addressed within agency workflows (Kroll et al., 2017 ). The present study addresses this need by benchmarking sentiment analysis and topic modeling approaches using multi-source public engagement data from a major transportation project. We compare transformer-based models, including DistilBERT and RoBERTa for sentiment classification and BERTopic for topic discovery, with classical baselines such as lexicon-based sentiment tools and probabilistic topic models. Our evaluation examines predictive performance, interpretability, cross-platform stability, and computational efficiency. Building on recent advances in infrastructure focused text analytics (Shamshiri et al., 2024a , 2024b , 2022 ), we propose a method selection framework that links data characteristics and analytical objectives to suitable natural language processing methods. The remainder of this paper describes the methodological design, presents comparative findings, and discusses implications for improving public engagement analysis in transportation planning. This study contributes to transportation data science by providing a systematic benchmarking framework for analyzing large-scale public engagement data using advanced natural language processing techniques. 2. Methodology This study evaluates the performance of classical, and transformer-based natural language processing models for the analysis of public engagement data collected from a large-scale transportation infrastructure project. The analytical framework incorporates two complementary tasks, sentiment classification and topic modeling, applied to three sources of public input: social media posts, online news articles, and formal hearing transcripts. These sources represent a wide spectrum of writing styles, communication norms, and levels of formality that are common in transportation public engagement. Their diversity allows for a thorough assessment of how different modeling approaches perform when applied to varied and complex text environments. The methodological design follows a clear sequence of stages. The first stage structures the text for analysis and compiles descriptive statistics to characterize linguistic and structural properties of each dataset. The second stage applies the analytical tasks using separate families of models. Transformer-based models, including DistilBERT and RoBERTa for sentiment classification and BERTopic for topic discovery, form the central focus because of their ability to capture context and represent semantic meaning with greater nuance. Classical approaches, including lexicon-based sentiment tools and probabilistic or non-negative matrix factorization topic models, are incorporated as comparative baselines. Model performance is assessed using quantitative and descriptive measures that capture predictive accuracy, topic quality, interpretability, cross-platform stability, and computational efficiency. Sentiment classifiers are evaluated using accuracy and F1-Micro score. Topic models are examined using coherence, diversity, and human interpretability assessments that reflect the clarity and usefulness of the extracted themes. Efficiency measures, including runtime and memory usage, extend the evaluation to practical considerations that are important for transportation agencies with limited analytical capacity. Cross-platform tests are conducted to examine how model behavior changes when applied to short informal social media posts, semi formal news articles, and long structured public comments. The following sections present each part of the analytical framework in detail. 2.1. Data Sources This study develops its analytical basis from public engagement data extracted and curated for the North Houston Highway Improvement Project, a large transportation initiative that generated substantial public input across online and formal channels, as also depicted in Fig. 1 . Three major corpora were constructed through these efforts: Facebook posts and comments, online news articles, and public hearing related documents. Together, these sources provide a diverse representation of informal conversation, semi formal reporting, and detailed community feedback. The online datasets consist of Facebook content and news articles. Facebook records were collected from public pages between 2012 and 2023 through web scraping and search-based retrieval. The initial dataset contained 10,698 posts and comments. After filtering for project relevance using keywords such as “I 45 Project” and “North Houston Highway Improvement Project,” the final analyzable corpus includes 1,429 entries. News articles were collected from regional and national outlets between 2003 and 2023. The raw collection contained 8,150 articles, which were filtered to 1,170 project specific records. These online sources represent the informal and semi formal components of public engagement. In addition to online sources, formal public engagement documents were obtained from the Texas Department of Transportation. These materials include scoping meeting notes, public meeting documentation, hearing transcripts, environmental impact statements, and technical assessment reports. The raw corpus comprised 6,329 documents in various formats, including handwritten letters, scanned pages, typed comments, and structured reports. These documents were processed using optical character recognition, noise removal, and content reconstruction methods. This transformation produced 1,287 structured files. After removing documents that contained no usable text or incomplete conversions, the final analyzable hearing corpus includes 1,285 records. These long form submissions capture detailed public concerns related to displacement, environmental justice, access, safety, and project design. Together, Facebook, news, and public hearing sub-datasets form a heterogeneous collection of text that reflects the broad range of communication styles present in contemporary public engagement. Facebook records contain short and informal expressions, news articles provide medium length journalistic narratives, and public hearing documents include structured submissions with varying levels of technical and policy detail. Table 1 summarizes the public engagement data sources used in this study. Table 1 Summary of public engagement data sources used in this study. Data Source Collection Period Number of Records Average Text Length * Key Characteristics Facebook Posts (FB) 2012–2023 1,429 Short (< 40 words) Informal posts and comments; conversational tone; emojis and abbreviations; platform specific expressions News Articles (NS) 2003–2023 1,170 Medium (200–800 words) Semi formal reporting; narrative structure; event-based coverage Scoping Meeting #2 Comments (SM2C) 2012 468 Medium Early-stage public feedback; short written comments; general transportation and access concerns Documentation of Public Hearing Comments (DPH) 2020 316 Medium to long Formal written submissions; project design, traffic, and accessibility concerns Draft Community Impacts and Cumulative Impacts Assessment Comments (DCIAC) 2020 124 Long Technical and policy focused responses; environmental and social impact discussions Final Environmental Impact Statement Comments (FIESC) 2020 377 Long Structured formal submissions; detailed legal, environmental, and equity related concerns * Average text length refers to the approximate word count after preprocessing, with text length categories defined as short ( 1,000 words). 2.2. Data Preprocessing The datasets described in Section 2.1 were prepared using a unified data processing strategy to support sentiment classification and topic modeling. Although the sources differ substantially in length, structure, and formality, the processing steps were standardized to ensure consistency across datasets and to allow fair comparison between classical and transformer-based approaches. Differences in dataset usage for sentiment analysis and topic modeling were handled at the modeling stage rather than through separate preprocessing pipelines. The online datasets were already available in machine-readable format but contained noise such as repeated entries and incomplete records. Public hearing documents required additional transformation because a large portion originated from scanned pages, handwritten notes, or mixed-format submissions. These items were converted to digital text using optical character recognition, and the resulting text files were screened to remove pages or documents that contained no usable content, producing the final analyzable corpus for public hearing materials. All remaining text across the datasets underwent standard cleaning and normalization sufficient to support reliable evaluation of classical baselines. Duplicate records were removed to avoid bias from repeated articles or shared posts, and non-English entries were excluded using automated language detection. Routine normalization was applied to reduce formatting-related variation across platforms. Although transformer-based models do not require extensive preprocessing, these steps were retained to enable accurate and fair comparison with lexicon-based sentiment tools and probabilistic topic models. The analysis therefore focuses on how intrinsic dataset characteristics shape model behavior across platforms. Table 2 summarizes the key characteristics of each dataset that are relevant to sentiment classification and topic modeling, providing context for interpreting cross-platform performance differences discussed in Section 3. Table 2 Dataset characteristics relevant to sentiment classification and topic modeling. Dataset Analytical Usage Linguistic and Structural Characteristics Dominant Sources of Variability Expected Modeling Sensitivity DCIAC Topic modeling Long, highly structured submissions with dense technical and policy-oriented content Multi-issue narratives and overlapping thematic discussion Probabilistic topic models sensitive to topic overlap; contextual embeddings improve separation of related themes FIESC Topic modeling Formal institutional documents with consistent terminology and structured argumentation Low linguistic noise and stable vocabulary Classical topic models perform reliably; transformer-based models capture subtle thematic nuance SM2C Topic modeling and sentiment analysis Highly informal conversational text with frequent sentiment shifts and nonstandard expressions Colloquial language, abbreviations, platform-specific phrasing, sentiment wavering within single comments Lexicon-based sentiment tools sensitive to mixed sentiment; transformer models benefit from contextual representation DPH Topic modeling and sentiment analysis Edited narrative text with structured reporting and evaluative framing Implicit stance and subtle tonal variation Classical models perform consistently; transformer-based models capture nuanced sentiment and stance FB Topic modeling and sentiment analysis Informal posts combining short statements and fragmented discussion High topical diversity and inconsistent structure Topic fragmentation risk for LDA and NMF; embedding-based models produce more coherent topics NS Topic modeling and sentiment analysis Semi-formal articles with narrative structure and moderate lexical consistency Event-focused framing with limited slang Stable topic formation across models; transformers improve cross-platform generalization 2.3. Sentiment Analysis Models Sentiment analysis was conducted using transformer-based models as the primary analytical approach, supported by concise classical baselines for contextual comparison. The goal of this component was to evaluate the ability of modern natural language processing methods to classify public attitudes expressed across Facebook posts, news articles, and hearing transcripts. Transformer-based modeling relied on DistilBERT and RoBERTa, both of which provide contextual word representations that support fine grained sentiment classification. Fine tuning was performed using pretrained checkpoints with the output layer adapted for a three-class sentiment scheme representing positive, neutral, and negative attitudes. Training used the Adam optimizer with commonly adopted learning rate and batch size settings. Early stopping based on validation loss was applied to prevent overfitting and to ensure stable convergence across multiple training runs. The dataset was divided into separate subsets using an eighty percent training split, a ten percent validation split, and a ten percent test split to support consistent evaluation across all models. Input text was tokenized using the respective tokenizer of each model and truncated to a maximum length of 512 tokens to accommodate variation in document size across platforms. Model outputs were mapped to the final sentiment categories and later aggregated for evaluation using accuracy and F1-Micro as described in Section 2.6. Transformer-based methods were applied using a consistent training and evaluation setup across datasets. Classical sentiment methods, including Valence Aware Dictionary and Sentiment Reasoner (VADER), TextBlob, and AFINN, were incorporated as minimal baselines to contextualize the performance of transformer-based models. AFINN is a lexicon-based sentiment analysis approach that assigns predefined sentiment scores to words and aggregates them to determine overall polarity. These lexicon-based approaches have been commonly employed in earlier public engagement and infrastructure research because they require no training data and can be applied directly to raw text. However, their reliance on predefined sentiment dictionaries limits their ability to interpret context dependent expressions, domain specific vocabulary, or mixed sentiment. Their inclusion therefore serves primarily to illustrate the performance gap between traditional sentiment tools and modern contextual models rather than competing with transformer-based methods. Model capacity assumptions and configuration settings are summarized in Table 3 . Table 3 Summary of model capacities and assumptions for sentiment classification and topic modeling. Model Maximum Input Handling Output Representation Processing Logic Pretrained Basis Reference DistilBERT Supports sequences up to 512 tokens Three sentiment classes Contextual representation and fine tuning DistilBERT base checkpoint (Sanh et al., 2020 ) RoBERTa Supports sequences up to 512 tokens Three sentiment classes Contextual representation and fine tuning RoBERTa base checkpoint (Liu et al., 2019 ) BERTopic Variable document lengths depending on embedding size Topic clusters Unsupervised contextual topic discovery Sentence transformer embeddings (Grootendorst, 2022 ) VADER No input length restriction Polarity score Lexicon rule-based scoring Built in sentiment lexicon (Hutto and Gilbert, 2014 ) TextBlob No input length restriction Polarity classification Lexicon rule-based scoring Built in sentiment lexicon (DeSmedt and Daelemans, 2012 ) 2.4. Topic Modeling Topic modeling was applied to identify the main themes expressed across Facebook posts, news articles, and public hearing submissions. This section describes the transformer-based and classical topic modeling approaches used in the study, together with the procedures for topic extraction, coherence evaluation, and interpretability assessment. Transformer-based modeling relied on BERTopic, which combines contextual embeddings with density-based clustering to generate coherent and interpretable topics. BERTopic first transforms each document into a dense semantic vector using all MiniLM L6 v2 embedding model. Dimensionality is then reduced with UMAP, and clusters are formed using HDBSCAN, which allows topics to emerge naturally without requiring a predefined number of clusters. The final topic representations are constructed using class-based term frequency inverse document frequency weighting, producing human interpretable topic descriptions supported by semantically meaningful document groups. These capabilities make BERTopic well suited for mixed length text and informal public engagement data. For comparison, two classical baselines were included: latent Dirichlet allocation and non-negative matrix factorization. Both methods rely on bag of words or TF-IDF representations of text and require the number of topics to be predefined. Topic quality was evaluated through coherence and interpretability ratings, and topic counts were selected based on the combination of coherence stability and manual inspection of semantic clarity. Including these classical models provides a minimal but necessary point of reference, allowing the study to isolate the contribution of contextual embeddings within the same engagement datasets. The configuration settings and key modeling parameters for BERTopic, LDA, and NMF are summarized in Table 4 . Table 4 Configuration settings for BERTopic, LDA, and NMF topic models. Parameter BERTopic LDA (baseline) NMF (baseline) Embedding model all MiniLM L6 v2 Bag of words TF IDF Dimensionality reduction UMAP Not applicable Not applicable Clustering algorithm HDBSCAN Not applicable Not applicable Minimum cluster size Determined empirically Not applicable Not applicable Topic representation Class based TF IDF Top word probabilities Top TF IDF weights Vectorizer None (uses embeddings) Count vectorizer TF IDF vectorizer Number of topics Determined by iterating over parameters to maximize coherence and interpretability Determined by iterating over parameters to maximize coherence and interpretability Determined by iterating over parameters to maximize coherence and interpretability Coherence evaluation c v c v c v For BERTopic, clustering parameters were selected empirically based on topic coherence and interpretability to ensure consistency of observed topic structures across datasets. Once topics were generated, the study applied a structured evaluation procedure to compare coherence, diversity, interpretability, and stability across the three datasets. Transformer-based models were expected to perform especially well on short and informal text, while classical models were anticipated to produce more stable topics in long form hearing transcripts. These assumptions were evaluated in Section 3 through quantitative and qualitative analysis. 2.5. Hybrid Workflows Hybrid workflows were explored as an extension of the primary sentiment and topic modeling approaches to determine whether simple combinations of classical and transformer-based methods could provide additional stability or interpretability without significantly increasing computational demand. These workflows are motivated by practical considerations observed in public engagement analysis, where agencies often operate under resource constraints and may benefit from strategies that leverage the strengths of multiple analytical techniques. For sentiment analysis, a two-stage workflow was tested in which a lexicon-based model provided an initial classification that served as a screening layer before applying transformer-based refinement. In this structure, lexicon scores identified comments with clear evaluative polarity, while borderline or ambiguous cases were reclassified using DistilBERT or RoBERTa. This approach allowed the transformer models to focus computational effort on comments where contextual interpretation was most needed. The objective of this workflow was not to outperform transformer-based models alone but to evaluate whether lexicon assisted triage could improve efficiency while maintaining classification quality. For topic modeling, a complementary strategy was applied in which classical models provided an initial thematic structure that informed transformer-based topic extraction. Latent Dirichlet allocation and non-negative matrix factorization were used to generate preliminary topic word distributions. These topic terms were then used to guide BERTopic initialization by seeding cluster centroids with anchor terms identified from the classical models. This procedure aimed to improve topic stability and interpretability, particularly for long and formally written hearing transcripts where classical models often capture high level themes effectively. Hybrid workflows were evaluated using the same performance metrics applied to the standalone models, including accuracy and F1-Micro for sentiment, coherence and diversity for topics, and runtime and memory consumption for efficiency. The goal was not to replace transformer-based methods but to assess whether these simple integration strategies could offer marginal gains in interpretability or computational performance. The results of these evaluations are presented in Section 3 and are used to refine the method selection framework presented in Section 3.5. 2.6. Evaluation Framework The evaluation framework was designed to compare the performance of classical and transformer-based models across sentiment analysis and topic modeling tasks and to assess how consistently these methods operate across the three engagement platforms. The framework integrates quantitative accuracy-based metrics, qualitative interpretability assessments, and computational efficiency measurements, allowing for a comprehensive comparison of model behavior under realistic conditions. For sentiment analysis, model outputs were evaluated using accuracy and F1-Micro. F1-Micro was treated as the primary indicator of performance because it provides a balanced measure across sentiment classes, including minority categories, which is essential for public engagement datasets where negative comments may be less frequent but highly important. These metrics were used to examine whether models produced balanced predictions across the full range of public responses. These metrics were computed for each dataset individually and for combined evaluations to assess cross-platform robustness. For topic modeling, both coherence and diversity were used to quantify the quality of generated topics. Coherence was measured using the c v metric, which captures semantic similarity among the most representative terms within each topic. The c v coherence score is computed as the average normalized pointwise mutual information (NPMI) between pairs of top-ranked topic words, as defined in Equations ( 1 ) and ( 2 ): $$\:{c}_{v}\left(T\right)=\frac{1}{\left|P\right|}\sum\:_{({w}_{i},{w}_{j})\in\:P}\text{N}\text{M}\text{P}\text{I}({w}_{i},{w}_{j})$$ 1 $$\:\text{N}\text{M}\text{P}\text{I}\left({w}_{i},{w}_{j}\right)=\frac{\text{log}\left(\frac{P\left({w}_{i},{w}_{j}\right)}{P\left({w}_{i}\right)P\left({w}_{j}\right)}\right)}{-\text{log}P({w}_{i},{w}_{j})}$$ 2 Where P(w i ) and P(w i , w j ) denote the marginal and joint probabilities of word occurrences estimated from a reference corpus. Topic diversity was assessed through the proportion of unique terms across the full set of topics, providing insight into whether models generated distinct themes or produced repetitive or overlapping topics. In addition to quantitative scores, topic quality was assessed through manual interpretability ratings. Two independent reviewers examined the semantic clarity and internal consistency of topics, assigning interpretability scores based on predefined criteria. This combined approach ensured that topics were evaluated not only through automatic measures but also through human level understanding, which is essential for practical use in engagement analysis. Computational performance was evaluated through runtime and memory consumption. Runtime was measured for both training and inference stages to account for differences in workload across models. Memory usage was monitored to compare the resource demands of embedding based models with classical baselines. These measurements were performed under identical computational conditions to ensure comparability. Assessing efficiency is important because transportation agencies often operate under constrained computing environments and may require methods that balance performance with practical limitations. Finally, cross-platform evaluation was conducted to determine how model behavior varies across Facebook posts, news articles, and hearing transcripts. For this analysis, models were trained and tested on each platform separately and then applied across platforms to observe the effects of domain shift. This approach provides insight into the extent to which models trained on one type of engagement data can generalize to other forms, a consideration that is particularly important for agencies analyzing input collected through multiple channels. These evaluation components allow for a detailed comparison of the strengths and limitations of classical and transformer-based methods. The results of this framework are reported in Section 3 and are used to support the development of the method selection framework presented in Section 3.5. 3. Results and Discussion 3.1. Overview of Results This section presents the comparative results of the sentiment analysis and topic modeling methods applied to Facebook posts, news articles, and public hearing transcripts. Across all platforms, transformer-based models demonstrated stronger performance than classical baselines, with particularly large gains observed for short and informal text. DistilBERT and RoBERTa produced higher accuracy and F1-Micro scores on sentiment classification, while BERTopic generated more coherent and diverse topics than latent Dirichlet allocation or non-negative matrix factorization. These patterns were consistent across the three datasets, although differences in text length and formality influenced the magnitude of improvement. In addition to performance gains, transformer-based methods showed greater robustness to cross-platform variation. Models trained on one dataset transferred more effectively to others when using contextual embeddings, while classical models exhibited substantial declines in performance under domain shift. Hybrid workflows provided modest improvements in efficiency and interpretability for specific cases but did not exceed the standalone transformer-based models in overall performance. The findings from these analyses support the development of a practical method selection framework, which is presented in Section 3.5. The following subsections detail the results for sentiment analysis and topic modeling, examine cross-platform behavior, and discuss methodological tradeoffs and practical implications. 3.2. Sentiment Analysis Results Sentiment classification performance is examined across engagement datasets summarized in Section 2.1. Across all platforms, transformer-based models showed consistently stronger sentiment classification performance than classical baselines. Table 5 summarizes the quantitative evaluation results, including accuracy and F1-Micro for RoBERTa, DistilBERT, VADER, AFINN, and TextBlob on Facebook posts, news articles, and public hearing transcripts. Across all datasets, RoBERTa achieved the highest performance, followed by DistilBERT. The largest performance gap appeared in the Facebook dataset, where comments are short, informal, and often contain expressions characteristic of informal social media discourse, including abbreviated phrasing and mixed evaluative cues, which challenge lexicon-based tools. These performance patterns reflect systematic differences in text length, structure, and expressive style across engagement sources and are consistent with the topic modeling results discussed in the following section. Table 5 Sentiment classification performance across platforms. Metric Dataset TextBlob AFINN VADER DistilBERT RoBERTa F1-Micro FB 15.35 32.89 26.31 46.05 58.69 NS 40.00 44.71 47.75 45.71 70.32 SM2C 31.73 39.56 36.08 53.91 70.43 DPH 30.00 37.82 33.04 53.91 66.08 Accuracy FB 37.31 50.00 44.24 72.34 87.50 NS 29.07 61.90 63.63 71.30 83.33 SM2C 30.30 42.10 38.63 72.22 86.69 DPH 37.50 45.97 31.62 83.72 86.00 Transformer-based models provided a substantial advantage when sentiment depended on contextual cues, mixed expressions, sarcasm, or platform specific phrasing. In the Facebook dataset, both RoBERTa and DistilBERT successfully detected negative sentiment in posts containing blended emotional and informational content. Classical models, particularly TextBlob and VADER, frequently misclassified these posts as neutral due to averaging effects and limited contextual understanding. Performance differences were smaller for news articles, reflecting the more formal and consistent writing style. Even so, transformer models demonstrated higher overall classification performance in capturing evaluative language, quotations, or editorials expressing concerns about safety, displacement, or environmental impacts. Public hearing transcripts exhibited patterns like Facebook comments, though with distinctive challenges. Speakers often shift rapidly between technical descriptions, procedural information, and personal narratives. Transformer models were able to capture these transitions more effectively, while classical models struggled with long, multi layered discourse. To visually illustrate these differences, Fig. 2 presents a bar chart of accuracy across the four models and three datasets. The figure highlights the consistent superiority of transformer-based methods, with the largest improvements observed for Facebook posts and hearing transcripts. Although classical sentiment models underperformed relative to transformer-based approaches, their behavior varied systematically across datasets with different linguistic characteristics. Lexicon-based tools showed reasonable stability when applied to highly neutral or descriptive text, such as short news updates or administrative statements, where sentiment cues are explicit and linguistic variability is limited. In contrast, their performance degraded substantially for informal and conversational datasets characterized by sentiment wavering, implicit evaluation, or mixed expressions within a single comment, such as social media discussions and public feedback submissions. Transformer-based models demonstrated greater robustness across these heterogeneous conditions by adapting to differences in text length, structure, and expressive style. Their advantage was most pronounced for datasets exhibiting nonstandard language use, overlapping sentiments, or contextual dependencies, where fixed sentiment dictionaries failed to capture evaluative meaning. These results indicate that model selection for sentiment analysis in transportation engagement should be guided by dataset characteristics rather than defaulting to a single approach. In practice, lexicon-based models may remain suitable for rapid screening of formal or neutral text, while transformer-based models are better suited for analyzing complex, informal, and context-rich public discourse. This dataset-specific performance distinction provides the foundation for the topic modeling analysis presented in Section 3.3, where similar relationships between text characteristics and model behavior are examined. 3.3. Topic Modeling Results Topic modeling performance was evaluated across the three public engagement datasets to compare classical frequency-based approaches, including latent Dirichlet allocation and non-negative matrix factorization, with an embedding-based topic modeling framework, BERTopic. Table 6 summarizes model behavior using topic coherence, diversity, interpretability, and sensitivity to platform characteristics. Across datasets, topic quality varied systematically with text length, linguistic structure, and thematic complexity, highlighting the importance of aligning topic modeling approaches with dataset characteristics rather than relying on a single method. Facebook posts represent a challenging environment for topic modeling due to short message length, informal language, inconsistent grammar, and frequent topic blending within individual posts. Under these conditions, classical frequency-based models exhibited notable limitations. As reflected in Table 6 , latent Dirichlet allocation often produced broad or indistinct topics that merged unrelated concepts, particularly when posts combined project references with personal commentary, safety concerns, or emotional reactions. Non-negative matrix factorization improved topic separation relative to LDA by emphasizing additive word components, but it remained sensitive to fragmented phrasing and inconsistent vocabulary. BERTopic demonstrated stronger performance on Facebook data by forming topics based on semantic similarity rather than raw token co-occurrence. By clustering contextual document embeddings, BERTopic produced topics with clearer internal consistency and reduced overlap, even when users expressed similar concerns using different linguistic expressions. This resulted in more interpretable themes that better reflected recurring discussion patterns within informal social media discourse. Table 6 Topic modeling behavior across engagement platforms. Model Dataset Topic Structure Topic Distinctiveness Interpretability Cross-Platform Behavior Key Observations LDA Facebook Broad, merged clusters Overlapping themes Difficult to interpret Sensitive to informal text Dominance of frequent but unrelated terms News Stable clusters Partially overlapping Moderately interpretable Stable Topics align with reporting categories Public hearings Overly broad clusters Low separation Low interpretability Sensitive to document length Multi-issue statements merged NMF Facebook Sharper partitions Moderate separation Moderately interpretable Sensitive Improved boundaries but phrasing-dependent News Structured clusters Moderate separation Moderately interpretable Stable Additive themes with some redundancy Public hearings Fragmented clusters Moderate separation Moderately interpretable Moderately sensitive Over-splitting layered concerns BERTopic Facebook Semantically cohesive Distinct themes Highly interpretable Stable Captures informal and varied expressions News Compact clusters Distinct themes Highly interpretable Stable Differentiates framing and emphasis Public hearings Well-separated themes Distinct themes Highly interpretable Stable Captures multi-issue narratives News articles exhibited more structured writing style, longer documents, and more stable vocabulary, creating a more favorable environment for classical topic models. Both LDA and NMF generated moderately coherent topics aligned with common reporting themes such as environmental impacts, traffic conditions, and project financing. However, Table 6 indicates that classical models still exhibited partial redundancy, with closely related topics split across multiple clusters or overlapping in content. BERTopic reduced this redundancy by forming tighter semantic groupings that distinguished differences in framing within news coverage, such as procedural project updates versus articles emphasizing community impacts or policy implications. Although the relative advantage of embedding-based modeling was less pronounced for news articles than for Facebook posts, BERTopic provided more compact and distinct topic structures, improving interpretability without relying solely on token frequency. Public hearing transcripts produced the strongest contrast among topic modeling approaches. These documents often contain long, multi-issue statements in which speakers shift between technical critique, personal testimony, and procedural commentary within a single submission. In this setting, LDA frequently generated overly broad topics that merged distinct concerns or produced clusters driven by high-frequency connective terms. NMF yielded somewhat clearer partitions but tended to over-split topics when multiple issues were layered within the same statement. BERTopic demonstrated the most robust performance for hearing transcripts, consistently producing well-structured topics with clear semantic boundaries. As summarized in Table 6 , embedding-based clustering enabled BERTopic to capture recurring engagement themes such as displacement narratives, environmental justice concerns, safety critiques, and procedural issues, even when vocabulary varied across speakers and documents. This stability reflects the advantage of semantic representations for handling long, heterogeneous text that contains multiple co-occurring themes. These results indicate that topic modeling performance depends strongly on dataset characteristics. Classical frequency-based models retain value for structured corpora with consistent writing style and stable vocabulary, such as edited news articles, where dominant themes are well represented through token co-occurrence. However, for datasets characterized by informal language, fragmented structure, or multi-issue narratives, embedding-based topic modeling provides more coherent topics and stronger interpretability. These findings underscore the importance of selecting topic modeling approaches based on the linguistic and structural properties of engagement data rather than defaulting to a single method. The performance patterns of each dataset identified here form the basis for the cross-platform robustness analysis presented in Section 3.4 and inform the practice-oriented model selection framework proposed in Section 3.5. 3.4. Cross-Platform Robustness and Trade-offs A core objective of this study was to evaluate how classical and transformer-based models perform across distinct public engagement datasets and to assess how dataset characteristics influence model behavior. The three datasets analyzed represent markedly different linguistic and structural environments. Facebook posts are short, informal, and conversational, often combining emotional tone with fragmented information. News articles are formal, edited, and information centered, with relatively stable vocabulary and grammatical structure. Public hearing sub-datasets contain longer, multi-issue statements that integrate technical discussion, personal experience, and evaluative language. On Facebook data, transformer-based sentiment classifiers demonstrated clear advantages over lexicon-based approaches. DistilBERT and RoBERTa were better able to interpret abbreviated expressions, platform-specific language, and mixed evaluative cues, whereas lexicon-based tools frequently misclassified sentiment due to limited context and vocabulary mismatch. Topic modeling results showed a similar pattern. BERTopic generated coherent and interpretable themes despite short document length, while LDA and non-negative matrix factorization often produced fragmented or redundant topics. For news articles, performance differences between modeling approaches were less pronounced. Lexicon-based sentiment tools performed reasonably well due to the formal writing style and consistent vocabulary, while transformer-based models provided modest improvements in capturing nuanced evaluative language. In topic modeling, both BERTopic and classical probabilistic models produced coherent topics, although BERTopic showed greater stability when applied across different news subsets. Public hearing sub-datasets posed the greatest analytical challenge due to document length, multi-issue structure, and technical content. Transformer-based models consistently outperformed classical approaches in this setting. Sentiment classifiers were able to capture subtle evaluative shifts within long submissions, while lexicon-based tools struggled with polarity dilution and conflicting sentiment cues. BERTopic produced structured and semantically meaningful topics aligned with policy, environmental, and community concerns, whereas LDA and matrix factorization methods exhibited sensitivity to document heterogeneity and often merged unrelated themes. These dataset-specific comparisons highlight important methodological trade-offs. Transformer-based models offer robust performance across heterogeneous text environments but require greater computational resources. Classical models retain practical value in constrained settings, particularly for rapid screening of formal text such as news articles. Effective model selection therefore depends on dataset characteristics, analytical objectives, and resource availability rather than reliance on a single universal approach. The next section builds on these findings by presenting a method selection framework to guide transportation agencies in choosing appropriate analytical tools for diverse public engagement data. 3.5. Practical Implications, Time-Dependent Evaluation, and Application Framework The results of this study offer several practical implications for transportation agencies that rely on public participation to inform planning and decision making. The comparative evaluation demonstrates that transformer-based models provide the most reliable performance across diverse engagement platforms, especially when public comments contain blended emotional and factual expressions. Classical approaches remain useful for rapid screening or highly neutral content, but their limited contextual sensitivity restricts their utility for comprehensive engagement analysis. These findings highlight the importance of selecting analytical techniques that match the linguistic complexity and communication style of the underlying data. Building on these results, the study proposes a practical framework to guide analysts in choosing appropriate sentiment and topic modeling methods for different scenarios encountered in transportation project engagement. The framework is designed to help practitioners balance accuracy, interpretability, and computational demands while also accounting for the temporal dynamics of public opinion. The approach is grounded in the structure of the datasets examined in this study and reflects patterns observed across the three platforms. A central element of the framework involves examining how public sentiment and topic distributions shift in response to key project events. Major project milestones often trigger changes in tone and thematic emphasis, and identifying these shifts can help agencies understand when concerns escalate, when confusion declines, and when communication gaps remain unresolved. To support this analysis, sentiment and topic outputs can be evaluated across time windows associated with specific project activities. This enables agencies to monitor not only overall sentiment but also the emergence or decline of themes such as traffic, displacement, environmental justice, or drainage concerns. The analytical structure evaluates changes in public responses by linking model outputs to project events that historically generated shifts in engagement for the NHHIP case. These events include the release of environmental impact documentation, redesign announcements for specific segments, temporary project pauses, and periods of intensified media coverage. For each event, a baseline interval is defined using comments, posts, and articles collected in the weeks or months prior to the announcement. Sentiment and dominant topics are then extracted for each platform using RoBERTa, DistilBERT, and BERTopic, allowing the models to characterize the tone and thematic structure of public opinion under stable conditions. The same analytical steps are applied to the period after each event, producing matched sentiment scores and topic distributions that reflect the immediate reaction. Comparing these pre-event and post event results reveals how public concerns evolve as project conditions change. Facebook comments frequently show increases in negative sentiment following announcements related to property acquisition or relocation, with topic models identifying stronger emphasis on displacement, neighborhood fragmentation, and community cohesion. News articles present more balanced shifts, often highlighting policy debates, technical explanations, and agency responses. Hearing transcripts capture longer form, emotionally rich statements where concerns about air quality, drainage, and community health intensify after major environmental disclosures. These cross-platform differences demonstrate that sentiment shifts are not uniform but depend on audience characteristics, message framing, and the communication channel through which information circulates. This structured comparison does not aim to predict future sentiment but provides a diagnostic tool that identifies which factors drive changes in public opinion. By examining how topic prevalence and sentiment scores shift within specific themes, agencies can determine whether a project milestone amplified existing concerns, introduced new issues, or improved clarity. The method also helps distinguish temporary sentiment spikes driven by social media amplification from sustained concerns that appear across multiple platforms and persist over time. This evidence-based approach supports more informed decision making by allowing agencies to evaluate whether outreach strategies or project modifications effectively address public needs. Taken together, the findings and the analytical framework offer transportation planners a structured way to choose modeling methods and to interpret temporal shifts in public engagement. Transformer-based models are recommended for comprehensive analysis of sentiment and topic patterns, while classical tools can complement them in settings where efficiency and rapid inspection are priorities. The framework also emphasizes the value of integrating time series insights to understand how public concerns evolve, providing agencies with a practical and scientifically grounded guide for analyzing and responding to public input in large infrastructure projects. 4. Conclusions This study demonstrates that transformer-based natural language processing models provide a more reliable and robust foundation for interpreting public engagement in transportation planning than traditional lexicon-based or probabilistic approaches. Benchmarking sentiment analysis and topic modeling methods across Facebook posts, news articles, and public hearing records shows consistent improvements in classification performance, topic coherence, interpretability, and cross-platform robustness for models such as RoBERTa, DistilBERT, and BERTopic. These advantages are most evident in informal and context-rich settings, where public comments combine emotional expression, factual content, and multiple concerns within a single statement, challenging dictionary-based and bag-of-words methods. At the same time, classical models retain value for rapid screening and in contexts where transparency and computational simplicity are required. The results highlight that no single method is universally optimal; instead, effective analysis depends on aligning modeling approaches with data characteristics and analytical objectives. By providing a systematic and cross-platform comparison within a real transportation project, this study establishes an evidence-based foundation for selecting appropriate text analytics methods in public sector applications. The proposed framework offers practical guidance for transportation agencies to integrate advanced language models into engagement workflows, improving the identification and tracking of public concerns while supporting more transparent and data-driven decision making. Although the analysis is based on a single case study, the findings are transferable to other infrastructure contexts that rely on large-scale and heterogeneous public input. Future research can extend this framework by incorporating multimodal data sources, such as images or geospatial information, and by exploring real-time monitoring systems to support adaptive and responsive engagement strategies. Declarations CRediT Authorship Contribution Statement Alireza Shamshiri: Data curation, Methodology, Validation, Writing – original draft, Writing – review & editing. Mahdi Jaberizadeh: Data curation, Methodology, Validation, Writing – original draft, Writing – review & editing. Shah Salah Uddin Chowdhury: Writing – review & editing Mahdis Hamisi: Writing – review & editing. Kyeong Rok Ryu : Conceptualization, Methodology, Resources, Supervision, Validation, Visualization, Writing – review and editing. Jiseul Kim: Conceptualization, Supervision, Writing – review and editing. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Funding The authors declare that no specific funding was received for this study. Data Availability The data used in this study are derived from publicly available sources, including social media content, news articles, and publicly accessible transportation project documents. Processed datasets may be made available by the authors upon reasonable request. Code Availability The code used for analysis in this study is available from the corresponding author upon reasonable request. References Acosta-Sequeda, J., Mohammadi, M., Patipati, S., Mohammadian, A., Derrible, S., 2024. Estimating Telecommuting Rates in the USA Using Twitter Sentiment Analysis. Data Sci. Transp. 6, 28. https://doi.org/10.1007/s42421-024-00114-0 Ali, F., Kwak, D., Khan, P., Islam, S.M.R., Kim, K.H., Kwak, K.S., 2017. Fuzzy ontology-based sentiment analysis of transportation and city feature reviews for safe traveling. Transportation Research Part C: Emerging Technologies 77, 33–48. https://doi.org/10.1016/j.trc.2017.01.014 Babelon, I., Pánek, J., Falco, E., Kleinhans, R., Charlton, J., 2021. Between Consultation and Collaboration: Self-Reported Objectives for 25 Web-Based Geoparticipation Projects in Urban Planning. IJGI 10, 783. https://doi.org/10.3390/ijgi10110783 Blank, G., Lutz, C., 2017. Representativeness of Social Media in Great Britain: Investigating Facebook, LinkedIn, Twitter, Pinterest, Google+, and Instagram. American Behavioral Scientist 61, 741–756. https://doi.org/10.1177/0002764217717559 Bryson, J.M., Quick, K.S., Slotterback, C.S., Crosby, B.C., 2013. Designing Public Participation Processes. Public Administration Review 73, 23–34. https://doi.org/10.1111/j.1540-6210.2012.02678.x Chauhan, U., Shah, A., 2022. Topic Modeling Using Latent Dirichlet allocation: A Survey. ACM Computing Surveys 54, 1–35. https://doi.org/10.1145/3462478 Chow, V., Leiringer, R., 2020. The Practice of Public Engagement on Projects: From Managing External Stakeholders to Facilitating Active Contributors. Project Management Journal 51, 24–37. https://doi.org/10.1177/8756972819878346 Chowdhury, S., Alzarrad, A., 2023. Applications of Text Mining in the Transportation Infrastructure Sector: A Review. Information 14, 201. https://doi.org/10.3390/info14040201 Chung, S., Moon, S., Kim, Junghoon, Kim, Jungyeon, Lim, S., Chi, S., 2023. Comparing natural language processing (NLP) applications in construction and computer science using preferred reporting items for systematic reviews (PRISMA). Automation in Construction 154, 105020. https://doi.org/10.1016/j.autcon.2023.105020 DeSmedt, T., Daelemans, W., 2012. Pattern for Python. Journal of Machine Learning Research 13, 2063–2067. Ding, Y., Ma, J., Luo, X., 2022. Applications of natural language processing in construction. Automation in Construction 136, 104169. https://doi.org/10.1016/j.autcon.2022.104169 Do, K., Tran, T., Nguyen, T., Venkatesh, S., 2018. Attentional Multilabel Learning over Graphs: A Message Passing Approach. https://doi.org/10.48550/ARXIV.1804.00293 D’Orazio, M., Di Giuseppe, E., Bernardini, G., 2022. Automatic detection of maintenance requests: Comparison of Human Manual Annotation and Sentiment Analysis techniques. Automation in Construction 134, 104068. https://doi.org/10.1016/j.autcon.2021.104068 Egger, R., Yu, J., 2022. A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. Frontiers in Sociology 7, 886498. https://doi.org/10.3389/fsoc.2022.886498 Fitzpatrick, P., Sinclair, A.J., 2003. Learning through public involvement in environmental assessment hearings. Journal of environmental management 67, 161–74. https://doi.org/10.1016/S0301-4797(02)00204-9 Flyvbjerg, B., Bruzelius, N., Rothengatter, W., 2003. Megaprojects and Risk: An Anatomy of Ambition. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9781107050891 Fu, X., 2024. Natural Language Processing in Urban Planning: A Research Agenda. Journal of Planning Literature 39, 395–407. https://doi.org/10.1177/08854122241229571 Geekiyanage, D., Fernando, T., Keraminiyage, K., 2021. Mapping Participatory Methods in the Urban Development Process: A Systematic Review and Case-Based Evidence Analysis. Sustainability 13, 8992. https://doi.org/10.3390/su13168992 Geetha, M.P., Karthika Renuka, D., 2021. Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model. International Journal of Intelligent Networks 2, 64–69. https://doi.org/10.1016/j.ijin.2021.06.005 Grootendorst, M., 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. https://doi.org/10.48550/arXiv.2203.05794 Gu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., Naumann, T., Gao, J., Poon, H., 2022. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthcare 3, 1–23. https://doi.org/10.1145/3458754 Hagen, L., 2018. Content analysis of e-petitions with topic modeling: How to train and evaluate LDA models? Information Processing & Management 54, 1292–1307. https://doi.org/10.1016/j.ipm.2018.05.006 Hou, Y., Lampe, C., 2015. Social Media Effectiveness for Public Engagement: Example of Small Nonprofits, in: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI ’15. Association for Computing Machinery, New York, NY, USA, pp. 3107–3116. https://doi.org/10.1145/2702123.2702557 Hutto, C., Gilbert, E., 2014. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. ICWSM 8, 216–225. https://doi.org/10.1609/icwsm.v8i1.14550 Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., Zhao, L., 2019. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications 78, 15169–15211. https://doi.org/10.1007/s11042-018-6894-4 Kroll, J., Huey, J., Barocas, S., Felten, E., Reidenberg, J., Robinson, D., Yu, H., 2017. Accountable Algorithms. University of Pennsylvania Law Review 165, 633. Li, T.H.Y., Ng, S.T., Skitmore, M., 2013. Evaluating stakeholder satisfaction during public participation in major infrastructure and construction projects: A fuzzy approach. Automation in Construction 29, 123–135. https://doi.org/10.1016/j.autcon.2012.09.007 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V., 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/ARXIV.1907.11692 Lu, J., Zhang, H., Zhang, X., 2025. Cultural ecosystem services in China’s national parks and their impact on public online engagement − Analysis of Douyin short videos data based on BERTopic modeling. Journal for Nature Conservation 87, 126969. https://doi.org/10.1016/j.jnc.2025.126969 Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Häussler, T., Schmid-Petri, H., Adam, S., 2018. Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Communication Methods and Measures 12, 93–118. https://doi.org/10.1080/19312458.2018.1430754 Medhat, W., Hassan, A., Korashy, H., 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal 5, 1093–1113. https://doi.org/10.1016/j.asej.2014.04.011 Mellon, J., Prosser, C., 2017. Twitter and Facebook are not representative of the general population: Political attitudes and demographics of British social media users. Research & Politics 4, 2053168017720008. https://doi.org/10.1177/2053168017720008 Nabatchi, T., Amsler, L.B., 2014. Direct Public Engagement in Local Government. The American Review of Public Administration 44, 63S-88S. https://doi.org/10.1177/0275074013519702 Ogryzek, M., Krupowicz, W., Sajnóg, N., 2021. Public Participation as a Tool for Solving Socio-Spatial Conflicts of Smart Cities and Smart Villages in the Sustainable Transport System. Remote Sensing 13, 4821. https://doi.org/10.3390/rs13234821 Raghunathan, N., Saravanakumar, K., 2023. Challenges and Issues in Sentiment Analysis: A Comprehensive Survey. IEEE Access 11, 69626–69642. https://doi.org/10.1109/ACCESS.2023.3293041 Ram, J., Titarenko, R., 2022. Using Social Media in Project Management: Behavioral, Cognitive, and Environmental Challenges. Project Management Journal 53, 236–256. https://doi.org/10.1177/87569728221079427 Reddick, C.G., Chatfield, A.T., Ojo, A., 2017. A social media text analytics framework for double-loop learning for citizen-centric public services: A case study of a local government Facebook use. Government Information Quarterly 34, 110–125. https://doi.org/10.1016/j.giq.2016.11.001 Rizun, N., Revina, A., Edelmann, N., 2025. Text analytics for co-creation in public sector organizations: a literature review-based research framework. Artif Intell Rev 58, 125. https://doi.org/10.1007/s10462-025-11112-1 Rowe, G., Frewer, L.J., 2005. A Typology of Public Engagement Mechanisms. Science, Technology, & Human Values 30, 251–290. https://doi.org/10.1177/0162243904271724 San Juan, P., Vidal, A.M., Garcia-Molla, V.M., 2017. Updating/downdating the NonNegative Matrix Factorization. Journal of Computational and Applied Mathematics 318, 59–68. https://doi.org/10.1016/j.cam.2016.11.048 Sanh, V., Debut, L., Chaumond, J., Wolf, T., 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https://doi.org/10.48550/arXiv.1910.01108 Shamshiri, A., Ryu, K.R., McCullough, S., Park, J.Y., 2022. ConStory: Automatic story investigator of public perception on the mega urban infrastructure project: poster abstract, in: Proceedings of the 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation. Presented at the BuildSys ’22: The 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, ACM, Boston Massachusetts, pp. 293–294. https://doi.org/10.1145/3563357.3567751 Shamshiri, A., Ryu, K.R., Park, J.Y., 2024a. Text mining and natural language processing in construction. Automation in Construction 158, 105200. https://doi.org/10.1016/j.autcon.2023.105200 Shamshiri, A., Ryu, K.R., Park, J.Y., 2024b. In-Context Learning for Long-Context Sentiment Analysis on Infrastructure Project Opinions. https://doi.org/10.48550/arXiv.2410.11265 Sun, W., Kobayashi, H., Nakao, S., Schmöcker, J.-D., 2023. On the Relationship Between Crowdsourced Sentiments and Mobility Trends During COVID-19: A Case Study of Kyoto. Data Sci. Transp. 5, 17. https://doi.org/10.1007/s42421-023-00080-z Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M., 2011. Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics 37, 267–307. https://doi.org/10.1162/COLI_a_00049 Tang, Z., Pan, X., Gu, Z., 2024. Analyzing public demands on China’s online government inquiry platform: A BERTopic-Based topic modeling study. PLoS ONE 19, e0296855. https://doi.org/10.1371/journal.pone.0296855 Tori, F., Tori, S., Keseru, I., Ginis, V., 2024. Performing Sentiment Analysis Using Natural Language Models for Urban Policymaking: An analysis of Twitter Data in Brussels. Data Sci. Transp. 6, 5. https://doi.org/10.1007/s42421-024-00090-5 Torres, E.C.M., De Picado-Santos, L.G., 2025. Sentiment Analysis and Topic Modeling in Transportation: A Literature Review. Applied Sciences 15, 6576. https://doi.org/10.3390/app15126576 Wan, X., Wang, R., Wang, M., Deng, J., Zhou, Z., Yi, X., Pan, J., Du, Y., 2022. Online Public Opinion Mining for Large Cross-Regional Projects: Case Study of the South-to-North Water Diversion Project in China. J. Manage. Eng. 38, 05021011. https://doi.org/10.1061/(ASCE)ME.1943-5479.0000970 Wang, Ying, Li, H., Wu, Z., 2019. Attitude of the Chinese public toward off-site construction: A text mining study. Journal of Cleaner Production 238, 117926. https://doi.org/10.1016/j.jclepro.2019.117926 Wang, Yuan, Zhang, D., Liu, Y., Dai, B., Lee, L.H., 2019. Enhancing transportation systems via deep learning: A survey. Transportation Research Part C: Emerging Technologies 99, 144–163. https://doi.org/10.1016/j.trc.2018.12.004 Xiao, H., Hao, S., 2021. Public participation in infrastructure projects: an integrative review and prospects for the future research. Engineering, Construction and Architectural Management ahead-of-print. https://doi.org/10.1108/ECAM-06-2021-0495 Xue, Y., Temeljotov-Salaj, A., Engebø, A., Lohne, J., 2020. Multi-sector partnerships in the urban development context: A scoping review. Journal of Cleaner Production 268, 122291. https://doi.org/10.1016/j.jclepro.2020.122291 Yao, W., Qian, S., 2021. From Twitter to traffic predictor: Next-day morning traffic prediction using social media data. Transportation Research Part C: Emerging Technologies 124, 102938. https://doi.org/10.1016/j.trc.2020.102938 Yin, J., Wang, J., 2014. A dirichlet multinomial mixture model-based approach for short text clustering, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York New York USA, pp. 233–242. https://doi.org/10.1145/2623330.2623715 Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R., Smola, A., 2018. Deep Sets. https://doi.org/10.48550/arXiv.1703.06114 Zeng, L., Li, R.Y.M., Yigitcanlar, T., Zeng, H., 2023. Public Opinion Mining on Construction Health and Safety: Latent Dirichlet Allocation Approach. Buildings 13, 927. https://doi.org/10.3390/buildings13040927 Zha, W., Ye, Q., Li, J., Ozbay, K., 2023. A social media Data-Driven analysis for transport policy response to the COVID-19 pandemic outbreak in Wuhan, China. Transportation Research Part A: Policy and Practice 172, 103669. https://doi.org/10.1016/j.tra.2023.103669 Zhang, S., Feick, R., 2016. Understanding Public Opinions from Geosocial Media. International Journal of Geo-Information (IJGI) 5, 74. https://doi.org/10.3390/ijgi5060074 Zhao, H., Phung, D., Huynh, V., Jin, Y., Du, L., Buntine, W., 2021. Topic Modelling Meets Deep Neural Networks: A Survey, in: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, Montreal, Canada, pp. 4713–4720. https://doi.org/10.24963/ijcai.2021/638 Zhou, P., El-Gohary, N., 2016. Domain-Specific Hierarchical Text Classification for Supporting Automated Environmental Compliance Checking. J. Comput. Civ. Eng. 30, 04015057. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000513 Zoghbi, S., Vulić, I., Moens, M.-F., 2016. Latent Dirichlet allocation for linking user-generated content and e-commerce data. Information Sciences 367–368, 573–599. https://doi.org/10.1016/j.ins.2016.05.047 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviewers agreed at journal 22 Apr, 2026 Reviewers invited by journal 10 Apr, 2026 Editor assigned by journal 10 Apr, 2026 Submission checks completed at journal 10 Apr, 2026 First submitted to journal 08 Apr, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9351360","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":621975947,"identity":"1e97c2f7-e330-4cb4-a4a6-9178f8c2d7b9","order_by":0,"name":"Alireza Shamshiri","email":"","orcid":"","institution":"The University of Texas at Arlington","correspondingAuthor":false,"prefix":"","firstName":"Alireza","middleName":"","lastName":"Shamshiri","suffix":""},{"id":621975948,"identity":"9352791e-63dd-428a-a71d-cd269136d2bc","order_by":1,"name":"Mahdi Jaberizadeh","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA7klEQVRIiWNgGAWjYHCCxAcfeGzkkEUMCGlJNpwhk2aMrJqgFjZpHpvDiQ1Ea9GdkfBMgifncPra9jPGHxjb/sgzsDdvk8CnxezMgWQLiTPpudvO5JhJMLYZGDbwHCvDr+V4Q+INwx7r3G0H0tIYgFoYGySAevFqOcyQIJH4jznd7PyzZKDDDOwb5N8Q0HK8IUniAI9zgtmN5AMghyU2SPAQ0AL0C9D5aYbbbjw+JpFwzji5jSet2AKvlhs5iY//8NjIm51PbP7woUzOtp/98MYb+LQwMPAkINggJht+5SDAfoCwmlEwCkbBKBjZAAAEwks9l74lcgAAAABJRU5ErkJggg==","orcid":"","institution":"The University of Texas at Arlington","correspondingAuthor":true,"prefix":"","firstName":"Mahdi","middleName":"","lastName":"Jaberizadeh","suffix":""},{"id":621975949,"identity":"9de73f37-1865-4443-aa61-a85e620d01f3","order_by":2,"name":"Shah Salah Uddin Chowdhury","email":"","orcid":"","institution":"The University of Texas at Arlington","correspondingAuthor":false,"prefix":"","firstName":"Shah","middleName":"Salah Uddin","lastName":"Chowdhury","suffix":""},{"id":621975950,"identity":"a9489dca-2e3f-41fc-a9d1-fd67c83091a9","order_by":3,"name":"Mahdis Hamisi","email":"","orcid":"","institution":"The University of Texas at Arlington","correspondingAuthor":false,"prefix":"","firstName":"Mahdis","middleName":"","lastName":"Hamisi","suffix":""},{"id":621975951,"identity":"2d7640dd-6f15-48ee-9996-4c2b131e21b4","order_by":4,"name":"Kyeong Rok Ryu","email":"","orcid":"","institution":"The University of Texas at Arlington","correspondingAuthor":false,"prefix":"","firstName":"Kyeong","middleName":"Rok","lastName":"Ryu","suffix":""},{"id":621975952,"identity":"96c017c9-fca5-4d29-8021-86c3c99a2440","order_by":5,"name":"Jiseul Kim","email":"","orcid":"","institution":"The University of Texas at Arlington","correspondingAuthor":false,"prefix":"","firstName":"Jiseul","middleName":"","lastName":"Kim","suffix":""}],"badges":[],"createdAt":"2026-04-08 04:23:36","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9351360/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9351360/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107338928,"identity":"1436fab5-e5ae-4118-93f3-aee2ec2c8a04","added_by":"auto","created_at":"2026-04-20 14:05:54","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":86146,"visible":true,"origin":"","legend":"\u003cp\u003eOverview of data collection and transformation workflow.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-9351360/v1/6a4c1f1ef3a024524bb59e9d.png"},{"id":107338929,"identity":"4ea45abf-427e-431c-9be7-87a3c265e27a","added_by":"auto","created_at":"2026-04-20 14:05:54","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":42152,"visible":true,"origin":"","legend":"\u003cp\u003eAccuracy of sentiment models across platforms.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-9351360/v1/b2b3e7ca3bcafd17974b69cd.png"},{"id":107486579,"identity":"dc85e17e-502d-4903-8536-85cc8aa807d7","added_by":"auto","created_at":"2026-04-22 02:38:23","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":797171,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9351360/v1/4d70140f-d148-40f9-8acf-0420da43beeb.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Benchmarking Transformer-Based NLP Models for Multi-Platform Public Engagement Analysis in Transportation Projects","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eLarge scale transportation and urban infrastructure projects influence mobility networks, environmental outcomes, and neighborhood wellbeing for decades, shaping access, exposure, and displacement pressures and making meaningful public engagement an essential component of equitable and legitimate planning practice. Extensive research emphasizes that public participation supports transparency, fairness, and informed and accountable decision making, particularly when communities are affected by impactful or controversial infrastructure proposals (Geekiyanage et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Rowe and Frewer, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2005\u003c/span\u003e). Stakeholder involvement is essential for recognizing concerns that shape long term project outcomes, since limited engagement often leads to mistrust, conflict, and project delays (Fitzpatrick and Sinclair, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2003\u003c/span\u003e; Li et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). Public participation is frequently described as a progression from informing and consulting to involving, collaborating, and empowering communities, and these stages are widely viewed as foundational for building trust and long-term project support (Babelon et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2021\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eConventional public engagement practices, such as public hearings, community workshops, surveys, and written comment submissions, have historically served as primary mechanisms for collecting public input in transportation and urban infrastructure projects (Rowe and Frewer, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2005\u003c/span\u003e). While these approaches provide structured avenues for participation, they often capture only a limited subset of perspectives because they rely on attendance, scheduled meetings, or formal submission processes, which can constrain inclusiveness and representation (Babelon et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Geekiyanage et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). As a result, valuable concerns from dispersed, marginalized, or less vocal groups may remain underrepresented, which contributes to incomplete understanding of public concerns and reduces perceived legitimacy of project decisions (Fitzpatrick and Sinclair, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2003\u003c/span\u003e). Despite this recognition, transportation and urban development projects often struggle to capture the full range of public perspectives during early phases, which can reduce legitimacy and amplify public resistance (Chow and Leiringer, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Xiao and Hao, \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). These limitations are especially consequential in transportation megaprojects, where governance structures are multi-level, timelines are long, and impacts are spatially concentrated. In such settings, early decisions can become path dependent and difficult to reverse once alternatives, delivery strategies, and right of way commitments are institutionalized (Flyvbjerg et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2003\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIn recent years, digital communication has expanded the landscape of public engagement. Social media platforms in particular generate large volumes of unstructured textual data that have been increasingly leveraged for transportation analysis, including the extraction of features from user-generated posts to support traffic prediction and system monitoring, and sentiment-based assessment of transportation policies and mobility trends using social media data (Acosta-Sequeda et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Sun et al., \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Tori et al., \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Yao and Qian, \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Communities now express their views increasingly through online participation platforms, news outlets, and social media, which collectively generate diverse and unstructured textual data (Hou and Lampe, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Reddick et al., \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). These inputs vary widely in formality, tone, length, and linguistic style, creating a heterogeneous environment that challenges traditional analytic approaches. This digital expansion complements but also complicates conventional comment processes by introducing significantly larger comment volumes and more distributed channels of communication. Consequently, the need for scalable analytical methods capable of summarizing concerns across multiple platforms has become increasingly apparent (Fu, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Ogryzek et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Ram and Titarenko, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). At the same time, digital channels are not inherently more representative. Participation can reflect unequal access, language barriers, differences in time availability, and mobilization dynamics that amplify some voices while muting others. For this reason, multi-source text analytics is best interpreted as complementary evidence rather than a substitute for deliberative engagement, with explicit attention to representativeness and distributional impacts (Blank and Lutz, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Mellon and Prosser, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2017\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eTo improve scalability, early computational approaches introduced text mining and classical natural language processing techniques into public sector and infrastructure contexts. Topic modeling methods such as latent Dirichlet allocation and non-negative matrix factorization have been used to identify themes within large civic and infrastructure datasets (Jelodar et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Maier et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; San Juan et al., \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). These approaches have contributed to understanding public concerns in transportation megaprojects, environmental schemes, and major water diversion programs (Xue et al., \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). However, classical models rely on bag of words representations that treat text as unordered tokens, limiting their ability to capture contextual meaning or semantic nuance (Chauhan and Shah, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Hagen, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). These limitations become particularly evident when analyzing short or noisy messages typical of online participation platforms (Chowdhury and Alzarrad, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Yin and Wang, \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Zhang and Feick, \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Sentiment analysis provides another perspective for assessing public attitudes. Prior studies have applied semantic and ontology-based sentiment analysis to transportation and city feature reviews in order to extract user perceptions related to safety and travel experience (Ali et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Foundational reviews categorize sentiment techniques into lexicon based, machine learning based, and hybrid methods (Medhat et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Taboada et al., \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). Lexicon based approaches are widely used because they require no labeled training data and can be applied quickly across public engagement datasets (Wan et al., \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Ying Wang et al., \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). However, they often struggle with domain specific terminology, conflicting sentiment polarity across different aspects of the same issue, and subtle evaluative expressions, which can lead to incomplete or misleading interpretations of public concerns in complex infrastructure debates (Geetha and Karthika Renuka, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Raghunathan and Saravanakumar, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). This limitation matters for engagement practice because expressions of negative sentiment can reflect substantively different concerns, such as displacement risk, construction disruption, safety, procedural distrust, or perceived inequity, each of which implies different response and mitigation strategies. Accordingly, analytic outputs are most useful when they support traceable issue categorization and institutional responsiveness, rather than only aggregate measures of positivity or negativity (Bryson et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Nabatchi and Amsler, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2014\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eWith advances in transformer models, natural language processing has gained significantly improved contextual understanding and semantic representation. More broadly, deep learning approaches have become central to transportation analytics, enabling the modeling of complex, high-dimensional relationships across diverse transportation systems (Yuan Wang et al., \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Models in the BERT family learn bidirectional context and consistently outperform earlier techniques across many language tasks (Gu et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Embedding based topic models such as BERTopic combine dense vector representations with clustering techniques to generate coherent topics even for short and informal text (Grootendorst, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Zhao et al., \u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). These methods have enabled new applications in public sector analytics, including analysis of government inquiry platforms, ecosystem service discussions, and cross model comparisons using real world social media data (Egger and Yu, 2022; Lu et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Tang et al., \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Reviews of natural language processing in construction and infrastructure research similarly show growing interest in advanced models due to their adaptability and ability to integrate information from multiple sources (Chung et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Ding et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Broader computational work highlights the relevance of graph attention, deep set architectures, and contextual learning for representing complex relationships in text (Do et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Zaheer et al., \u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Despite these developments, systematic benchmarking of classical and transformer-based models for transportation public engagement remains limited. Existing studies demonstrate the potential of text analytics for understanding public reactions to major transportation events and safety concerns (Zha et al., \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), but most rely on a single data source or a single analytical method. This restricts insight into how model performance varies across informal social media posts, semi formal news commentary, and long form public comments (Chowdhury and Alzarrad, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Rizun et al., \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Zhou and El-Gohary, \u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Reviews consistently emphasize that the lack of cross-platform, multi method evaluation limits generalizability and constrains agencies\u0026rsquo; ability to select suitable analytical tools (Maier et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Zeng et al., \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Zoghbi et al., \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). The lack of comparative evidence creates a clear methodological need. Classical and transformer-based models have rarely been evaluated side by side on matched public engagement datasets, and little is known about how model behavior shifts across platforms, writing styles, and document lengths. Without systematic benchmarking, agencies lack evidence-based guidance for choosing methods that balance accuracy, interpretability, robustness, and computational efficiency (Torres and De Picado-Santos, \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Zoghbi et al., \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Recent work in infrastructure text analytics and document classification further underscores the value of method selection frameworks that align analytical techniques with data characteristics and project needs (D\u0026rsquo;Orazio et al., \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). In public decision contexts, interpretability and auditability are critical considerations for engagement analysis, particularly when results are used to inform planning decisions and public communication. Analytical methods should enable transparent explanation of how themes emerge, how patterns vary across platforms or over time, and how recurring concerns can be systematically tracked and addressed within agency workflows (Kroll et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2017\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe present study addresses this need by benchmarking sentiment analysis and topic modeling approaches using multi-source public engagement data from a major transportation project. We compare transformer-based models, including DistilBERT and RoBERTa for sentiment classification and BERTopic for topic discovery, with classical baselines such as lexicon-based sentiment tools and probabilistic topic models. Our evaluation examines predictive performance, interpretability, cross-platform stability, and computational efficiency. Building on recent advances in infrastructure focused text analytics (Shamshiri et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2024a\u003c/span\u003e, \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2024b\u003c/span\u003e, \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), we propose a method selection framework that links data characteristics and analytical objectives to suitable natural language processing methods. The remainder of this paper describes the methodological design, presents comparative findings, and discusses implications for improving public engagement analysis in transportation planning. This study contributes to transportation data science by providing a systematic benchmarking framework for analyzing large-scale public engagement data using advanced natural language processing techniques.\u003c/p\u003e"},{"header":"2. Methodology","content":"\u003cp\u003eThis study evaluates the performance of classical, and transformer-based natural language processing models for the analysis of public engagement data collected from a large-scale transportation infrastructure project. The analytical framework incorporates two complementary tasks, sentiment classification and topic modeling, applied to three sources of public input: social media posts, online news articles, and formal hearing transcripts. These sources represent a wide spectrum of writing styles, communication norms, and levels of formality that are common in transportation public engagement. Their diversity allows for a thorough assessment of how different modeling approaches perform when applied to varied and complex text environments.\u003c/p\u003e \u003cp\u003eThe methodological design follows a clear sequence of stages. The first stage structures the text for analysis and compiles descriptive statistics to characterize linguistic and structural properties of each dataset. The second stage applies the analytical tasks using separate families of models. Transformer-based models, including DistilBERT and RoBERTa for sentiment classification and BERTopic for topic discovery, form the central focus because of their ability to capture context and represent semantic meaning with greater nuance. Classical approaches, including lexicon-based sentiment tools and probabilistic or non-negative matrix factorization topic models, are incorporated as comparative baselines. Model performance is assessed using quantitative and descriptive measures that capture predictive accuracy, topic quality, interpretability, cross-platform stability, and computational efficiency. Sentiment classifiers are evaluated using accuracy and F1-Micro score. Topic models are examined using coherence, diversity, and human interpretability assessments that reflect the clarity and usefulness of the extracted themes. Efficiency measures, including runtime and memory usage, extend the evaluation to practical considerations that are important for transportation agencies with limited analytical capacity. Cross-platform tests are conducted to examine how model behavior changes when applied to short informal social media posts, semi formal news articles, and long structured public comments. The following sections present each part of the analytical framework in detail.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Data Sources\u003c/h2\u003e \u003cp\u003eThis study develops its analytical basis from public engagement data extracted and curated for the North Houston Highway Improvement Project, a large transportation initiative that generated substantial public input across online and formal channels, as also depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Three major corpora were constructed through these efforts: Facebook posts and comments, online news articles, and public hearing related documents. Together, these sources provide a diverse representation of informal conversation, semi formal reporting, and detailed community feedback.\u003c/p\u003e \u003cp\u003eThe online datasets consist of Facebook content and news articles. Facebook records were collected from public pages between 2012 and 2023 through web scraping and search-based retrieval. The initial dataset contained 10,698 posts and comments. After filtering for project relevance using keywords such as \u0026ldquo;I 45 Project\u0026rdquo; and \u0026ldquo;North Houston Highway Improvement Project,\u0026rdquo; the final analyzable corpus includes 1,429 entries. News articles were collected from regional and national outlets between 2003 and 2023. The raw collection contained 8,150 articles, which were filtered to 1,170 project specific records. These online sources represent the informal and semi formal components of public engagement.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn addition to online sources, formal public engagement documents were obtained from the Texas Department of Transportation. These materials include scoping meeting notes, public meeting documentation, hearing transcripts, environmental impact statements, and technical assessment reports. The raw corpus comprised 6,329 documents in various formats, including handwritten letters, scanned pages, typed comments, and structured reports. These documents were processed using optical character recognition, noise removal, and content reconstruction methods. This transformation produced 1,287 structured files. After removing documents that contained no usable text or incomplete conversions, the final analyzable hearing corpus includes 1,285 records. These long form submissions capture detailed public concerns related to displacement, environmental justice, access, safety, and project design.\u003c/p\u003e \u003cp\u003eTogether, Facebook, news, and public hearing sub-datasets form a heterogeneous collection of text that reflects the broad range of communication styles present in contemporary public engagement. Facebook records contain short and informal expressions, news articles provide medium length journalistic narratives, and public hearing documents include structured submissions with varying levels of technical and policy detail. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e summarizes the public engagement data sources used in this study.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSummary of public engagement data sources used in this study.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eData Source\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCollection Period\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNumber of Records\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAverage Text Length\u003csup\u003e*\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eKey Characteristics\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFacebook Posts (FB)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2012\u0026ndash;2023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1,429\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eShort (\u0026lt;\u0026thinsp;40 words)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eInformal posts and comments; conversational tone; emojis and abbreviations; platform specific expressions\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNews Articles (NS)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2003\u0026ndash;2023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1,170\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMedium (200\u0026ndash;800 words)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSemi formal reporting; narrative structure; event-based coverage\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eScoping Meeting #2 Comments (SM2C)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2012\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e468\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMedium\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eEarly-stage public feedback; short written comments; general transportation and access concerns\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDocumentation of Public Hearing Comments (DPH)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2020\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e316\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMedium to long\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eFormal written submissions; project design, traffic, and accessibility concerns\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDraft Community Impacts and Cumulative Impacts Assessment Comments (DCIAC)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2020\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e124\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLong\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTechnical and policy focused responses; environmental and social impact discussions\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFinal Environmental Impact Statement Comments (FIESC)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2020\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e377\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLong\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eStructured formal submissions; detailed legal, environmental, and equity related concerns\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e* Average text length refers to the approximate word count after preprocessing, with text length categories defined as short (\u0026lt;\u0026thinsp;50 words), medium (200\u0026ndash;800 words), and long (\u0026gt;\u0026thinsp;1,000 words).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Data Preprocessing\u003c/h2\u003e \u003cp\u003eThe datasets described in Section 2.1 were prepared using a unified data processing strategy to support sentiment classification and topic modeling. Although the sources differ substantially in length, structure, and formality, the processing steps were standardized to ensure consistency across datasets and to allow fair comparison between classical and transformer-based approaches. Differences in dataset usage for sentiment analysis and topic modeling were handled at the modeling stage rather than through separate preprocessing pipelines. The online datasets were already available in machine-readable format but contained noise such as repeated entries and incomplete records. Public hearing documents required additional transformation because a large portion originated from scanned pages, handwritten notes, or mixed-format submissions. These items were converted to digital text using optical character recognition, and the resulting text files were screened to remove pages or documents that contained no usable content, producing the final analyzable corpus for public hearing materials. All remaining text across the datasets underwent standard cleaning and normalization sufficient to support reliable evaluation of classical baselines. Duplicate records were removed to avoid bias from repeated articles or shared posts, and non-English entries were excluded using automated language detection. Routine normalization was applied to reduce formatting-related variation across platforms. Although transformer-based models do not require extensive preprocessing, these steps were retained to enable accurate and fair comparison with lexicon-based sentiment tools and probabilistic topic models. The analysis therefore focuses on how intrinsic dataset characteristics shape model behavior across platforms. Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e summarizes the key characteristics of each dataset that are relevant to sentiment classification and topic modeling, providing context for interpreting cross-platform performance differences discussed in Section 3.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDataset characteristics relevant to sentiment classification and topic modeling.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDataset\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAnalytical Usage\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLinguistic and Structural Characteristics\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eDominant Sources of Variability\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eExpected Modeling Sensitivity\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDCIAC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTopic modeling\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLong, highly structured submissions with dense technical and policy-oriented content\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMulti-issue narratives and overlapping thematic discussion\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eProbabilistic topic models sensitive to topic overlap; contextual embeddings improve separation of related themes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFIESC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTopic modeling\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFormal institutional documents with consistent terminology and structured argumentation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLow linguistic noise and stable vocabulary\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eClassical topic models perform reliably; transformer-based models capture subtle thematic nuance\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSM2C\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTopic modeling and sentiment analysis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHighly informal conversational text with frequent sentiment shifts and nonstandard expressions\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eColloquial language, abbreviations, platform-specific phrasing, sentiment wavering within single comments\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eLexicon-based sentiment tools sensitive to mixed sentiment; transformer models benefit from contextual representation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDPH\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTopic modeling and sentiment analysis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEdited narrative text with structured reporting and evaluative framing\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eImplicit stance and subtle tonal variation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eClassical models perform consistently; transformer-based models capture nuanced sentiment and stance\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFB\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTopic modeling and sentiment analysis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eInformal posts combining short statements and fragmented discussion\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eHigh topical diversity and inconsistent structure\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTopic fragmentation risk for LDA and NMF; embedding-based models produce more coherent topics\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTopic modeling and sentiment analysis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSemi-formal articles with narrative structure and moderate lexical consistency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEvent-focused framing with limited slang\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eStable topic formation across models; transformers improve cross-platform generalization\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Sentiment Analysis Models\u003c/h2\u003e \u003cp\u003eSentiment analysis was conducted using transformer-based models as the primary analytical approach, supported by concise classical baselines for contextual comparison. The goal of this component was to evaluate the ability of modern natural language processing methods to classify public attitudes expressed across Facebook posts, news articles, and hearing transcripts.\u003c/p\u003e \u003cp\u003eTransformer-based modeling relied on DistilBERT and RoBERTa, both of which provide contextual word representations that support fine grained sentiment classification. Fine tuning was performed using pretrained checkpoints with the output layer adapted for a three-class sentiment scheme representing positive, neutral, and negative attitudes. Training used the Adam optimizer with commonly adopted learning rate and batch size settings. Early stopping based on validation loss was applied to prevent overfitting and to ensure stable convergence across multiple training runs. The dataset was divided into separate subsets using an eighty percent training split, a ten percent validation split, and a ten percent test split to support consistent evaluation across all models. Input text was tokenized using the respective tokenizer of each model and truncated to a maximum length of 512 tokens to accommodate variation in document size across platforms. Model outputs were mapped to the final sentiment categories and later aggregated for evaluation using accuracy and F1-Micro as described in Section 2.6. Transformer-based methods were applied using a consistent training and evaluation setup across datasets. Classical sentiment methods, including Valence Aware Dictionary and Sentiment Reasoner (VADER), TextBlob, and AFINN, were incorporated as minimal baselines to contextualize the performance of transformer-based models. AFINN is a lexicon-based sentiment analysis approach that assigns predefined sentiment scores to words and aggregates them to determine overall polarity. These lexicon-based approaches have been commonly employed in earlier public engagement and infrastructure research because they require no training data and can be applied directly to raw text. However, their reliance on predefined sentiment dictionaries limits their ability to interpret context dependent expressions, domain specific vocabulary, or mixed sentiment. Their inclusion therefore serves primarily to illustrate the performance gap between traditional sentiment tools and modern contextual models rather than competing with transformer-based methods. Model capacity assumptions and configuration settings are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSummary of model capacities and assumptions for sentiment classification and topic modeling.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMaximum Input Handling\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eOutput Representation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eProcessing Logic\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePretrained Basis\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eReference\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDistilBERT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSupports sequences up to 512 tokens\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eThree sentiment classes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eContextual representation and fine tuning\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDistilBERT base checkpoint\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Sanh et al., \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2020\u003c/span\u003e)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRoBERTa\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSupports sequences up to 512 tokens\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eThree sentiment classes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eContextual representation and fine tuning\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRoBERTa base checkpoint\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Liu et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2019\u003c/span\u003e)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBERTopic\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVariable document lengths depending on embedding size\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTopic clusters\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eUnsupervised contextual topic discovery\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSentence transformer embeddings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Grootendorst, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2022\u003c/span\u003e)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVADER\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo input length restriction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePolarity score\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLexicon rule-based scoring\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eBuilt in sentiment lexicon\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Hutto and Gilbert, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2014\u003c/span\u003e)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTextBlob\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo input length restriction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePolarity classification\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLexicon rule-based scoring\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eBuilt in sentiment lexicon\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(DeSmedt and Daelemans, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2012\u003c/span\u003e)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4. Topic Modeling\u003c/h2\u003e \u003cp\u003eTopic modeling was applied to identify the main themes expressed across Facebook posts, news articles, and public hearing submissions. This section describes the transformer-based and classical topic modeling approaches used in the study, together with the procedures for topic extraction, coherence evaluation, and interpretability assessment. Transformer-based modeling relied on BERTopic, which combines contextual embeddings with density-based clustering to generate coherent and interpretable topics. BERTopic first transforms each document into a dense semantic vector using all MiniLM L6 v2 embedding model. Dimensionality is then reduced with UMAP, and clusters are formed using HDBSCAN, which allows topics to emerge naturally without requiring a predefined number of clusters. The final topic representations are constructed using class-based term frequency inverse document frequency weighting, producing human interpretable topic descriptions supported by semantically meaningful document groups. These capabilities make BERTopic well suited for mixed length text and informal public engagement data.\u003c/p\u003e \u003cp\u003eFor comparison, two classical baselines were included: latent Dirichlet allocation and non-negative matrix factorization. Both methods rely on bag of words or TF-IDF representations of text and require the number of topics to be predefined. Topic quality was evaluated through coherence and interpretability ratings, and topic counts were selected based on the combination of coherence stability and manual inspection of semantic clarity. Including these classical models provides a minimal but necessary point of reference, allowing the study to isolate the contribution of contextual embeddings within the same engagement datasets. The configuration settings and key modeling parameters for BERTopic, LDA, and NMF are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eConfiguration settings for BERTopic, LDA, and NMF topic models.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eParameter\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBERTopic\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLDA (baseline)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNMF (baseline)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEmbedding model\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eall MiniLM L6 v2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eBag of words\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTF IDF\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDimensionality reduction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUMAP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClustering algorithm\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHDBSCAN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMinimum cluster size\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDetermined empirically\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTopic representation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eClass based TF IDF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTop word probabilities\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTop TF IDF weights\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVectorizer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNone (uses embeddings)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCount vectorizer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTF IDF vectorizer\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNumber of topics\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDetermined by iterating over parameters to maximize coherence and interpretability\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDetermined by iterating over parameters to maximize coherence and interpretability\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eDetermined by iterating over parameters to maximize coherence and interpretability\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCoherence evaluation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003ec\u003c/em\u003e\u003csub\u003e\u003cem\u003ev\u003c/em\u003e\u003c/sub\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003ec\u003c/em\u003e\u003csub\u003e\u003cem\u003ev\u003c/em\u003e\u003c/sub\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003ec\u003c/em\u003e\u003csub\u003e\u003cem\u003ev\u003c/em\u003e\u003c/sub\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFor BERTopic, clustering parameters were selected empirically based on topic coherence and interpretability to ensure consistency of observed topic structures across datasets.\u003c/p\u003e \u003cp\u003eOnce topics were generated, the study applied a structured evaluation procedure to compare coherence, diversity, interpretability, and stability across the three datasets. Transformer-based models were expected to perform especially well on short and informal text, while classical models were anticipated to produce more stable topics in long form hearing transcripts. These assumptions were evaluated in Section 3 through quantitative and qualitative analysis.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.5. Hybrid Workflows\u003c/h2\u003e \u003cp\u003eHybrid workflows were explored as an extension of the primary sentiment and topic modeling approaches to determine whether simple combinations of classical and transformer-based methods could provide additional stability or interpretability without significantly increasing computational demand. These workflows are motivated by practical considerations observed in public engagement analysis, where agencies often operate under resource constraints and may benefit from strategies that leverage the strengths of multiple analytical techniques.\u003c/p\u003e \u003cp\u003eFor sentiment analysis, a two-stage workflow was tested in which a lexicon-based model provided an initial classification that served as a screening layer before applying transformer-based refinement. In this structure, lexicon scores identified comments with clear evaluative polarity, while borderline or ambiguous cases were reclassified using DistilBERT or RoBERTa. This approach allowed the transformer models to focus computational effort on comments where contextual interpretation was most needed. The objective of this workflow was not to outperform transformer-based models alone but to evaluate whether lexicon assisted triage could improve efficiency while maintaining classification quality. For topic modeling, a complementary strategy was applied in which classical models provided an initial thematic structure that informed transformer-based topic extraction. Latent Dirichlet allocation and non-negative matrix factorization were used to generate preliminary topic word distributions. These topic terms were then used to guide BERTopic initialization by seeding cluster centroids with anchor terms identified from the classical models. This procedure aimed to improve topic stability and interpretability, particularly for long and formally written hearing transcripts where classical models often capture high level themes effectively.\u003c/p\u003e \u003cp\u003eHybrid workflows were evaluated using the same performance metrics applied to the standalone models, including accuracy and F1-Micro for sentiment, coherence and diversity for topics, and runtime and memory consumption for efficiency. The goal was not to replace transformer-based methods but to assess whether these simple integration strategies could offer marginal gains in interpretability or computational performance. The results of these evaluations are presented in Section 3 and are used to refine the method selection framework presented in Section 3.5.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e2.6. Evaluation Framework\u003c/h2\u003e \u003cp\u003eThe evaluation framework was designed to compare the performance of classical and transformer-based models across sentiment analysis and topic modeling tasks and to assess how consistently these methods operate across the three engagement platforms. The framework integrates quantitative accuracy-based metrics, qualitative interpretability assessments, and computational efficiency measurements, allowing for a comprehensive comparison of model behavior under realistic conditions. For sentiment analysis, model outputs were evaluated using accuracy and F1-Micro. F1-Micro was treated as the primary indicator of performance because it provides a balanced measure across sentiment classes, including minority categories, which is essential for public engagement datasets where negative comments may be less frequent but highly important. These metrics were used to examine whether models produced balanced predictions across the full range of public responses. These metrics were computed for each dataset individually and for combined evaluations to assess cross-platform robustness. For topic modeling, both coherence and diversity were used to quantify the quality of generated topics. Coherence was measured using the \u003cem\u003ec\u003c/em\u003e\u003csub\u003e\u003cem\u003ev\u003c/em\u003e\u003c/sub\u003e metric, which captures semantic similarity among the most representative terms within each topic.\u003c/p\u003e \u003cp\u003eThe \u003cem\u003ec\u003c/em\u003e\u003csub\u003e\u003cem\u003ev\u003c/em\u003e\u003c/sub\u003e coherence score is computed as the average normalized pointwise mutual information (NPMI) between pairs of top-ranked topic words, as defined in Equations (\u003cspan refid=\"Equ1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) and (\u003cspan refid=\"Equ2\" class=\"InternalRef\"\u003e2\u003c/span\u003e):\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:{c}_{v}\\left(T\\right)=\\frac{1}{\\left|P\\right|}\\sum\\:_{({w}_{i},{w}_{j})\\in\\:P}\\text{N}\\text{M}\\text{P}\\text{I}({w}_{i},{w}_{j})$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:\\text{N}\\text{M}\\text{P}\\text{I}\\left({w}_{i},{w}_{j}\\right)=\\frac{\\text{log}\\left(\\frac{P\\left({w}_{i},{w}_{j}\\right)}{P\\left({w}_{i}\\right)P\\left({w}_{j}\\right)}\\right)}{-\\text{log}P({w}_{i},{w}_{j})}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eWhere \u003cem\u003eP(w\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e\u003cem\u003e)\u003c/em\u003e and \u003cem\u003eP(w\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e,\u003cem\u003ew\u003c/em\u003e\u003csub\u003e\u003cem\u003ej\u003c/em\u003e\u003c/sub\u003e\u003cem\u003e)\u003c/em\u003e denote the marginal and joint probabilities of word occurrences estimated from a reference corpus. Topic diversity was assessed through the proportion of unique terms across the full set of topics, providing insight into whether models generated distinct themes or produced repetitive or overlapping topics. In addition to quantitative scores, topic quality was assessed through manual interpretability ratings. Two independent reviewers examined the semantic clarity and internal consistency of topics, assigning interpretability scores based on predefined criteria. This combined approach ensured that topics were evaluated not only through automatic measures but also through human level understanding, which is essential for practical use in engagement analysis. Computational performance was evaluated through runtime and memory consumption. Runtime was measured for both training and inference stages to account for differences in workload across models. Memory usage was monitored to compare the resource demands of embedding based models with classical baselines. These measurements were performed under identical computational conditions to ensure comparability. Assessing efficiency is important because transportation agencies often operate under constrained computing environments and may require methods that balance performance with practical limitations. Finally, cross-platform evaluation was conducted to determine how model behavior varies across Facebook posts, news articles, and hearing transcripts. For this analysis, models were trained and tested on each platform separately and then applied across platforms to observe the effects of domain shift. This approach provides insight into the extent to which models trained on one type of engagement data can generalize to other forms, a consideration that is particularly important for agencies analyzing input collected through multiple channels. These evaluation components allow for a detailed comparison of the strengths and limitations of classical and transformer-based methods. The results of this framework are reported in Section 3 and are used to support the development of the method selection framework presented in Section 3.5.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Results and Discussion","content":"\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Overview of Results\u003c/h2\u003e \u003cp\u003eThis section presents the comparative results of the sentiment analysis and topic modeling methods applied to Facebook posts, news articles, and public hearing transcripts. Across all platforms, transformer-based models demonstrated stronger performance than classical baselines, with particularly large gains observed for short and informal text. DistilBERT and RoBERTa produced higher accuracy and F1-Micro scores on sentiment classification, while BERTopic generated more coherent and diverse topics than latent Dirichlet allocation or non-negative matrix factorization. These patterns were consistent across the three datasets, although differences in text length and formality influenced the magnitude of improvement. In addition to performance gains, transformer-based methods showed greater robustness to cross-platform variation. Models trained on one dataset transferred more effectively to others when using contextual embeddings, while classical models exhibited substantial declines in performance under domain shift. Hybrid workflows provided modest improvements in efficiency and interpretability for specific cases but did not exceed the standalone transformer-based models in overall performance. The findings from these analyses support the development of a practical method selection framework, which is presented in Section 3.5. The following subsections detail the results for sentiment analysis and topic modeling, examine cross-platform behavior, and discuss methodological tradeoffs and practical implications.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Sentiment Analysis Results\u003c/h2\u003e \u003cp\u003eSentiment classification performance is examined across engagement datasets summarized in Section 2.1. Across all platforms, transformer-based models showed consistently stronger sentiment classification performance than classical baselines. Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e summarizes the quantitative evaluation results, including accuracy and F1-Micro for RoBERTa, DistilBERT, VADER, AFINN, and TextBlob on Facebook posts, news articles, and public hearing transcripts. Across all datasets, RoBERTa achieved the highest performance, followed by DistilBERT. The largest performance gap appeared in the Facebook dataset, where comments are short, informal, and often contain expressions characteristic of informal social media discourse, including abbreviated phrasing and mixed evaluative cues, which challenge lexicon-based tools. These performance patterns reflect systematic differences in text length, structure, and expressive style across engagement sources and are consistent with the topic modeling results discussed in the following section.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSentiment classification performance across platforms.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMetric\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDataset\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTextBlob\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAFINN\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eVADER\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eDistilBERT\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eRoBERTa\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eF1-Micro\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFB\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e15.35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e32.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e26.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e46.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e58.69\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e40.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e44.71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e47.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e45.71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e70.32\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSM2C\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e31.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e39.56\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e36.08\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e53.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e70.43\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDPH\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e30.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e37.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e33.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e53.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e66.08\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFB\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e37.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e50.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e44.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e72.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e87.50\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e29.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e61.90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e63.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e71.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e83.33\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSM2C\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e30.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e42.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e38.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e72.22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e86.69\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDPH\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e37.50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e45.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e31.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e83.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e86.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTransformer-based models provided a substantial advantage when sentiment depended on contextual cues, mixed expressions, sarcasm, or platform specific phrasing. In the Facebook dataset, both RoBERTa and DistilBERT successfully detected negative sentiment in posts containing blended emotional and informational content. Classical models, particularly TextBlob and VADER, frequently misclassified these posts as neutral due to averaging effects and limited contextual understanding. Performance differences were smaller for news articles, reflecting the more formal and consistent writing style. Even so, transformer models demonstrated higher overall classification performance in capturing evaluative language, quotations, or editorials expressing concerns about safety, displacement, or environmental impacts. Public hearing transcripts exhibited patterns like Facebook comments, though with distinctive challenges. Speakers often shift rapidly between technical descriptions, procedural information, and personal narratives. Transformer models were able to capture these transitions more effectively, while classical models struggled with long, multi layered discourse. To visually illustrate these differences, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e presents a bar chart of accuracy across the four models and three datasets. The figure highlights the consistent superiority of transformer-based methods, with the largest improvements observed for Facebook posts and hearing transcripts.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAlthough classical sentiment models underperformed relative to transformer-based approaches, their behavior varied systematically across datasets with different linguistic characteristics. Lexicon-based tools showed reasonable stability when applied to highly neutral or descriptive text, such as short news updates or administrative statements, where sentiment cues are explicit and linguistic variability is limited. In contrast, their performance degraded substantially for informal and conversational datasets characterized by sentiment wavering, implicit evaluation, or mixed expressions within a single comment, such as social media discussions and public feedback submissions. Transformer-based models demonstrated greater robustness across these heterogeneous conditions by adapting to differences in text length, structure, and expressive style. Their advantage was most pronounced for datasets exhibiting nonstandard language use, overlapping sentiments, or contextual dependencies, where fixed sentiment dictionaries failed to capture evaluative meaning. These results indicate that model selection for sentiment analysis in transportation engagement should be guided by dataset characteristics rather than defaulting to a single approach. In practice, lexicon-based models may remain suitable for rapid screening of formal or neutral text, while transformer-based models are better suited for analyzing complex, informal, and context-rich public discourse. This dataset-specific performance distinction provides the foundation for the topic modeling analysis presented in Section 3.3, where similar relationships between text characteristics and model behavior are examined.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e3.3. Topic Modeling Results\u003c/h2\u003e \u003cp\u003eTopic modeling performance was evaluated across the three public engagement datasets to compare classical frequency-based approaches, including latent Dirichlet allocation and non-negative matrix factorization, with an embedding-based topic modeling framework, BERTopic. Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e summarizes model behavior using topic coherence, diversity, interpretability, and sensitivity to platform characteristics. Across datasets, topic quality varied systematically with text length, linguistic structure, and thematic complexity, highlighting the importance of aligning topic modeling approaches with dataset characteristics rather than relying on a single method.\u003c/p\u003e \u003cp\u003eFacebook posts represent a challenging environment for topic modeling due to short message length, informal language, inconsistent grammar, and frequent topic blending within individual posts. Under these conditions, classical frequency-based models exhibited notable limitations. As reflected in Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, latent Dirichlet allocation often produced broad or indistinct topics that merged unrelated concepts, particularly when posts combined project references with personal commentary, safety concerns, or emotional reactions. Non-negative matrix factorization improved topic separation relative to LDA by emphasizing additive word components, but it remained sensitive to fragmented phrasing and inconsistent vocabulary. BERTopic demonstrated stronger performance on Facebook data by forming topics based on semantic similarity rather than raw token co-occurrence. By clustering contextual document embeddings, BERTopic produced topics with clearer internal consistency and reduced overlap, even when users expressed similar concerns using different linguistic expressions. This resulted in more interpretable themes that better reflected recurring discussion patterns within informal social media discourse.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eTopic modeling behavior across engagement platforms.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDataset\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTopic Structure\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTopic Distinctiveness\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eInterpretability\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eCross-Platform Behavior\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eKey Observations\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eLDA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFacebook\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eBroad, merged clusters\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eOverlapping themes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDifficult to interpret\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eSensitive to informal text\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eDominance of frequent but unrelated terms\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNews\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStable clusters\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePartially overlapping\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eModerately interpretable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eStable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eTopics align with reporting categories\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePublic hearings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eOverly broad clusters\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLow separation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eLow interpretability\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eSensitive to document length\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eMulti-issue statements merged\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eNMF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFacebook\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSharper partitions\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eModerate separation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eModerately interpretable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eSensitive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eImproved boundaries but phrasing-dependent\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNews\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStructured clusters\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eModerate separation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eModerately interpretable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eStable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eAdditive themes with some redundancy\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePublic hearings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFragmented clusters\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eModerate separation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eModerately interpretable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eModerately sensitive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eOver-splitting layered concerns\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eBERTopic\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFacebook\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSemantically cohesive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eDistinct themes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHighly interpretable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eStable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eCaptures informal and varied expressions\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNews\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCompact clusters\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eDistinct themes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHighly interpretable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eStable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eDifferentiates framing and emphasis\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePublic hearings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eWell-separated themes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eDistinct themes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHighly interpretable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eStable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eCaptures multi-issue narratives\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eNews articles exhibited more structured writing style, longer documents, and more stable vocabulary, creating a more favorable environment for classical topic models. Both LDA and NMF generated moderately coherent topics aligned with common reporting themes such as environmental impacts, traffic conditions, and project financing. However, Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e indicates that classical models still exhibited partial redundancy, with closely related topics split across multiple clusters or overlapping in content. BERTopic reduced this redundancy by forming tighter semantic groupings that distinguished differences in framing within news coverage, such as procedural project updates versus articles emphasizing community impacts or policy implications. Although the relative advantage of embedding-based modeling was less pronounced for news articles than for Facebook posts, BERTopic provided more compact and distinct topic structures, improving interpretability without relying solely on token frequency.\u003c/p\u003e \u003cp\u003ePublic hearing transcripts produced the strongest contrast among topic modeling approaches. These documents often contain long, multi-issue statements in which speakers shift between technical critique, personal testimony, and procedural commentary within a single submission. In this setting, LDA frequently generated overly broad topics that merged distinct concerns or produced clusters driven by high-frequency connective terms. NMF yielded somewhat clearer partitions but tended to over-split topics when multiple issues were layered within the same statement. BERTopic demonstrated the most robust performance for hearing transcripts, consistently producing well-structured topics with clear semantic boundaries. As summarized in Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, embedding-based clustering enabled BERTopic to capture recurring engagement themes such as displacement narratives, environmental justice concerns, safety critiques, and procedural issues, even when vocabulary varied across speakers and documents. This stability reflects the advantage of semantic representations for handling long, heterogeneous text that contains multiple co-occurring themes.\u003c/p\u003e \u003cp\u003eThese results indicate that topic modeling performance depends strongly on dataset characteristics. Classical frequency-based models retain value for structured corpora with consistent writing style and stable vocabulary, such as edited news articles, where dominant themes are well represented through token co-occurrence. However, for datasets characterized by informal language, fragmented structure, or multi-issue narratives, embedding-based topic modeling provides more coherent topics and stronger interpretability. These findings underscore the importance of selecting topic modeling approaches based on the linguistic and structural properties of engagement data rather than defaulting to a single method. The performance patterns of each dataset identified here form the basis for the cross-platform robustness analysis presented in Section 3.4 and inform the practice-oriented model selection framework proposed in Section 3.5.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e3.4. Cross-Platform Robustness and Trade-offs\u003c/h2\u003e \u003cp\u003eA core objective of this study was to evaluate how classical and transformer-based models perform across distinct public engagement datasets and to assess how dataset characteristics influence model behavior. The three datasets analyzed represent markedly different linguistic and structural environments. Facebook posts are short, informal, and conversational, often combining emotional tone with fragmented information. News articles are formal, edited, and information centered, with relatively stable vocabulary and grammatical structure. Public hearing sub-datasets contain longer, multi-issue statements that integrate technical discussion, personal experience, and evaluative language. On Facebook data, transformer-based sentiment classifiers demonstrated clear advantages over lexicon-based approaches. DistilBERT and RoBERTa were better able to interpret abbreviated expressions, platform-specific language, and mixed evaluative cues, whereas lexicon-based tools frequently misclassified sentiment due to limited context and vocabulary mismatch. Topic modeling results showed a similar pattern. BERTopic generated coherent and interpretable themes despite short document length, while LDA and non-negative matrix factorization often produced fragmented or redundant topics. For news articles, performance differences between modeling approaches were less pronounced. Lexicon-based sentiment tools performed reasonably well due to the formal writing style and consistent vocabulary, while transformer-based models provided modest improvements in capturing nuanced evaluative language. In topic modeling, both BERTopic and classical probabilistic models produced coherent topics, although BERTopic showed greater stability when applied across different news subsets.\u003c/p\u003e \u003cp\u003ePublic hearing sub-datasets posed the greatest analytical challenge due to document length, multi-issue structure, and technical content. Transformer-based models consistently outperformed classical approaches in this setting. Sentiment classifiers were able to capture subtle evaluative shifts within long submissions, while lexicon-based tools struggled with polarity dilution and conflicting sentiment cues. BERTopic produced structured and semantically meaningful topics aligned with policy, environmental, and community concerns, whereas LDA and matrix factorization methods exhibited sensitivity to document heterogeneity and often merged unrelated themes. These dataset-specific comparisons highlight important methodological trade-offs. Transformer-based models offer robust performance across heterogeneous text environments but require greater computational resources. Classical models retain practical value in constrained settings, particularly for rapid screening of formal text such as news articles. Effective model selection therefore depends on dataset characteristics, analytical objectives, and resource availability rather than reliance on a single universal approach. The next section builds on these findings by presenting a method selection framework to guide transportation agencies in choosing appropriate analytical tools for diverse public engagement data.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e3.5. Practical Implications, Time-Dependent Evaluation, and Application Framework\u003c/h2\u003e \u003cp\u003eThe results of this study offer several practical implications for transportation agencies that rely on public participation to inform planning and decision making. The comparative evaluation demonstrates that transformer-based models provide the most reliable performance across diverse engagement platforms, especially when public comments contain blended emotional and factual expressions. Classical approaches remain useful for rapid screening or highly neutral content, but their limited contextual sensitivity restricts their utility for comprehensive engagement analysis. These findings highlight the importance of selecting analytical techniques that match the linguistic complexity and communication style of the underlying data. Building on these results, the study proposes a practical framework to guide analysts in choosing appropriate sentiment and topic modeling methods for different scenarios encountered in transportation project engagement. The framework is designed to help practitioners balance accuracy, interpretability, and computational demands while also accounting for the temporal dynamics of public opinion. The approach is grounded in the structure of the datasets examined in this study and reflects patterns observed across the three platforms. A central element of the framework involves examining how public sentiment and topic distributions shift in response to key project events. Major project milestones often trigger changes in tone and thematic emphasis, and identifying these shifts can help agencies understand when concerns escalate, when confusion declines, and when communication gaps remain unresolved. To support this analysis, sentiment and topic outputs can be evaluated across time windows associated with specific project activities. This enables agencies to monitor not only overall sentiment but also the emergence or decline of themes such as traffic, displacement, environmental justice, or drainage concerns. The analytical structure evaluates changes in public responses by linking model outputs to project events that historically generated shifts in engagement for the NHHIP case. These events include the release of environmental impact documentation, redesign announcements for specific segments, temporary project pauses, and periods of intensified media coverage. For each event, a baseline interval is defined using comments, posts, and articles collected in the weeks or months prior to the announcement. Sentiment and dominant topics are then extracted for each platform using RoBERTa, DistilBERT, and BERTopic, allowing the models to characterize the tone and thematic structure of public opinion under stable conditions.\u003c/p\u003e \u003cp\u003eThe same analytical steps are applied to the period after each event, producing matched sentiment scores and topic distributions that reflect the immediate reaction. Comparing these pre-event and post event results reveals how public concerns evolve as project conditions change. Facebook comments frequently show increases in negative sentiment following announcements related to property acquisition or relocation, with topic models identifying stronger emphasis on displacement, neighborhood fragmentation, and community cohesion. News articles present more balanced shifts, often highlighting policy debates, technical explanations, and agency responses. Hearing transcripts capture longer form, emotionally rich statements where concerns about air quality, drainage, and community health intensify after major environmental disclosures. These cross-platform differences demonstrate that sentiment shifts are not uniform but depend on audience characteristics, message framing, and the communication channel through which information circulates. This structured comparison does not aim to predict future sentiment but provides a diagnostic tool that identifies which factors drive changes in public opinion. By examining how topic prevalence and sentiment scores shift within specific themes, agencies can determine whether a project milestone amplified existing concerns, introduced new issues, or improved clarity. The method also helps distinguish temporary sentiment spikes driven by social media amplification from sustained concerns that appear across multiple platforms and persist over time. This evidence-based approach supports more informed decision making by allowing agencies to evaluate whether outreach strategies or project modifications effectively address public needs. Taken together, the findings and the analytical framework offer transportation planners a structured way to choose modeling methods and to interpret temporal shifts in public engagement. Transformer-based models are recommended for comprehensive analysis of sentiment and topic patterns, while classical tools can complement them in settings where efficiency and rapid inspection are priorities. The framework also emphasizes the value of integrating time series insights to understand how public concerns evolve, providing agencies with a practical and scientifically grounded guide for analyzing and responding to public input in large infrastructure projects.\u003c/p\u003e \u003c/div\u003e"},{"header":"4. Conclusions","content":"\u003cp\u003eThis study demonstrates that transformer-based natural language processing models provide a more reliable and robust foundation for interpreting public engagement in transportation planning than traditional lexicon-based or probabilistic approaches. Benchmarking sentiment analysis and topic modeling methods across Facebook posts, news articles, and public hearing records shows consistent improvements in classification performance, topic coherence, interpretability, and cross-platform robustness for models such as RoBERTa, DistilBERT, and BERTopic. These advantages are most evident in informal and context-rich settings, where public comments combine emotional expression, factual content, and multiple concerns within a single statement, challenging dictionary-based and bag-of-words methods. At the same time, classical models retain value for rapid screening and in contexts where transparency and computational simplicity are required. The results highlight that no single method is universally optimal; instead, effective analysis depends on aligning modeling approaches with data characteristics and analytical objectives. By providing a systematic and cross-platform comparison within a real transportation project, this study establishes an evidence-based foundation for selecting appropriate text analytics methods in public sector applications. The proposed framework offers practical guidance for transportation agencies to integrate advanced language models into engagement workflows, improving the identification and tracking of public concerns while supporting more transparent and data-driven decision making. Although the analysis is based on a single case study, the findings are transferable to other infrastructure contexts that rely on large-scale and heterogeneous public input. Future research can extend this framework by incorporating multimodal data sources, such as images or geospatial information, and by exploring real-time monitoring systems to support adaptive and responsive engagement strategies.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eCRediT Authorship Contribution Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAlireza Shamshiri:\u0026nbsp;\u003c/strong\u003eData curation, Methodology, Validation, Writing – original draft, Writing – review \u0026amp; editing.\u003cstrong\u003e\u0026nbsp;Mahdi Jaberizadeh:\u0026nbsp;\u003c/strong\u003eData curation, Methodology, Validation, Writing – original draft, Writing – review \u0026amp; editing.\u003cstrong\u003e\u0026nbsp;Shah Salah Uddin Chowdhury:\u0026nbsp;\u003c/strong\u003eWriting \u003cstrong\u003e–\u0026nbsp;\u003c/strong\u003ereview \u0026amp; editing\u003cstrong\u003e\u0026nbsp;Mahdis Hamisi:\u0026nbsp;\u003c/strong\u003eWriting \u003cstrong\u003e–\u0026nbsp;\u003c/strong\u003ereview \u0026amp; editing.\u003cstrong\u003e\u0026nbsp;Kyeong Rok Ryu\u003c/strong\u003e: Conceptualization, Methodology, Resources, Supervision, Validation, Visualization, Writing – review and editing. \u003cstrong\u003eJiseul Kim:\u003c/strong\u003e Conceptualization, Supervision, Writing – review and editing.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclaration of Competing Interest\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that no specific funding was received for this study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe data used in this study are derived from publicly available sources, including social media content, news articles, and publicly accessible transportation project documents. Processed datasets may be made available by the authors upon reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe code used for analysis in this study is available from the corresponding author upon reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eAcosta-Sequeda, J., Mohammadi, M., Patipati, S., Mohammadian, A., Derrible, S., 2024. Estimating Telecommuting Rates in the USA Using Twitter Sentiment Analysis. Data Sci. Transp. 6, 28. https://doi.org/10.1007/s42421-024-00114-0\u003c/li\u003e\n \u003cli\u003eAli, F., Kwak, D., Khan, P., Islam, S.M.R., Kim, K.H., Kwak, K.S., 2017. Fuzzy ontology-based sentiment analysis of transportation and city feature reviews for safe traveling. Transportation Research Part C: Emerging Technologies 77, 33\u0026ndash;48. https://doi.org/10.1016/j.trc.2017.01.014\u003c/li\u003e\n \u003cli\u003eBabelon, I., P\u0026aacute;nek, J., Falco, E., Kleinhans, R., Charlton, J., 2021. Between Consultation and Collaboration: Self-Reported Objectives for 25 Web-Based Geoparticipation Projects in Urban Planning. IJGI 10, 783. https://doi.org/10.3390/ijgi10110783\u003c/li\u003e\n \u003cli\u003eBlank, G., Lutz, C., 2017. Representativeness of Social Media in Great Britain: Investigating Facebook, LinkedIn, Twitter, Pinterest, Google+, and Instagram. American Behavioral Scientist 61, 741\u0026ndash;756. https://doi.org/10.1177/0002764217717559\u003c/li\u003e\n \u003cli\u003eBryson, J.M., Quick, K.S., Slotterback, C.S., Crosby, B.C., 2013. Designing Public Participation Processes. Public Administration Review 73, 23\u0026ndash;34. https://doi.org/10.1111/j.1540-6210.2012.02678.x\u003c/li\u003e\n \u003cli\u003eChauhan, U., Shah, A., 2022. Topic Modeling Using Latent Dirichlet allocation: A Survey. ACM Computing Surveys 54, 1\u0026ndash;35. https://doi.org/10.1145/3462478\u003c/li\u003e\n \u003cli\u003eChow, V., Leiringer, R., 2020. The Practice of Public Engagement on Projects: From Managing External Stakeholders to Facilitating Active Contributors. Project Management Journal 51, 24\u0026ndash;37. https://doi.org/10.1177/8756972819878346\u003c/li\u003e\n \u003cli\u003eChowdhury, S., Alzarrad, A., 2023. Applications of Text Mining in the Transportation Infrastructure Sector: A Review. Information 14, 201. https://doi.org/10.3390/info14040201\u003c/li\u003e\n \u003cli\u003eChung, S., Moon, S., Kim, Junghoon, Kim, Jungyeon, Lim, S., Chi, S., 2023. Comparing natural language processing (NLP) applications in construction and computer science using preferred reporting items for systematic reviews (PRISMA). Automation in Construction 154, 105020. https://doi.org/10.1016/j.autcon.2023.105020\u003c/li\u003e\n \u003cli\u003eDeSmedt, T., Daelemans, W., 2012. Pattern for Python. Journal of Machine Learning Research 13, 2063\u0026ndash;2067.\u003c/li\u003e\n \u003cli\u003eDing, Y., Ma, J., Luo, X., 2022. Applications of natural language processing in construction. Automation in Construction 136, 104169. https://doi.org/10.1016/j.autcon.2022.104169\u003c/li\u003e\n \u003cli\u003eDo, K., Tran, T., Nguyen, T., Venkatesh, S., 2018. Attentional Multilabel Learning over Graphs: A Message Passing Approach. https://doi.org/10.48550/ARXIV.1804.00293\u003c/li\u003e\n \u003cli\u003eD\u0026rsquo;Orazio, M., Di Giuseppe, E., Bernardini, G., 2022. Automatic detection of maintenance requests: Comparison of Human Manual Annotation and Sentiment Analysis techniques. Automation in Construction 134, 104068. https://doi.org/10.1016/j.autcon.2021.104068\u003c/li\u003e\n \u003cli\u003eEgger, R., Yu, J., 2022. A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. Frontiers in Sociology 7, 886498. https://doi.org/10.3389/fsoc.2022.886498\u003c/li\u003e\n \u003cli\u003eFitzpatrick, P., Sinclair, A.J., 2003. Learning through public involvement in environmental assessment hearings. Journal of environmental management 67, 161\u0026ndash;74. https://doi.org/10.1016/S0301-4797(02)00204-9\u003c/li\u003e\n \u003cli\u003eFlyvbjerg, B., Bruzelius, N., Rothengatter, W., 2003. Megaprojects and Risk: An Anatomy of Ambition. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9781107050891\u003c/li\u003e\n \u003cli\u003eFu, X., 2024. Natural Language Processing in Urban Planning: A Research Agenda. Journal of Planning Literature 39, 395\u0026ndash;407. https://doi.org/10.1177/08854122241229571\u003c/li\u003e\n \u003cli\u003eGeekiyanage, D., Fernando, T., Keraminiyage, K., 2021. Mapping Participatory Methods in the Urban Development Process: A Systematic Review and Case-Based Evidence Analysis. Sustainability 13, 8992. https://doi.org/10.3390/su13168992\u003c/li\u003e\n \u003cli\u003eGeetha, M.P., Karthika Renuka, D., 2021. Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model. International Journal of Intelligent Networks 2, 64\u0026ndash;69. https://doi.org/10.1016/j.ijin.2021.06.005\u003c/li\u003e\n \u003cli\u003eGrootendorst, M., 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. https://doi.org/10.48550/arXiv.2203.05794\u003c/li\u003e\n \u003cli\u003eGu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., Naumann, T., Gao, J., Poon, H., 2022. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthcare 3, 1\u0026ndash;23. https://doi.org/10.1145/3458754\u003c/li\u003e\n \u003cli\u003eHagen, L., 2018. Content analysis of e-petitions with topic modeling: How to train and evaluate LDA models? Information Processing \u0026amp; Management 54, 1292\u0026ndash;1307. https://doi.org/10.1016/j.ipm.2018.05.006\u003c/li\u003e\n \u003cli\u003eHou, Y., Lampe, C., 2015. Social Media Effectiveness for Public Engagement: Example of Small Nonprofits, in: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI \u0026rsquo;15. Association for Computing Machinery, New York, NY, USA, pp. 3107\u0026ndash;3116. https://doi.org/10.1145/2702123.2702557\u003c/li\u003e\n \u003cli\u003eHutto, C., Gilbert, E., 2014. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. ICWSM 8, 216\u0026ndash;225. https://doi.org/10.1609/icwsm.v8i1.14550\u003c/li\u003e\n \u003cli\u003eJelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., Zhao, L., 2019. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications 78, 15169\u0026ndash;15211. https://doi.org/10.1007/s11042-018-6894-4\u003c/li\u003e\n \u003cli\u003eKroll, J., Huey, J., Barocas, S., Felten, E., Reidenberg, J., Robinson, D., Yu, H., 2017. Accountable Algorithms. University of Pennsylvania Law Review 165, 633.\u003c/li\u003e\n \u003cli\u003eLi, T.H.Y., Ng, S.T., Skitmore, M., 2013. Evaluating stakeholder satisfaction during public participation in major infrastructure and construction projects: A fuzzy approach. Automation in Construction 29, 123\u0026ndash;135. https://doi.org/10.1016/j.autcon.2012.09.007\u003c/li\u003e\n \u003cli\u003eLiu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V., 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/ARXIV.1907.11692\u003c/li\u003e\n \u003cli\u003eLu, J., Zhang, H., Zhang, X., 2025. Cultural ecosystem services in China\u0026rsquo;s national parks and their impact on public online engagement \u0026minus; Analysis of Douyin short videos data based on BERTopic modeling. Journal for Nature Conservation 87, 126969. https://doi.org/10.1016/j.jnc.2025.126969\u003c/li\u003e\n \u003cli\u003eMaier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., H\u0026auml;ussler, T., Schmid-Petri, H., Adam, S., 2018. Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Communication Methods and Measures 12, 93\u0026ndash;118. https://doi.org/10.1080/19312458.2018.1430754\u003c/li\u003e\n \u003cli\u003eMedhat, W., Hassan, A., Korashy, H., 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal 5, 1093\u0026ndash;1113. https://doi.org/10.1016/j.asej.2014.04.011\u003c/li\u003e\n \u003cli\u003eMellon, J., Prosser, C., 2017. Twitter and Facebook are not representative of the general population: Political attitudes and demographics of British social media users. Research \u0026amp; Politics 4, 2053168017720008. https://doi.org/10.1177/2053168017720008\u003c/li\u003e\n \u003cli\u003eNabatchi, T., Amsler, L.B., 2014. Direct Public Engagement in Local Government. The American Review of Public Administration 44, 63S-88S. https://doi.org/10.1177/0275074013519702\u003c/li\u003e\n \u003cli\u003eOgryzek, M., Krupowicz, W., Sajn\u0026oacute;g, N., 2021. Public Participation as a Tool for Solving Socio-Spatial Conflicts of Smart Cities and Smart Villages in the Sustainable Transport System. Remote Sensing 13, 4821. https://doi.org/10.3390/rs13234821\u003c/li\u003e\n \u003cli\u003eRaghunathan, N., Saravanakumar, K., 2023. Challenges and Issues in Sentiment Analysis: A Comprehensive Survey. IEEE Access 11, 69626\u0026ndash;69642. https://doi.org/10.1109/ACCESS.2023.3293041\u003c/li\u003e\n \u003cli\u003eRam, J., Titarenko, R., 2022. Using Social Media in Project Management: Behavioral, Cognitive, and Environmental Challenges. Project Management Journal 53, 236\u0026ndash;256. https://doi.org/10.1177/87569728221079427\u003c/li\u003e\n \u003cli\u003eReddick, C.G., Chatfield, A.T., Ojo, A., 2017. A social media text analytics framework for double-loop learning for citizen-centric public services: A case study of a local government Facebook use. Government Information Quarterly 34, 110\u0026ndash;125. https://doi.org/10.1016/j.giq.2016.11.001\u003c/li\u003e\n \u003cli\u003eRizun, N., Revina, A., Edelmann, N., 2025. Text analytics for co-creation in public sector organizations: a literature review-based research framework. Artif Intell Rev 58, 125. https://doi.org/10.1007/s10462-025-11112-1\u003c/li\u003e\n \u003cli\u003eRowe, G., Frewer, L.J., 2005. A Typology of Public Engagement Mechanisms. Science, Technology, \u0026amp; Human Values 30, 251\u0026ndash;290. https://doi.org/10.1177/0162243904271724\u003c/li\u003e\n \u003cli\u003eSan Juan, P., Vidal, A.M., Garcia-Molla, V.M., 2017. Updating/downdating the NonNegative Matrix Factorization. Journal of Computational and Applied Mathematics 318, 59\u0026ndash;68. https://doi.org/10.1016/j.cam.2016.11.048\u003c/li\u003e\n \u003cli\u003eSanh, V., Debut, L., Chaumond, J., Wolf, T., 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https://doi.org/10.48550/arXiv.1910.01108\u003c/li\u003e\n \u003cli\u003eShamshiri, A., Ryu, K.R., McCullough, S., Park, J.Y., 2022. ConStory: Automatic story investigator of public perception on the mega urban infrastructure project: poster abstract, in: Proceedings of the 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation. Presented at the BuildSys \u0026rsquo;22: The 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, ACM, Boston Massachusetts, pp. 293\u0026ndash;294. https://doi.org/10.1145/3563357.3567751\u003c/li\u003e\n \u003cli\u003eShamshiri, A., Ryu, K.R., Park, J.Y., 2024a. Text mining and natural language processing in construction. Automation in Construction 158, 105200. https://doi.org/10.1016/j.autcon.2023.105200\u003c/li\u003e\n \u003cli\u003eShamshiri, A., Ryu, K.R., Park, J.Y., 2024b. In-Context Learning for Long-Context Sentiment Analysis on Infrastructure Project Opinions. https://doi.org/10.48550/arXiv.2410.11265\u003c/li\u003e\n \u003cli\u003eSun, W., Kobayashi, H., Nakao, S., Schm\u0026ouml;cker, J.-D., 2023. On the Relationship Between Crowdsourced Sentiments and Mobility Trends During COVID-19: A Case Study of Kyoto. Data Sci. Transp. 5, 17. https://doi.org/10.1007/s42421-023-00080-z\u003c/li\u003e\n \u003cli\u003eTaboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M., 2011. Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics 37, 267\u0026ndash;307. https://doi.org/10.1162/COLI_a_00049\u003c/li\u003e\n \u003cli\u003eTang, Z., Pan, X., Gu, Z., 2024. Analyzing public demands on China\u0026rsquo;s online government inquiry platform: A BERTopic-Based topic modeling study. PLoS ONE 19, e0296855. https://doi.org/10.1371/journal.pone.0296855\u003c/li\u003e\n \u003cli\u003eTori, F., Tori, S., Keseru, I., Ginis, V., 2024. Performing Sentiment Analysis Using Natural Language Models for Urban Policymaking: An analysis of Twitter Data in Brussels. Data Sci. Transp. 6, 5. https://doi.org/10.1007/s42421-024-00090-5\u003c/li\u003e\n \u003cli\u003eTorres, E.C.M., De Picado-Santos, L.G., 2025. Sentiment Analysis and Topic Modeling in Transportation: A Literature Review. Applied Sciences 15, 6576. https://doi.org/10.3390/app15126576\u003c/li\u003e\n \u003cli\u003eWan, X., Wang, R., Wang, M., Deng, J., Zhou, Z., Yi, X., Pan, J., Du, Y., 2022. Online Public Opinion Mining for Large Cross-Regional Projects: Case Study of the South-to-North Water Diversion Project in China. J. Manage. Eng. 38, 05021011. https://doi.org/10.1061/(ASCE)ME.1943-5479.0000970\u003c/li\u003e\n \u003cli\u003eWang, Ying, Li, H., Wu, Z., 2019. Attitude of the Chinese public toward off-site construction: A text mining study. Journal of Cleaner Production 238, 117926. https://doi.org/10.1016/j.jclepro.2019.117926\u003c/li\u003e\n \u003cli\u003eWang, Yuan, Zhang, D., Liu, Y., Dai, B., Lee, L.H., 2019. Enhancing transportation systems via deep learning: A survey. Transportation Research Part C: Emerging Technologies 99, 144\u0026ndash;163. https://doi.org/10.1016/j.trc.2018.12.004\u003c/li\u003e\n \u003cli\u003eXiao, H., Hao, S., 2021. Public participation in infrastructure projects: an integrative review and prospects for the future research. Engineering, Construction and Architectural Management ahead-of-print. https://doi.org/10.1108/ECAM-06-2021-0495\u003c/li\u003e\n \u003cli\u003eXue, Y., Temeljotov-Salaj, A., Engeb\u0026oslash;, A., Lohne, J., 2020. Multi-sector partnerships in the urban development context: A scoping review. Journal of Cleaner Production 268, 122291. https://doi.org/10.1016/j.jclepro.2020.122291\u003c/li\u003e\n \u003cli\u003eYao, W., Qian, S., 2021. From Twitter to traffic predictor: Next-day morning traffic prediction using social media data. Transportation Research Part C: Emerging Technologies 124, 102938. https://doi.org/10.1016/j.trc.2020.102938\u003c/li\u003e\n \u003cli\u003eYin, J., Wang, J., 2014. A dirichlet multinomial mixture model-based approach for short text clustering, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York New York USA, pp. 233\u0026ndash;242. https://doi.org/10.1145/2623330.2623715\u003c/li\u003e\n \u003cli\u003eZaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R., Smola, A., 2018. Deep Sets. https://doi.org/10.48550/arXiv.1703.06114\u003c/li\u003e\n \u003cli\u003eZeng, L., Li, R.Y.M., Yigitcanlar, T., Zeng, H., 2023. Public Opinion Mining on Construction Health and Safety: Latent Dirichlet Allocation Approach. Buildings 13, 927. https://doi.org/10.3390/buildings13040927\u003c/li\u003e\n \u003cli\u003eZha, W., Ye, Q., Li, J., Ozbay, K., 2023. A social media Data-Driven analysis for transport policy response to the COVID-19 pandemic outbreak in Wuhan, China. Transportation Research Part A: Policy and Practice 172, 103669. https://doi.org/10.1016/j.tra.2023.103669\u003c/li\u003e\n \u003cli\u003eZhang, S., Feick, R., 2016. Understanding Public Opinions from Geosocial Media. International Journal of Geo-Information (IJGI) 5, 74. https://doi.org/10.3390/ijgi5060074\u003c/li\u003e\n \u003cli\u003eZhao, H., Phung, D., Huynh, V., Jin, Y., Du, L., Buntine, W., 2021. Topic Modelling Meets Deep Neural Networks: A Survey, in: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, Montreal, Canada, pp. 4713\u0026ndash;4720. https://doi.org/10.24963/ijcai.2021/638\u003c/li\u003e\n \u003cli\u003eZhou, P., El-Gohary, N., 2016. Domain-Specific Hierarchical Text Classification for Supporting Automated Environmental Compliance Checking. J. Comput. Civ. Eng. 30, 04015057. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000513\u003c/li\u003e\n \u003cli\u003eZoghbi, S., Vulić, I., Moens, M.-F., 2016. Latent Dirichlet allocation for linking user-generated content and e-commerce data. Information Sciences 367\u0026ndash;368, 573\u0026ndash;599. https://doi.org/10.1016/j.ins.2016.05.047\u003cstrong\u003e\u003c/strong\u003e\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"data-science-for-transportation","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Data Science for Transportation](https://www.springer.com/journal/42421)","snPcode":"42421","submissionUrl":"https://submission.nature.com/new-submission/42421/3","title":"Data Science for Transportation","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"snapp","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Public participation, Transportation planning, Text mining, Sentiment analysis, Topic modeling, Decision support","lastPublishedDoi":"10.21203/rs.3.rs-9351360/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9351360/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eMeaningful public engagement is central to transportation planning, yet agencies face challenges in synthesizing large volumes of unstructured comments from hearings, news media, and social platforms. Although natural language processing methods are increasingly used for this purpose, clear guidance is lacking on which models are most suitable for different data characteristics and analytical goals. This study compares transformer-based and classical approaches for sentiment analysis and topic modeling in transportation contexts. A curated multi-source corpus from the North Houston Highway Improvement Project was developed, including Facebook posts, news articles, and public hearing documents. Sentiment classification using Bidirectional Encoder Representations from Transformers (BERT) models, specifically DistilBERT and RoBERTa, was benchmarked against lexicon-based approaches, while topic discovery using BERTopic was compared with probabilistic and matrix factorization models. Model performance was evaluated using classification accuracy and F1-Micro scores, topic coherence and interpretability, and cross-platform consistency. Transformer-based methods outperformed classical approaches, particularly in informal and context-rich settings where lexicon-based tools struggled with nuanced language and mixed sentiment. In addition, BERTopic produced more coherent and transferable topic structures across heterogeneous datasets, while lexicon-based methods remained useful for rapid screening. These findings show that model selection should be guided by data characteristics and analytical objectives rather than reliance on a single technique. The study introduces a method selection framework that provides practical guidance for transportation agencies.\u003c/p\u003e","manuscriptTitle":"Benchmarking Transformer-Based NLP Models for Multi-Platform Public Engagement Analysis in Transportation Projects","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-20 14:05:50","doi":"10.21203/rs.3.rs-9351360/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewerAgreed","content":"133685600113196642164482843572509224371","date":"2026-04-22T05:56:59+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-10T12:47:25+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-10T05:17:34+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-04-10T05:17:29+00:00","index":"","fulltext":""},{"type":"submitted","content":"Data Science for Transportation","date":"2026-04-08T04:20:25+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"data-science-for-transportation","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Data Science for Transportation](https://www.springer.com/journal/42421)","snPcode":"42421","submissionUrl":"https://submission.nature.com/new-submission/42421/3","title":"Data Science for Transportation","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"snapp","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"971b5e96-2162-4732-8299-e63ff892c4e7","owner":[],"postedDate":"April 20th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-20T14:05:50+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-20 14:05:50","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9351360","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9351360","identity":"rs-9351360","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-23T02:00:01.238055+00:00

License: CC-BY-4.0