Zero-Shot Traffic Flow Prediction with Large Language Models: A Comparison with Deep Learning Approaches | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Zero-Shot Traffic Flow Prediction with Large Language Models: A Comparison with Deep Learning Approaches Yue Li, Qunshan Zhao, Mingshu Wang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6572761/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 14 You are reading this latest preprint version Abstract Traffic flow prediction plays an important role in managing urban transportation systems, helping to reduce congestion and improve road safety. Although existing deep learning models improve their prediction accuracy with complex structures, they always require large datasets for task-specific training. Recently, the rapidly developed pre-trained large language models (LLMs) have shown outstanding performance in time series prediction. Motivated by the development, we apply two foundation models, Lag-Llama and Chronos, for zero-shot traffic flow prediction and compare their accuracy against traditional deep learning models. Our results show that LLMs outperform deep learning models in traffic flow prediction under both normal conditions and disruptive events. Unlike deep learning models, which require large-scale historical data and extensive training time for each task, pre-trained LLMs can be directly applied to datasets with different data sizes, traffic dynamics, and context lengths. We also find that LLMs with longer context lengths and larger model sizes achieve higher prediction accuracy but require increased inference times. Selecting an appropriate LLM is also crucial – models trained on a comprehensive dataset are more likely to achieve superior zero-shot performance, making them a practical and efficient choice for real-world traffic prediction applications. Physical sciences/Engineering Physical sciences/Mathematics and computing Traffic flows Time-series prediction Deep learning Large language models (LLMs) Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Introduction Traffic flow prediction is a critical task of intelligent transportation systems (ITS) 1,2 , focusing on predicting future traffic flow conditions based on historical and real-time data 3–6 . Accurate traffic prediction plays a crucial role in urban planning, infrastructure management, and traffic control, which helps to reduce traffic congestion, enhance road safety, and lower environmental impacts 6–9 . Due to the rapid growth of urban populations and vehicle ownership worldwide, cities face increasing challenges to maintain smooth traffic flow 10 . Effective prediction of traffic flows provides transportation authorities, city planners, and travellers with timely information that enables better decision-making, more efficient resource allocation, and an improved quality of life 2,11,12 . The traffic flow prediction models have evolved from traditional statistical approaches to deep learning models in recent years 5 . Initially, statistical models such as Autoregressive Integrated Moving Average (ARIMA) dominated the field due to their simplicity, interpretability 13 , and effectiveness in modelling linear 14 and stationary time-series data. However, these models often struggled to capture complex, non-linear relationships in real-world traffic conditions 15 . To overcome these limitations, deep learning models have been applied broadly, such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks 2,6,15–17 . CNNs are good at capturing local temporal patterns in traffic data by applying convolution operations along the time dimension 18 , while LSTMs are effective in modelling long-term temporal dependencies because of their gated mechanisms 19–21 . Despite their improved accuracy and adaptability to complex scenarios, deep learning models typically require extensive computational resources and large datasets for training 11 , which presents challenges for practical implementation and real-time applications. Besides, these models are often developed for specific tasks 22 , which may lead to overfitting and reducing their ability to generalise effectively to new or unseen data. In the context of government mobility interventions and large-scale disruptive events, such as COVID-19, traffic flow prediction becomes even more challenging. The imposed restrictions, such as lockdowns and social distancing measures, caused rapid changes in travel behavior, leading to increased irregularities and deviations from historical traffic trends 23–29 . Most existing traffic prediction models only trained on pre-pandemic data 12,13,21,30 , which limits their ability to make accurate predictions on traffic flows during periods of significant disruption. Limited models have attempted to address this challenge by considering external factors, such as the effects of imposed response measure 31 and the evolving status of COVID-19 32 . Another research involved decomposing irregular traffic flows into distinct attributes and predicting them separately to improve model performance during the pandemic 33 . However, these models often rely on the availability of extensive labeled datasets that capture the effects of mobility restrictions and event status over time, making them difficult to implement in real-time and across various regions with differing policies. Recently, the development of Large Language Models (LLMs) has introduced new opportunities for traffic flow prediction 5 . Initially designed for natural language processing tasks, LLMs learn extensive general knowledge by pre-training on large amounts of textual data 34–36 . A distinctive strength of pre-trained LLMs is their capability for zero-shot prediction, allowing them to perform various tasks without requiring task-specific training examples 37–39 . Models such as GPT have demonstrated impressive zero-shot performance across numerous language understanding and generation tasks from different domains 34,40–42 . Motivated by these advances, researchers have started exploring pre-trained LLMs for time series prediction 38,39,43 . By specifically pre-training existing transformer-based language model architectures on large-scale time series datasets, these models learn to capture the temporal patterns and dynamics in sequential data effectively 22,44 . Inspired by this capability, we are motivated to apply pre-trained time series LLMs to zero-shot traffic flow prediction tasks, enabling accurate forecasting of traffic conditions without the need for large amounts of task-specific datasets. The overarching goal of this research is to present a comprehensive analysis to compare the performance of traditional deep learning models and pre-trained time series LLMs for traffic flow prediction under normal conditions and disruptive events. The contribution of this paper is threefold. First, it bridges the research gap in applying pre-trained LLMs to traffic flow prediction by comparing the performance of two groups of time-series foundation models and traditional deep learning models on the SCOOT dataset 45 . Second, it evaluates prediction accuracy on heterogeneous and unusual traffic patterns, an area that has been sparsely explored in previous research. It utilises a long-term traffic flow dataset that includes a unique global pandemic period, which allows models to capture long-term traffic trends, seasonal fluctuations, and emergency-related variations, contributing to more robust predictive performance. Third, it highlights the performance gap between pre-trained LLMs with different model size and the diversity and temporal coverage of its training data. A well-trained LLM with comprehensive datasets is more likely to achieve superior zero-shot performance, making it a practical and efficient choice for real-world traffic flow prediction applications. The research outputs will support city planners in integrating pre-trained time-series LLMs into intelligent traffic control systems, enhancing their ability to respond effectively to both routine traffic conditions and unexpected disruptions. It is valuable in helping transportation authorities and urban policymakers make informed, data-driven decisions in traffic management for future large-scale emergencies. Background Traffic Flow Prediction Traffic prediction aims to forecast key factors such as vehicle flow, speed, and congestion levels 46 – 51 , 51 , 52 . Traffic flow prediction is one of the most fundamental and widely studied tasks in ITS. Traditional approaches typically rely on statistical models, such as ARIMA 53 or Kalman filter 54 , to capture traffic patterns and seasonalities, often serving as a strong baseline when data exhibit relatively stable trends 14 . While these models are relatively straightforward and interpretable 13 , they may struggle with irregular fluctuations in large-scale transportation networks 15 . To address these complexities, researchers introduced machine learning models such as Random Forests 55 and Support Vector Machines 56 . By integrating a richer set of input features, these methods can account for additional factors like weather conditions, special events, or road incidents 57 . Although more flexible than purely statistical techniques, they often struggle to achieve consistently robust performance across diverse traffic scenarios. In recent years, deep learning approaches have shown significant promise due to their capability for automatic feature extraction and handling complex dependencies. Convolutional Neural Networks (CNNs), traditionally used for image data, have been adapted for time series prediction by applying convolutional filters along the temporal dimension 18 . This allows CNNs to extract local features, detect short-term patterns, and reduce noise in traffic data. Recurrent Neural Networks (RNNs) further enhance sequence modelling by processing time series data, capturing the dynamic behaviour of traffic flow 58 – 60 . To overcome limitations like vanishing gradients in standard RNNs, Long Short-Term Memory (LSTM) networks, an advanced type of RNN, employ gating mechanisms to maintain long-term dependencies 61 – 65 , making them especially effective for predicting complex temporal patterns such as rush-hour peaks or irregular traffic flows. Large Language Models Recent progress in computer hardware and the availability of large text datasets have led to the development of transformer-based LLMs that demonstrate impressive performance on various natural language processing tasks 11 , 35 , 66 , 67 . Language models are designed to predict the next token in a sequence by estimating the probability of each token based on those that have already appeared 22 . Tokens may be characters, subwords 68 , or words from a vocabulary. The transformer architecture 69 was initially developed as an encoder-decoder system for machine translation 22 and is currently applied in many popular models, such as BART 70 and T5 71 . In these models, the input text is first converted into a continuous representation using an encoder, after which the decoder generates output tokens sequentially based on the representation and previous tokens. Alternatively, a decoder-only architecture, used in models like GPT-3 34 and Llama 2 66 , only considers tokens before the current token when making predictions. This architecture simplifies the model's design while still achieving robust performance. LLMs are trained on extensive collections of text and can have millions to hundreds of billions of parameters 71 , 72 . Researchers have found that increasing the number of parameters in these models leads to better performance 34 . When the number of parameters becomes large enough, LLMs perform traditional language tasks more accurately and show new abilities that smaller models lack 11 . Zero-shot generalisation is one such ability where the model makes predictions on tasks it was not explicitly trained for 22 . For example, Brown et al. (2020) demonstrated that as the number of parameters grows, LLMs acquire the skill to handle new tasks without additional, task-specific training. This connection between model parameters and zero-shot generalisation highlights that LLMs not only improve their flexibility and power in language understanding and generation but also become capable of tackling challenges such as forecasting time series data. Large Language Models for Time-series Prediction LLMs have recently developed as useful tools for time series forecasting by using their powerful sequence modelling and pattern recognition capabilities 37 . PromptCast 43 first treats time series forecasting as a natural language generation task, converting numerical inputs and outputs into textual prompts, thus allowing general-purpose language models to serve as core forecasting engines. However, PromptCast often requires carefully designed prompts, which can be time-consuming in complex or domain-specific scenarios. LLMTime 38 addresses these limitations by directly tokenising time series data and treating forecasting as next-token prediction. This tokenisation strategy not only avoids extensive prompt engineering but also allows pre-trained LLMs, like GPT-3 and LLaMA, to produce robust zero-shot forecasts across a variety of benchmark datasets 22 , 38 . However, LLMTime can be computationally and memory-intensive due to the large size of models, and it requires careful rescaling of data to handle varying magnitudes or precision. Unlike PromptCast and LLMTime repurpose large pre-trained LLMs with textual or digit-based prompts, researchers have further developed time series-specific LLMs by training foundation models with large, diverse time series datasets. Rasul et al. (2024) propose a foundation model (Lag-Llama) designed explicitly for univariate time series forecasting. Built on a decoder-only transformer architecture that uses lag features as covariates, Lag-Llama is pre-trained on a broad collection of real-world time series across multiple domains including energy, transportation, economics, environmental science, air quality and cloud operations. This large-scale pre-training process allows it to capture a wide range of time series patterns, enabling strong performance in zero-shot generalisation. Recent concurrent work, Chronos 22 , offers a similarly broad framework for pretrained time series forecasting but adapts standard language model architectures T5 71 to treat real-valued time series as discrete tokens. Using scaling and uniform binning, Chronos converts continuous sequences into a fixed vocabulary. Once tokenised, it trains a language model on an extensive collection of public and synthetic time series datasets, thus learning to model a wide range of temporal patterns. Chronos demonstrates superior performance across 42 benchmark datasets, outperforming in-domain and zero-shot scenarios. Results Model Performance Comparison This study compares traffic flow prediction model performance across the entire dataset and post-COVID-19 dataset at different context lengths (input lengths) between deep learning and LLMs (Fig. 2 – 4 ). The evaluation results clearly distinguish between model performance when trained on the post-COVID-19 dataset and the entire dataset. Across all models, evaluation metrics (MAE, MAPE and RMSE) are consistently lower for the post-COVID-19 dataset. This suggests that deep learning and LLMs perform better with stable traffic patterns. Specifically, the improvements of deep learning models are moderate, with a slight decrease in RMSE and MAPE when predicted on the post-COVID-19 dataset, suggesting that traditional deep learning models may be less sensitive to different data patterns. In contrast, LLMs demonstrate a noticeable performance gap between different datasets. When predicted on post-COVID-19 data, the reduction in MAE, MAPE and RMSE is more evident than deep learning models, particularly for Lag-LLaMA, indicating improved model adaptability to stable traffic dynamics. According to the context length, increasing the context length leads to improved prediction performance across LLMs but limited improvement in deep learning models. CNN and LSTM show relatively worse performance with increased context length, especially for the entire dataset. While LSTM maintains a relatively stable trend, CNN exhibits more fluctuations, particularly for longer context lengths, suggesting potential overfitting or inefficiencies in capturing long-term dependencies. LLMs consistently reduce MAE, MAPE and RMSE as context length increases, although with slight fluctuations. Lag-LlaMA, in particular, demonstrates the most significant improvement, reinforcing its ability to apply long historical sequences effectively. These findings highlight the superior capacity of LLMs to process and utilize long-term dependencies in time series prediction. Although LLMs generally benefit from longer context lengths, their performance declines when the context is short—often falling behind traditional deep learning models. In particular, Lag-LLaMA consistently yields higher MAE and RMSE values than both CNN and LSTM across all context lengths when evaluated on the full dataset. This can be attributed to the zero-shot nature of these pre-trained models, which rely on broadly learned universal patterns from large-scale, high-quality data rather than task-specific training 11 . The entire dataset, which includes the more heterogeneous and unusual traffic patterns, would make the zero-shot prediction more demanding on these models. In contrast, the post-COVID-19 dataset exhibits more stable and universal traffic flow dynamics, enabling the LLMs to utilise their extensive pre-training more effectively and outperform the traditional deep learning models. Besides, the consistent performance improvements observed with longer context lengths demonstrate the importance of providing LLMs with sufficient historical information to enhance their zero-shot predictions in time series prediction. Training Time and Inference Time Analysis Figure 5 illustrates the trade-off between training time and Mean Absolute Error (MAE) for CNN and LSTM models on the post-COVID-19 and entire datasets. The CNN consistently achieves lower MAE and demonstrates superior computational efficiency on both datasets, outperforming the LSTM by all measures. Specifically, the CNN completes training epochs in considerably less time, suggesting that its convolution-based structure may be faster to capture features of time series data with fewer parameters. Meanwhile, the LSTM exhibits an evident increase in training time, requiring more than twice that of the CNN and tends to produce higher MAE values, especially on the entire dataset. This difference is likely due to the larger data size and the sequential nature of LSTM, which requires more complex computations per timestep. However, the increased training time and more considerable dataset help bridge the performance gap between LSTM and CNN. This suggests that LSTM may achieve better accuracy by capturing complex temporal dependencies when provided with sufficient data and training time. We also compare the inference time for all the models, including deep learning and LLMs. It is important to note that Lag-Llama is intentionally omitted from this figure due to its exceptionally high inference times 5 . It requires approximately 132 seconds per epoch on the post-COVID-19 dataset and 875 seconds per epoch on the entire dataset, substantially longer than the inference times observed for the other models. For both post-COVID-19 and entire datasets, CNN and LSTM exhibit relatively short inference times but moderately higher MAEs, while the Chronos models show a broader range of inference times, generally increasing with model size. Similar to the training time comparison, LSTM consistently exhibits longer inference times and higher MAEs than CNN. This highlights that the sequential structure of LSTM requires significantly more computational resources, which may limit its applicability in time-sensitive and resource-constrained scenarios compared to CNN. For Chronos models, it can be seen that the larger Chronos configurations tend to achieve lower MAEs at the cost of longer inference durations. This is because Chronos are probabilistic time series models that rely on autoregressively sampling from the predicted distribution 22 , which leads to longer inference times than deep learning models producing point predictions. This effect is particularly significant for larger Chronos models with more parameters, leading to increased computational resources. A key observation is that Chronos (Small) achieves shorter inference times and better performance than Chronos (Mini). This indicates that specific Chronos configurations may effectively balance computational effort and predictive accuracy. Beyond this point, increasing model size leads to significantly longer inference times but only slightly increased performance. Model Size Analysis Figure 7 compares the average performance of different context lengths with model size, which is measured by the number of parameters. Specifically, the number of parameters for CNN and LSTM depends on their architecture, hyperparameters, and context length, while pre-trained LLMs maintain a fixed parameter number. From Fig. 7 , it is clear that deep learning models and LLMs demonstrate different performance trends as their sizes change. For LLMs, increasing the number of parameters generally leads to improved prediction accuracy on both the post-COVID-19 dataset and the entire dataset, suggesting that additional parameters help capture complex traffic patterns. Meanwhile, CNN slightly outperforms LSTM on the post-COVID-19 dataset, which might be because CNN is more effective at capturing local temporal features on smaller data sizes. LSTM remains stable on the entire dataset, possibly due to its ability to apply longer sequence dependencies for complex traffic flow prediction. Besides, Lag-Llama's performance decreases obviously on the entire dataset, while Chronos models demonstrate robust performance across different sizes, indicating that they effectively balance complexity and accuracy. Discussion In this study, we compare the performance of traffic flow prediction on traditional deep learning models and cutting-edge LLMs. The deep learning models, CNN and LSTM, are trained on the SCOOT dataset ourselves, while Lag-Llama and Chronos are pre-trained time series LLMs, which can be applied for zero-shot prediction. We have found that LLMs with longer context lengths and larger model sizes tend to achieve higher prediction accuracy, while deep learning models show limited improvement. This suggests that deep learning models may suffer from overfitting or inefficiencies in capturing long-term dependencies. In contrast, LLMs demonstrate a superior ability to process and utilise long-term dependencies in time series prediction. However, there is a trade-off between model performance and inference time, as increasing context length and model size require more significant computational resources. Moreover, although LLMs are more sensitive to traffic patterns, they outperform traditional deep learning models in both usual and unusual traffic conditions, with Chronos demonstrating particularly strong performance. In our experiments, we evaluate training and inference times separately to provide a clear understanding of the computational demands of each stage. However, for a comprehensive comparison between deep learning models and LLMs, we calculate the total running time, combining training and inference durations for the deep learning models. Specifically, we calculate the cumulative running time over 100 training epochs for deep learning models and one prediction epoch for all models. As shown in Fig. 8 , the running times of Chronos are significantly lower than those of other models, particularly for the small dataset. While the running time of CNN is shorter than that of Lag-Llama in this case, it overlooks hyperparameter tuning, which is essential in the deep learning training process for each prediction task. The tuning process involves testing dozens of hyperparameter combinations 101 , 102 , with each test requiring an amount of time equivalent to the running time observed here (since the inference times of CNN and LSTM are too short to be considered). This leads to a practical running time multiple times greater than the running time here. As a result, the running time for LLMs is shorter than that of deep learning models. Besides, deep learning models require separate training for each prediction task. In contrast, pre-trained LLMs can be directly applied to different prediction tasks across various datasets with varying context lengths. This significantly simplifies model deployment and streamlines forecasting pipelines, eliminating the need for task-specific training. A key limitation of LLMs is the performance gap between different models. Our findings indicate that Lag-Llama shows only limited improvements in prediction accuracy compared to deep learning models, while Chronos consistently demonstrates strong performance across various context lengths. To further illustrate this performance gap, we used another publicly available and widely used traffic flow dataset collected by the Caltrans Performance Measurement System (PeMS) in California, USA from January 1 to December 31, 2018. As shown in Fig. 9 , the results are consistent with those from the SCOOT dataset in our research, with Chronos significantly outperforming Lag-Llama across all context lengths. Since both Chronos and Lag-Llama are pre-trained on a diverse set of publicly available datasets, the observed performance difference may stem from their training data. Comparing their datasets, we find that Chronos is trained on seven different datasets, while Lag-Llama uses only three. Besides, Chronos incorporates synthetic data generated using Gaussian processes to enhance its training process. The training data for Chronos ranges from 2009 to 2022, while Lag-Llama's training data is limited to 2009 and 2014–2016. This suggests that training LLMs on a more extensive corpus of time series data improves zero-shot performance. Moreover, model size plays a crucial role in performance. Chronos offers models ranging from 8M (Tiny) to 710M (Large) parameters, which are significantly larger than Lag-Llama's 2.45M parameters and are likely contributing to its superior predictive accuracy. In summary, this research highlights that LLMs can achieve excellent zero-shot performance in traffic flow prediction under both normal conditions and disruptive events. Unlike traditional deep learning models, which require extensive task-specific training and domain expertise, pre-trained LLMs can be directly applied to datasets with different data sizes, traffic dynamics, and context lengths. Besides, while deep learning models require large-scale historical data for training and validation, LLMs can make accurate zero-shot predictions with only a small subset of contextual data. Those advantages can address critical limitations of traditional deep learning methods, such as time-consuming training processes, overfitting due to task-specific model training, and limited generalisation capabilities. Additionally, they can contribute to the practical deployment of traffic prediction models across diverse and dynamically changing urban scenarios. However, choosing an appropriate LLM is crucial, as performance depends on factors such as training data diversity, time coverage, and model size. A well-trained LLM with a comprehensive dataset is more likely to achieve superior zero-shot performance, making it a practical and efficient choice for real-world traffic prediction applications. Using LLMs for traffic flow prediction and other time series analysis tasks in urban settings presents both opportunities and challenges. The development of LLMs has been one of the fastest-growing areas over the past two and a half years, since OpenAI released the first version of ChatGPT 3.5 on November 2022. As foundation models continue to evolve in both capability and efficiency, the prediction accuracy of LLM-based time series models is expected to improve correspondingly. As we observed in this research, although LLMs exhibit superior performance in this research, with faster inference and higher accuracy, they still have several limitations. Firstly, the training process of LLMs is typically very expensive and time-consuming, making it less accessible for academic institutions or small research groups to extend or revise pre-trained models. Furthermore, the efficiency of LLMs heavily relies on the underlying foundation models, and the most advanced foundation models are often closed-source and developed by leading companies in generative AI. As a result, most researchers can only rely on "less advanced" or "older generation" publicly available foundation models to design fine-tuned models for time series prediction. In the future, a better collaboration between AI companies and academia is necessary to enable further customised model development. Lastly, the limited scope and diversity of training data 22 for time-series LLMs lead to performance disparities and prediction biases across foundation models. Future work can focus on building and maintaining large-scale, diverse traffic datasets to improve model training and predictive accuracy across various scenarios. The model can also be further evaluated in regions with limited data, such as those in the Global South, to assess the effectiveness of using foundation models trained on datasets from Global North countries in different geographical contexts. Governments can encourage collaborative data-sharing initiatives between public and private sectors to expand the availability of high-quality traffic data for model development. Data and Methods Datasets This section details the dataset employed to evaluate the predictive performance of deep learning and LLMs, with real-world traffic data collected via a Split Cycle Offset Optimisation Technique (SCOOT) based Urban Traffic Control system 45 . The SCOOT uses a network of sensors to capture traffic flow data across the road network. The dataset includes traffic flows from the Glasgow City Council area over four consecutive years, from October 1, 2019, to September 30, 2023, which includes the COVID-19 pandemic period 9 . There are 470 sensors in the SCOOT dataset which record traffic flows at 60-minute intervals. Figure 1 compares the attributes of the SCOOT dataset with other traffic flow datasets applied in recent traffic flow prediction research from 2022 to 2024 15,21,30,52,73–82 . Most datasets cover no more than one year during normal periods and use time intervals of less than 30 minutes 20 , 83 – 86 , 86 – 93 , while the SCOOT dataset covers longer than many previous studies. Although one study utilised a seven-year traffic dataset, which is longer than the SCOOT dataset, it only covers a period of stable traffic conditions. In contrast, the SCOOT dataset captures traffic flows before, during, and after COVID-19, providing valuable insights into the drastic changes in human mobility patterns in response to government mobility interventions during a period of significant disruption. Its long-term coverage allows models to capture long-term traffic trends, seasonal fluctuations, and emergency-related variations, contributing to more robust predictive performance. Besides, The bubble size in Fig. 1 represents the number of data points, and the SCOOT dataset contains a relatively large volume of observations. This large volume of data enhances deep learning and LLMs by improving their ability to learn complex traffic patterns and reducing the risk of overfitting. Deep Learning and Large Language Models We select two widely used deep learning models for time series analysis, CNN 94 and LSTM 95 , and two recently developed LLM-based time series prediction models, Lag-Llama 44 and Chronos 22 , to assess the performance of traffic flow prediction. The details of the models are outlined as follows: Convolutional Neural Network (CNN) CNN 94 captures temporal patterns by applying convolutional filters to sliding windows of sequential data. This approach effectively detects local trends and short-term dependencies, enhancing prediction accuracy. In our CNN model, two one-dimensional convolutional layers are employed, with each utilising the ReLU activation function. The first layer specifies the input shape based on the sequence length and extracts initial local features from the traffic flow data. A subsequent convolutional layer further refines these features. A max-pooling layer then reduces the dimensionality of the resulting feature maps, preserving essential representations while reducing computational cost. Finally, the network is flattened and regularised using a dropout layer to prevent overfitting before a dense layer produces forecasts over 6-time steps. Long Short-Term Memory (LSTM) LSTM 95 effectively models long-term dependencies and sequential relationships in time series data through gated memory cells, which retain relevant historical information while addressing vanishing gradient issues. Our LSTM model includes two hidden layers. The first LSTM layer, configured with the tanh activation function and set to return sequences, processes the input sequence to extract temporal features and passes the entire sequence to the subsequent layer. The second LSTM layer further refines these features using the tanh activation function. Each LSTM layer is followed by a dropout layer to reduce the risk of overfitting. Finally, a dense layer with six neurons is employed for multi-step prediction. Lag-Llama Lag-Llama 44 is a foundation model for univariate probabilistic time series prediction, built on a decoder-only transformer architecture, LLaMA 36 . Lag-Llama tokenises input data by constructing lagged feature vectors using historical observations at predetermined lag intervals. These intervals include multiple standard frequencies such as quarterly, monthly, weekly, daily, hourly, and second-level frequencies. Each token also incorporates temporal covariates derived from date-time features such as hour-of-day, day-of-week, and month-of-year, enriching the representation and providing contextual information to the model. The input tokens, composed of lagged features and temporal covariates, are projected into a hidden representation and passed through a series of causally masked transformer decoder layers, employing RMSNorm and Rotary Positional Encoding (RoPE) at each attention layer. The final output from the transformer decoder is fed into a distribution head designed to predict parameters of a Student's t-distribution (degrees of freedom, mean, and scale) used for probabilistic forecasting. Lag-Llama applies a robust scaling procedure using median and interquartile range (IQR) normalisation to handle numerical scale variations across different time series, significantly improving training stability and forecast accuracy. During training, Lag-Llama minimises the negative log-likelihood of the forecast distribution for future values. Lag-Llama is pre-trained on 27 datasets categorised into six domains: air quality, transportation, economics, nature, energy, and cloud operations. The pre-training corpus includes 7,965 univariate series consisting of about 352 million data tokens. This extensive and diverse corpus improves Lag-Llama's ability to generalise and deliver strong zero-shot forecasting performance. Chronos Chronos 22 is a pre-trained probabilistic forecasting framework designed specifically for time series, built on transformer-based language models. The core innovation of Chronos is its approach to treating time series forecasting similarly to natural language modelling tasks. It achieves this by tokenising continuous time series data into discrete tokens using a two-step approach: scaling and quantisation. Firstly, Chronos tokenises time series data by scaling each series individually using mean scaling, which normalises the data based on the mean of absolute historical values. Then, the scaled data are quantised into discrete bins, forming tokens from a fixed-size vocabulary. This vocabulary includes numerical bins and special tokens such as PAD (for padding sequences to equal lengths) and EOS (end-of-sequence). Chronos primarily employs variants of the T5 family of transformer-based language models, ranging from smaller models with approximately 8 million parameters to larger models of up to 710 million 71 . These models are trained in 5 sizes, named Tiny (8M), Mini (20M), Small (46M), Base (200M) and Large (710M), using a cross-entropy loss function, effectively framing regression as a classification task over discrete quantised bins. Chronos models provide probabilistic forecasts by autoregressively sampling from the learned categorical distributions and subsequently mapping these sampled tokens back to continuous numerical values via dequantisation and inverse scaling. To enhance training, Chronos utilises data augmentation methods: TSMixup, which creates augmented series through convex combinations of existing series, and KernelSynth, which generates synthetic series using Gaussian processes. Chronos was pre-trained on 28 datasets comprising publicly available datasets, including transport, retail, energy, finance, healthcare, and climate science, complemented by synthetic datasets. The comprehensive benchmark evaluation involved 42 datasets to assess in-domain and zero-shot forecasting performance. Model Implementations To evaluate the model performance on usual traffic patterns and unusual traffic dynamics, we divide the SCOOT dataset into two subgroups – the entire dataset including pandemic period, and the post-COVID-19 dataset. Based on the Stringency Index, the entire dataset contains hourly traffic flow data from October 1, 2019, to September 30, 2023, while the post-COVID-19 dataset includes data from June 3, 2022 96 . Each subgroup is chronologically divided into training (60%), validation (20%), and testing (20%) sets, with a 60‑minute interval for both training and prediction. To assess the impact of context length on prediction accuracy, we train models with varying context lengths. Specifically, the context length is set to 24× n hours, where n ranges from 1 to 21, limited by the available computational memory. These context lengths are used to predict traffic flow over the next 6 hours, a common forecasting horizon in existing research 19 , 83 , 97 , 98 . We conduct experiments with different hyperparameters for each context length and dataset to train deep learning models, selecting the best configuration for comparison. All the experiments are repeated 10 times, and we record the mean value of evaluation metrics to reduce the randomness of individual training runs. The Adam optimizer is employed over 100 epochs, and the best hyperparameters of each model are shown in Table A1 . Mean Square Error (MSE) is used as the loss function during the model training 99 : $$\:Loss=\frac{1}{n}\sum\:_{i=1}^{n}{({\widehat{y}}_{i}-{y}_{i})}^{2}$$ 1 All the models are implemented in Python 3.12.4, and executed on a 64-bit Ubuntu server with Intel Xeon Gold 6334 8-Core Processor × 2 @ 3.60GHz CPU, 125 GB of RAM, and an NVIDIA A100 GPU with 24 GB of memory. The deep learning models are developed using TensorFlow 2.17.0, and the LLMs are conducted with PyTorch 2.3.1. Evaluation Metrics The accuracy of traffic prediction models is typically evaluated using performance metrics that quantify their ability to forecast traffic conditions. In this research, we employ three widely recognised metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). RMSE and MAE assess absolute errors, while MAPE evaluates relative errors 100 . In all metrics, lower values indicate better prediction performance. The formulas are as follows: \(\:RMSE=\:\sqrt{\frac{1}{n}\sum\:_{i=1}^{n}{({\widehat{y}}_{i}-{y}_{i})}^{2}}\) (2) \(\:MAE=\frac{1}{n}\sum\:_{i=1}^{n}|{\widehat{y}}_{i}-{y}_{i}|\) (3) \(\:MAPE=\frac{1}{n}\sum\:_{i=1}^{n}\left|\frac{{\widehat{y}}_{i}-{y}_{i}}{{y}_{i}}\right|\times\:100\) (4) where \(\:{y}_{i}\) and \(\:{\widehat{y}}_{i}\) represent the ground truth and the predicted value for the \(\:n\) th traffic flow sample. \(\:n\) is the total number of the prediction samples. Declarations Acknowledgement The first author is funded by the China Scholarship Council (CSC) from the Ministry of Education of P.R. China. Dr Qunshan Zhao has received the ESRC's ongoing support for the Urban Big Data Centre (UBDC) [ES/L011921/1 and ES/S007105/1], and Royal Society International Exchange Scheme [IEC\NSFC\223042]. The authors want to thank the anonymous reviewers for their insightful comments and suggestions on an earlier version of this manuscript. CRediT authorship contribution statement Yue Li: Conceptualization; Data curation; Formal analysis; Methodology; Visualization; Writing - original draft; Writing – review and editing. Qunshan Zhao: Conceptualization; Writing - review & editing; Supervision; Resources; Project administration; Funding acquisition. Mingshu Wang: Conceptualization; Writing – review & editing; Supervision. Data availability The data used in this paper is publicly available. Full details about the data acquisition can be found in the documentation available at the GitHub repository: https://github.com/YueLi-0816/trafficFlowPrediction. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. References Yin, X. et al. Deep Learning on Traffic Prediction: Methods, Analysis, and Future Directions. IEEE Transactions on Intelligent Transportation Systems 23 , 4927–4943 (2022). Liu, Y., Rasouli, S., Wong, M., Feng, T. & Huang, T. RT-GCN: Gaussian-based spatiotemporal graph convolutional network for robust traffic prediction. Information Fusion 102 , 102078 (2024). Chen, H. & Rakha, H. A. Real-time travel time prediction using particle filtering with a non-explicit state-transition model. Transportation Research Part C: Emerging Technologies 43 , 112–126 (2014). Guo, G. & Yuan, W. Short-term traffic speed forecasting based on graph attention temporal convolutional networks. Neurocomputing 410 , 387–393 (2020). Liu, C. et al. Spatial-Temporal Large Language Model for Traffic Prediction. Preprint at https://doi.org/10.48550/arXiv.2401.10134 (2024). Kim, Y., Tak, H., Kim, S. & Yeo, H. A hybrid approach of traffic simulation and machine learning techniques for enhancing real-time traffic prediction. Transportation Research Part C: Emerging Technologies 160 , 104490 (2024). Chen, J. et al. Traffic flow matrix-based graph neural network with attention mechanism for traffic flow prediction. Information Fusion 104 , 102146 (2024). Fan, J. et al. RGDAN: A random graph diffusion attention network for traffic prediction. Neural Networks 172 , 106093 (2024). Li, Y., Zhao, Q. & Wang, M. Understanding urban traffic flows in response to COVID-19 pandemic with emerging urban big data in Glasgow. Cities 154 , 105381 (2024). Kalašová, A. & Stacho, M. Smooth traffic flow as one of the most important factors for safety increase in road transport. Transport (2006). Ren, Y. et al. TPLLM: A Traffic Prediction Framework Based on Pretrained Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2403.02221 (2024). Sattarzadeh, A. R., Kutadinata, R. J., Pathirana, P. N. & Huynh, V. T. A novel hybrid deep learning model with ARIMA Conv-LSTM networks and shuffle attention layer for short-term traffic flow prediction. Transportmetrica A: Transport Science (2025). Zhang, Y., Tang, S. & Yu, G. An interpretable hybrid predictive model of COVID-19 cases using autoregressive model and LSTM. Sci Rep 13 , 6708 (2023). Wang, Y., Jia, R., Dai, F. & Ye, Y. Traffic Flow Prediction Method Based on Seasonal Characteristics and SARIMA-NAR Model. Applied Sciences 12 , 2190 (2022). Kashyap, A. A. et al. Traffic flow prediction models - A review of deep learning techniques. COGENT ENGINEERING 9 , (2022). Li, Y., Chai, S., Ma, Z. & Wang, G. A Hybrid Deep Learning Framework for Long-Term Traffic Flow Prediction. IEEE Access 9 , 11264–11271 (2021). Méndez, M., Merayo, M. G. & Núñez, M. Long-term traffic flow forecasting using a hybrid CNN-BiLSTM model. Engineering Applications of Artificial Intelligence 121 , 106041 (2023). Li, Y. et al. Modeling Temporal Patterns with Dilated Convolutions for Time-Series Forecasting. ACM Transactions on Knowledge Discovery from Data 16 , 14:1-14:22 (2021). Wu, D., Peng, K., Wang, S. & Leung, V. C. M. Spatial-Temporal Graph Attention Gated Recurrent Transformer Network for Traffic Flow Forecasting. IEEE INTERNET OF THINGS JOURNAL 11 , 14267–14281 (2024). Xia, Z., Zhang, Y., Yang, J. & Xie, L. Dynamic spatial-temporal graph convolutional recurrent networks for traffic flow forecasting. EXPERT SYSTEMS WITH APPLICATIONS 240 , (2024). Zhao, Y. et al. Dual flow fusion graph convolutional network for traffic flow prediction. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS 15 , 3425–3437 (2024). Ansari, A. F. et al. Chronos: Learning the Language of Time Series. Preprint at https://doi.org/10.48550/arXiv.2403.07815 (2024). Parr, S., Wolshon, B., Renne, J., Murray-Tuite, P. & Kim, K. Traffic Impacts of the COVID-19 Pandemic: Statewide Analysis of Social Separation and Activity Restriction. Natural Hazards Review 21 , 04020025 (2020). Warren, M. S. & Skillman, S. W. Mobility Changes in Response to COVID-19. Preprint at https://doi.org/10.48550/arXiv.2003.14228 (2020). Borkowski, P., Jażdżewska-Gutta, M. & Szmelter-Jarosz, A. Lockdowned: Everyday mobility changes in response to COVID-19. Journal of Transport Geography 90 , 102906 (2021). Nouvellet, P. et al. Reduction in mobility and COVID-19 transmission. Nat Commun 12 , 1090 (2021). Patra, S. S., Chilukuri ,Bhargava Rama & and Vanajakshi, L. Analysis of road traffic pattern changes due to activity restrictions during COVID-19 pandemic in Chennai. Transportation Letters 13 , 473–481 (2021). Ebrahim Shaik, Md. & Ahmed, S. An overview of the impact of COVID-19 on road traffic safety and travel behavior. Transportation Engineering 9 , 100119 (2022). Hu, Y. et al. Impacts of Covid-19 mode shift on road traffic. Preprint at https://doi.org/10.48550/arXiv.2005.01610 (2023). Ma, C., Dai, G. & Zhou, J. Short-Term Traffic Flow Prediction for Urban Road Sections Based on Time Series Analysis and LSTM_BILSTM Method. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 23 , 5615–5624 (2022). Ghanim, M. S., Muley, D. & Kharbeche, M. ANN-Based traffic volume prediction models in response to COVID-19 imposed measures. Sustainable Cities and Society 81 , 103830 (2022). Liapis, S. et al. A methodology using classification for traffic prediction: Featuring the impact of COVID-19. Integrated Computer-Aided Engineering 28 , 417–435 (2021). Li, H. et al. Traffic Flow Forecasting in the COVID-19: A Deep Spatial-temporal Model Based on Discrete Wavelet Transformation. ACM Trans. Knowl. Discov. Data 17 , 64:1-64:28 (2023). Brown, T. et al. Language Models are Few-Shot Learners. in Advances in Neural Information Processing Systems vol. 33 1877–1901 (Curran Associates, Inc., 2020). Chung, H. W. et al. Scaling Instruction-Finetuned Language Models. Preprint at https://doi.org/10.48550/arXiv.2210.11416 (2022). Touvron, H. et al. LLaMA: Open and Efficient Foundation Language Models. Preprint at https://doi.org/10.48550/arXiv.2302.13971 (2023). Mirchandani, S. et al. Large Language Models as General Pattern Machines. Preprint at https://doi.org/10.48550/arXiv.2307.04721 (2023). Gruver, N., Finzi, M., Qiu, S. & Wilson, A. G. Large Language Models Are Zero-Shot Time Series Forecasters. Preprint at https://doi.org/10.48550/arXiv.2310.07820 (2024). Liu, H., Zhao, Z., Wang, J., Kamarthi, H. & Prakash, B. A. LSTPrompt: Large Language Models as Zero-Shot Time Series Forecasters by Long-Short-Term Prompting. Preprint at https://doi.org/10.48550/arXiv.2402.16132 (2024). OpenAI et al. GPT-4 Technical Report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2024). Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. Improving Language Understanding by Generative Pre-Training. (2018). Radford, A. et al. Language Models are Unsupervised Multitask Learners. (2019). Xue, H. & Salim, F. D. PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting. Preprint at https://doi.org/10.48550/arXiv.2210.08964 (2023). Rasul, K. et al. Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting. Preprint at https://doi.org/10.48550/arXiv.2310.08278 (2024). Li, Y., Zhao, Q. & Wang, M. High-resolution traffic flow data from the urban traffic control system in Glasgow. Sci Data 12 , 253 (2025). Park, J. et al. Real time vehicle speed prediction using a Neural Network Traffic Model. in The 2011 International Joint Conference on Neural Networks 2991–2996 (2011). doi:10.1109/IJCNN.2011.6033614. Jia, Y., Wu, J. & Du, Y. Traffic speed prediction using deep learning method. in 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC) 1217–1222 (2016). doi:10.1109/ITSC.2016.7795712. Jia, Y., Wu, J., Ben-Akiva, M., Seshadri, R. & Du, Y. Rainfall-integrated traffic speed prediction using deep learning method. IET Intelligent Transport Systems 11 , 531–536 (2017). Akhtar, M. & Moridpour, S. A Review of Traffic Congestion Prediction Using Artificial Intelligence. Journal of Advanced Transportation 2021 , 8878011 (2021). Chen, C., Liu, Z., Wan, S., Luan, J. & Pei, Q. Traffic Flow Prediction Based on Deep Learning in Internet of Vehicles. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 22 , 3776–3789 (2021). Aljebreen, M. et al. Enhancing Traffic Flow Prediction in Intelligent Cyber-Physical Systems: A Novel Bi-LSTM-Based Approach With Kalman Filter Integration. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS 70 , 1889–1902 (2024). Alvi, M., Minerva, R., Rajapaksha, P., Crespi, N. & Alvi, U. Traffic Flow Prediction in Sensor-Limited Areas Through Synthetic Sensing and Data Fusion. IEEE SENSORS LETTERS 8 , (2024). Van Der Voort, M., Dougherty, M. & Watson, S. Combining kohonen maps with arima time series models to forecast traffic flow. Transportation Research Part C: Emerging Technologies 4 , 307–318 (1996). Okutani, I. & Stephanedes, Y. J. Dynamic prediction of traffic volume through Kalman filtering theory. Transportation Research Part B: Methodological 18 , 1–11 (1984). Leshem, G. & Ritov, Y. Traffic Flow Prediction using Adaboost Algorithm with Random Forests as a Weak Learner. International Journal of Electrical and Computer Engineering 21 , (2007). Tang, J. et al. Traffic flow prediction based on combination of support vector machine and data denoising schemes. Physica A: Statistical Mechanics and its Applications 534 , 120642 (2019). Yang, S. & Qian, S. Understanding and Predicting Travel Time with Spatio-Temporal Features of Network Traffic Flow, Weather and Incidents. IEEE Intelligent Transportation Systems Magazine 11 , 12–28 (2019). Tian, Y. & Pan, L. Predicting Short-Term Traffic Flow by Long Short-Term Memory Recurrent Neural Network. in 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity) 153–158 (2015). doi:10.1109/SmartCity.2015.63. Zhu, H. et al. A Novel Traffic Flow Forecasting Method Based on RNN-GCN and BRB. Journal of Advanced Transportation 2020 , 7586154 (2020). Lu, S., Zhang, Q., Chen, G. & Seng, D. A combined method for short-term traffic flow prediction based on recurrent neural network. Alexandria Engineering Journal 60 , 87–94 (2021). Xiao, Y. & Yin, Y. Hybrid LSTM Neural Network for Short-Term Traffic Flow Prediction. INFORMATION 10 , (2019). Wang, S., Zhao, J., Shao, C., Dong, C. D. & Yin, C. Truck Traffic Flow Prediction Based on LSTM and GRU Methods With Sampled GPS Data. IEEE ACCESS 8 , 208158–208169 (2020). Xiong, L., Ding, W., Huang, X. & Huang, W. CLSTAN: ConvLSTM-Based Spatiotemporal Attention Network for Traffic Flow Forecasting. MATHEMATICAL PROBLEMS IN ENGINEERING 2022 , (2022). Wang, J.-D. & Susanto, C. O. N. Traffic Flow Prediction with Heterogenous Data Using a Hybrid CNN-LSTM Model. CMC-COMPUTERS MATERIALS & CONTINUA 76 , 3097–3112 (2023). Guo, C., Zhu, J. & Wang, X. MVHS-LSTM: The Comprehensive Traffic Flow Prediction Based on Improved LSTM via Multiple Variables Heuristic Selection. APPLIED SCIENCES-BASEL 14 , (2024). Touvron, H. et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. Preprint at https://doi.org/10.48550/arXiv.2307.09288 (2023). Zhao, W. X. et al. A Survey of Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2303.18223 (2024). Sennrich, R., Haddow, B. & Birch, A. Neural Machine Translation of Rare Words with Subword Units. Preprint at https://doi.org/10.48550/arXiv.1508.07909 (2016). Vaswani, A. et al. Attention is All you Need. in Advances in Neural Information Processing Systems vol. 30 (Curran Associates, Inc., 2017). Lewis, M. et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Preprint at https://doi.org/10.48550/arXiv.1910.13461 (2019). Raffel, C. et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Preprint at https://doi.org/10.48550/arXiv.1910.10683 (2023). Chowdhery, A. et al. PaLM: Scaling Language Modeling with Pathways. Preprint at https://doi.org/10.48550/arXiv.2204.02311 (2022). Chen, Z. et al. Spatial-temporal short-term traffic flow prediction model based on dynamical-learning graph convolution mechanism. INFORMATION SCIENCES 611 , 522–539 (2022). Chen, J. et al. Node Connection Strength Matrix-Based Graph Convolution Network for Traffic Flow Prediction. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 72 , 12063–12074 (2023). Gao, H., Jia, H. & Yang, L. An Improved CEEMDAN-FE-TCN Model for Highway Traffic Flow Prediction. JOURNAL OF ADVANCED TRANSPORTATION 2022 , (2022). Huang, X., Tang, J., Yang, X. & Xiong, L. A time-dependent attention convolutional LSTM method for traffic flow prediction. APPLIED INTELLIGENCE 52 , 17371–17386 (2022). Xu, X., Liu, C., Zhao, Y. & Lv, X. Short-term traffic flow prediction based on whale optimization algorithm optimized BiLSTM_Attention. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 34 , (2022). Xu, X., Yang, C., Bilal, M., Li, W. & Wang, H. Computation Offloading for Energy and Delay Trade-Offs With Traffic Flow Prediction in Edge Computing-Enabled IoV. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 24 , 15613–15623 (2023). He, R., Xiao, Y., Lu, X., Zhang, S. & Liu, Y. ST-3DGMR: Spatio-temporal 3D grouped multiscale ResNet network for region-based urban traffic flow prediction. INFORMATION SCIENCES 624 , 68–93 (2023). Zhou, S. et al. Short-Term Traffic Flow Prediction of the Smart City Using 5G Internet of Vehicles Based on Edge Computing. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 24 , 2229–2238 (2023). Naheliya, B., Redhu, P. & Kumar, K. MFOA-Bi-LSTM: An optimized bidirectional long short-term memory model for short-term traffic flow prediction. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS 634 , (2024). Tan, G. et al. A noise-immune and attention-based multi-modal framework for short-term traffic flow forecasting. SOFT COMPUTING 28 , 4775–4790 (2024). Duan, Y. et al. FDSA-STG: Fully Dynamic Self-Attention Spatio-Temporal Graph Networks for Intelligent Traffic Flow Prediction. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 71 , 9250–9260 (2022). Yan, B., Wang, G., Yu, J., Jin, X. & Zhang, H. Spatial-Temporal Chebyshev Graph Neural Network for Traffic Flow Prediction in IoT-Based ITS. IEEE INTERNET OF THINGS JOURNAL 9 , 9266–9279 (2022). Huo, G. et al. Hierarchical Spatio-Temporal Graph Convolutional Networks and Transformer Network for Traffic Flow Forecasting. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 24 , 3855–3867 (2023). Lai, Q., Tian, J., Wang, W. & Hu, X. Spatial-Temporal Attention Graph Convolution Network on Edge Cloud for Traffic Flow Prediction. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 24 , 4565–4576 (2023). Narmadha, S. & Vijayakumar, V. Spatio-Temporal vehicle traffic flow prediction using multivariate CNN and LSTM model. Materials Today: Proceedings 81 , 826–833 (2023). Wang, Z., Sun, P., Hu, Y. & Boukerche, A. A novel hybrid method for achieving accurate and timeliness vehicular traffic flow prediction in road networks. COMPUTER COMMUNICATIONS 209 , 378–386 (2023). Wu, K. et al. Error-distribution-free kernel extreme learning machine for traffic flow forecasting. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 123 , (2023). Xing, H., Chen, A. & Zhang, X. RL-GCN: Traffic flow prediction based on graph convolution and reinforcement for smart cities. DISPLAYS 80 , (2023). Yang, D. & Lv, L. A Graph Deep Learning-Based Fast Traffic Flow Prediction Method in Urban Road Networks. IEEE ACCESS 11 , 93754–93763 (2023). Jia, Q., Zang, J. & Liu, S. Deep learning based traffic flow prediction model on highway research. in (eds. Ghanizadeh, A. & Jia, H.) vol. 13064 (2024). Lu, W. et al. Traffic flow prediction for highway vehicle detectors through decomposition and machine learning. TRANSPORTATION LETTERS-THE INTERNATIONAL JOURNAL OF TRANSPORTATION RESEARCH (2024) doi:10.1080/19427867.2024.2339631. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. & Lang, K. J. Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing 37 , 328–339 (1989). Hochreiter, S. & Schmidhuber, J. Long Short-Term Memory. Neural Computation 9 , 1735–1780 (1997). Hale, T. et al. A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker). Nat Hum Behav 5 , 529–538 (2021). Huang, X., Lan, Y., Ye, Y., Wang, J. & Jiang, Y. Traffic Flow Prediction Based on Multi-Mode Spatial-Temporal Convolution of Mixed Hop Diffuse ODE. ELECTRONICS 11 , (2022). Su, Z., Liu, T., Hao, X. & Hu, X. Spatial-temporal graph convolutional networks for traffic flow prediction considering multiple traffic parameters. JOURNAL OF SUPERCOMPUTING 79 , 18293–18312 (2023). Wang, Z. & Bovik, A. C. Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures. IEEE Signal Processing Magazine 26 , 98–117 (2009). De Gooijer, J. G. & Hyndman, R. J. 25 years of time series forecasting. International Journal of Forecasting 22 , 443–473 (2006). Yi, H. & Bui, K.-H. N. An Automated Hyperparameter Search-Based Deep Learning Model for Highway Traffic Prediction. IEEE Trans. Intell. Transport. Syst. 22 , 5486–5495 (2021). Hyperparameter Tuning for Machine and Deep Learning with R: A Practical Guide . (Springer Nature, 2023). doi:10.1007/978-981-19-5170-1. Additional Declarations No competing interests reported. Supplementary Files Appendix.docx Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 30 Jul, 2025 Reviews received at journal 27 Jun, 2025 Reviews received at journal 23 Jun, 2025 Reviews received at journal 21 Jun, 2025 Reviewers agreed at journal 16 Jun, 2025 Reviews received at journal 12 Jun, 2025 Reviewers agreed at journal 12 Jun, 2025 Reviewers agreed at journal 12 Jun, 2025 Reviewers agreed at journal 11 Jun, 2025 Reviewers invited by journal 11 Jun, 2025 Editor assigned by journal 15 May, 2025 Editor invited by journal 15 May, 2025 Submission checks completed at journal 14 May, 2025 First submitted to journal 01 May, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6572761","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":450695498,"identity":"90aeb6d4-03e8-4585-afae-a2ed2f52c1bd","order_by":0,"name":"Yue Li","email":"","orcid":"","institution":"University of Glasgow","correspondingAuthor":false,"prefix":"","firstName":"Yue","middleName":"","lastName":"Li","suffix":""},{"id":450695499,"identity":"064c42c1-b95f-44de-8260-882780b45733","order_by":1,"name":"Qunshan Zhao","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABJ0lEQVRIiWNgGAWjYBACfiBmBjHsZwCJDwwWSHI82LVINkC1MM4AYwnCWgwOIGlh5iFKy43cw58Lau4wMEs3P3xs80tCXr5/8TEJhho7BoMzB3BoyUuTnnHsGQObzDFj49w+CcMNN56lSTAcS2YwONuAQ0uOGTMP22EGHokEM+ncHgnGDRJnzG4wsB1gMDiP3WFmN3KMP/P8O8wgIZH+TdqyR8J+/gyQln+4tRjfyDGQ5m07zGAgkWMmzfBDIrHhfI/ZDca2AzgdJtnzxkyat+8wD1BLsWFvg0Tyhhts6T8S+5J5JHF4n98d5LBvh+XsZ6RvfPDjj43t/P7Dhw0+fLOT4zuTgN1lUACJA8Y2ICEBVJmAM1YwwB+QxdjdMwpGwSgYBSMXAACBbF9ng1XHWQAAAABJRU5ErkJggg==","orcid":"","institution":"University of Glasgow","correspondingAuthor":true,"prefix":"","firstName":"Qunshan","middleName":"","lastName":"Zhao","suffix":""},{"id":450695501,"identity":"a7f0bdd3-b921-4b6f-965e-a0b510ec9421","order_by":2,"name":"Mingshu Wang","email":"","orcid":"","institution":"University of Glasgow","correspondingAuthor":false,"prefix":"","firstName":"Mingshu","middleName":"","lastName":"Wang","suffix":""}],"badges":[],"createdAt":"2025-05-01 15:08:14","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6572761/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6572761/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":82231159,"identity":"67936f20-873a-4017-9828-edfc681c0e8f","added_by":"auto","created_at":"2025-05-08 06:06:05","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":580673,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of traffic flow datasets by time span, time interval, and data volume.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-6572761/v1/017f106c44d36de2fc19591c.png"},{"id":82231964,"identity":"e892200e-6111-463b-bcd0-e8ee33139021","added_by":"auto","created_at":"2025-05-08 06:14:05","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":712883,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of the post-COVID-19 and entire dataset performance (MAE) of models across different input lengths.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-6572761/v1/29a73098df907fa32bb41322.png"},{"id":82231965,"identity":"a95b3c3a-8397-4eaf-b30d-6a32c2d65c29","added_by":"auto","created_at":"2025-05-08 06:14:05","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":654053,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of the post-COVID-19 and entire dataset performance (MAPE) of models across different input lengths.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-6572761/v1/c927e989c3798bd464480420.png"},{"id":82231967,"identity":"e281384a-9e6e-4730-8613-6bf2408c478a","added_by":"auto","created_at":"2025-05-08 06:14:05","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":705630,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of the post-COVID-19 and entire dataset performance (RMSE) of models across different context lengths.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-6572761/v1/e18795c741e6399f450f5955.png"},{"id":82231167,"identity":"f945367d-b952-4877-bb6f-9e71d764094e","added_by":"auto","created_at":"2025-05-08 06:06:05","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":127343,"visible":true,"origin":"","legend":"\u003cp\u003eTraining time of CNN and LSTM.\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-6572761/v1/a86f11006408f2da923eac4b.png"},{"id":82231169,"identity":"ffd3350c-047e-49e9-8380-98410df743d2","added_by":"auto","created_at":"2025-05-08 06:06:05","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":252792,"visible":true,"origin":"","legend":"\u003cp\u003eInference time of models.\u003c/p\u003e","description":"","filename":"Figure6.png","url":"https://assets-eu.researchsquare.com/files/rs-6572761/v1/ccadeeba1b6110d8d6b5fc78.png"},{"id":82231174,"identity":"9f6e2456-862b-47e4-bd49-43c8453851b7","added_by":"auto","created_at":"2025-05-08 06:06:05","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":360914,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of the post-COVID-19 and entire dataset performance of models across different model sizes.\u003c/p\u003e","description":"","filename":"Figure7.png","url":"https://assets-eu.researchsquare.com/files/rs-6572761/v1/e88612175aa860e0ace584d2.png"},{"id":82231164,"identity":"72b4f0a6-a062-4fd6-842c-d464eaccedfe","added_by":"auto","created_at":"2025-05-08 06:06:05","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":172483,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of the cumulative running time of models.\u003c/p\u003e","description":"","filename":"Figure8.png","url":"https://assets-eu.researchsquare.com/files/rs-6572761/v1/49a98f5d60bdea22fc603af1.png"},{"id":82231177,"identity":"23ce178b-eb89-48f8-baeb-e86e9ff9b3fa","added_by":"auto","created_at":"2025-05-08 06:06:06","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":472938,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of model performance across different context lengths.\u003c/p\u003e","description":"","filename":"Figure9.png","url":"https://assets-eu.researchsquare.com/files/rs-6572761/v1/8f00346fb962a7fccf2e388b.png"},{"id":82233017,"identity":"49333443-c087-4ece-99af-3aacade14c41","added_by":"auto","created_at":"2025-05-08 06:30:13","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":4600378,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6572761/v1/b3438259-deb1-4e2e-bb56-82f52d5ac638.pdf"},{"id":82231162,"identity":"5b6d80b3-aae4-4ff9-89f8-5c047b45310c","added_by":"auto","created_at":"2025-05-08 06:06:05","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":17997,"visible":true,"origin":"","legend":"","description":"","filename":"Appendix.docx","url":"https://assets-eu.researchsquare.com/files/rs-6572761/v1/5abed1800337351ace01f8bd.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Zero-Shot Traffic Flow Prediction with Large Language Models: A Comparison with Deep Learning Approaches","fulltext":[{"header":"Introduction","content":"\u003cp\u003eTraffic flow prediction is a critical task of intelligent transportation systems (ITS) \u003csup\u003e1,2\u003c/sup\u003e, focusing on predicting future traffic flow conditions based on historical and real-time data \u003csup\u003e3\u0026ndash;6\u003c/sup\u003e. Accurate traffic prediction plays a crucial role in urban planning, infrastructure management, and traffic control, which helps to reduce traffic congestion, enhance road safety, and lower environmental impacts \u003csup\u003e6\u0026ndash;9\u003c/sup\u003e. Due to the rapid growth of urban populations and vehicle ownership worldwide, cities face increasing challenges to maintain smooth traffic flow \u003csup\u003e10\u003c/sup\u003e. Effective prediction of traffic flows provides transportation authorities, city planners, and travellers with timely information that enables better decision-making, more efficient resource allocation, and an improved quality of life \u003csup\u003e2,11,12\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe traffic flow prediction models have evolved from traditional statistical approaches to deep learning models in recent years \u003csup\u003e5\u003c/sup\u003e. Initially, statistical models such as Autoregressive Integrated Moving Average (ARIMA) dominated the field due to their simplicity, interpretability \u003csup\u003e13\u003c/sup\u003e, and effectiveness in modelling linear \u003csup\u003e14\u003c/sup\u003e and stationary time-series data. However, these models often struggled to capture complex, non-linear relationships in real-world traffic conditions \u003csup\u003e15\u003c/sup\u003e. To overcome these limitations, deep learning models have been applied broadly, such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks \u003csup\u003e2,6,15\u0026ndash;17\u003c/sup\u003e. CNNs are good at capturing local temporal patterns in traffic data by applying convolution operations along the time dimension \u003csup\u003e18\u003c/sup\u003e, while LSTMs are effective in modelling long-term temporal dependencies because of their gated mechanisms \u003csup\u003e19\u0026ndash;21\u003c/sup\u003e. Despite their improved accuracy and adaptability to complex scenarios, deep learning models typically require extensive computational resources and large datasets for training \u003csup\u003e11\u003c/sup\u003e, which presents challenges for practical implementation and real-time applications. Besides, these models are often developed for specific tasks \u003csup\u003e22\u003c/sup\u003e, which may lead to overfitting and reducing their ability to generalise effectively to new or unseen data.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;In the context of government mobility interventions and large-scale disruptive events, such as COVID-19, traffic flow prediction becomes even more challenging. The imposed restrictions, such as lockdowns and social distancing measures, caused rapid changes in travel behavior, leading to increased irregularities and deviations from historical traffic trends \u003csup\u003e23\u0026ndash;29\u003c/sup\u003e. Most existing traffic prediction models only trained on pre-pandemic data \u003csup\u003e12,13,21,30\u003c/sup\u003e, which limits their ability to make accurate predictions on traffic flows during periods of significant disruption. Limited models have attempted to address this challenge by considering external factors, such as the effects of imposed response measure \u003csup\u003e31\u003c/sup\u003e and the evolving status of COVID-19 \u003csup\u003e32\u003c/sup\u003e. Another research involved decomposing irregular traffic flows into distinct attributes and predicting them separately to improve model performance during the pandemic \u003csup\u003e33\u003c/sup\u003e. However, these models often rely on the availability of extensive labeled datasets that capture the effects of mobility restrictions and event status over time, making them difficult to implement in real-time and across various regions with differing policies.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;Recently, the development of Large Language Models (LLMs) has introduced new opportunities for traffic flow prediction \u003csup\u003e5\u003c/sup\u003e. Initially designed for natural language processing tasks, LLMs learn extensive general knowledge by pre-training on large amounts of textual data \u003csup\u003e34\u0026ndash;36\u003c/sup\u003e. A distinctive strength of pre-trained LLMs is their capability for zero-shot prediction, allowing them to perform various tasks without requiring task-specific training examples \u003csup\u003e37\u0026ndash;39\u003c/sup\u003e. Models such as GPT have demonstrated impressive zero-shot performance across numerous language understanding and generation tasks from different domains \u003csup\u003e34,40\u0026ndash;42\u003c/sup\u003e. Motivated by these advances, researchers have started exploring pre-trained LLMs for time series prediction \u003csup\u003e38,39,43\u003c/sup\u003e. By specifically pre-training existing transformer-based language model architectures on large-scale time series datasets, these models learn to capture the temporal patterns and dynamics in sequential data effectively \u003csup\u003e22,44\u003c/sup\u003e. Inspired by this capability, we are motivated to apply pre-trained time series LLMs to zero-shot traffic flow prediction tasks, enabling accurate forecasting of traffic conditions without the need for large amounts of task-specific datasets.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;The overarching goal of this research is to present a comprehensive analysis to compare the performance of traditional deep learning models and pre-trained time series LLMs for traffic flow prediction under normal conditions and disruptive events. The contribution of this paper is threefold. First, it bridges the research gap in applying pre-trained LLMs to traffic flow prediction by comparing the performance of two groups of time-series foundation models and traditional deep learning models on the SCOOT dataset \u003csup\u003e45\u003c/sup\u003e. Second, it evaluates prediction accuracy on heterogeneous and unusual traffic patterns, an area that has been sparsely explored in previous research. It utilises a long-term traffic flow dataset that includes a unique global pandemic period, which allows models to capture long-term traffic trends, seasonal fluctuations, and emergency-related variations, contributing to more robust predictive performance. Third, it highlights the performance gap between pre-trained LLMs with different model size and the diversity and temporal coverage of its training data. A well-trained LLM with comprehensive datasets is more likely to achieve superior zero-shot performance, making it a practical and efficient choice for real-world traffic flow prediction applications. The research outputs will support city planners in integrating pre-trained time-series LLMs into intelligent traffic control systems, enhancing their ability to respond effectively to both routine traffic conditions and unexpected disruptions. It is valuable in helping transportation authorities and urban policymakers make informed, data-driven decisions in traffic management for future large-scale emergencies.\u003c/p\u003e"},{"header":"Background","content":"\u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003eTraffic Flow Prediction\u003c/h2\u003e \u003cp\u003eTraffic prediction aims to forecast key factors such as vehicle flow, speed, and congestion levels \u003csup\u003e\u003cspan additionalcitationids=\"CR47 CR48 CR49 CR50\" citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e,\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e,\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e\u003c/sup\u003e. Traffic flow prediction is one of the most fundamental and widely studied tasks in ITS. Traditional approaches typically rely on statistical models, such as ARIMA \u003csup\u003e\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003e or Kalman filter \u003csup\u003e\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e, to capture traffic patterns and seasonalities, often serving as a strong baseline when data exhibit relatively stable trends \u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. While these models are relatively straightforward and interpretable \u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e, they may struggle with irregular fluctuations in large-scale transportation networks \u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. To address these complexities, researchers introduced machine learning models such as Random Forests \u003csup\u003e\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e\u003c/sup\u003e and Support Vector Machines \u003csup\u003e\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e\u003c/sup\u003e. By integrating a richer set of input features, these methods can account for additional factors like weather conditions, special events, or road incidents \u003csup\u003e\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e\u003c/sup\u003e. Although more flexible than purely statistical techniques, they often struggle to achieve consistently robust performance across diverse traffic scenarios.\u003c/p\u003e \u003cp\u003eIn recent years, deep learning approaches have shown significant promise due to their capability for automatic feature extraction and handling complex dependencies. Convolutional Neural Networks (CNNs), traditionally used for image data, have been adapted for time series prediction by applying convolutional filters along the temporal dimension \u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. This allows CNNs to extract local features, detect short-term patterns, and reduce noise in traffic data. Recurrent Neural Networks (RNNs) further enhance sequence modelling by processing time series data, capturing the dynamic behaviour of traffic flow \u003csup\u003e\u003cspan additionalcitationids=\"CR59\" citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e\u003c/sup\u003e. To overcome limitations like vanishing gradients in standard RNNs, Long Short-Term Memory (LSTM) networks, an advanced type of RNN, employ gating mechanisms to maintain long-term dependencies \u003csup\u003e\u003cspan additionalcitationids=\"CR62 CR63 CR64\" citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e\u003c/sup\u003e, making them especially effective for predicting complex temporal patterns such as rush-hour peaks or irregular traffic flows.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eLarge Language Models\u003c/h2\u003e \u003cp\u003eRecent progress in computer hardware and the availability of large text datasets have led to the development of transformer-based LLMs that demonstrate impressive performance on various natural language processing tasks \u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e,\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e,\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e,\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e\u003c/sup\u003e. Language models are designed to predict the next token in a sequence by estimating the probability of each token based on those that have already appeared \u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e. Tokens may be characters, subwords \u003csup\u003e\u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e\u003c/sup\u003e, or words from a vocabulary. The transformer architecture \u003csup\u003e\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e\u003c/sup\u003e was initially developed as an encoder-decoder system for machine translation \u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e and is currently applied in many popular models, such as BART \u003csup\u003e\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e\u003c/sup\u003e and T5 \u003csup\u003e71\u003c/sup\u003e. In these models, the input text is first converted into a continuous representation using an encoder, after which the decoder generates output tokens sequentially based on the representation and previous tokens. Alternatively, a decoder-only architecture, used in models like GPT-3 \u003csup\u003e34\u003c/sup\u003e and Llama 2 \u003csup\u003e66\u003c/sup\u003e, only considers tokens before the current token when making predictions. This architecture simplifies the model's design while still achieving robust performance.\u003c/p\u003e \u003cp\u003eLLMs are trained on extensive collections of text and can have millions to hundreds of billions of parameters \u003csup\u003e\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e,\u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e72\u003c/span\u003e\u003c/sup\u003e. Researchers have found that increasing the number of parameters in these models leads to better performance \u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e. When the number of parameters becomes large enough, LLMs perform traditional language tasks more accurately and show new abilities that smaller models lack \u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e. Zero-shot generalisation is one such ability where the model makes predictions on tasks it was not explicitly trained for \u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e. For example, Brown \u003cem\u003eet al.\u003c/em\u003e (2020) demonstrated that as the number of parameters grows, LLMs acquire the skill to handle new tasks without additional, task-specific training. This connection between model parameters and zero-shot generalisation highlights that LLMs not only improve their flexibility and power in language understanding and generation but also become capable of tackling challenges such as forecasting time series data.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eLarge Language Models for Time-series Prediction\u003c/h3\u003e\n\u003cp\u003eLLMs have recently developed as useful tools for time series forecasting by using their powerful sequence modelling and pattern recognition capabilities \u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e. PromptCast \u003csup\u003e\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e\u003c/sup\u003e first treats time series forecasting as a natural language generation task, converting numerical inputs and outputs into textual prompts, thus allowing general-purpose language models to serve as core forecasting engines. However, PromptCast often requires carefully designed prompts, which can be time-consuming in complex or domain-specific scenarios. LLMTime \u003csup\u003e\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e addresses these limitations by directly tokenising time series data and treating forecasting as next-token prediction. This tokenisation strategy not only avoids extensive prompt engineering but also allows pre-trained LLMs, like GPT-3 and LLaMA, to produce robust zero-shot forecasts across a variety of benchmark datasets \u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e,\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e. However, LLMTime can be computationally and memory-intensive due to the large size of models, and it requires careful rescaling of data to handle varying magnitudes or precision.\u003c/p\u003e \u003cp\u003eUnlike PromptCast and LLMTime repurpose large pre-trained LLMs with textual or digit-based prompts, researchers have further developed time series-specific LLMs by training foundation models with large, diverse time series datasets. Rasul \u003cem\u003eet al.\u003c/em\u003e (2024) propose a foundation model (Lag-Llama) designed explicitly for univariate time series forecasting. Built on a decoder-only transformer architecture that uses lag features as covariates, Lag-Llama is pre-trained on a broad collection of real-world time series across multiple domains including energy, transportation, economics, environmental science, air quality and cloud operations. This large-scale pre-training process allows it to capture a wide range of time series patterns, enabling strong performance in zero-shot generalisation. Recent concurrent work, Chronos \u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e, offers a similarly broad framework for pretrained time series forecasting but adapts standard language model architectures T5 \u003csup\u003e71\u003c/sup\u003e to treat real-valued time series as discrete tokens. Using scaling and uniform binning, Chronos converts continuous sequences into a fixed vocabulary. Once tokenised, it trains a language model on an extensive collection of public and synthetic time series datasets, thus learning to model a wide range of temporal patterns. Chronos demonstrates superior performance across 42 benchmark datasets, outperforming in-domain and zero-shot scenarios.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eModel Performance Comparison\u003c/h2\u003e \u003cp\u003eThis study compares traffic flow prediction model performance across the entire dataset and post-COVID-19 dataset at different context lengths (input lengths) between deep learning and LLMs (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e2\u003c/span\u003e\u0026ndash;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003e). The evaluation results clearly distinguish between model performance when trained on the post-COVID-19 dataset and the entire dataset. Across all models, evaluation metrics (MAE, MAPE and RMSE) are consistently lower for the post-COVID-19 dataset. This suggests that deep learning and LLMs perform better with stable traffic patterns. Specifically, the improvements of deep learning models are moderate, with a slight decrease in RMSE and MAPE when predicted on the post-COVID-19 dataset, suggesting that traditional deep learning models may be less sensitive to different data patterns. In contrast, LLMs demonstrate a noticeable performance gap between different datasets. When predicted on post-COVID-19 data, the reduction in MAE, MAPE and RMSE is more evident than deep learning models, particularly for Lag-LLaMA, indicating improved model adaptability to stable traffic dynamics.\u003c/p\u003e \u003cp\u003eAccording to the context length, increasing the context length leads to improved prediction performance across LLMs but limited improvement in deep learning models. CNN and LSTM show relatively worse performance with increased context length, especially for the entire dataset. While LSTM maintains a relatively stable trend, CNN exhibits more fluctuations, particularly for longer context lengths, suggesting potential overfitting or inefficiencies in capturing long-term dependencies. LLMs consistently reduce MAE, MAPE and RMSE as context length increases, although with slight fluctuations. Lag-LlaMA, in particular, demonstrates the most significant improvement, reinforcing its ability to apply long historical sequences effectively. These findings highlight the superior capacity of LLMs to process and utilize long-term dependencies in time series prediction.\u003c/p\u003e \u003cp\u003eAlthough LLMs generally benefit from longer context lengths, their performance declines when the context is short\u0026mdash;often falling behind traditional deep learning models. In particular, Lag-LLaMA consistently yields higher MAE and RMSE values than both CNN and LSTM across all context lengths when evaluated on the full dataset. This can be attributed to the zero-shot nature of these pre-trained models, which rely on broadly learned universal patterns from large-scale, high-quality data rather than task-specific training \u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e. The entire dataset, which includes the more heterogeneous and unusual traffic patterns, would make the zero-shot prediction more demanding on these models. In contrast, the post-COVID-19 dataset exhibits more stable and universal traffic flow dynamics, enabling the LLMs to utilise their extensive pre-training more effectively and outperform the traditional deep learning models. Besides, the consistent performance improvements observed with longer context lengths demonstrate the importance of providing LLMs with sufficient historical information to enhance their zero-shot predictions in time series prediction.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eTraining Time and Inference Time Analysis\u003c/h3\u003e\n\u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003e illustrates the trade-off between training time and Mean Absolute Error (MAE) for CNN and LSTM models on the post-COVID-19 and entire datasets. The CNN consistently achieves lower MAE and demonstrates superior computational efficiency on both datasets, outperforming the LSTM by all measures. Specifically, the CNN completes training epochs in considerably less time, suggesting that its convolution-based structure may be faster to capture features of time series data with fewer parameters. Meanwhile, the LSTM exhibits an evident increase in training time, requiring more than twice that of the CNN and tends to produce higher MAE values, especially on the entire dataset. This difference is likely due to the larger data size and the sequential nature of LSTM, which requires more complex computations per timestep. However, the increased training time and more considerable dataset help bridge the performance gap between LSTM and CNN. This suggests that LSTM may achieve better accuracy by capturing complex temporal dependencies when provided with sufficient data and training time.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe also compare the inference time for all the models, including deep learning and LLMs. It is important to note that Lag-Llama is intentionally omitted from this figure due to its exceptionally high inference times \u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. It requires approximately 132 seconds per epoch on the post-COVID-19 dataset and 875 seconds per epoch on the entire dataset, substantially longer than the inference times observed for the other models. For both post-COVID-19 and entire datasets, CNN and LSTM exhibit relatively short inference times but moderately higher MAEs, while the Chronos models show a broader range of inference times, generally increasing with model size. Similar to the training time comparison, LSTM consistently exhibits longer inference times and higher MAEs than CNN. This highlights that the sequential structure of LSTM requires significantly more computational resources, which may limit its applicability in time-sensitive and resource-constrained scenarios compared to CNN.\u003c/p\u003e \u003cp\u003eFor Chronos models, it can be seen that the larger Chronos configurations tend to achieve lower MAEs at the cost of longer inference durations. This is because Chronos are probabilistic time series models that rely on autoregressively sampling from the predicted distribution \u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e, which leads to longer inference times than deep learning models producing point predictions. This effect is particularly significant for larger Chronos models with more parameters, leading to increased computational resources. A key observation is that Chronos (Small) achieves shorter inference times and better performance than Chronos (Mini). This indicates that specific Chronos configurations may effectively balance computational effort and predictive accuracy. Beyond this point, increasing model size leads to significantly longer inference times but only slightly increased performance.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eModel Size Analysis\u003c/h2\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e7\u003c/span\u003e compares the average performance of different context lengths with model size, which is measured by the number of parameters. Specifically, the number of parameters for CNN and LSTM depends on their architecture, hyperparameters, and context length, while pre-trained LLMs maintain a fixed parameter number. From Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e7\u003c/span\u003e, it is clear that deep learning models and LLMs demonstrate different performance trends as their sizes change. For LLMs, increasing the number of parameters generally leads to improved prediction accuracy on both the post-COVID-19 dataset and the entire dataset, suggesting that additional parameters help capture complex traffic patterns. Meanwhile, CNN slightly outperforms LSTM on the post-COVID-19 dataset, which might be because CNN is more effective at capturing local temporal features on smaller data sizes. LSTM remains stable on the entire dataset, possibly due to its ability to apply longer sequence dependencies for complex traffic flow prediction. Besides, Lag-Llama's performance decreases obviously on the entire dataset, while Chronos models demonstrate robust performance across different sizes, indicating that they effectively balance complexity and accuracy.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this study, we compare the performance of traffic flow prediction on traditional deep learning models and cutting-edge LLMs. The deep learning models, CNN and LSTM, are trained on the SCOOT dataset ourselves, while Lag-Llama and Chronos are pre-trained time series LLMs, which can be applied for zero-shot prediction. We have found that LLMs with longer context lengths and larger model sizes tend to achieve higher prediction accuracy, while deep learning models show limited improvement. This suggests that deep learning models may suffer from overfitting or inefficiencies in capturing long-term dependencies. In contrast, LLMs demonstrate a superior ability to process and utilise long-term dependencies in time series prediction. However, there is a trade-off between model performance and inference time, as increasing context length and model size require more significant computational resources. Moreover, although LLMs are more sensitive to traffic patterns, they outperform traditional deep learning models in both usual and unusual traffic conditions, with Chronos demonstrating particularly strong performance.\u003c/p\u003e \u003cp\u003eIn our experiments, we evaluate training and inference times separately to provide a clear understanding of the computational demands of each stage. However, for a comprehensive comparison between deep learning models and LLMs, we calculate the total running time, combining training and inference durations for the deep learning models. Specifically, we calculate the cumulative running time over 100 training epochs for deep learning models and one prediction epoch for all models. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003e, the running times of Chronos are significantly lower than those of other models, particularly for the small dataset. While the running time of CNN is shorter than that of Lag-Llama in this case, it overlooks hyperparameter tuning, which is essential in the deep learning training process for each prediction task. The tuning process involves testing dozens of hyperparameter combinations \u003csup\u003e\u003cspan citationid=\"CR101\" class=\"CitationRef\"\u003e101\u003c/span\u003e,\u003cspan citationid=\"CR102\" class=\"CitationRef\"\u003e102\u003c/span\u003e\u003c/sup\u003e, with each test requiring an amount of time equivalent to the running time observed here (since the inference times of CNN and LSTM are too short to be considered). This leads to a practical running time multiple times greater than the running time here. As a result, the running time for LLMs is shorter than that of deep learning models. Besides, deep learning models require separate training for each prediction task. In contrast, pre-trained LLMs can be directly applied to different prediction tasks across various datasets with varying context lengths. This significantly simplifies model deployment and streamlines forecasting pipelines, eliminating the need for task-specific training.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eA key limitation of LLMs is the performance gap between different models. Our findings indicate that Lag-Llama shows only limited improvements in prediction accuracy compared to deep learning models, while Chronos consistently demonstrates strong performance across various context lengths. To further illustrate this performance gap, we used another publicly available and widely used traffic flow dataset collected by the Caltrans Performance Measurement System (PeMS) in California, USA from January 1 to December 31, 2018. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003e, the results are consistent with those from the SCOOT dataset in our research, with Chronos significantly outperforming Lag-Llama across all context lengths. Since both Chronos and Lag-Llama are pre-trained on a diverse set of publicly available datasets, the observed performance difference may stem from their training data. Comparing their datasets, we find that Chronos is trained on seven different datasets, while Lag-Llama uses only three. Besides, Chronos incorporates synthetic data generated using Gaussian processes to enhance its training process. The training data for Chronos ranges from 2009 to 2022, while Lag-Llama's training data is limited to 2009 and 2014\u0026ndash;2016. This suggests that training LLMs on a more extensive corpus of time series data improves zero-shot performance. Moreover, model size plays a crucial role in performance. Chronos offers models ranging from 8M (Tiny) to 710M (Large) parameters, which are significantly larger than Lag-Llama's 2.45M parameters and are likely contributing to its superior predictive accuracy.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn summary, this research highlights that LLMs can achieve excellent zero-shot performance in traffic flow prediction under both normal conditions and disruptive events. Unlike traditional deep learning models, which require extensive task-specific training and domain expertise, pre-trained LLMs can be directly applied to datasets with different data sizes, traffic dynamics, and context lengths. Besides, while deep learning models require large-scale historical data for training and validation, LLMs can make accurate zero-shot predictions with only a small subset of contextual data. Those advantages can address critical limitations of traditional deep learning methods, such as time-consuming training processes, overfitting due to task-specific model training, and limited generalisation capabilities. Additionally, they can contribute to the practical deployment of traffic prediction models across diverse and dynamically changing urban scenarios. However, choosing an appropriate LLM is crucial, as performance depends on factors such as training data diversity, time coverage, and model size. A well-trained LLM with a comprehensive dataset is more likely to achieve superior zero-shot performance, making it a practical and efficient choice for real-world traffic prediction applications.\u003c/p\u003e \u003cp\u003eUsing LLMs for traffic flow prediction and other time series analysis tasks in urban settings presents both opportunities and challenges. The development of LLMs has been one of the fastest-growing areas over the past two and a half years, since OpenAI released the first version of ChatGPT 3.5 on November 2022. As foundation models continue to evolve in both capability and efficiency, the prediction accuracy of LLM-based time series models is expected to improve correspondingly. As we observed in this research, although LLMs exhibit superior performance in this research, with faster inference and higher accuracy, they still have several limitations. Firstly, the training process of LLMs is typically very expensive and time-consuming, making it less accessible for academic institutions or small research groups to extend or revise pre-trained models. Furthermore, the efficiency of LLMs heavily relies on the underlying foundation models, and the most advanced foundation models are often closed-source and developed by leading companies in generative AI. As a result, most researchers can only rely on \"less advanced\" or \"older generation\" publicly available foundation models to design fine-tuned models for time series prediction. In the future, a better collaboration between AI companies and academia is necessary to enable further customised model development. Lastly, the limited scope and diversity of training data \u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e for time-series LLMs lead to performance disparities and prediction biases across foundation models. Future work can focus on building and maintaining large-scale, diverse traffic datasets to improve model training and predictive accuracy across various scenarios. The model can also be further evaluated in regions with limited data, such as those in the Global South, to assess the effectiveness of using foundation models trained on datasets from Global North countries in different geographical contexts. Governments can encourage collaborative data-sharing initiatives between public and private sectors to expand the availability of high-quality traffic data for model development.\u003c/p\u003e"},{"header":"Data and Methods","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eDatasets\u003c/h2\u003e \u003cp\u003eThis section details the dataset employed to evaluate the predictive performance of deep learning and LLMs, with real-world traffic data collected via a Split Cycle Offset Optimisation Technique (SCOOT) based Urban Traffic Control system \u003csup\u003e\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e. The SCOOT uses a network of sensors to capture traffic flow data across the road network. The dataset includes traffic flows from the Glasgow City Council area over four consecutive years, from October 1, 2019, to September 30, 2023, which includes the COVID-19 pandemic period \u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. There are 470 sensors in the SCOOT dataset which record traffic flows at 60-minute intervals.\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e1\u003c/span\u003e compares the attributes of the SCOOT dataset with other traffic flow datasets applied in recent traffic flow prediction research from 2022 to 2024 \u003csup\u003e15,21,30,52,73\u0026ndash;82\u003c/sup\u003e. Most datasets cover no more than one year during normal periods and use time intervals of less than 30 minutes \u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e,\u003cspan additionalcitationids=\"CR84 CR85\" citationid=\"CR83\" class=\"CitationRef\"\u003e83\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR86\" class=\"CitationRef\"\u003e86\u003c/span\u003e,\u003cspan additionalcitationids=\"CR87 CR88 CR89 CR90 CR91 CR92\" citationid=\"CR86\" class=\"CitationRef\"\u003e86\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e93\u003c/span\u003e\u003c/sup\u003e, while the SCOOT dataset covers longer than many previous studies. Although one study utilised a seven-year traffic dataset, which is longer than the SCOOT dataset, it only covers a period of stable traffic conditions. In contrast, the SCOOT dataset captures traffic flows before, during, and after COVID-19, providing valuable insights into the drastic changes in human mobility patterns in response to government mobility interventions during a period of significant disruption. Its long-term coverage allows models to capture long-term traffic trends, seasonal fluctuations, and emergency-related variations, contributing to more robust predictive performance. Besides, The bubble size in Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e1\u003c/span\u003e represents the number of data points, and the SCOOT dataset contains a relatively large volume of observations. This large volume of data enhances deep learning and LLMs by improving their ability to learn complex traffic patterns and reducing the risk of overfitting.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eDeep Learning and Large Language Models\u003c/h2\u003e \u003cp\u003eWe select two widely used deep learning models for time series analysis, CNN \u003csup\u003e\u003cspan citationid=\"CR94\" class=\"CitationRef\"\u003e94\u003c/span\u003e\u003c/sup\u003e and LSTM \u003csup\u003e\u003cspan citationid=\"CR95\" class=\"CitationRef\"\u003e95\u003c/span\u003e\u003c/sup\u003e, and two recently developed LLM-based time series prediction models, Lag-Llama \u003csup\u003e\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e and Chronos \u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e, to assess the performance of traffic flow prediction. The details of the models are outlined as follows:\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eConvolutional Neural Network (CNN)\u003c/h2\u003e \u003cp\u003eCNN \u003csup\u003e\u003cspan citationid=\"CR94\" class=\"CitationRef\"\u003e94\u003c/span\u003e\u003c/sup\u003e captures temporal patterns by applying convolutional filters to sliding windows of sequential data. This approach effectively detects local trends and short-term dependencies, enhancing prediction accuracy. In our CNN model, two one-dimensional convolutional layers are employed, with each utilising the ReLU activation function. The first layer specifies the input shape based on the sequence length and extracts initial local features from the traffic flow data. A subsequent convolutional layer further refines these features. A max-pooling layer then reduces the dimensionality of the resulting feature maps, preserving essential representations while reducing computational cost. Finally, the network is flattened and regularised using a dropout layer to prevent overfitting before a dense layer produces forecasts over 6-time steps.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eLong Short-Term Memory (LSTM)\u003c/h2\u003e \u003cp\u003eLSTM \u003csup\u003e\u003cspan citationid=\"CR95\" class=\"CitationRef\"\u003e95\u003c/span\u003e\u003c/sup\u003e effectively models long-term dependencies and sequential relationships in time series data through gated memory cells, which retain relevant historical information while addressing vanishing gradient issues. Our LSTM model includes two hidden layers. The first LSTM layer, configured with the tanh activation function and set to return sequences, processes the input sequence to extract temporal features and passes the entire sequence to the subsequent layer. The second LSTM layer further refines these features using the tanh activation function. Each LSTM layer is followed by a dropout layer to reduce the risk of overfitting. Finally, a dense layer with six neurons is employed for multi-step prediction.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eLag-Llama\u003c/h2\u003e \u003cp\u003eLag-Llama \u003csup\u003e\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e is a foundation model for univariate probabilistic time series prediction, built on a decoder-only transformer architecture, LLaMA \u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. Lag-Llama tokenises input data by constructing lagged feature vectors using historical observations at predetermined lag intervals. These intervals include multiple standard frequencies such as quarterly, monthly, weekly, daily, hourly, and second-level frequencies. Each token also incorporates temporal covariates derived from date-time features such as hour-of-day, day-of-week, and month-of-year, enriching the representation and providing contextual information to the model. The input tokens, composed of lagged features and temporal covariates, are projected into a hidden representation and passed through a series of causally masked transformer decoder layers, employing RMSNorm and Rotary Positional Encoding (RoPE) at each attention layer. The final output from the transformer decoder is fed into a distribution head designed to predict parameters of a Student's t-distribution (degrees of freedom, mean, and scale) used for probabilistic forecasting.\u003c/p\u003e \u003cp\u003eLag-Llama applies a robust scaling procedure using median and interquartile range (IQR) normalisation to handle numerical scale variations across different time series, significantly improving training stability and forecast accuracy. During training, Lag-Llama minimises the negative log-likelihood of the forecast distribution for future values. Lag-Llama is pre-trained on 27 datasets categorised into six domains: air quality, transportation, economics, nature, energy, and cloud operations. The pre-training corpus includes 7,965 univariate series consisting of about 352\u0026nbsp;million data tokens. This extensive and diverse corpus improves Lag-Llama's ability to generalise and deliver strong zero-shot forecasting performance.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eChronos\u003c/h2\u003e \u003cp\u003eChronos \u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e is a pre-trained probabilistic forecasting framework designed specifically for time series, built on transformer-based language models. The core innovation of Chronos is its approach to treating time series forecasting similarly to natural language modelling tasks. It achieves this by tokenising continuous time series data into discrete tokens using a two-step approach: scaling and quantisation. Firstly, Chronos tokenises time series data by scaling each series individually using mean scaling, which normalises the data based on the mean of absolute historical values. Then, the scaled data are quantised into discrete bins, forming tokens from a fixed-size vocabulary. This vocabulary includes numerical bins and special tokens such as PAD (for padding sequences to equal lengths) and EOS (end-of-sequence).\u003c/p\u003e \u003cp\u003eChronos primarily employs variants of the T5 family of transformer-based language models, ranging from smaller models with approximately 8\u0026nbsp;million parameters to larger models of up to 710\u0026nbsp;million \u003csup\u003e\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e\u003c/sup\u003e. These models are trained in 5 sizes, named Tiny (8M), Mini (20M), Small (46M), Base (200M) and Large (710M), using a cross-entropy loss function, effectively framing regression as a classification task over discrete quantised bins. Chronos models provide probabilistic forecasts by autoregressively sampling from the learned categorical distributions and subsequently mapping these sampled tokens back to continuous numerical values via dequantisation and inverse scaling. To enhance training, Chronos utilises data augmentation methods: TSMixup, which creates augmented series through convex combinations of existing series, and KernelSynth, which generates synthetic series using Gaussian processes. Chronos was pre-trained on 28 datasets comprising publicly available datasets, including transport, retail, energy, finance, healthcare, and climate science, complemented by synthetic datasets. The comprehensive benchmark evaluation involved 42 datasets to assess in-domain and zero-shot forecasting performance.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eModel Implementations\u003c/h2\u003e \u003cp\u003eTo evaluate the model performance on usual traffic patterns and unusual traffic dynamics, we divide the SCOOT dataset into two subgroups \u0026ndash; the entire dataset including pandemic period, and the post-COVID-19 dataset. Based on the Stringency Index, the entire dataset contains hourly traffic flow data from October 1, 2019, to September 30, 2023, while the post-COVID-19 dataset includes data from June 3, 2022 \u003csup\u003e96\u003c/sup\u003e. Each subgroup is chronologically divided into training (60%), validation (20%), and testing (20%) sets, with a 60‑minute interval for both training and prediction. To assess the impact of context length on prediction accuracy, we train models with varying context lengths. Specifically, the context length is set to 24\u0026times;\u003cem\u003en\u003c/em\u003e hours, where \u003cem\u003en\u003c/em\u003e ranges from 1 to 21, limited by the available computational memory. These context lengths are used to predict traffic flow over the next 6 hours, a common forecasting horizon in existing research \u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e,\u003cspan citationid=\"CR83\" class=\"CitationRef\"\u003e83\u003c/span\u003e,\u003cspan citationid=\"CR97\" class=\"CitationRef\"\u003e97\u003c/span\u003e,\u003cspan citationid=\"CR98\" class=\"CitationRef\"\u003e98\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eWe conduct experiments with different hyperparameters for each context length and dataset to train deep learning models, selecting the best configuration for comparison. All the experiments are repeated 10 times, and we record the mean value of evaluation metrics to reduce the randomness of individual training runs. The Adam optimizer is employed over 100 epochs, and the best hyperparameters of each model are shown in Table \u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003eA1\u003c/span\u003e. Mean Square Error (MSE) is used as the loss function during the model training \u003csup\u003e\u003cspan citationid=\"CR99\" class=\"CitationRef\"\u003e99\u003c/span\u003e\u003c/sup\u003e:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:Loss=\\frac{1}{n}\\sum\\:_{i=1}^{n}{({\\widehat{y}}_{i}-{y}_{i})}^{2}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eAll the models are implemented in Python 3.12.4, and executed on a 64-bit Ubuntu server with Intel Xeon Gold 6334 8-Core Processor \u0026times; 2 @ 3.60GHz CPU, 125 GB of RAM, and an NVIDIA A100 GPU with 24 GB of memory. The deep learning models are developed using TensorFlow 2.17.0, and the LLMs are conducted with PyTorch 2.3.1.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eEvaluation Metrics\u003c/h2\u003e \u003cp\u003eThe accuracy of traffic prediction models is typically evaluated using performance metrics that quantify their ability to forecast traffic conditions. In this research, we employ three widely recognised metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). RMSE and MAE assess absolute errors, while MAPE evaluates relative errors \u003csup\u003e\u003cspan citationid=\"CR100\" class=\"CitationRef\"\u003e100\u003c/span\u003e\u003c/sup\u003e. In all metrics, lower values indicate better prediction performance. The formulas are as follows:\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:RMSE=\\:\\sqrt{\\frac{1}{n}\\sum\\:_{i=1}^{n}{({\\widehat{y}}_{i}-{y}_{i})}^{2}}\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(2)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:MAE=\\frac{1}{n}\\sum\\:_{i=1}^{n}|{\\widehat{y}}_{i}-{y}_{i}|\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(3)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:MAPE=\\frac{1}{n}\\sum\\:_{i=1}^{n}\\left|\\frac{{\\widehat{y}}_{i}-{y}_{i}}{{y}_{i}}\\right|\\times\\:100\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(4)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{y}_{i}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\widehat{y}}_{i}\\)\u003c/span\u003e\u003c/span\u003e represent the ground truth and the predicted value for the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:n\\)\u003c/span\u003e\u003c/span\u003eth traffic flow sample. \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:n\\)\u003c/span\u003e\u003c/span\u003e is the total number of the prediction samples.\u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003eAcknowledgement\u003c/p\u003e\n\u003cp\u003eThe first author is funded by the China Scholarship Council (CSC) from the Ministry of Education of P.R. China. Dr Qunshan Zhao has received the ESRC\u0026apos;s ongoing support for the Urban Big Data Centre (UBDC) [ES/L011921/1 and ES/S007105/1], and Royal Society International Exchange Scheme [IEC\\NSFC\\223042]. The authors want to thank the anonymous reviewers for their insightful comments and suggestions on an earlier version of this manuscript.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;CRediT authorship contribution statement\u003c/p\u003e\n\u003cp\u003eYue Li: Conceptualization; Data curation; Formal analysis; Methodology; Visualization; Writing - original draft; Writing \u0026ndash; review and editing. Qunshan Zhao: Conceptualization; Writing - review \u0026amp; editing; Supervision; Resources; Project administration; Funding acquisition. Mingshu Wang: Conceptualization; Writing \u0026ndash; review \u0026amp; editing; Supervision.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;Data availability\u003c/p\u003e\n\u003cp\u003eThe data used in this paper is publicly available. Full details about the data acquisition can be found in the documentation available at the GitHub repository: https://github.com/YueLi-0816/trafficFlowPrediction.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;Declaration of competing interest\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eYin, X. \u003cem\u003eet al.\u003c/em\u003e Deep Learning on Traffic Prediction: Methods, Analysis, and Future Directions. \u003cem\u003eIEEE Transactions on Intelligent Transportation Systems\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 4927\u0026ndash;4943 (2022).\u003c/li\u003e\n\u003cli\u003eLiu, Y., Rasouli, S., Wong, M., Feng, T. \u0026amp; Huang, T. RT-GCN: Gaussian-based spatiotemporal graph convolutional network for robust traffic prediction. \u003cem\u003eInformation Fusion\u003c/em\u003e \u003cstrong\u003e102\u003c/strong\u003e, 102078 (2024).\u003c/li\u003e\n\u003cli\u003eChen, H. \u0026amp; Rakha, H. A. Real-time travel time prediction using particle filtering with a non-explicit state-transition model. \u003cem\u003eTransportation Research Part C: Emerging Technologies\u003c/em\u003e \u003cstrong\u003e43\u003c/strong\u003e, 112\u0026ndash;126 (2014).\u003c/li\u003e\n\u003cli\u003eGuo, G. \u0026amp; Yuan, W. Short-term traffic speed forecasting based on graph attention temporal convolutional networks. \u003cem\u003eNeurocomputing\u003c/em\u003e \u003cstrong\u003e410\u003c/strong\u003e, 387\u0026ndash;393 (2020).\u003c/li\u003e\n\u003cli\u003eLiu, C. \u003cem\u003eet al.\u003c/em\u003e Spatial-Temporal Large Language Model for Traffic Prediction. Preprint at https://doi.org/10.48550/arXiv.2401.10134 (2024).\u003c/li\u003e\n\u003cli\u003eKim, Y., Tak, H., Kim, S. \u0026amp; Yeo, H. A hybrid approach of traffic simulation and machine learning techniques for enhancing real-time traffic prediction. \u003cem\u003eTransportation Research Part C: Emerging Technologies\u003c/em\u003e \u003cstrong\u003e160\u003c/strong\u003e, 104490 (2024).\u003c/li\u003e\n\u003cli\u003eChen, J. \u003cem\u003eet al.\u003c/em\u003e Traffic flow matrix-based graph neural network with attention mechanism for traffic flow prediction. \u003cem\u003eInformation Fusion\u003c/em\u003e \u003cstrong\u003e104\u003c/strong\u003e, 102146 (2024).\u003c/li\u003e\n\u003cli\u003eFan, J. \u003cem\u003eet al.\u003c/em\u003e RGDAN: A random graph diffusion attention network for traffic prediction. \u003cem\u003eNeural Networks\u003c/em\u003e \u003cstrong\u003e172\u003c/strong\u003e, 106093 (2024).\u003c/li\u003e\n\u003cli\u003eLi, Y., Zhao, Q. \u0026amp; Wang, M. Understanding urban traffic flows in response to COVID-19 pandemic with emerging urban big data in Glasgow. \u003cem\u003eCities\u003c/em\u003e \u003cstrong\u003e154\u003c/strong\u003e, 105381 (2024).\u003c/li\u003e\n\u003cli\u003eKala\u0026scaron;ov\u0026aacute;, A. \u0026amp; Stacho, M. Smooth traffic flow as one of the most important factors for safety increase in road transport. \u003cem\u003eTransport\u003c/em\u003e (2006).\u003c/li\u003e\n\u003cli\u003eRen, Y. \u003cem\u003eet al.\u003c/em\u003e TPLLM: A Traffic Prediction Framework Based on Pretrained Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2403.02221 (2024).\u003c/li\u003e\n\u003cli\u003eSattarzadeh, A. R., Kutadinata, R. J., Pathirana, P. N. \u0026amp; Huynh, V. T. A novel hybrid deep learning model with ARIMA Conv-LSTM networks and shuffle attention layer for short-term traffic flow prediction. \u003cem\u003eTransportmetrica A: Transport Science\u003c/em\u003e (2025).\u003c/li\u003e\n\u003cli\u003eZhang, Y., Tang, S. \u0026amp; Yu, G. An interpretable hybrid predictive model of COVID-19 cases using autoregressive model and LSTM. \u003cem\u003eSci Rep\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 6708 (2023).\u003c/li\u003e\n\u003cli\u003eWang, Y., Jia, R., Dai, F. \u0026amp; Ye, Y. Traffic Flow Prediction Method Based on Seasonal Characteristics and SARIMA-NAR Model. \u003cem\u003eApplied Sciences\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 2190 (2022).\u003c/li\u003e\n\u003cli\u003eKashyap, A. A. \u003cem\u003eet al.\u003c/em\u003e Traffic flow prediction models - A review of deep learning techniques. \u003cem\u003eCOGENT ENGINEERING\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, (2022).\u003c/li\u003e\n\u003cli\u003eLi, Y., Chai, S., Ma, Z. \u0026amp; Wang, G. A Hybrid Deep Learning Framework for Long-Term Traffic Flow Prediction. \u003cem\u003eIEEE Access\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 11264\u0026ndash;11271 (2021).\u003c/li\u003e\n\u003cli\u003eM\u0026eacute;ndez, M., Merayo, M. G. \u0026amp; N\u0026uacute;\u0026ntilde;ez, M. Long-term traffic flow forecasting using a hybrid CNN-BiLSTM model. \u003cem\u003eEngineering Applications of Artificial Intelligence\u003c/em\u003e \u003cstrong\u003e121\u003c/strong\u003e, 106041 (2023).\u003c/li\u003e\n\u003cli\u003eLi, Y. \u003cem\u003eet al.\u003c/em\u003e Modeling Temporal Patterns with Dilated Convolutions for Time-Series Forecasting. \u003cem\u003eACM Transactions on Knowledge Discovery from Data\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 14:1-14:22 (2021).\u003c/li\u003e\n\u003cli\u003eWu, D., Peng, K., Wang, S. \u0026amp; Leung, V. C. M. Spatial-Temporal Graph Attention Gated Recurrent Transformer Network for Traffic Flow Forecasting. \u003cem\u003eIEEE INTERNET OF THINGS JOURNAL\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 14267\u0026ndash;14281 (2024).\u003c/li\u003e\n\u003cli\u003eXia, Z., Zhang, Y., Yang, J. \u0026amp; Xie, L. Dynamic spatial-temporal graph convolutional recurrent networks for traffic flow forecasting. \u003cem\u003eEXPERT SYSTEMS WITH APPLICATIONS\u003c/em\u003e \u003cstrong\u003e240\u003c/strong\u003e, (2024).\u003c/li\u003e\n\u003cli\u003eZhao, Y. \u003cem\u003eet al.\u003c/em\u003e Dual flow fusion graph convolutional network for traffic flow prediction. \u003cem\u003eINTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 3425\u0026ndash;3437 (2024).\u003c/li\u003e\n\u003cli\u003eAnsari, A. F. \u003cem\u003eet al.\u003c/em\u003e Chronos: Learning the Language of Time Series. Preprint at https://doi.org/10.48550/arXiv.2403.07815 (2024).\u003c/li\u003e\n\u003cli\u003eParr, S., Wolshon, B., Renne, J., Murray-Tuite, P. \u0026amp; Kim, K. Traffic Impacts of the COVID-19 Pandemic: Statewide Analysis of Social Separation and Activity Restriction. \u003cem\u003eNatural Hazards Review\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 04020025 (2020).\u003c/li\u003e\n\u003cli\u003eWarren, M. S. \u0026amp; Skillman, S. W. Mobility Changes in Response to COVID-19. Preprint at https://doi.org/10.48550/arXiv.2003.14228 (2020).\u003c/li\u003e\n\u003cli\u003eBorkowski, P., Jażdżewska-Gutta, M. \u0026amp; Szmelter-Jarosz, A. Lockdowned: Everyday mobility changes in response to COVID-19. \u003cem\u003eJournal of Transport Geography\u003c/em\u003e \u003cstrong\u003e90\u003c/strong\u003e, 102906 (2021).\u003c/li\u003e\n\u003cli\u003eNouvellet, P. \u003cem\u003eet al.\u003c/em\u003e Reduction in mobility and COVID-19 transmission. \u003cem\u003eNat Commun\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 1090 (2021).\u003c/li\u003e\n\u003cli\u003ePatra, S. S., Chilukuri ,Bhargava Rama \u0026amp; and Vanajakshi, L. Analysis of road traffic pattern changes due to activity restrictions during COVID-19 pandemic in Chennai. \u003cem\u003eTransportation Letters\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 473\u0026ndash;481 (2021).\u003c/li\u003e\n\u003cli\u003eEbrahim Shaik, Md. \u0026amp; Ahmed, S. An overview of the impact of COVID-19 on road traffic safety and travel behavior. \u003cem\u003eTransportation Engineering\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 100119 (2022).\u003c/li\u003e\n\u003cli\u003eHu, Y. \u003cem\u003eet al.\u003c/em\u003e Impacts of Covid-19 mode shift on road traffic. Preprint at https://doi.org/10.48550/arXiv.2005.01610 (2023).\u003c/li\u003e\n\u003cli\u003eMa, C., Dai, G. \u0026amp; Zhou, J. Short-Term Traffic Flow Prediction for Urban Road Sections Based on Time Series Analysis and LSTM_BILSTM Method. \u003cem\u003eIEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 5615\u0026ndash;5624 (2022).\u003c/li\u003e\n\u003cli\u003eGhanim, M. S., Muley, D. \u0026amp; Kharbeche, M. ANN-Based traffic volume prediction models in response to COVID-19 imposed measures. \u003cem\u003eSustainable Cities and Society\u003c/em\u003e \u003cstrong\u003e81\u003c/strong\u003e, 103830 (2022).\u003c/li\u003e\n\u003cli\u003eLiapis, S. \u003cem\u003eet al.\u003c/em\u003e A methodology using classification for traffic prediction: Featuring the impact of COVID-19. \u003cem\u003eIntegrated Computer-Aided Engineering\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 417\u0026ndash;435 (2021).\u003c/li\u003e\n\u003cli\u003eLi, H. \u003cem\u003eet al.\u003c/em\u003e Traffic Flow Forecasting in the COVID-19: A Deep Spatial-temporal Model Based on Discrete Wavelet Transformation. \u003cem\u003eACM Trans. Knowl. Discov. Data\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 64:1-64:28 (2023).\u003c/li\u003e\n\u003cli\u003eBrown, T. \u003cem\u003eet al.\u003c/em\u003e Language Models are Few-Shot Learners. in \u003cem\u003eAdvances in Neural Information Processing Systems\u003c/em\u003e vol. 33 1877\u0026ndash;1901 (Curran Associates, Inc., 2020).\u003c/li\u003e\n\u003cli\u003eChung, H. W. \u003cem\u003eet al.\u003c/em\u003e Scaling Instruction-Finetuned Language Models. Preprint at https://doi.org/10.48550/arXiv.2210.11416 (2022).\u003c/li\u003e\n\u003cli\u003eTouvron, H. \u003cem\u003eet al.\u003c/em\u003e LLaMA: Open and Efficient Foundation Language Models. Preprint at https://doi.org/10.48550/arXiv.2302.13971 (2023).\u003c/li\u003e\n\u003cli\u003eMirchandani, S. \u003cem\u003eet al.\u003c/em\u003e Large Language Models as General Pattern Machines. Preprint at https://doi.org/10.48550/arXiv.2307.04721 (2023).\u003c/li\u003e\n\u003cli\u003eGruver, N., Finzi, M., Qiu, S. \u0026amp; Wilson, A. G. Large Language Models Are Zero-Shot Time Series Forecasters. Preprint at https://doi.org/10.48550/arXiv.2310.07820 (2024).\u003c/li\u003e\n\u003cli\u003eLiu, H., Zhao, Z., Wang, J., Kamarthi, H. \u0026amp; Prakash, B. A. LSTPrompt: Large Language Models as Zero-Shot Time Series Forecasters by Long-Short-Term Prompting. Preprint at https://doi.org/10.48550/arXiv.2402.16132 (2024).\u003c/li\u003e\n\u003cli\u003eOpenAI \u003cem\u003eet al.\u003c/em\u003e GPT-4 Technical Report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2024).\u003c/li\u003e\n\u003cli\u003eRadford, A., Narasimhan, K., Salimans, T. \u0026amp; Sutskever, I. Improving Language Understanding by Generative Pre-Training. (2018).\u003c/li\u003e\n\u003cli\u003eRadford, A. \u003cem\u003eet al.\u003c/em\u003e Language Models are Unsupervised Multitask Learners. (2019).\u003c/li\u003e\n\u003cli\u003eXue, H. \u0026amp; Salim, F. D. PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting. Preprint at https://doi.org/10.48550/arXiv.2210.08964 (2023).\u003c/li\u003e\n\u003cli\u003eRasul, K. \u003cem\u003eet al.\u003c/em\u003e Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting. Preprint at https://doi.org/10.48550/arXiv.2310.08278 (2024).\u003c/li\u003e\n\u003cli\u003eLi, Y., Zhao, Q. \u0026amp; Wang, M. High-resolution traffic flow data from the urban traffic control system in Glasgow. \u003cem\u003eSci Data\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 253 (2025).\u003c/li\u003e\n\u003cli\u003ePark, J. \u003cem\u003eet al.\u003c/em\u003e Real time vehicle speed prediction using a Neural Network Traffic Model. in \u003cem\u003eThe 2011 International Joint Conference on Neural Networks\u003c/em\u003e 2991\u0026ndash;2996 (2011). doi:10.1109/IJCNN.2011.6033614.\u003c/li\u003e\n\u003cli\u003eJia, Y., Wu, J. \u0026amp; Du, Y. Traffic speed prediction using deep learning method. in \u003cem\u003e2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC)\u003c/em\u003e 1217\u0026ndash;1222 (2016). doi:10.1109/ITSC.2016.7795712.\u003c/li\u003e\n\u003cli\u003eJia, Y., Wu, J., Ben-Akiva, M., Seshadri, R. \u0026amp; Du, Y. Rainfall-integrated traffic speed prediction using deep learning method. \u003cem\u003eIET Intelligent Transport Systems\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 531\u0026ndash;536 (2017).\u003c/li\u003e\n\u003cli\u003eAkhtar, M. \u0026amp; Moridpour, S. A Review of Traffic Congestion Prediction Using Artificial Intelligence. \u003cem\u003eJournal of Advanced Transportation\u003c/em\u003e \u003cstrong\u003e2021\u003c/strong\u003e, 8878011 (2021).\u003c/li\u003e\n\u003cli\u003eChen, C., Liu, Z., Wan, S., Luan, J. \u0026amp; Pei, Q. Traffic Flow Prediction Based on Deep Learning in Internet of Vehicles. \u003cem\u003eIEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 3776\u0026ndash;3789 (2021).\u003c/li\u003e\n\u003cli\u003eAljebreen, M. \u003cem\u003eet al.\u003c/em\u003e Enhancing Traffic Flow Prediction in Intelligent Cyber-Physical Systems: A Novel Bi-LSTM-Based Approach With Kalman Filter Integration. \u003cem\u003eIEEE TRANSACTIONS ON CONSUMER ELECTRONICS\u003c/em\u003e \u003cstrong\u003e70\u003c/strong\u003e, 1889\u0026ndash;1902 (2024).\u003c/li\u003e\n\u003cli\u003eAlvi, M., Minerva, R., Rajapaksha, P., Crespi, N. \u0026amp; Alvi, U. Traffic Flow Prediction in Sensor-Limited Areas Through Synthetic Sensing and Data Fusion. \u003cem\u003eIEEE SENSORS LETTERS\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, (2024).\u003c/li\u003e\n\u003cli\u003eVan Der Voort, M., Dougherty, M. \u0026amp; Watson, S. Combining kohonen maps with arima time series models to forecast traffic flow. \u003cem\u003eTransportation Research Part C: Emerging Technologies\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, 307\u0026ndash;318 (1996).\u003c/li\u003e\n\u003cli\u003eOkutani, I. \u0026amp; Stephanedes, Y. J. Dynamic prediction of traffic volume through Kalman filtering theory. \u003cem\u003eTransportation Research Part B: Methodological\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 1\u0026ndash;11 (1984).\u003c/li\u003e\n\u003cli\u003eLeshem, G. \u0026amp; Ritov, Y. Traffic Flow Prediction using Adaboost Algorithm with Random Forests as a Weak Learner. \u003cem\u003eInternational Journal of Electrical and Computer Engineering\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, (2007).\u003c/li\u003e\n\u003cli\u003eTang, J. \u003cem\u003eet al.\u003c/em\u003e Traffic flow prediction based on combination of support vector machine and data denoising schemes. \u003cem\u003ePhysica A: Statistical Mechanics and its Applications\u003c/em\u003e \u003cstrong\u003e534\u003c/strong\u003e, 120642 (2019).\u003c/li\u003e\n\u003cli\u003eYang, S. \u0026amp; Qian, S. Understanding and Predicting Travel Time with Spatio-Temporal Features of Network Traffic Flow, Weather and Incidents. \u003cem\u003eIEEE Intelligent Transportation Systems Magazine\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 12\u0026ndash;28 (2019).\u003c/li\u003e\n\u003cli\u003eTian, Y. \u0026amp; Pan, L. Predicting Short-Term Traffic Flow by Long Short-Term Memory Recurrent Neural Network. in \u003cem\u003e2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity)\u003c/em\u003e 153\u0026ndash;158 (2015). doi:10.1109/SmartCity.2015.63.\u003c/li\u003e\n\u003cli\u003eZhu, H. \u003cem\u003eet al.\u003c/em\u003e A Novel Traffic Flow Forecasting Method Based on RNN-GCN and BRB. \u003cem\u003eJournal of Advanced Transportation\u003c/em\u003e \u003cstrong\u003e2020\u003c/strong\u003e, 7586154 (2020).\u003c/li\u003e\n\u003cli\u003eLu, S., Zhang, Q., Chen, G. \u0026amp; Seng, D. A combined method for short-term traffic flow prediction based on recurrent neural network. \u003cem\u003eAlexandria Engineering Journal\u003c/em\u003e \u003cstrong\u003e60\u003c/strong\u003e, 87\u0026ndash;94 (2021).\u003c/li\u003e\n\u003cli\u003eXiao, Y. \u0026amp; Yin, Y. Hybrid LSTM Neural Network for Short-Term Traffic Flow Prediction. \u003cem\u003eINFORMATION\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, (2019).\u003c/li\u003e\n\u003cli\u003eWang, S., Zhao, J., Shao, C., Dong, C. D. \u0026amp; Yin, C. Truck Traffic Flow Prediction Based on LSTM and GRU Methods With Sampled GPS Data. \u003cem\u003eIEEE ACCESS\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 208158\u0026ndash;208169 (2020).\u003c/li\u003e\n\u003cli\u003eXiong, L., Ding, W., Huang, X. \u0026amp; Huang, W. CLSTAN: ConvLSTM-Based Spatiotemporal Attention Network for Traffic Flow Forecasting. \u003cem\u003eMATHEMATICAL PROBLEMS IN ENGINEERING\u003c/em\u003e \u003cstrong\u003e2022\u003c/strong\u003e, (2022).\u003c/li\u003e\n\u003cli\u003eWang, J.-D. \u0026amp; Susanto, C. O. N. Traffic Flow Prediction with Heterogenous Data Using a Hybrid CNN-LSTM Model. \u003cem\u003eCMC-COMPUTERS MATERIALS \u0026amp; CONTINUA\u003c/em\u003e \u003cstrong\u003e76\u003c/strong\u003e, 3097\u0026ndash;3112 (2023).\u003c/li\u003e\n\u003cli\u003eGuo, C., Zhu, J. \u0026amp; Wang, X. MVHS-LSTM: The Comprehensive Traffic Flow Prediction Based on Improved LSTM via Multiple Variables Heuristic Selection. \u003cem\u003eAPPLIED SCIENCES-BASEL\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, (2024).\u003c/li\u003e\n\u003cli\u003eTouvron, H. \u003cem\u003eet al.\u003c/em\u003e Llama 2: Open Foundation and Fine-Tuned Chat Models. Preprint at https://doi.org/10.48550/arXiv.2307.09288 (2023).\u003c/li\u003e\n\u003cli\u003eZhao, W. X. \u003cem\u003eet al.\u003c/em\u003e A Survey of Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2303.18223 (2024).\u003c/li\u003e\n\u003cli\u003eSennrich, R., Haddow, B. \u0026amp; Birch, A. Neural Machine Translation of Rare Words with Subword Units. Preprint at https://doi.org/10.48550/arXiv.1508.07909 (2016).\u003c/li\u003e\n\u003cli\u003eVaswani, A. \u003cem\u003eet al.\u003c/em\u003e Attention is All you Need. in \u003cem\u003eAdvances in Neural Information Processing Systems\u003c/em\u003e vol. 30 (Curran Associates, Inc., 2017).\u003c/li\u003e\n\u003cli\u003eLewis, M. \u003cem\u003eet al.\u003c/em\u003e BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Preprint at https://doi.org/10.48550/arXiv.1910.13461 (2019).\u003c/li\u003e\n\u003cli\u003eRaffel, C. \u003cem\u003eet al.\u003c/em\u003e Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Preprint at https://doi.org/10.48550/arXiv.1910.10683 (2023).\u003c/li\u003e\n\u003cli\u003eChowdhery, A. \u003cem\u003eet al.\u003c/em\u003e PaLM: Scaling Language Modeling with Pathways. Preprint at https://doi.org/10.48550/arXiv.2204.02311 (2022).\u003c/li\u003e\n\u003cli\u003eChen, Z. \u003cem\u003eet al.\u003c/em\u003e Spatial-temporal short-term traffic flow prediction model based on dynamical-learning graph convolution mechanism. \u003cem\u003eINFORMATION SCIENCES\u003c/em\u003e \u003cstrong\u003e611\u003c/strong\u003e, 522\u0026ndash;539 (2022).\u003c/li\u003e\n\u003cli\u003eChen, J. \u003cem\u003eet al.\u003c/em\u003e Node Connection Strength Matrix-Based Graph Convolution Network for Traffic Flow Prediction. \u003cem\u003eIEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY\u003c/em\u003e \u003cstrong\u003e72\u003c/strong\u003e, 12063\u0026ndash;12074 (2023).\u003c/li\u003e\n\u003cli\u003eGao, H., Jia, H. \u0026amp; Yang, L. An Improved CEEMDAN-FE-TCN Model for Highway Traffic Flow Prediction. \u003cem\u003eJOURNAL OF ADVANCED TRANSPORTATION\u003c/em\u003e \u003cstrong\u003e2022\u003c/strong\u003e, (2022).\u003c/li\u003e\n\u003cli\u003eHuang, X., Tang, J., Yang, X. \u0026amp; Xiong, L. A time-dependent attention convolutional LSTM method for traffic flow prediction. \u003cem\u003eAPPLIED INTELLIGENCE\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, 17371\u0026ndash;17386 (2022).\u003c/li\u003e\n\u003cli\u003eXu, X., Liu, C., Zhao, Y. \u0026amp; Lv, X. Short-term traffic flow prediction based on whale optimization algorithm optimized BiLSTM_Attention. \u003cem\u003eCONCURRENCY AND COMPUTATION-PRACTICE \u0026amp; EXPERIENCE\u003c/em\u003e \u003cstrong\u003e34\u003c/strong\u003e, (2022).\u003c/li\u003e\n\u003cli\u003eXu, X., Yang, C., Bilal, M., Li, W. \u0026amp; Wang, H. Computation Offloading for Energy and Delay Trade-Offs With Traffic Flow Prediction in Edge Computing-Enabled IoV. \u003cem\u003eIEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 15613\u0026ndash;15623 (2023).\u003c/li\u003e\n\u003cli\u003eHe, R., Xiao, Y., Lu, X., Zhang, S. \u0026amp; Liu, Y. ST-3DGMR: Spatio-temporal 3D grouped multiscale ResNet network for region-based urban traffic flow prediction. \u003cem\u003eINFORMATION SCIENCES\u003c/em\u003e \u003cstrong\u003e624\u003c/strong\u003e, 68\u0026ndash;93 (2023).\u003c/li\u003e\n\u003cli\u003eZhou, S. \u003cem\u003eet al.\u003c/em\u003e Short-Term Traffic Flow Prediction of the Smart City Using 5G Internet of Vehicles Based on Edge Computing. \u003cem\u003eIEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 2229\u0026ndash;2238 (2023).\u003c/li\u003e\n\u003cli\u003eNaheliya, B., Redhu, P. \u0026amp; Kumar, K. MFOA-Bi-LSTM: An optimized bidirectional long short-term memory model for short-term traffic flow prediction. \u003cem\u003ePHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS\u003c/em\u003e \u003cstrong\u003e634\u003c/strong\u003e, (2024).\u003c/li\u003e\n\u003cli\u003eTan, G. \u003cem\u003eet al.\u003c/em\u003e A noise-immune and attention-based multi-modal framework for short-term traffic flow forecasting. \u003cem\u003eSOFT COMPUTING\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 4775\u0026ndash;4790 (2024).\u003c/li\u003e\n\u003cli\u003eDuan, Y. \u003cem\u003eet al.\u003c/em\u003e FDSA-STG: Fully Dynamic Self-Attention Spatio-Temporal Graph Networks for Intelligent Traffic Flow Prediction. \u003cem\u003eIEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY\u003c/em\u003e \u003cstrong\u003e71\u003c/strong\u003e, 9250\u0026ndash;9260 (2022).\u003c/li\u003e\n\u003cli\u003eYan, B., Wang, G., Yu, J., Jin, X. \u0026amp; Zhang, H. Spatial-Temporal Chebyshev Graph Neural Network for Traffic Flow Prediction in IoT-Based ITS. \u003cem\u003eIEEE INTERNET OF THINGS JOURNAL\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 9266\u0026ndash;9279 (2022).\u003c/li\u003e\n\u003cli\u003eHuo, G. \u003cem\u003eet al.\u003c/em\u003e Hierarchical Spatio-Temporal Graph Convolutional Networks and Transformer Network for Traffic Flow Forecasting. \u003cem\u003eIEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 3855\u0026ndash;3867 (2023).\u003c/li\u003e\n\u003cli\u003eLai, Q., Tian, J., Wang, W. \u0026amp; Hu, X. Spatial-Temporal Attention Graph Convolution Network on Edge Cloud for Traffic Flow Prediction. \u003cem\u003eIEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 4565\u0026ndash;4576 (2023).\u003c/li\u003e\n\u003cli\u003eNarmadha, S. \u0026amp; Vijayakumar, V. Spatio-Temporal vehicle traffic flow prediction using multivariate CNN and LSTM model. \u003cem\u003eMaterials Today: Proceedings\u003c/em\u003e \u003cstrong\u003e81\u003c/strong\u003e, 826\u0026ndash;833 (2023).\u003c/li\u003e\n\u003cli\u003eWang, Z., Sun, P., Hu, Y. \u0026amp; Boukerche, A. A novel hybrid method for achieving accurate and timeliness vehicular traffic flow prediction in road networks. \u003cem\u003eCOMPUTER COMMUNICATIONS\u003c/em\u003e \u003cstrong\u003e209\u003c/strong\u003e, 378\u0026ndash;386 (2023).\u003c/li\u003e\n\u003cli\u003eWu, K. \u003cem\u003eet al.\u003c/em\u003e Error-distribution-free kernel extreme learning machine for traffic flow forecasting. \u003cem\u003eENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE\u003c/em\u003e \u003cstrong\u003e123\u003c/strong\u003e, (2023).\u003c/li\u003e\n\u003cli\u003eXing, H., Chen, A. \u0026amp; Zhang, X. RL-GCN: Traffic flow prediction based on graph convolution and reinforcement for smart cities. \u003cem\u003eDISPLAYS\u003c/em\u003e \u003cstrong\u003e80\u003c/strong\u003e, (2023).\u003c/li\u003e\n\u003cli\u003eYang, D. \u0026amp; Lv, L. A Graph Deep Learning-Based Fast Traffic Flow Prediction Method in Urban Road Networks. \u003cem\u003eIEEE ACCESS\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 93754\u0026ndash;93763 (2023).\u003c/li\u003e\n\u003cli\u003eJia, Q., Zang, J. \u0026amp; Liu, S. Deep learning based traffic flow prediction model on highway research. in (eds. Ghanizadeh, A. \u0026amp; Jia, H.) vol. 13064 (2024).\u003c/li\u003e\n\u003cli\u003eLu, W. \u003cem\u003eet al.\u003c/em\u003e Traffic flow prediction for highway vehicle detectors through decomposition and machine learning. \u003cem\u003eTRANSPORTATION LETTERS-THE INTERNATIONAL JOURNAL OF TRANSPORTATION RESEARCH\u003c/em\u003e (2024) doi:10.1080/19427867.2024.2339631.\u003c/li\u003e\n\u003cli\u003eWaibel, A., Hanazawa, T., Hinton, G., Shikano, K. \u0026amp; Lang, K. J. Phoneme recognition using time-delay neural networks. \u003cem\u003eIEEE Transactions on Acoustics, Speech, and Signal Processing\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 328\u0026ndash;339 (1989).\u003c/li\u003e\n\u003cli\u003eHochreiter, S. \u0026amp; Schmidhuber, J. Long Short-Term Memory. \u003cem\u003eNeural Computation\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 1735\u0026ndash;1780 (1997).\u003c/li\u003e\n\u003cli\u003eHale, T. \u003cem\u003eet al.\u003c/em\u003e A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker). \u003cem\u003eNat Hum Behav\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 529\u0026ndash;538 (2021).\u003c/li\u003e\n\u003cli\u003eHuang, X., Lan, Y., Ye, Y., Wang, J. \u0026amp; Jiang, Y. Traffic Flow Prediction Based on Multi-Mode Spatial-Temporal Convolution of Mixed Hop Diffuse ODE. \u003cem\u003eELECTRONICS\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, (2022).\u003c/li\u003e\n\u003cli\u003eSu, Z., Liu, T., Hao, X. \u0026amp; Hu, X. Spatial-temporal graph convolutional networks for traffic flow prediction considering multiple traffic parameters. \u003cem\u003eJOURNAL OF SUPERCOMPUTING\u003c/em\u003e \u003cstrong\u003e79\u003c/strong\u003e, 18293\u0026ndash;18312 (2023).\u003c/li\u003e\n\u003cli\u003eWang, Z. \u0026amp; Bovik, A. C. Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures. \u003cem\u003eIEEE Signal Processing Magazine\u003c/em\u003e \u003cstrong\u003e26\u003c/strong\u003e, 98\u0026ndash;117 (2009).\u003c/li\u003e\n\u003cli\u003eDe Gooijer, J. G. \u0026amp; Hyndman, R. J. 25 years of time series forecasting. \u003cem\u003eInternational Journal of Forecasting\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 443\u0026ndash;473 (2006).\u003c/li\u003e\n\u003cli\u003eYi, H. \u0026amp; Bui, K.-H. N. An Automated Hyperparameter Search-Based Deep Learning Model for Highway Traffic Prediction. \u003cem\u003eIEEE Trans. Intell. Transport. Syst.\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 5486\u0026ndash;5495 (2021).\u003c/li\u003e\n\u003cli\u003e\u003cem\u003eHyperparameter Tuning for Machine and Deep Learning with R: A Practical Guide\u003c/em\u003e. (Springer Nature, 2023). doi:10.1007/978-981-19-5170-1.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Traffic flows, Time-series prediction, Deep learning, Large language models (LLMs)","lastPublishedDoi":"10.21203/rs.3.rs-6572761/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6572761/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eTraffic flow prediction plays an important role in managing urban transportation systems, helping to reduce congestion and improve road safety. Although existing deep learning models improve their prediction accuracy with complex structures, they always require large datasets for task-specific training. Recently, the rapidly developed pre-trained large language models (LLMs) have shown outstanding performance in time series prediction. Motivated by the development, we apply two foundation models, Lag-Llama and Chronos, for zero-shot traffic flow prediction and compare their accuracy against traditional deep learning models. Our results show that LLMs outperform deep learning models in traffic flow prediction under both normal conditions and disruptive events. Unlike deep learning models, which require large-scale historical data and extensive training time for each task, pre-trained LLMs can be directly applied to datasets with different data sizes, traffic dynamics, and context lengths. We also find that LLMs with longer context lengths and larger model sizes achieve higher prediction accuracy but require increased inference times. Selecting an appropriate LLM is also crucial \u0026ndash; models trained on a comprehensive dataset are more likely to achieve superior zero-shot performance, making them a practical and efficient choice for real-world traffic prediction applications.\u003c/p\u003e","manuscriptTitle":"Zero-Shot Traffic Flow Prediction with Large Language Models: A Comparison with Deep Learning Approaches","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-08 06:06:00","doi":"10.21203/rs.3.rs-6572761/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-07-30T04:50:34+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-27T05:22:17+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-24T00:43:56+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-21T16:41:05+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"81253432109524779955448084695665013822","date":"2025-06-17T01:42:38+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-12T18:46:29+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"258976254743231037576431265521574151486","date":"2025-06-12T18:25:25+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"158530240346525596356833018748780656849","date":"2025-06-12T17:56:22+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"122100903037698160348878367507929622782","date":"2025-06-12T02:49:58+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-06-12T01:39:11+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-05-16T02:26:51+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-05-16T02:24:18+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-05-14T10:29:31+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-05-01T14:53:29+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"fa1e8e01-429c-4ac6-b05b-137389e7a99e","owner":[],"postedDate":"May 8th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":47953998,"name":"Physical sciences/Engineering"},{"id":47953999,"name":"Physical sciences/Mathematics and computing"}],"tags":[],"updatedAt":"2026-04-21T09:26:56+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-08 06:06:00","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6572761","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6572761","identity":"rs-6572761","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.