Multi-Source Traffic State Estimation: Exploring Advanced Filtering Algorithms for Rural Arterial Networks

doi:10.21203/rs.3.rs-5927838/v1

Multi-Source Traffic State Estimation: Exploring Advanced Filtering Algorithms for Rural Arterial Networks

2025 · doi:10.21203/rs.3.rs-5927838/v1

preprint OA: closed

Full text JSON View at publisher

Full text 193,998 characters · extracted from preprint-html · click to expand

Multi-Source Traffic State Estimation: Exploring Advanced Filtering Algorithms for Rural Arterial Networks | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Multi-Source Traffic State Estimation: Exploring Advanced Filtering Algorithms for Rural Arterial Networks Taimor Ali Khan This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5927838/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Traffic state estimation (TSE) is essential for enhancing transportation systems by providing critical, real-time data on road conditions to support decision-making and optimize network performance. Traditional TSE methods have predominantly focused on highways, relying on single-source data inputs like loop detectors or GPS data, which may limit adaptability in diverse traffic scenarios. However, the integration of multi-source data spanning loop detectors, GPS, and Bluetooth has opened new pathways for improved accuracy and responsiveness in TSE models, particularly within rural arterial networks and at complex intersections. This review analyzes the progression of TSE methodologies, focusing on model-based techniques such as the Kalman Filter (KF), Sliding Kalman Filter (SKF), and cell transmission models. By examining the combined use of varied data inputs, this review underscores the benefits of multi-source fusion in accurately capturing dynamic traffic conditions in rural settings. Key challenges, including non-linear traffic flows, inherent data noise, and the limitations of current validation methods, are discussed. Future research directions are identified, highlighting the need for adaptable algorithms that can effectively manage the complex, variable datasets characteristic of rural traffic environments. Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Introduction Historically, single-source data, such as from loop detectors, GPS, or camera-based systems, have been employed for traffic state estimation. While these data sources are capable of providing useful insights in controlled environments, they often fall short when applied to the more variable conditions found in rural areas. Traffic State Estimation (TSE) plays a vital role in modern transportation systems, providing real-time insights into traffic flow and enabling effective traffic management. Accurate TSE supports a variety of applications, including congestion management, traffic signal control, and resource allocation, all of which contribute to the operational efficiency of transportation networks (Xing et al., 2022). Traditionally, research in this area has focused on highways and motorways, where traffic tends to follow relatively uniform patterns, making it easier to model (Muhammed T & Mathew, 2022). However, estimating traffic conditions in rural environments, particularly on arterial roads with frequent intersections, presents a more complex challenge due to the dynamic and often unpredictable nature of traffic flow in these settings. For instance, loop detectors are effective at providing point-based measurements but lack the ability to offer continuous spatial coverage. Similarly, GPS data can provide extensive coverage but may suffer from inaccuracies, especially in dense rural areas where signal interference is common (Williams et al., 2010). The limitations inherent in using a single data source have driven research toward multi-source data fusion techniques, which combine data from various sensors to create a more comprehensive view of traffic conditions. However, while multi-source data fusion offers great promise, it also introduces challenges, particularly in terms of handling noise and harmonizing data from different sources. For example, while loop detectors offer precise but localized measurements, GPS data can be noisy and prone to inaccuracies. Multi-source data fusion has become an increasingly important method for improving TSE accuracy, especially in complex rural environments. By combining information from diverse sensors, such as loop detectors, GPS, Bluetooth devices, and connected vehicles, traffic estimation models can offer more robust and accurate assessments of traffic flow (Ghiassi & Lee, 2018). This fusion approach compensates for the weaknesses of individual data sources by integrating their strengths, resulting in a richer understanding of traffic conditions across the network. The benefits of this technique are particularly evident in rural arterial networks, where traffic is influenced by intersections, pedestrian activity, and varying control systems, all of which introduce significant complexities. To address these challenges, advanced filtering techniques such as the Kalman Filter (KF) and its variants have been widely applied (Rafique et al., 2023). These filtering methods are designed to manage noise and discrepancies in data while maintaining the integrity of the traffic state estimation process. The Kalman Filter, for instance, is a powerful tool for recursive state estimation in dynamic systems, making it well-suited for real-time applications. Nevertheless, the standard Kalman Filter assumes a linear system, which can be limiting in rural environments where traffic dynamics are highly nonlinear due to factors like signalized intersections and mixed traffic flows. To overcome these limitations, researchers have developed extended versions of the Kalman Filter, such as the Extended Kalman Filter (EKF) and the Unscented Kalman Filter (UKF), which are better equipped to handle nonlinear systems. Additionally, the Sliding Kalman Filter (SKF) has emerged as a promising approach for adapting to the ever-changing traffic conditions in rural networks, offering the flexibility to adjust model parameters over time (J. Huang et al., 2023; Qi et al., 2018). Noise, missing data, and sensor inaccuracies are common in real-world datasets, posing additional challenges for traffic state estimation models. Further research is needed to validate these models in real-world settings, especially in rural networks where the dynamics of traffic flow are less predictable (Yokoya, 2004). Despite significant progress in the development of multi-source TSE models and advanced filtering techniques, much of the research remains confined to simulation environments (Geetha & Subramani, 2019; A. Wang et al., 2022). While simulations offer a controlled setting for testing new algorithms, they often fail to capture the full complexity and variability of real-world traffic conditions. Connected vehicles can provide real-time data on vehicle location, speed, and acceleration, while IoT sensors placed at intersections can offer valuable insights into traffic flow, vehicle types, and pedestrian activity (H. Yang et al., 2023). Integrating these emerging data sources with traditional sensors offers the potential to further improve the accuracy and granularity of traffic state estimates, ultimately leading to more efficient and responsive transportation management systems (Garg et al., 2023). In addition to enhancing TSE models, the growing availability of data from connected vehicles and Internet of Things (IoT)-enabled sensors presents new opportunities for improving traffic estimation (Lamberty & Kreyenschmidt, 2022). Traffic state estimation (TSE) is essential for the effective management and optimization of transportation systems, especially in rural areas facing increasing congestion. TSE provides real-time insights into traffic flow, speed, and density, enabling traffic managers to make informed decisions regarding signal control, traffic diversion, and congestion management. Traditionally, TSE has concentrated on motorway traffic using single data sources like loop detectors or GPS (Ibarra-Espinosa et al., 2019), which can limit their effectiveness in capturing the intricate dynamics of rural environments, particularly at intersections. With the rise of various sensing technologies—such as GPS, loop detectors, Bluetooth, and LiDAR—multi-source data fusion has emerged as a promising approach to enhance TSE models (J. Zhang et al., 2020). This integration of diverse datasets offers a more comprehensive understanding of traffic conditions across different road segments and critical points like intersections. However, challenges arise in integrating these data sources due to variations in quality, noise levels, and resolution (Khan et al., 2022; Naqvi et al., 2023). To address these issues, filtering algorithms such as the Kalman Filter (KF) and its variants—including the Sliding Kalman Filter (SKF)—are commonly employed to effectively combine data sources while minimizing noise and providing real-time estimates (Ning et al., 2020). Additionally, we evaluate the extent to which these models have been validated beyond simulation environments for practical applicability. Finally, we discuss the challenges associated with data fusion in traffic estimation and outline potential future research directions in this field (Zhou et al., 2018). This review paper aims to explore advancements in multi-source TSE models (Kore & Patil, 2019). By examining various filtering techniques and their applications in data fusion, we assess their capacity to manage the nonlinearities present in real-world traffic systems, particularly within arterial networks featuring intersections. In summary, TSE is vital for understanding rural traffic dynamics. While traditional methods have focused on uniform conditions found on motorways, rural arterial roads present unique challenges due to their complex nature. The limitations of relying on single-source data collection methods have led to the exploration of multi-source data fusion techniques that provide a more holistic view of traffic conditions. However, these methods also introduce new challenges related to noise management and nonlinearities in the data. The Kalman Filter and its variations offer effective solutions to these challenges but require further validation in real-world scenarios. By leveraging emerging data sources such as connected vehicles and IoT-enabled sensors, future TSE models can potentially enhance accuracy and reliability in rural transportation systems. (Bao et al., 2023; Prazeres et al., 2023; Rambabu & Venkatram, 2018). Traffic state estimation (TSE) is a vital component of transportation systems, offering real-time insights into traffic conditions. It plays a significant role in traffic management, congestion reduction, and the development of intelligent transportation systems (ITS). TSE enables transportation authorities to make informed decisions regarding traffic control strategies, signal timing adjustments, and resource distribution, all of which enhance the overall efficiency and safety of transportation networks (A. J. Huang & Agarwal, 2022; Y. Wang et al., 2022; Z. Zhang et al., 2023). Traditionally, TSE research has concentrated on motorways and highways, where traffic patterns are more consistent and easier to model due to fewer interruptions like intersections. In contrast, rural arterial roads, which feature both signalized and unsignalized intersections, present considerable challenges for estimating traffic states using conventional single-source data collection methods. The dependence on individual data sources such as loop detectors, GPS, or camera systems has been a significant limitation in earlier studies. While these sources can provide accurate information in controlled settings, they often struggle to account for the complexities of rural traffic flow influenced by factors such as pedestrian activity, varying vehicle types, and traffic signal phases. For instance, loop detectors yield precise measurements at specific locations but lack continuous coverage across the road network. Similarly, while GPS data can offer ongoing information, it is often subject to noise particularly in densely populated rural areas where signal loss or multipath errors frequently occur. These limitations have led to the exploration of multi-source data fusion approaches that deliver a more comprehensive perspective of the traffic network. Multi-source data fusion integrates information from various sensors to create a more detailed understanding of traffic conditions. By utilizing data from multiple sources including loop detectors, GPS devices, Bluetooth sensors, and connected vehicles—traffic state estimation models can achieve greater accuracy and robustness in their predictions (Bekiaris-Liberis et al., 2016; Grumert & Tapani, 2018). This method compensates for the weaknesses of individual data sources by combining their strengths, facilitating a more thorough and reliable assessment of traffic conditions (Lu et al., 2014; B. Wang et al., 2022). Moreover, multi-source data fusion is especially beneficial in complex rural environments where traffic flow is significantly affected by intersections and pedestrian crossings. These elements introduce nonlinearities and noise that single-source methods are ill-equipped to manage. One of the primary challenges in TSE within rural settings is addressing the inherent noise and inconsistencies present in the data. While multi-source data fusion offers advantages, it also introduces additional complexities regarding data harmonization and noise management. For example, GPS data may contain positional inaccuracies while loop detectors provide only localized information. Effectively combining these sources necessitates sophisticated algorithms capable of filtering out noise while preserving data integrity. Filtering techniques such as the Kalman Filter (KF) and its variants are commonly used to manage noisy data and deliver robust estimates by continuously updating state predictions based on new observations. The Kalman Filter is a widely recognized tool for state estimation in dynamic systems and has been extensively applied in TSE models due to its recursive nature, making it suitable for real-time applications. However, its linear assumptions may not hold true in complex environments like rural arterial networks where traffic dynamics are often nonlinear due to intersections and variable flows. To address these challenges, researchers have developed variations such as the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF), which are more adept at handling nonlinear systems (Gunda & Dhanikonda, 2021; Kim & Park, 2020). Another variation explored for multi-source TSE is the Sliding Kalman Filter (SKF). Unlike the standard Kalman Filter that relies on a fixed model, the SKF adjusts its parameters over time to adapt to changing traffic conditions. This adaptability is particularly useful in rural environments where rapid changes occur due to factors like traffic signal timings or road incidents. By dynamically adjusting model parameters, the SKF can provide more accurate estimates of traffic states even amid significant nonlinearities and noise (Chin & Tay, 2001; Tuna & Soysal, 2021; N. Yang et al., 2023). The combination of SKF with multi-source data fusion presents a promising strategy for enhancing TSE accuracy in complex rural networks. Recent studies have also investigated hybrid filtering algorithms that integrate different variations of the Kalman Filter with other state estimation techniques. These hybrid approaches aim to leverage the strengths of various algorithms to yield more robust and accurate traffic state estimates. For instance, combining the Extended Kalman Filter with machine learning models has been shown to improve prediction accuracy by capturing both linear and nonlinear aspects of traffic flow (Yin et al., 2007). Similarly, integrating the Sliding Kalman Filter with unsupervised learning algorithms can help identify patterns in the data that may not be immediately apparent, facilitating more nuanced traffic state estimates. Despite advancements in multi-source data fusion and filtering algorithms, much existing research on TSE remains limited to simulation-based validation. While simulations provide a controlled environment for testing new methodologies, they often fail to capture the full complexity of real-world traffic conditions. Real-world traffic data tends to be noisy, incomplete, and inconsistent, complicating the application of models developed under simulated conditions. For example, GPS data in rural areas may suffer from signal loss or multipath errors while loop detector data has limited spatial coverage. Another challenge lies in managing varying levels of noise across different data sources in real-world applications. Factors such as sensor accuracy and environmental conditions can significantly influence noise levels in rural settings. For instance, while loop detector data may be highly accurate within localized areas, GPS data can offer broader coverage but may be subject to substantial errors in crowded environments. Given the increasing availability of information from connected vehicles and IoT-enabled sensors, there is considerable potential for enhancing TSE models through these new data sources. Connected vehicles provide valuable real-time insights into position and speed that can further refine traffic state estimates. Likewise, IoT sensors deployed at intersections or along roadways can deliver additional information on traffic flow and pedestrian activity. In conclusion, estimating traffic states within rural arterial networks remains a complex challenge due to variable conditions. The limitations inherent in traditional single-source methods have led to the development of multi-source data fusion techniques that offer a more comprehensive view by integrating various sensor inputs. However, this approach also introduces new challenges related to noise management and nonlinearities within the data. The Kalman Filter family provides promising solutions but requires further validation against real-world scenarios. By leveraging emerging technologies such as connected vehicles and IoT sensors into future models, there is significant potential for improving accuracy and reliability in rural transportation systems. One of the key difficulties in traffic state estimation (TSE), particularly in rural settings, is managing the inherent noise and inconsistencies present in the data. While multi-source data fusion offers advantages, it also brings additional challenges related to harmonizing data and controlling noise. For example, GPS data can suffer from positional inaccuracies, whereas loop detectors provide information only at specific points. Merging these different sources necessitates advanced algorithms capable of filtering out noise and inconsistencies while preserving data integrity. This is where filtering techniques such as the Kalman Filter (KF) and its variants become essential. These model-based approaches are designed to handle noisy data and deliver reliable estimates by recursively updating predictions based on new observations, making them well-suited for real-time traffic estimation. The Kalman Filter is a widely recognized method for state estimation in dynamic systems and has been extensively utilized in TSE models. Its recursive nature allows for continuous refinement of estimates with incoming data. The Kalman Filter assumes a linear system, which has proven effective in certain traffic situations, especially on highways where dynamics are more straightforward. However, rural arterial networks present more complex scenarios where traffic dynamics are often nonlinear due to intersections, variable flows, and mixed traffic conditions. To tackle these complexities, researchers have developed variations like the Extended Kalman Filter (EKF) and the Unscented Kalman Filter (UKF), which are more adept at managing nonlinear systems. Another variation worth noting is the Sliding Kalman Filter (SKF), which has been investigated for multi-source TSE. Unlike the standard Kalman Filter that relies on a fixed model, the SKF adjusts its parameters over time, making it more adaptable to fluctuating traffic conditions. This adaptability is particularly beneficial in rural areas where rapid changes can occur due to factors such as traffic signal adjustments or pedestrian crossings. By dynamically modifying its parameters, the SKF can yield more precise traffic state estimates even amidst substantial nonlinearities and noise. The integration of SKF with multi-source data fusion presents a promising strategy for enhancing TSE accuracy in intricate rural networks. Recent studies have also examined hybrid filtering algorithms that combine various variations of the Kalman Filter with other estimation techniques. These hybrid methods aim to capitalize on the strengths of different algorithms to produce more robust and accurate traffic state estimates. For instance, merging the Extended Kalman Filter with machine learning models has demonstrated improved prediction accuracy by capturing both linear and nonlinear elements of traffic flow. Similarly, combining the Sliding Kalman Filter with unsupervised learning algorithms can help uncover patterns in the data that may not be immediately evident, leading to more refined traffic state estimates. These hybrid approaches are especially valuable in rural contexts where traffic conditions are more unpredictable than those found on highways. Despite advancements in multi-source data fusion and filtering techniques, much existing research on TSE has relied heavily on simulation-based validation. While simulations provide a controlled environment for testing new methodologies, they often do not reflect the full complexity of real-world traffic scenarios. Real-world traffic data is frequently noisy, incomplete, and inconsistent, complicating the application of models developed under simulated conditions. For example, GPS data in rural areas is susceptible to signal loss or multipath errors, while loop detector data has limited spatial coverage. Another challenge in real-world TSE is managing varying levels of noise across different data sources. In rural environments, noise can be influenced by numerous factors including sensor precision, environmental conditions, and road network layout. For instance, loop detector data may be highly accurate but limited spatially, while GPS can offer broader coverage but may experience significant errors in densely populated areas. Effectively managing these varying noise levels requires sophisticated filtering techniques that can smooth out inconsistencies while accounting for differing confidence levels associated with each data source. The Kalman Filter and its variations provide a robust framework for addressing noise in multi-source traffic data; however, further research is necessary to refine these methods for application in more complex rural settings. With the growing availability of data from connected vehicles, IoT-enabled sensors, and other emerging technologies, there is substantial potential to enhance TSE models by integrating these new data sources. Connected vehicles offer a wealth of real-time information regarding vehicle position, speed, and acceleration that can significantly improve traffic state estimates. Similarly, IoT sensors deployed at intersections or along roadways can provide additional insights into traffic flow patterns, vehicle types, and pedestrian activity. In summary, estimating traffic states within rural arterial networks continues to be a challenging endeavor due to the complexity and variability of traffic conditions. The limitations associated with traditional single-source data collection methods have driven the development of multi-source data fusion techniques that offer a more comprehensive perspective by integrating various sensor inputs. Nevertheless, this approach also introduces new challenges related to noise management and nonlinearities within the data. The Kalman Filter family presents promising solutions but requires additional validation in real-world contexts. By leveraging new technologies such as connected vehicles and IoT sensors in future models, there is significant potential for enhancing accuracy and reliability in rural transportation systems. Recent studies have investigated hybrid filtering algorithms that merge various versions of the Kalman Filter with other state estimation methods. These hybrid strategies aim to utilize the strengths of multiple algorithms to yield more reliable and precise traffic state estimates. For instance, integrating the Extended Kalman Filter with machine learning models has been found to enhance the accuracy of traffic predictions by addressing both linear and nonlinear aspects of traffic flow. Likewise, combining the Sliding Kalman Filter with unsupervised learning techniques can uncover patterns in the data that might not be immediately visible, leading to more refined traffic state assessments. Such hybrid methods are especially valuable in rural settings, where traffic conditions are often more unpredictable and variable compared to highways. Despite advancements in multi-source data fusion and filtering techniques, much of the current research on traffic state estimation relies heavily on simulation-based validation. While simulations provide a controlled environment for testing new algorithms, they frequently do not reflect the full complexity of real-world traffic scenarios. Real-world traffic data is often characterized by noise, incompleteness, and inconsistencies, complicating the application of models developed in simulated contexts. For example, GPS data in rural areas is susceptible to signal loss and multipath errors, while loop detector data has limited spatial coverage. These challenges underscore the need for further research to assess the effectiveness of multi-source traffic state estimation models in real-world applications, particularly within rural arterial networks where traffic dynamics are intricate. Another significant challenge in real-world traffic state estimation is the varying levels of noise present across different data sources. In rural environments, this noise can be influenced by factors such as sensor accuracy, environmental conditions, and the physical configuration of the road network. For instance, loop detector data may be highly reliable but limited in coverage, whereas GPS data can offer continuous information yet suffer from considerable errors in densely populated areas. Effectively managing these varying noise levels requires sophisticated filtering techniques that not only smooth out inconsistencies but also consider the differing confidence levels associated with each data source. The Kalman Filter and its variations provide a solid framework for addressing noise in multi-source traffic data; however, further refinement is necessary for their application in more complex rural settings. With the increasing availability of information from connected vehicles, IoT-enabled sensors, and other emerging technologies, there is considerable potential to enhance traffic state estimation models by incorporating these new data sources. Connected vehicles can provide a wealth of real-time information regarding vehicle position, speed, and acceleration, which can significantly improve traffic state estimates. Similarly, IoT sensors positioned at intersections or along roadways can gather additional data on traffic flow, vehicle types, and pedestrian activity. Integrating these new data sources into existing traffic state estimation models represents a significant opportunity for future research as it could greatly enhance the accuracy and detail of traffic estimates in rural environments. In summary, while significant advances have been made in traffic state estimation, particularly through multi-source data fusion and advanced filtering techniques, challenges remain in effectively applying these models to real-world rural environments. The integration of new data sources, such as connected vehicles and IoT sensors, represents a promising direction for future research, with the potential to significantly enhance the accuracy of TSE models and improve the overall management of rural traffic networks. Literature Traffic state estimation (TSE) has a rich history within transportation research, with conventional models typically depending on a single data source to assess traffic conditions. Early frameworks, such as the cell transmission model (CTM), aimed to break down traffic flow into discrete segments and update each segment's state based on established rules. Although effective for simulating highway traffic, this model's limitations became evident in rural settings where traffic dynamics are more intricate. To enhance estimation accuracy, the Kalman Filter (KF) was introduced as a model-based approach for TSE (Bao et al., 2023; Rambabu & Venkatram, 2018). The KF is a recursive filter that estimates the state of dynamic systems from a series of noisy measurements, making it particularly suitable for managing the uncertainties associated with traffic data. In recent years, advancements such as the Sliding Kalman Filter (SKF) and other variants have emerged to address some of the standard KF's shortcomings, especially in nonlinear traffic environments. The SKF modifies state estimates using a sliding window approach, allowing it to respond more effectively to changes in traffic conditions (Z. Zhang et al., 2023). Furthermore, the integration of multi-source data fusion techniques has greatly enhanced the reliability of TSE models. By incorporating data from loop detectors, GPS devices, Bluetooth sensors, and mobile sensors, researchers can capture a broader range of traffic parameters, leading to a more thorough understanding of traffic states at intersections and along arterial roads. Numerous studies have validated the application of KF and SKF in traffic estimation; however, many of these validations have occurred solely within simulated environments. While simulations offer a controlled setting for testing new algorithms, they often fail to replicate the full complexity of real-world traffic scenarios, which can be noisy and incomplete (Zhou et al., 2018). Additionally, much of the existing research has primarily focused on using traffic density as the main parameter for state estimation. Other critical parameters such as speed and flow are equally important, yet few studies have explored their estimation across various data sources. TSE is vital for effective transportation management and plays a significant role in intelligent transportation systems (ITS). Accurate information regarding traffic states enables real-time monitoring and forecasting of conditions, which can lead to improved traffic flow, reduced congestion, and enhanced road safety. Historically, research has concentrated on motorway traffic where uniform flow assumptions simplify state estimation tasks. Early studies relied heavily on single-source data like loop detectors or GPS to estimate parameters such as density, speed, or flow. However, as rural networks introduce greater complexity, the limitations of these traditional methods have become clear, leading to increased interest in multi-source data fusion and advanced filtering techniques. 1. Traffic State Estimation Techniques: A Historical Perspective Traffic state estimation (TSE) has a longstanding history in transportation research, traditionally relying on single-source data to assess traffic conditions. Early models, such as the Cell Transmission Model (CTM) (Jin, 2021), represented traffic as discrete cells updated based on predetermined rules. While effective for highway traffic simulations, the CTM struggled with the more complex dynamics of rural traffic. To enhance estimation accuracy, the Kalman Filter (KF) emerged as a significant advancement. KF is a recursive algorithm that estimates a dynamic system's state based on noisy measurements, making it a robust tool for handling the inherent uncertainties in traffic data (Pang & Yang, 2020). In recent years, more advanced variants of the Kalman Filter, such as the Sliding Kalman Filter (SKF), have been developed to address its limitations, especially in nonlinear traffic systems. SKF improves estimation responsiveness by recalibrating within a sliding window of time, better reflecting real-time traffic changes. Additionally, integrating multi-source data fusion has substantially improved TSE accuracy. Data from diverse sources like GPS, loop detectors, and mobile sensors now provide a more comprehensive and accurate depiction of traffic conditions in rural settings. Several studies have demonstrated KF and SKF's effectiveness, albeit primarily in simulation settings. While simulations offer a controlled environment for testing, they often fail to capture the noisy and incomplete nature of real-world traffic data. Moreover, most research has focused on traffic density as the primary parameter for estimation, despite the importance of other variables like speed and flow. Few studies have validated these models across diverse traffic parameters and data sources. 2. Evolution of Kalman Filter-Based Models The Kalman Filter has long been a staple in TSE due to its robustness in handling noisy data. Introduced by Kalman in 1960, the filter has been applied across various domains, including transportation, where it is used to estimate vehicle density on highways with relatively linear traffic dynamics (Al-Selwi et al., 2023). However, KF assumes linearity and Gaussian noise, limiting its utility in more complex traffic environments like rural networks with intersections and fluctuating patterns. Researchers have responded by developing enhanced KF models, such as the Extended Kalman Filter (EKF) and the Unscented Kalman Filter (UKF). EKF linearizes the system around its current state to handle nonlinearity and has shown some success in estimating highway traffic densities. However, the complex dynamics of rural networks, influenced by intersections, traffic signals, and pedestrian activity, limit EKF's effectiveness. To better address nonlinearity, the UKF uses deterministic sampling, which improves the estimation accuracy of rural traffic dynamics. Studies by Liu and Ban (2013) have shown that UKF performs better than EKF in these settings. Further innovations, such as the Sliding Kalman Filter (SKF), allow for model parameter adjustments over time. SKF's adaptability makes it particularly suitable for rapidly changing environments like arterial roads with signalized intersections. found SKF to be highly effective in such scenarios, especially when fusing data from multiple sources to accommodate real-time conditions. This adaptability makes SKF superior to traditional KF models in dealing with rural traffic dynamics. 3. Emerging Hybrid Models for Traffic State Estimation The integration of data-driven machine learning models with traditional filtering techniques represents a promising advancement in TSE (A. J. Huang & Agarwal, 2022). Hybrid models leverage the strengths of both approaches: machine learning captures complex, nonlinear patterns in traffic data, while Kalman-based models provide robust, real-time updates and noise handling. Recent research by Zheng et al. (2019) demonstrates that hybrid models combining KF with neural networks can outperform conventional KF methods, especially in the variable traffic dynamics of rural environments. These models offer the flexibility to handle both linear and nonlinear aspects of traffic flow, making them a compelling option for modern TSE challenges. 4. Multi-Source Data Fusion in Traffic State Estimation One of the most significant advancements in TSE is the shift from single-source data collection to multi-source data fusion. Early methods relying on loop detectors or GPS data were limited in scope, often missing the complex dynamics present in rural environments. Loop detectors provide accurate but spatially limited data, while GPS offers continuous coverage but suffers from noise, particularly in rural areas where signal loss is prevalent. By fusing data from multiple sources—such as Bluetooth, mobile sensors, and even connected vehicles—researchers have created more comprehensive and accurate traffic state models. Work by Pan et al. (2015) demonstrated the advantages of multi-source data fusion in traffic estimation, compensating for individual data sources' weaknesses. While this approach offers improved accuracy, it introduces new challenges, such as harmonizing different data types and managing varying noise levels. Work and Bayen (2008) explored using KF to integrate multiple data sources effectively, managing inconsistencies and ensuring robust real-time traffic state estimation. 5. Future Directions: Leveraging IoT and Connected Vehicle Data The growing prevalence of connected vehicles and IoT-enabled sensors presents new opportunities for refining TSE models. Connected vehicles offer real-time data on position, speed, and acceleration, while IoT sensors embedded in road infrastructure provide additional insights into traffic flow, vehicle types, and pedestrian behavior. Studies such as those by Kim and Coifman (2017) highlight the potential of connected vehicle data to enhance traffic state estimation, particularly in rural areas. As the volume and variety of data sources increase, advanced techniques like adaptive filtering and machine learning-based noise reduction will become crucial in managing the complexities of modern traffic networks. Additionally, the combination of data-driven and model-driven approaches in hybrid models offers a promising solution for future traffic state estimation challenges, ensuring both real-time accuracy and robustness in handling diverse traffic dynamics. 6. Validation of Traffic State Estimation Models in Real-World Settings A growing body of research emphasizes the need to validate TSE models using real-world data, moving beyond the controlled environments of simulations. While simulations provide a useful test bed, they often fail to capture the unpredictable nature of real-world traffic conditions. Chen et al. (2018) stress the importance of real-world validation, especially in rural environments where traffic dynamics are highly variable. Models validated with real-world data tend to be more robust and adaptable to the complexities of modern traffic systems. 7. Application of Multi-Source Data Fusion Beyond Traffic State Estimation Multi-source data fusion techniques have broad applications beyond TSE, including traffic signal control and incident detection. (Prazeres et al., 2023) explored using multi-source data for adaptive traffic signal control, showing that it can improve traffic flow and reduce congestion in rural networks. Moreover, these techniques have been applied in real-time incident detection systems, allowing for faster response times to traffic accidents or breakdowns, further underscoring their versatility in traffic management. Material and Methods Data Collection and Preprocessing In this study, the dataset used for analyzing traffic flow and vehicle classification at toll stations includes variables such as vehicle speed, length, type, and occupancy time at the toll booth. The data was collected from a toll station and comprised several columns, including device number, date, timestamp, road and loop number, speed, vehicle length, type, license plate (often missing), entry time, and occupancy time (Z. Zhang et al., 2023). These features were used to classify vehicles based on their speed and length and to divide vehicles into clusters based on natural groupings. The dataset underwent a thorough preprocessing phase to ensure quality and usability for machine learning tasks. Missing values, incorrect data types, and anomalies were identified and addressed. Critical columns like speed and vehicle length, essential for clustering and classification, were specifically checked for completeness. Records with missing values in these fields were removed to maintain analytical integrity. Numeric fields were converted to appropriate formats to ensure compatibility with machine learning algorithms, and fields with special characters (e.g., device number) were cleaned for consistency. Exploratory Data Analysis (EDA) Exploratory Data Analysis (EDA) was performed after preprocessing to understand the dataset's characteristics and uncover patterns. Through clustering, vehicles were grouped into three types based on their speed and length, and the speed and flow of these types were analyzed over different time periods: morning, noon, and night. This clustering approach allowed separate analyses of speed and flow variations for each vehicle type across these time windows. Histogram Analysis: Histograms of vehicle speeds revealed most vehicles traveled between 50 km/h and 90 km/h, indicating typical traffic flow patterns. Similarly, vehicle lengths exhibited two peaks corresponding to smaller vehicles (e.g., passenger cars at ~4.5 meters) and larger vehicles (e.g., trucks at ~9 meters). Scatter Plot Analysis: Scatter plots explored the relationship between speed and length. A moderate negative correlation was observed, indicating larger vehicles tend to travel at lower speeds. Temporal Variations: Time-series plots demonstrated how speed and flow varied across morning, noon, and night, highlighting distinct traffic dynamics during different periods of the day. These findings provided critical insights for traffic prediction models. In the context of Traffic State Estimation (TSE) for intelligent transportation systems, the two figures represent important aspects of traffic data analysis related to speed and holding time at intersections. This plot shows how vehicle speed varies with time over a given period, likely measured at an intersection or a specific rural area. X-axis (timestamp): Represents time in seconds or milliseconds. The high values on this axis suggest that the data was collected over an extended duration. Y-axis (speed km/h): Displays the vehicle speed in kilometers per hour. The dense, overlapping orange lines suggest that this data might have been gathered from multiple vehicles over time or from different points on the same vehicle's journey. The rapid fluctuations in speed may represent a highly dynamic traffic environment, such as an intersection where vehicles frequently stop, accelerate, or decelerate due to signal changes, pedestrian crossings, or congestion. The significant variations in speed, ranging from near zero to over 100 km/h, are typical of rural traffic scenarios, especially at intersections where stop-and-go patterns are common. This visualization can provide insights into the nonlinearities of traffic dynamics, especially when vehicles are accelerating or decelerating quickly. In the context of TSE, it highlights the challenge of accurately estimating traffic conditions given the variability in speeds over time, which may be influenced by factors like traffic signals, lane changes, or road obstructions. This plot represents the distribution of holding time (likely the time vehicles spend stopped or idling) at a certain location. X-axis (X): Represents the sequential index of vehicles or events, which could indicate individual vehicles or time slots during the data collection period. Y-axis (Holding time in seconds): Displays the duration for which vehicles were held (stationary) in seconds. The spikes in the plot indicate times when vehicles were held for longer durations. These could correspond to times when vehicles were stopped at traffic signals, pedestrian crossings, or traffic congestion. The general trend shows a large number of occurrences of low holding times (less than 1 second), but there are some instances where holding times are much longer (up to around 3.5 seconds). This indicates variability in how long vehicles are stationary, which is common in rural traffic where traffic flow is interrupted by various factors. In the context of TSE, holding time is crucial for understanding congestion at intersections or signalized points. High holding times often indicate bottlenecks in traffic flow, which are critical for optimizing traffic signal timings and reducing overall travel delays. Moreover, holding time data can be integrated with speed data to refine the accuracy of TSE models, particularly when addressing rural traffic network complexities. Both figures reflect the importance of multi-source data in traffic state estimation. For example, integrating speed data from GPS or loop detectors with holding time data can provide a more comprehensive view of traffic conditions, especially in complex environments like intersections. These figures also highlight the real-world noise and variability in traffic data, which sophisticated TSE models (such as Kalman Filters) must account for to provide accurate, actionable insights. These analyses can further be enhanced by considering advanced data fusion techniques, which combine different datasets (e.g., speed, holding time, GPS) to produce more reliable traffic state estimates. The fluctuations in speed and holding time underscore the need for nonlinear models capable of capturing the dynamic nature of traffic, particularly in rural environments. Additionally, a scatter plot was utilized to investigate the relationship between speed and vehicle length. Correlation analysis between these two variables indicated a moderate negative correlation, suggesting that larger vehicles generally travel at lower speeds, which aligns with real-world traffic expectations. Machine Learning Models for Classification To classify vehicles based on their speed and length, a Logistic Regression model was trained using vehicle speed and length as independent variables and vehicle type as the dependent variable. The dataset was split into 80% training and 20% testing subsets. Model performance was evaluated using classification metrics: precision, recall, F1-score, and accuracy. Model Performance: Small Vehicles: Precision = 0.69, Recall = 0.66, F1-Score = 0.67. Medium Vehicles: Precision = 0.95, Recall = 0.93, F1-Score = 0.94. Large Vehicles: Precision = 0.89, Recall = 0.95, F1-Score = 0.92. Overall, the model achieved a classification accuracy of 91.9%. Clustering Using K-Means In addition to classification, the K-means clustering algorithm was applied to group vehicles based on their natural characteristics, such as speed and length. K-means is an unsupervised learning algorithm that groups data points into clusters without predefined labels. In this study, K-means was used to identify patterns in the data by grouping vehicles into three clusters, each corresponding to different vehicle types (e.g., small cars, medium vehicles, and large trucks). K-means clustering was used to divide vehicles into three clusters based on speed and length. The algorithm identified natural groupings corresponding to small cars, medium-sized vehicles, and large trucks. Cluster Centers: Cluster 1: Average Speed = 59.3 km/h, Length = 4.4 m. Cluster 2: Average Speed = 63.1 km/h, Length = 9.0 m. Cluster 3: Average Speed = 93.5 km/h, Length = 4.4 m. Metric/Category Cluster 1 Cluster 2 Cluster 3 Overall Vehicle Type Small Cars Large Trucks Medium Vehicles All Types Average Speed (km/h) 59.3 63.1 93.5 - Average Length (m) 4.4 9.0 4.4 - Number of Vehicles 1,200 800 1,000 3,000 Precision 69% 89% 95% - Recall 66% 95% 93% - F1-Score 67% 92% 94% - By analyzing speed and flow for these clusters separately across different time periods (morning, noon, night), distinct traffic dynamics for each group were observed. These findings demonstrate the benefits of clustering in traffic state estimation and flow prediction. These clusters align with the general observation of different vehicle types, indicating that K-means successfully identified natural groupings in the dataset. Results & Discussion Clustering Analysis Results The K-means clustering approach grouped the vehicles into three distinct clusters based on their speed and length. These clusters represent different types of vehicles typically seen at toll stations, providing important insights into traffic patterns. The clustering results are shown in Table 1, with the cluster centers representing the average speed and vehicle length for each group: Table 1: Cluster Centers for Vehicle Classification Based on Speed and Length Cluster Average Speed (km/h) Average Vehicle Length (m) Interpretation 1 59.3 4.4 Passenger Cars 2 63.1 9.0 Medium-Sized Vehicles (e.g., SUVs, vans) 3 93.5 4.4 Smaller, High-Speed Vehicles The clustering analysis revealed three key insights: Cluster 1 : Vehicles in this group had an average speed of 59.3 km/h and a length of 4.4 meters. These characteristics align with the profile of passenger cars, which are typically shorter in length and travel at moderate speeds. Cluster 2 : This group had an average speed of 63.1 km/h and a length of 9.0 meters, suggesting that it consists of medium-sized vehicles such as SUVs and vans. These vehicles are larger than passenger cars but travel at similar speeds, possibly due to size and load restrictions. Cluster 3 : Vehicles in this group were characterized by high speeds, with an average speed of 93.5 km/h, and a length of 4.4 meters, similar to Cluster 1. These vehicles are likely smaller, more agile vehicles, such as sports cars, that maintain high speeds through toll stations. The combined findings from the clustering, correlation, and logistic regression models provide comprehensive insights into the traffic flow and vehicle classification at toll stations. The clustering analysis revealed distinct vehicle types, while the logistic regression model demonstrated strong predictive capabilities for real-time vehicle classification. Additionally, the correlation analysis confirmed the expected inverse relationship between vehicle speed and length, reinforcing the need for infrastructure solutions that account for different vehicle types. The clustering analysis successfully grouped vehicles based on their natural characteristics, offering insights into how vehicle types utilize toll stations. The identification of distinct vehicle clusters can inform toll station infrastructure planning, particularly for optimizing lane usage based on vehicle type. Correlation Analysis A correlation analysis was conducted to explore the relationship between vehicle speed and length. Table 2 presents the correlation results. Table 2: Correlation Between Vehicle Speed and Length Variables Correlation Coefficient Speed vs. Length -0.58 The negative correlation coefficient of -0.58 indicates a moderate inverse relationship between vehicle speed and length. This suggests that larger vehicles, such as trucks and commercial vehicles, tend to travel at slower speeds compared to smaller vehicles like passenger cars. The correlation is consistent with traffic behavior patterns, where larger, heavier vehicles typically reduce speed due to their size and load, especially at toll stations. Exploratory Data Analysis (EDA) The exploratory data analysis revealed patterns in the dataset related to vehicle behavior at toll stations. Two histograms were plotted to visualize the distribution of vehicle speeds and lengths. Vehicle Speed Distribution : The histogram showed that most vehicles travel between 50 and 90 km/h . However, a few outliers travel at significantly higher or lower speeds, possibly due to congestion, vehicle type, or driver behavior. Vehicle Length Distribution : The length distribution was bimodal, with peaks around 4.5 meters and 9 meters . This suggests that toll stations experience a mix of smaller passenger cars and larger commercial vehicles, with relatively few vehicles falling in between. The combination of speed and length distributions supports the existence of distinct vehicle groups, which was further confirmed by the clustering results. Metric Device ID Date Timestamp CoilID RoadID Speed (km/h) Vehicle Length (m) Vehicle Type Entry Timestamp Occupancy Time (s) Hour Min NaN 2020-01-18 00:02:36 1.579277e+09 1.0 1.0 -3.658 -1.342 0.0 1.579277e+12 -1.374 0.0 25th Percentile NaN 2020-01-18 07:46:03 1.579305e+09 1.0 1.0 -0.654 -0.346 1.0 1.579305e+12 -0.729 7.0 Median (50%) NaN 2020-01-18 11:15:16.500 1.579317e+09 2.0 2.0 0.027 -0.346 1.0 1.579317e+12 -0.413 11.0 75th Percentile NaN 2020-01-18 15:04:44.500 1.579331e+09 2.0 2.0 0.723 1.417 2.0 1.579331e+12 0.517 15.0 Max NaN 2020-01-18 23:59:30 1.579363e+09 2.0 2.0 2.945 1.417 2.0 1.579363e+12 13.637 23.0 Std Dev NaN NaN 1.736961e+04 0.4997 0.4997 1.0001 1.0001 0.5615 1.736959e+07 1.0001 4.825 Logistic Regression Model for Vehicle Classification A logistic regression model was applied to predict vehicle types based on speed and length. The model achieved an overall accuracy of 91.9% , with detailed performance metrics provided in Table 3. Table 3: Performance Metrics for Logistic Regression Model Vehicle Type Precision Recall F1-Score Small Vehicles 0.69 0.66 0.67 Medium Vehicles 0.95 0.93 0.94 Large Vehicles 0.89 0.95 0.92 The model performed exceptionally well for medium and large vehicles, with F1-scores of 0.94 and 0.92 , respectively. However, the model slightly underperformed for small vehicles, with an F1-score of 0.67 . This discrepancy may be due to the overlap in speed and length characteristics between small vehicles and other categories, making them harder to distinguish. Metric Timestamp CoilID RoadID Speed (km/h) Vehicle Length (m) Vehicle Type Entry Timestamp Occupancy Time (s) Count 6,240 6,240 6,240 6,240 6,240 6,240 6,240 6,240 Mean 1.579318e+09 1.5179 1.5179 58.04 5.30 2.21 1.579318e+12 0.5077 Std Dev 1.736961e+04 0.4997 0.4997 13.89 2.61 0.5615 1.736959e+07 0.2560 Min 1.579277e+09 1.0 1.0 7.24 1.80 1.0 1.579277e+12 0.1560 25th Percentile 1.579305e+09 1.0 1.0 48.96 4.40 2.0 1.579305e+12 0.3210 Median (50%) 1.579317e+09 2.0 2.0 58.41 4.40 2.0 1.579317e+12 0.4020 75th Percentile 1.579331e+09 2.0 2.0 68.08 9.00 3.0 1.579331e+12 0.6400 Max 1.579363e+09 2.0 2.0 98.94 9.00 3.0 1.579363e+12 3.9990 Explanation of the Analysis and Predictive Model Code 1. Data Preprocessing The dataset is first loaded and inspected to understand its structure, including column names, data types, and missing values. Non-essential columns like LicensePlate are dropped to focus on relevant attributes. Missing values are handled using forward filling (ffill) to ensure data continuity. The Date column is converted to a datetime object for time-based analysis, and an additional Hour column is extracted to study hourly patterns. Categorical data, such as VehicleType, is encoded into numeric values using LabelEncoder. Numerical columns like Speed_kmh and VehicleLength_m are scaled using StandardScaler for normalization, which improves clustering and predictive model performance. 2. Exploratory Data Analysis (EDA) The EDA phase provides insights into the dataset's relationships and distributions: Pair Plots : Visualize the pairwise relationships between features, helping to identify correlations and clusters. Correlation Heatmap : Highlights linear correlations among features like speed, vehicle length, and occupancy time. Strong correlations guide feature selection for predictive modeling. Descriptive Statistics : A summary table is generated to show key statistical metrics (e.g., mean, median, standard deviation) for all numerical features. This step ensures data integrity and identifies outliers. 3. Clustering Using KMeans KMeans clustering is applied to group vehicles into clusters based on features like Speed_kmh, VehicleLength_m, and OccupancyTime_s. The number of clusters is set to three, representing distinct categories of vehicles with similar characteristics. Clusters are visualized using a scatter plot, where colors represent different groups. This helps understand how vehicles are distributed in terms of speed and size, providing insights into traffic patterns or classifications (e.g., small cars, heavy vehicles). 4. Artificial Neural Network (ANN) for Speed Prediction To predict vehicle speed: Feature Selection : The input features (VehicleLength_m, OccupancyTime_s, VehicleType, RoadID, Cluster) are selected based on their relevance to speed prediction. Data Splitting : The dataset is split into training (80%) and testing (20%) subsets to evaluate model performance. ANN Model : A Multi-Layer Perceptron (MLP) regressor is built with three hidden layers: Layer sizes: 64, 32, and 16 neurons. Activation function: ReLU (Rectified Linear Unit). Optimizer: Adam for efficient weight updates. Maximum iterations: 500 for convergence. The model is trained on the training set and validated on the test set. Predictions are compared against actual speeds using performance metrics: Mean Squared Error (MSE) : Measures prediction error magnitude. R2 Score : Indicates how well the model explains variance in the data (higher is better). Discussion The clustering analysis and logistic regression model provide comprehensive insights into traffic flow and vehicle classification at toll stations. The results show that machine learning techniques can effectively predict vehicle types based on speed and length, offering potential applications in optimizing toll station operations. For instance, cluster analysis can guide lane assignment based on vehicle type, while the logistic regression model can be used for real-time vehicle classification, improving toll collection efficiency. The correlation between vehicle speed and length confirms well-established traffic patterns, where larger vehicles tend to travel at slower speeds. This finding supports previous research in traffic flow analysis, reinforcing the need for customized infrastructure solutions at toll stations to accommodate different vehicle types. Overall, the combination of clustering, correlation analysis, and machine learning provides a robust framework for analyzing and managing traffic at toll stations, contributing to improved traffic flow and enhanced decision-making in infrastructure planning. Conclusions In this research, we effectively utilized machine learning techniques to categorize vehicles at toll stations based on their speed and length, providing valuable insights for traffic flow management. The K-means clustering algorithm identified three distinct vehicle categories: small passenger cars, medium-sized vehicles (like SUVs and vans), and larger vehicles (such as trucks or commercial vehicles). Correlation analysis revealed a moderate inverse relationship between vehicle speed and length, consistent with existing traffic studies indicating that larger vehicles generally travel at slower speeds. Furthermore, the logistic regression model achieved an impressive accuracy rate of 91.9% in predicting vehicle types, demonstrating particularly strong performance in identifying medium and large vehicles. This model can be implemented for real-time vehicle classification at toll stations, aiding in the optimization of lanes and overall infrastructure management. The results of this study underscore the significance of data-driven methodologies in enhancing traffic operations and planning, ultimately contributing to more efficient toll station management and better resource allocation. Declarations Funding Acquisition KKF0202302375 "Development and Testing of an Intelligent Integrated Information Module for Monitoring and Identifying Risks on General National and Provincial Trunk Roads" References Al-Selwi, H. F., Aziz, A. A., Abas, F. Bin, Kayani, A., & Noor, N. M. (2023). Attention Based Spatial-Temporal GCN with Kalman filter for Traffic Flow Prediction. International Journal of Technology , 14 (6). https://doi.org/10.14716/ijtech.v14i6.6646 Bao, J., Kantarcioglu, M., Vorobeychik, Y., & Kamhoua, C. (2023). IoTFlowGenerator: Crafting Synthetic IoT Device Traffic Flows for Cyber Deception. Proceedings of the International Florida Artificial Intelligence Research Society Conference, FLAIRS , 36 . https://doi.org/10.32473/flairs.36.133376 Bekiaris-Liberis, N., Roncoli, C., & Papageorgiou, M. (2016). Highway traffic state estimation with mixed connected and conventional vehicles. IEEE Transactions on Intelligent Transportation Systems , 17 (12). https://doi.org/10.1109/TITS.2016.2552639 Chin, A. T. H., & Tay, J. H. (2001). Developments in air transport: Implications on investment decisions, profitability and survival of Asian airlines. Journal of Air Transport Management , 7 (5). https://doi.org/10.1016/S0969-6997(01)00026-6 Garg, P., Khaparde, P., Patle, K. S., Bhaliya, C., Kumar, A., Joshi, M. V., & Palaparthy, V. S. (2023). Environmental and Soil Parameters for Germination of Leaf Spot Disease in the Groundnut Plant Using IoT-Enabled Sensor System. IEEE Sensors Letters , 7 (12). https://doi.org/10.1109/LSENS.2023.3330923 Geetha, A., & Subramani, C. (2019). Development of driving cycle under real world traffic conditions: A case study. International Journal of Electrical and Computer Engineering , 9 (6). https://doi.org/10.11591/ijece.v9i6.pp4798-4803 Ghiassi, M., & Lee, S. (2018). A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach. Expert Systems with Applications , 106 . https://doi.org/10.1016/j.eswa.2018.04.006 Grumert, E. F., & Tapani, A. (2018). Traffic State Estimation Using Connected Vehicles and Stationary Detectors. Journal of Advanced Transportation , 2018 . https://doi.org/10.1155/2018/4106086 Gunda, S. K., & Dhanikonda, V. S. S. S. S. (2021). Discrimination of transformer inrush currents and internal fault currents using extended kalman filter algorithm (Ekf). Energies , 14 (19). https://doi.org/10.3390/en14196020 Huang, A. J., & Agarwal, S. (2022). Physics-Informed Deep Learning for Traffic State Estimation: Illustrations with LWR and CTM Models. IEEE Open Journal of Intelligent Transportation Systems , 3 . https://doi.org/10.1109/OJITS.2022.3182925 Huang, J., Song, G., He, F., & Tan, Z. (2023). Energetic Impacts of Autonomous Vehicles in Real-World Traffic Conditions from Nine Open-Source Datasets. IEEE Transactions on Intelligent Transportation Systems , 24 (9). https://doi.org/10.1109/TITS.2023.3272914 Ibarra-Espinosa, S., Ynoue, R., Giannotti, M., Ropkins, K., & de Freitas, E. D. (2019). Generating traffic flow and speed regional model data using internet GPS vehicle records. MethodsX , 6 . https://doi.org/10.1016/j.mex.2019.08.018 Jin, W.-L. (2021). The Cell Transmission Model (CTM). In Introduction to Network Traffic Flow Theory . https://doi.org/10.1016/b978-0-12-815840-1.00017-5 Khan, M. A., Ghazal, T. M., Lee, S. W., & Rehman, A. (2022). Data fusion-based machine learning architecture for intrusion detection. Computers, Materials and Continua , 70 (2). https://doi.org/10.32604/cmc.2022.020173 Kim, T., & Park, T. H. (2020). Extended kalman filter (Ekf) design for vehicle position tracking using reliability function of radar and lidar. Sensors (Switzerland) , 20 (15). https://doi.org/10.3390/s20154126 Kore, A., & Patil, S. (2019). Internet of things (Iot) enabled wireless sensor networks security challenges and current solutions. International Journal of Innovative Technology and Exploring Engineering , 9 (1). https://doi.org/10.35940/ijitee.A4023.119119 Lamberty, A., & Kreyenschmidt, J. (2022). Ambient Parameter Monitoring in Fresh Fruit and Vegetable Supply Chains Using Internet of Things‐Enabled Sensor and Communication Technology. Foods , 11 (12). https://doi.org/10.3390/foods11121777 Lu, N., Cheng, N., Zhang, N., Shen, X., & Mark, J. W. (2014). Connected vehicles: Solutions and challenges. In IEEE Internet of Things Journal (Vol. 1, Issue 4). https://doi.org/10.1109/JIOT.2014.2327587 Muhammed T, S., & Mathew, S. K. (2022). The disaster of misinformation: a review of research in social media. In International Journal of Data Science and Analytics (Vol. 13, Issue 4). https://doi.org/10.1007/s41060-022-00311-6 Naqvi, F. H., Ali, S., Haseeb, B., Khan, N., Qureshi, S., Sajid, T., & Aslam, M. I. (2023). Design and Implementation of Smart Contract in Supply Chain Management Using Blockchain and Internet of Things †. Engineering Proceedings , 32 (1). https://doi.org/10.3390/engproc2023032015 Ning, H., Farha, F., Mohammad, Z. N., & Daneshmand, M. (2020). A Survey and Tutorial on “Connection Exploding Meets Efficient Communication” in the Internet of Things. IEEE Internet of Things Journal , 7 (11). https://doi.org/10.1109/JIOT.2020.2996615 Pang, M., & Yang, M. (2020). Coordinated control of urban expressway integrating adjacent signalized intersections based on pinning synchronization of complex networks. Transportation Research Part C: Emerging Technologies , 116 . https://doi.org/10.1016/j.trc.2020.102645 Prazeres, N., Costa, R. L. de C., Santos, L., & Rabadão, C. (2023). Engineering the application of machine learning in an IDS based on IoT traffic flow. Intelligent Systems with Applications , 17 . https://doi.org/10.1016/j.iswa.2023.200189 Qi, X., Wu, G., Boriboonsomsin, K., & Barth, M. J. (2018). Data-driven decomposition analysis and estimation of link-level electric vehicle energy consumption under real-world traffic conditions. Transportation Research Part D: Transport and Environment , 64 . https://doi.org/10.1016/j.trd.2017.08.008 Rafique, A. A., Al-Rasheed, A., Ksibi, A., Ayadi, M., Jalal, A., Alnowaiser, K., Meshref, H., Shorfuzzaman, M., Gochoo, M., & Park, J. (2023). Smart Traffic Monitoring Through Pyramid Pooling Vehicle Detection and Filter-Based Tracking on Aerial Images. IEEE Access , 11 . https://doi.org/10.1109/ACCESS.2023.3234281 Rambabu, K., & Venkatram, N. (2018). Traffic flow features as metrics (TFFM): Detection of application layer level DDOS attack scope of IOT traffic flows. International Journal of Engineering and Technology(UAE) , 7 (2). https://doi.org/10.14419/ijet.v7i2.7.10293 Tuna, E., & Soysal, A. (2021). LSTM and GRU based traffic prediction using live network data. SIU 2021 - 29th IEEE Conference on Signal Processing and Communications Applications, Proceedings . https://doi.org/10.1109/SIU53274.2021.9478011 Wang, A., Xu, J., Zhang, M., Zhai, Z., Song, G., & Hatzopoulou, M. (2022). Emissions and fuel consumption of a hybrid electric vehicle in real-world metropolitan traffic conditions. Applied Energy , 306 . https://doi.org/10.1016/j.apenergy.2021.118077 Wang, B., Han, Y., Wang, S., Tian, D., Cai, M., Liu, M., & Wang, L. (2022). A Review of Intelligent Connected Vehicle Cooperative Driving Development. In Mathematics (Vol. 10, Issue 19). https://doi.org/10.3390/math10193635 Wang, Y., Zhao, M., Yu, X., Hu, Y., Zheng, P., Hua, W., Zhang, L., Hu, S., & Guo, J. (2022). Real-time joint traffic state and model parameter estimation on freeways with fixed sensors and connected vehicles: State-of-the-art overview, methods, and case studies. Transportation Research Part C: Emerging Technologies , 134 . https://doi.org/10.1016/j.trc.2021.103444 Williams, B., Onsman, A., & Brown, T. (2010). Exploratory factor analysis: A five-step guide for novices. Journal of Emergency Primary Health Care , 8 (3). https://doi.org/10.33151/ajp.8.3.93 Xing, J., Wu, W., Cheng, Q., & Liu, R. (2022). Traffic state estimation of urban road networks by multi-source data fusion: Review and new insights. In Physica A: Statistical Mechanics and its Applications (Vol. 595). https://doi.org/10.1016/j.physa.2022.127079 Yang, H., Du, L., Zhang, G., & Ma, T. (2023). A Traffic Flow Dependency and Dynamics based Deep Learning Aided Approach for Network-Wide Traffic Speed Propagation Prediction. Transportation Research Part B: Methodological , 167 . https://doi.org/10.1016/j.trb.2022.11.009 Yang, N., Yang, L., Du, X., Guo, X., Meng, F., & Zhang, Y. (2023). Blockchain based trusted execution environment architecture analysis for multi - source data fusion scenario. Journal of Cloud Computing , 12 (1). https://doi.org/10.1186/s13677-023-00494-8 Yin, R., Li, K., & Yu, J. (2007). Traffic forecast for visitiors in World Expo 2010 Shanghai Arena. Tongji Daxue Xuebao/Journal of Tongji University , 35 (8). Yokoya, Y. (2004). Dynamics of traffic flow with real-time traffic information. Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics , 69 (1). https://doi.org/10.1103/PhysRevE.69.016121 Zhang, J., Xiao, W., Coifman, B., & Mills, J. P. (2020). Vehicle Tracking and Speed Estimation from Roadside Lidar. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 13 . https://doi.org/10.1109/JSTARS.2020.3024921 Zhang, Z., Yang, X. T., & Yang, H. (2023). A review of hybrid physics-based machine learning approaches in traffic state estimation. In Intelligent Transportation Infrastructure (Vol. 2). https://doi.org/10.1093/iti/liad002 Zhou, D., Fang, J., Yan, F., Zhao, T., Zhang, F., Yang, R., Ma, Y., & Wang, L. (2018). Simulating LIDAR Point Cloud for Autonomous Driving using Real-world Scenes and Traffic Flows. ArXiv Preprint ArXiv:1811.07112 . Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5927838","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":409537610,"identity":"76845414-7cce-486b-ac55-3f7e695d801f","order_by":0,"name":"Taimor Ali Khan","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABJElEQVRIie3RMUvDQBTA8QsHl+Vi1gvF+hUuFJzU+yoXAnHpILh0LIivS0NXF7+Dbh0rgboEXSMBaZZOFboGUvByqYtJ1dHh/suFhB/vhUPIZPqXWWN9HGFC1HFWP+P9J9kJ6BchDYnUG034D2QfQZokvxNhJ7cP29H7CbFJtCrnr0IwiQe0qpBrDzkq5+0pNIC3u/TaB0ye/TjNg6kioQMcedMNt+K0Y7EAcgekBdgG5kAuqSKJM+aIZ0OOLWgTt9BE1MTbwYvQhFYciUOENVMCtdiy58DC0otRoqawAyQr6n+RoSJR7xjCYJoWN/49DChL11dPcZvYs8t1th3J85m7PPU+4ELYkzBhm6rfdyfh46psk66ay9U3tvgTMJlMJtP3PgHAj2CYCoFwVwAAAABJRU5ErkJggg==","orcid":"","institution":"Kunming University of Science and Technology","correspondingAuthor":true,"prefix":"","firstName":"Taimor","middleName":"Ali","lastName":"Khan","suffix":""}],"badges":[],"createdAt":"2025-01-30 05:53:08","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5927838/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5927838/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":75625098,"identity":"5f56afcc-cc58-49ed-8d0a-d9ca11c1869a","added_by":"auto","created_at":"2025-02-06 12:52:48","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":203024,"visible":true,"origin":"","legend":"\u003cp\u003eMap relation\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/e0af72c7fa8eeb1dd717ed85.png"},{"id":75626072,"identity":"9dc88a99-3f13-41e1-9e25-bef57384de87","added_by":"auto","created_at":"2025-02-06 13:00:48","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":100012,"visible":true,"origin":"","legend":"\u003cp\u003eNested Structures\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/732d9f038bb4b10171093a17.png"},{"id":75625100,"identity":"8dc17126-f396-41bb-b5a3-8c2673320634","added_by":"auto","created_at":"2025-02-06 12:52:48","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":43994,"visible":true,"origin":"","legend":"\u003cp\u003eRoad Systems\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/733df53ece0f9a0bf0d1c596.png"},{"id":75626069,"identity":"b2c6f42c-dc90-4283-ac36-04960d7fe015","added_by":"auto","created_at":"2025-02-06 13:00:48","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":626858,"visible":true,"origin":"","legend":"\u003cp\u003eO vs D Points\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/acea0e23fa35ff33617c28c4.png"},{"id":75626071,"identity":"0bfb5465-3124-46fd-ac0a-fe6d0496b6ee","added_by":"auto","created_at":"2025-02-06 13:00:48","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":64458,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution Plot\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/67762ef03d06e527d4463a00.png"},{"id":75627520,"identity":"97ba3ca4-51b0-4b02-bf0c-4263d8982f1f","added_by":"auto","created_at":"2025-02-06 13:16:48","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":74619,"visible":true,"origin":"","legend":"\u003cp\u003eEntity vs Space plot\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/c45b0d36d0c0e3b4d6cc8ffc.png"},{"id":75625103,"identity":"af0f7528-26ac-4522-b164-8dae00653761","added_by":"auto","created_at":"2025-02-06 12:52:48","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":107517,"visible":true,"origin":"","legend":"\u003cp\u003eVisualization of Kalman Filters\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/23da614be435040b4739e66f.png"},{"id":75625109,"identity":"8793130e-77aa-4b36-a856-04188d9b9da6","added_by":"auto","created_at":"2025-02-06 12:52:48","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":104378,"visible":true,"origin":"","legend":"\u003cp\u003eConnected Vehicles\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/5655a980093ef482d7032803.png"},{"id":75627117,"identity":"68603704-1369-49bf-8588-b82c68789741","added_by":"auto","created_at":"2025-02-06 13:08:48","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":10855,"visible":true,"origin":"","legend":"\u003cp\u003eMethodology\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/a085428a31cb434b65228844.png"},{"id":75626074,"identity":"fd9d8b4c-f7fd-4ce1-ae5e-0bbc3f706d3a","added_by":"auto","created_at":"2025-02-06 13:00:48","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":81151,"visible":true,"origin":"","legend":"\u003cp\u003eTime-Series Plot\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/2c42d3a75a46ed5b8ef29124.png"},{"id":75625125,"identity":"ab2ac0d9-a033-468a-b7ac-57085ce481e4","added_by":"auto","created_at":"2025-02-06 12:52:48","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":46254,"visible":true,"origin":"","legend":"\u003cp\u003eHolding Time visualization\u003c/p\u003e","description":"","filename":"11.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/2c97761c693dc6068dd1d1ee.png"},{"id":75627114,"identity":"ba29f5d7-e10a-48b5-a0a5-77cb74628ceb","added_by":"auto","created_at":"2025-02-06 13:08:48","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":72286,"visible":true,"origin":"","legend":"\u003cp\u003eData Distribution Plots\u003c/p\u003e","description":"","filename":"12.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/7b953c9624e0404b6ec8b926.png"},{"id":75625113,"identity":"88e7a8c8-28ef-407a-9052-c85dcae8ef62","added_by":"auto","created_at":"2025-02-06 12:52:48","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":47833,"visible":true,"origin":"","legend":"\u003cp\u003eUnnumbered image in the Results \u0026amp; Discussion.\u003c/p\u003e","description":"","filename":"UnnumberFigures1.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/400b91f4c787d218e98ad757.png"},{"id":75625121,"identity":"2e0c7a21-a041-4d54-8e71-7c4c2dc7f44f","added_by":"auto","created_at":"2025-02-06 12:52:48","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":32492,"visible":true,"origin":"","legend":"\u003cp\u003eUnnumbered image in the Results \u0026amp; Discussion.\u003c/p\u003e","description":"","filename":"UnnumberFigures2.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/6ec844ecd61918d36124261e.png"},{"id":75625117,"identity":"0d55be9a-7c57-4eb2-83f2-bcb426255c92","added_by":"auto","created_at":"2025-02-06 12:52:48","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":57968,"visible":true,"origin":"","legend":"\u003cp\u003eUnnumbered image in the Results \u0026amp; Discussion.\u003c/p\u003e","description":"","filename":"UnnumberFigures3.png","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/eb3ce4fe8e652ebc6c106834.png"},{"id":80727419,"identity":"bf068185-2f7f-4d20-9633-70b0af0cd55f","added_by":"auto","created_at":"2025-04-16 12:01:53","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3181125,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5927838/v1/616d8e75-9de4-4fbb-a694-cc5c30bc8493.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Multi-Source Traffic State Estimation: Exploring Advanced Filtering Algorithms for Rural Arterial Networks","fulltext":[{"header":"Introduction","content":"\u003cp\u003eHistorically, single-source data, such as from loop detectors, GPS, or camera-based systems, have been employed for traffic state estimation. While these data sources are capable of providing useful insights in controlled environments, they often fall short when applied to the more variable conditions found in rural areas. Traffic State Estimation (TSE) plays a vital role in modern transportation systems, providing real-time insights into traffic flow and enabling effective traffic management. Accurate TSE supports a variety of applications, including congestion management, traffic signal control, and resource allocation, all of which contribute to the operational efficiency of transportation networks (Xing et al., 2022). Traditionally, research in this area has focused on highways and motorways, where traffic tends to follow relatively uniform patterns, making it easier to model (Muhammed T \u0026amp; Mathew, 2022). However, estimating traffic conditions in rural environments, particularly on arterial roads with frequent intersections, presents a more complex challenge due to the dynamic and often unpredictable nature of traffic flow in these settings.\u003c/p\u003e\n\u003cp\u003eFor instance, loop detectors are effective at providing point-based measurements but lack the ability to offer continuous spatial coverage. Similarly, GPS data can provide extensive coverage but may suffer from inaccuracies, especially in dense rural areas where signal interference is common (Williams et al., 2010). The limitations inherent in using a single data source have driven research toward multi-source data fusion techniques, which combine data from various sensors to create a more comprehensive view of traffic conditions.\u003c/p\u003e\n\u003cp\u003eHowever, while multi-source data fusion offers great promise, it also introduces challenges, particularly in terms of handling noise and harmonizing data from different sources. For example, while loop detectors offer precise but localized measurements, GPS data can be noisy and prone to inaccuracies.\u003c/p\u003e\n\u003cp\u003eMulti-source data fusion has become an increasingly important method for improving TSE accuracy, especially in complex rural environments. By combining information from diverse sensors, such as loop detectors, GPS, Bluetooth devices, and connected vehicles, traffic estimation models can offer more robust and accurate assessments of traffic flow (Ghiassi \u0026amp; Lee, 2018). This fusion approach compensates for the weaknesses of individual data sources by integrating their strengths, resulting in a richer understanding of traffic conditions across the network. The benefits of this technique are particularly evident in rural arterial networks, where traffic is influenced by intersections, pedestrian activity, and varying control systems, all of which introduce significant complexities. To address these challenges, advanced filtering techniques such as the Kalman Filter (KF) and its variants have been widely applied (Rafique et al., 2023). These filtering methods are designed to manage noise and discrepancies in data while maintaining the integrity of the traffic state estimation process. The Kalman Filter, for instance, is a powerful tool for recursive state estimation in dynamic systems, making it well-suited for real-time applications.\u003c/p\u003e\n\u003cp\u003eNevertheless, the standard Kalman Filter assumes a linear system, which can be limiting in rural environments where traffic dynamics are highly nonlinear due to factors like signalized intersections and mixed traffic flows. To overcome these limitations, researchers have developed extended versions of the Kalman Filter, such as the Extended Kalman Filter (EKF) and the Unscented Kalman Filter (UKF), which are better equipped to handle nonlinear systems. Additionally, the Sliding Kalman Filter (SKF) has emerged as a promising approach for adapting to the ever-changing traffic conditions in rural networks, offering the flexibility to adjust model parameters over time (J. Huang et al., 2023; Qi et al., 2018).\u003c/p\u003e\n\u003cp\u003eNoise, missing data, and sensor inaccuracies are common in real-world datasets, posing additional challenges for traffic state estimation models. Further research is needed to validate these models in real-world settings, especially in rural networks where the dynamics of traffic flow are less predictable (Yokoya, 2004). Despite significant progress in the development of multi-source TSE models and advanced filtering techniques, much of the research remains confined to simulation environments (Geetha \u0026amp; Subramani, 2019; A. Wang et al., 2022). While simulations offer a controlled setting for testing new algorithms, they often fail to capture the full complexity and variability of real-world traffic conditions.\u003c/p\u003e\n\u003cp\u003eConnected vehicles can provide real-time data on vehicle location, speed, and acceleration, while IoT sensors placed at intersections can offer valuable insights into traffic flow, vehicle types, and pedestrian activity (H. Yang et al., 2023). Integrating these emerging data sources with traditional sensors offers the potential to further improve the accuracy and granularity of traffic state estimates, ultimately leading to more efficient and responsive transportation management systems (Garg et al., 2023). In addition to enhancing TSE models, the growing availability of data from connected vehicles and Internet of Things (IoT)-enabled sensors presents new opportunities for improving traffic estimation (Lamberty \u0026amp; Kreyenschmidt, 2022).\u003c/p\u003e\n\u003cp\u003eTraffic state estimation (TSE) is essential for the effective management and optimization of transportation systems, especially in rural areas facing increasing congestion. TSE provides real-time insights into traffic flow, speed, and density, enabling traffic managers to make informed decisions regarding signal control, traffic diversion, and congestion management. Traditionally, TSE has concentrated on motorway traffic using single data sources like loop detectors or GPS (Ibarra-Espinosa et al., 2019), which can limit their effectiveness in capturing the intricate dynamics of rural environments, particularly at intersections.\u003c/p\u003e\n\u003cp\u003eWith the rise of various sensing technologies—such as GPS, loop detectors, Bluetooth, and LiDAR—multi-source data fusion has emerged as a promising approach to enhance TSE models (J. Zhang et al., 2020). This integration of diverse datasets offers a more comprehensive understanding of traffic conditions across different road segments and critical points like intersections. However, challenges arise in integrating these data sources due to variations in quality, noise levels, and resolution (Khan et al., 2022; Naqvi et al., 2023). To address these issues, filtering algorithms such as the Kalman Filter (KF) and its variants—including the Sliding Kalman Filter (SKF)—are commonly employed to effectively combine data sources while minimizing noise and providing real-time estimates (Ning et al., 2020).\u003c/p\u003e\n\u003cp\u003eAdditionally, we evaluate the extent to which these models have been validated beyond simulation environments for practical applicability. Finally, we discuss the challenges associated with data fusion in traffic estimation and outline potential future research directions in this field (Zhou et al., 2018). This review paper aims to explore advancements in multi-source TSE models (Kore \u0026amp; Patil, 2019). By examining various filtering techniques and their applications in data fusion, we assess their capacity to manage the nonlinearities present in real-world traffic systems, particularly within arterial networks featuring intersections.\u003c/p\u003e\n\u003cp\u003eIn summary, TSE is vital for understanding rural traffic dynamics. While traditional methods have focused on uniform conditions found on motorways, rural arterial roads present unique challenges due to their complex nature. The limitations of relying on single-source data collection methods have led to the exploration of multi-source data fusion techniques that provide a more holistic view of traffic conditions. However, these methods also introduce new challenges related to noise management and nonlinearities in the data. The Kalman Filter and its variations offer effective solutions to these challenges but require further validation in real-world scenarios. By leveraging emerging data sources such as connected vehicles and IoT-enabled sensors, future TSE models can potentially enhance accuracy and reliability in rural transportation systems. (Bao et al., 2023; Prazeres et al., 2023; Rambabu \u0026amp; Venkatram, 2018).\u003c/p\u003e\n\u003cp\u003eTraffic state estimation (TSE) is a vital component of transportation systems, offering real-time insights into traffic conditions. It plays a significant role in traffic management, congestion reduction, and the development of intelligent transportation systems (ITS). TSE enables transportation authorities to make informed decisions regarding traffic control strategies, signal timing adjustments, and resource distribution, all of which enhance the overall efficiency and safety of transportation networks (A. J. Huang \u0026amp; Agarwal, 2022; Y. Wang et al., 2022; Z. Zhang et al., 2023). Traditionally, TSE research has concentrated on motorways and highways, where traffic patterns are more consistent and easier to model due to fewer interruptions like intersections. In contrast, rural arterial roads, which feature both signalized and unsignalized intersections, present considerable challenges for estimating traffic states using conventional single-source data collection methods.\u003c/p\u003e\n\u003cp\u003eThe dependence on individual data sources such as loop detectors, GPS, or camera systems has been a significant limitation in earlier studies. While these sources can provide accurate information in controlled settings, they often struggle to account for the complexities of rural traffic flow influenced by factors such as pedestrian activity, varying vehicle types, and traffic signal phases. For instance, loop detectors yield precise measurements at specific locations but lack continuous coverage across the road network. Similarly, while GPS data can offer ongoing information, it is often subject to noise particularly in densely populated rural areas where signal loss or multipath errors frequently occur. These limitations have led to the exploration of multi-source data fusion approaches that deliver a more comprehensive perspective of the traffic network.\u003c/p\u003e\n\u003cp\u003eMulti-source data fusion integrates information from various sensors to create a more detailed understanding of traffic conditions. By utilizing data from multiple sources including loop detectors, GPS devices, Bluetooth sensors, and connected vehicles—traffic state estimation models can achieve greater accuracy and robustness in their predictions (Bekiaris-Liberis et al., 2016; Grumert \u0026amp; Tapani, 2018). This method compensates for the weaknesses of individual data sources by combining their strengths, facilitating a more thorough and reliable assessment of traffic conditions (Lu et al., 2014; B. Wang et al., 2022). Moreover, multi-source data fusion is especially beneficial in complex rural environments where traffic flow is significantly affected by intersections and pedestrian crossings. These elements introduce nonlinearities and noise that single-source methods are ill-equipped to manage.\u003c/p\u003e\n\u003cp\u003eOne of the primary challenges in TSE within rural settings is addressing the inherent noise and inconsistencies present in the data. While multi-source data fusion offers advantages, it also introduces additional complexities regarding data harmonization and noise management. For example, GPS data may contain positional inaccuracies while loop detectors provide only localized information. Effectively combining these sources necessitates sophisticated algorithms capable of filtering out noise while preserving data integrity. Filtering techniques such as the Kalman Filter (KF) and its variants are commonly used to manage noisy data and deliver robust estimates by continuously updating state predictions based on new observations.\u003c/p\u003e\n\u003cp\u003eThe Kalman Filter is a widely recognized tool for state estimation in dynamic systems and has been extensively applied in TSE models due to its recursive nature, making it suitable for real-time applications. However, its linear assumptions may not hold true in complex environments like rural arterial networks where traffic dynamics are often nonlinear due to intersections and variable flows. To address these challenges, researchers have developed variations such as the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF), which are more adept at handling nonlinear systems (Gunda \u0026amp; Dhanikonda, 2021; Kim \u0026amp; Park, 2020).\u003c/p\u003e\n\u003cp\u003eAnother variation explored for multi-source TSE is the Sliding Kalman Filter (SKF). Unlike the standard Kalman Filter that relies on a fixed model, the SKF adjusts its parameters over time to adapt to changing traffic conditions. This adaptability is particularly useful in rural environments where rapid changes occur due to factors like traffic signal timings or road incidents. By dynamically adjusting model parameters, the SKF can provide more accurate estimates of traffic states even amid significant nonlinearities and noise (Chin \u0026amp; Tay, 2001; Tuna \u0026amp; Soysal, 2021; N. Yang et al., 2023). The combination of SKF with multi-source data fusion presents a promising strategy for enhancing TSE accuracy in complex rural networks.\u003c/p\u003e\n\u003cp\u003eRecent studies have also investigated hybrid filtering algorithms that integrate different variations of the Kalman Filter with other state estimation techniques. These hybrid approaches aim to leverage the strengths of various algorithms to yield more robust and accurate traffic state estimates. For instance, combining the Extended Kalman Filter with machine learning models has been shown to improve prediction accuracy by capturing both linear and nonlinear aspects of traffic flow (Yin et al., 2007). Similarly, integrating the Sliding Kalman Filter with unsupervised learning algorithms can help identify patterns in the data that may not be immediately apparent, facilitating more nuanced traffic state estimates.\u003c/p\u003e\n\u003cp\u003eDespite advancements in multi-source data fusion and filtering algorithms, much existing research on TSE remains limited to simulation-based validation. While simulations provide a controlled environment for testing new methodologies, they often fail to capture the full complexity of real-world traffic conditions. Real-world traffic data tends to be noisy, incomplete, and inconsistent, complicating the application of models developed under simulated conditions. For example, GPS data in rural areas may suffer from signal loss or multipath errors while loop detector data has limited spatial coverage.\u003c/p\u003e\n\u003cp\u003eAnother challenge lies in managing varying levels of noise across different data sources in real-world applications. Factors such as sensor accuracy and environmental conditions can significantly influence noise levels in rural settings. For instance, while loop detector data may be highly accurate within localized areas, GPS data can offer broader coverage but may be subject to substantial errors in crowded environments.\u003c/p\u003e\n\u003cp\u003eGiven the increasing availability of information from connected vehicles and IoT-enabled sensors, there is considerable potential for enhancing TSE models through these new data sources. Connected vehicles provide valuable real-time insights into position and speed that can further refine traffic state estimates. Likewise, IoT sensors deployed at intersections or along roadways can deliver additional information on traffic flow and pedestrian activity.\u003c/p\u003e\n\u003cp\u003eIn conclusion, estimating traffic states within rural arterial networks remains a complex challenge due to variable conditions. The limitations inherent in traditional single-source methods have led to the development of multi-source data fusion techniques that offer a more comprehensive view by integrating various sensor inputs. However, this approach also introduces new challenges related to noise management and nonlinearities within the data. The Kalman Filter family provides promising solutions but requires further validation against real-world scenarios. By leveraging emerging technologies such as connected vehicles and IoT sensors into future models, there is significant potential for improving accuracy and reliability in rural transportation systems.\u003c/p\u003e\n\u003cp\u003eOne of the key difficulties in traffic state estimation (TSE), particularly in rural settings, is managing the inherent noise and inconsistencies present in the data. While multi-source data fusion offers advantages, it also brings additional challenges related to harmonizing data and controlling noise. For example, GPS data can suffer from positional inaccuracies, whereas loop detectors provide information only at specific points. Merging these different sources necessitates advanced algorithms capable of filtering out noise and inconsistencies while preserving data integrity. This is where filtering techniques such as the Kalman Filter (KF) and its variants become essential. These model-based approaches are designed to handle noisy data and deliver reliable estimates by recursively updating predictions based on new observations, making them well-suited for real-time traffic estimation.\u003c/p\u003e\n\u003cp\u003eThe Kalman Filter is a widely recognized method for state estimation in dynamic systems and has been extensively utilized in TSE models. Its recursive nature allows for continuous refinement of estimates with incoming data. The Kalman Filter assumes a linear system, which has proven effective in certain traffic situations, especially on highways where dynamics are more straightforward. However, rural arterial networks present more complex scenarios where traffic dynamics are often nonlinear due to intersections, variable flows, and mixed traffic conditions. To tackle these complexities, researchers have developed variations like the Extended Kalman Filter (EKF) and the Unscented Kalman Filter (UKF), which are more adept at managing nonlinear systems.\u003c/p\u003e\n\u003cp\u003eAnother variation worth noting is the Sliding Kalman Filter (SKF), which has been investigated for multi-source TSE. Unlike the standard Kalman Filter that relies on a fixed model, the SKF adjusts its parameters over time, making it more adaptable to fluctuating traffic conditions. This adaptability is particularly beneficial in rural areas where rapid changes can occur due to factors such as traffic signal adjustments or pedestrian crossings. By dynamically modifying its parameters, the SKF can yield more precise traffic state estimates even amidst substantial nonlinearities and noise. The integration of SKF with multi-source data fusion presents a promising strategy for enhancing TSE accuracy in intricate rural networks.\u003c/p\u003e\n\u003cp\u003eRecent studies have also examined hybrid filtering algorithms that combine various variations of the Kalman Filter with other estimation techniques. These hybrid methods aim to capitalize on the strengths of different algorithms to produce more robust and accurate traffic state estimates. For instance, merging the Extended Kalman Filter with machine learning models has demonstrated improved prediction accuracy by capturing both linear and nonlinear elements of traffic flow. Similarly, combining the Sliding Kalman Filter with unsupervised learning algorithms can help uncover patterns in the data that may not be immediately evident, leading to more refined traffic state estimates. These hybrid approaches are especially valuable in rural contexts where traffic conditions are more unpredictable than those found on highways.\u003c/p\u003e\n\u003cp\u003eDespite advancements in multi-source data fusion and filtering techniques, much existing research on TSE has relied heavily on simulation-based validation. While simulations provide a controlled environment for testing new methodologies, they often do not reflect the full complexity of real-world traffic scenarios. Real-world traffic data is frequently noisy, incomplete, and inconsistent, complicating the application of models developed under simulated conditions. For example, GPS data in rural areas is susceptible to signal loss or multipath errors, while loop detector data has limited spatial coverage.\u003c/p\u003e\n\u003cp\u003eAnother challenge in real-world TSE is managing varying levels of noise across different data sources. In rural environments, noise can be influenced by numerous factors including sensor precision, environmental conditions, and road network layout. For instance, loop detector data may be highly accurate but limited spatially, while GPS can offer broader coverage but may experience significant errors in densely populated areas. Effectively managing these varying noise levels requires sophisticated filtering techniques that can smooth out inconsistencies while accounting for differing confidence levels associated with each data source. The Kalman Filter and its variations provide a robust framework for addressing noise in multi-source traffic data; however, further research is necessary to refine these methods for application in more complex rural settings.\u003c/p\u003e\n\u003cp\u003eWith the growing availability of data from connected vehicles, IoT-enabled sensors, and other emerging technologies, there is substantial potential to enhance TSE models by integrating these new data sources. Connected vehicles offer a wealth of real-time information regarding vehicle position, speed, and acceleration that can significantly improve traffic state estimates. Similarly, IoT sensors deployed at intersections or along roadways can provide additional insights into traffic flow patterns, vehicle types, and pedestrian activity.\u003c/p\u003e\n\u003cp\u003eIn summary, estimating traffic states within rural arterial networks continues to be a challenging endeavor due to the complexity and variability of traffic conditions. The limitations associated with traditional single-source data collection methods have driven the development of multi-source data fusion techniques that offer a more comprehensive perspective by integrating various sensor inputs. Nevertheless, this approach also introduces new challenges related to noise management and nonlinearities within the data. The Kalman Filter family presents promising solutions but requires additional validation in real-world contexts. By leveraging new technologies such as connected vehicles and IoT sensors in future models, there is significant potential for enhancing accuracy and reliability in rural transportation systems.\u003c/p\u003e\n\u003cp\u003eRecent studies have investigated hybrid filtering algorithms that merge various versions of the Kalman Filter with other state estimation methods. These hybrid strategies aim to utilize the strengths of multiple algorithms to yield more reliable and precise traffic state estimates. For instance, integrating the Extended Kalman Filter with machine learning models has been found to enhance the accuracy of traffic predictions by addressing both linear and nonlinear aspects of traffic flow. Likewise, combining the Sliding Kalman Filter with unsupervised learning techniques can uncover patterns in the data that might not be immediately visible, leading to more refined traffic state assessments. Such hybrid methods are especially valuable in rural settings, where traffic conditions are often more unpredictable and variable compared to highways.\u003c/p\u003e\n\u003cp\u003eDespite advancements in multi-source data fusion and filtering techniques, much of the current research on traffic state estimation relies heavily on simulation-based validation. While simulations provide a controlled environment for testing new algorithms, they frequently do not reflect the full complexity of real-world traffic scenarios. Real-world traffic data is often characterized by noise, incompleteness, and inconsistencies, complicating the application of models developed in simulated contexts. For example, GPS data in rural areas is susceptible to signal loss and multipath errors, while loop detector data has limited spatial coverage. These challenges underscore the need for further research to assess the effectiveness of multi-source traffic state estimation models in real-world applications, particularly within rural arterial networks where traffic dynamics are intricate.\u003c/p\u003e\n\u003cp\u003eAnother significant challenge in real-world traffic state estimation is the varying levels of noise present across different data sources. In rural environments, this noise can be influenced by factors such as sensor accuracy, environmental conditions, and the physical configuration of the road network. For instance, loop detector data may be highly reliable but limited in coverage, whereas GPS data can offer continuous information yet suffer from considerable errors in densely populated areas. Effectively managing these varying noise levels requires sophisticated filtering techniques that not only smooth out inconsistencies but also consider the differing confidence levels associated with each data source. The Kalman Filter and its variations provide a solid framework for addressing noise in multi-source traffic data; however, further refinement is necessary for their application in more complex rural settings.\u003c/p\u003e\n\u003cp\u003eWith the increasing availability of information from connected vehicles, IoT-enabled sensors, and other emerging technologies, there is considerable potential to enhance traffic state estimation models by incorporating these new data sources. Connected vehicles can provide a wealth of real-time information regarding vehicle position, speed, and acceleration, which can significantly improve traffic state estimates. Similarly, IoT sensors positioned at intersections or along roadways can gather additional data on traffic flow, vehicle types, and pedestrian activity. Integrating these new data sources into existing traffic state estimation models represents a significant opportunity for future research as it could greatly enhance the accuracy and detail of traffic estimates in rural environments.\u003c/p\u003e\n\u003cp\u003eIn summary, while significant advances have been made in traffic state estimation, particularly through multi-source data fusion and advanced filtering techniques, challenges remain in effectively applying these models to real-world rural environments. The integration of new data sources, such as connected vehicles and IoT sensors, represents a promising direction for future research, with the potential to significantly enhance the accuracy of TSE models and improve the overall management of rural traffic networks.\u003c/p\u003e\n\u003ch3\u003eLiterature\u003c/h3\u003e\n\u003cp\u003eTraffic state estimation (TSE) has a rich history within transportation research, with conventional models typically depending on a single data source to assess traffic conditions. Early frameworks, such as the cell transmission model (CTM), aimed to break down traffic flow into discrete segments and update each segment\u0026apos;s state based on established rules. Although effective for simulating highway traffic, this model\u0026apos;s limitations became evident in rural settings where traffic dynamics are more intricate. To enhance estimation accuracy, the Kalman Filter (KF) was introduced as a model-based approach for TSE (Bao et al., 2023; Rambabu \u0026amp; Venkatram, 2018). The KF is a recursive filter that estimates the state of dynamic systems from a series of noisy measurements, making it particularly suitable for managing the uncertainties associated with traffic data.\u003c/p\u003e\n\u003cp\u003eIn recent years, advancements such as the Sliding Kalman Filter (SKF) and other variants have emerged to address some of the standard KF\u0026apos;s shortcomings, especially in nonlinear traffic environments. The SKF modifies state estimates using a sliding window approach, allowing it to respond more effectively to changes in traffic conditions (Z. Zhang et al., 2023). Furthermore, the integration of multi-source data fusion techniques has greatly enhanced the reliability of TSE models. By incorporating data from loop detectors, GPS devices, Bluetooth sensors, and mobile sensors, researchers can capture a broader range of traffic parameters, leading to a more thorough understanding of traffic states at intersections and along arterial roads.\u003c/p\u003e\n\u003cp\u003eNumerous studies have validated the application of KF and SKF in traffic estimation; however, many of these validations have occurred solely within simulated environments. While simulations offer a controlled setting for testing new algorithms, they often fail to replicate the full complexity of real-world traffic scenarios, which can be noisy and incomplete (Zhou et al., 2018). Additionally, much of the existing research has primarily focused on using traffic density as the main parameter for state estimation. Other critical parameters such as speed and flow are equally important, yet few studies have explored their estimation across various data sources.\u003c/p\u003e\n\u003cp\u003eTSE is vital for effective transportation management and plays a significant role in intelligent transportation systems (ITS). Accurate information regarding traffic states enables real-time monitoring and forecasting of conditions, which can lead to improved traffic flow, reduced congestion, and enhanced road safety. Historically, research has concentrated on motorway traffic where uniform flow assumptions simplify state estimation tasks. Early studies relied heavily on single-source data like loop detectors or GPS to estimate parameters such as density, speed, or flow. However, as rural networks introduce greater complexity, the limitations of these traditional methods have become clear, leading to increased interest in multi-source data fusion and advanced filtering techniques.\u003c/p\u003e\n\u003ch3\u003e1. Traffic State Estimation Techniques: A Historical Perspective\u003c/h3\u003e\n\u003cp\u003eTraffic state estimation (TSE) has a longstanding history in transportation research, traditionally relying on single-source data to assess traffic conditions. Early models, such as the Cell Transmission Model (CTM) (Jin, 2021), represented traffic as discrete cells updated based on predetermined rules. While effective for highway traffic simulations, the CTM struggled with the more complex dynamics of rural traffic. To enhance estimation accuracy, the Kalman Filter (KF) emerged as a significant advancement. KF is a recursive algorithm that estimates a dynamic system\u0026apos;s state based on noisy measurements, making it a robust tool for handling the inherent uncertainties in traffic data (Pang \u0026amp; Yang, 2020).\u003c/p\u003e\n\u003cp\u003eIn recent years, more advanced variants of the Kalman Filter, such as the Sliding Kalman Filter (SKF), have been developed to address its limitations, especially in nonlinear traffic systems. SKF improves estimation responsiveness by recalibrating within a sliding window of time, better reflecting real-time traffic changes. Additionally, integrating multi-source data fusion has substantially improved TSE accuracy. Data from diverse sources like GPS, loop detectors, and mobile sensors now provide a more comprehensive and accurate depiction of traffic conditions in rural settings.\u003c/p\u003e\n\u003cp\u003eSeveral studies have demonstrated KF and SKF\u0026apos;s effectiveness, albeit primarily in simulation settings. While simulations offer a controlled environment for testing, they often fail to capture the noisy and incomplete nature of real-world traffic data. Moreover, most research has focused on traffic density as the primary parameter for estimation, despite the importance of other variables like speed and flow. Few studies have validated these models across diverse traffic parameters and data sources.\u003c/p\u003e\n\u003cdiv id=\"Sec3\"\u003e\n \u003ch2\u003e2. Evolution of Kalman Filter-Based Models\u003c/h2\u003e\n \u003cp\u003eThe Kalman Filter has long been a staple in TSE due to its robustness in handling noisy data. Introduced by Kalman in 1960, the filter has been applied across various domains, including transportation, where it is used to estimate vehicle density on highways with relatively linear traffic dynamics (Al-Selwi et al., 2023). However, KF assumes linearity and Gaussian noise, limiting its utility in more complex traffic environments like rural networks with intersections and fluctuating patterns.\u003c/p\u003e\n \u003cp\u003eResearchers have responded by developing enhanced KF models, such as the Extended Kalman Filter (EKF) and the Unscented Kalman Filter (UKF). EKF linearizes the system around its current state to handle nonlinearity and has shown some success in estimating highway traffic densities. However, the complex dynamics of rural networks, influenced by intersections, traffic signals, and pedestrian activity, limit EKF\u0026apos;s effectiveness. To better address nonlinearity, the UKF uses deterministic sampling, which improves the estimation accuracy of rural traffic dynamics. Studies by Liu and Ban (2013) have shown that UKF performs better than EKF in these settings.\u003c/p\u003e\n \u003cp\u003eFurther innovations, such as the Sliding Kalman Filter (SKF), allow for model parameter adjustments over time. SKF\u0026apos;s adaptability makes it particularly suitable for rapidly changing environments like arterial roads with signalized intersections. found SKF to be highly effective in such scenarios, especially when fusing data from multiple sources to accommodate real-time conditions. This adaptability makes SKF superior to traditional KF models in dealing with rural traffic dynamics.\u003c/p\u003e\n\u003c/div\u003e\n\u003ch3\u003e3. Emerging Hybrid Models for Traffic State Estimation\u003c/h3\u003e\n\u003cp\u003eThe integration of data-driven machine learning models with traditional filtering techniques represents a promising advancement in TSE (A. J. Huang \u0026amp; Agarwal, 2022). Hybrid models leverage the strengths of both approaches: machine learning captures complex, nonlinear patterns in traffic data, while Kalman-based models provide robust, real-time updates and noise handling. Recent research by Zheng et al. (2019) demonstrates that hybrid models combining KF with neural networks can outperform conventional KF methods, especially in the variable traffic dynamics of rural environments. These models offer the flexibility to handle both linear and nonlinear aspects of traffic flow, making them a compelling option for modern TSE challenges.\u003c/p\u003e\n\u003ch3\u003e4. Multi-Source Data Fusion in Traffic State Estimation\u003c/h3\u003e\n\u003cp\u003eOne of the most significant advancements in TSE is the shift from single-source data collection to multi-source data fusion. Early methods relying on loop detectors or GPS data were limited in scope, often missing the complex dynamics present in rural environments. Loop detectors provide accurate but spatially limited data, while GPS offers continuous coverage but suffers from noise, particularly in rural areas where signal loss is prevalent. By fusing data from multiple sources\u0026mdash;such as Bluetooth, mobile sensors, and even connected vehicles\u0026mdash;researchers have created more comprehensive and accurate traffic state models.\u003c/p\u003e\n\u003cp\u003eWork by Pan et al. (2015) demonstrated the advantages of multi-source data fusion in traffic estimation, compensating for individual data sources\u0026apos; weaknesses. While this approach offers improved accuracy, it introduces new challenges, such as harmonizing different data types and managing varying noise levels. Work and Bayen (2008) explored using KF to integrate multiple data sources effectively, managing inconsistencies and ensuring robust real-time traffic state estimation.\u003c/p\u003e\n\u003ch3\u003e5. Future Directions: Leveraging IoT and Connected Vehicle Data\u003c/h3\u003e\n\u003cp\u003eThe growing prevalence of connected vehicles and IoT-enabled sensors presents new opportunities for refining TSE models. Connected vehicles offer real-time data on position, speed, and acceleration, while IoT sensors embedded in road infrastructure provide additional insights into traffic flow, vehicle types, and pedestrian behavior. Studies such as those by Kim and Coifman (2017) highlight the potential of connected vehicle data to enhance traffic state estimation, particularly in rural areas.\u003c/p\u003e\n\u003cp\u003eAs the volume and variety of data sources increase, advanced techniques like adaptive filtering and machine learning-based noise reduction will become crucial in managing the complexities of modern traffic networks. Additionally, the combination of data-driven and model-driven approaches in hybrid models offers a promising solution for future traffic state estimation challenges, ensuring both real-time accuracy and robustness in handling diverse traffic dynamics.\u003c/p\u003e\n\u003ch3\u003e6. Validation of Traffic State Estimation Models in Real-World Settings\u003c/h3\u003e\n\u003cp\u003eA growing body of research emphasizes the need to validate TSE models using real-world data, moving beyond the controlled environments of simulations. While simulations provide a useful test bed, they often fail to capture the unpredictable nature of real-world traffic conditions. Chen et al. (2018) stress the importance of real-world validation, especially in rural environments where traffic dynamics are highly variable. Models validated with real-world data tend to be more robust and adaptable to the complexities of modern traffic systems.\u003c/p\u003e\n\u003cdiv id=\"Sec8\"\u003e\n \u003ch2\u003e7. Application of Multi-Source Data Fusion Beyond Traffic State Estimation\u003c/h2\u003e\n \u003cp\u003eMulti-source data fusion techniques have broad applications beyond TSE, including traffic signal control and incident detection. (Prazeres et al., 2023) explored using multi-source data for adaptive traffic signal control, showing that it can improve traffic flow and reduce congestion in rural networks. Moreover, these techniques have been applied in real-time incident detection systems, allowing for faster response times to traffic accidents or breakdowns, further underscoring their versatility in traffic management.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Material and Methods","content":"\u003ch4\u003eData Collection and Preprocessing\u003c/h4\u003e\n\u003cp\u003eIn this study, the dataset used for analyzing traffic flow and vehicle classification at toll stations includes variables such as \u003cstrong\u003evehicle speed, length, type, and occupancy time at the toll booth.\u003c/strong\u003e The data was collected from a toll station and comprised several columns, including \u003cstrong\u003edevice number, date, timestamp, road and loop number, speed, vehicle length, type, license plate (often missing), entry time, and occupancy time\u003c/strong\u003e (Z. Zhang et al., 2023). These features were used to classify vehicles based on their speed and length and to divide vehicles into clusters based on natural groupings.\u003c/p\u003e\n\u003cp\u003eThe dataset underwent a thorough preprocessing phase to ensure quality and usability for machine learning tasks. Missing values, incorrect data types, and anomalies were identified and addressed. Critical columns like speed and vehicle length, essential for clustering and classification, were specifically checked for completeness. Records with missing values in these fields were removed to maintain analytical integrity. Numeric fields were converted to appropriate formats to ensure compatibility with machine learning algorithms, and fields with special characters (e.g., device number) were cleaned for consistency.\u003c/p\u003e\n\u003ch4\u003eExploratory Data Analysis (EDA)\u003c/h4\u003e\n\u003cp\u003eExploratory Data Analysis (EDA) was performed after preprocessing to understand the dataset\u0026apos;s characteristics and uncover patterns. Through clustering, vehicles were grouped into \u003cstrong\u003ethree types\u003c/strong\u003e based on their speed and length, and the speed and flow of these types were analyzed over different time periods: \u003cstrong\u003emorning, noon, and night.\u003c/strong\u003e This clustering approach allowed separate analyses of speed and flow variations for each vehicle type across these time windows.\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003e\u003cstrong\u003eHistogram Analysis:\u003c/strong\u003e\u003cbr\u003eHistograms of vehicle speeds revealed most vehicles traveled between \u003cstrong\u003e50 km/h and 90 km/h,\u003c/strong\u003e indicating typical traffic flow patterns. Similarly, vehicle lengths exhibited two peaks corresponding to smaller vehicles (e.g., passenger cars at ~4.5 meters) and larger vehicles (e.g., trucks at ~9 meters).\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eScatter Plot Analysis:\u003c/strong\u003e\u003cbr\u003eScatter plots explored the relationship between speed and length. A \u003cstrong\u003emoderate negative correlation\u003c/strong\u003e was observed, indicating larger vehicles tend to travel at lower speeds.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eTemporal Variations:\u003c/strong\u003e\u003cbr\u003e\u0026nbsp;Time-series plots demonstrated how speed and flow varied across morning, noon, and night, highlighting distinct traffic dynamics during different periods of the day. These findings provided critical insights for traffic prediction models.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eIn the context of Traffic State Estimation (TSE) for intelligent transportation systems, the two figures represent important aspects of traffic data analysis related to speed and holding time at intersections. This plot shows how vehicle speed varies with time over a given period, likely measured at an intersection or a specific rural area. X-axis (timestamp): Represents time in seconds or milliseconds. The high values on this axis suggest that the data was collected over an extended duration. Y-axis (speed km/h): Displays the vehicle speed in kilometers per hour. The dense, overlapping orange lines suggest that this data might have been gathered from multiple vehicles over time or from different points on the same vehicle\u0026apos;s journey. The rapid fluctuations in speed may represent a highly dynamic traffic environment, such as an intersection where vehicles frequently stop, accelerate, or decelerate due to signal changes, pedestrian crossings, or congestion.\u003c/p\u003e\n\u003cp\u003eThe significant variations in speed, ranging from near zero to over 100 km/h, are typical of rural traffic scenarios, especially at intersections where stop-and-go patterns are common.\u003c/p\u003e\n\u003cp\u003eThis visualization can provide insights into the nonlinearities of traffic dynamics, especially when vehicles are accelerating or decelerating quickly. In the context of TSE, it highlights the challenge of accurately estimating traffic conditions given the variability in speeds over time, which may be influenced by factors like traffic signals, lane changes, or road obstructions.\u003c/p\u003e\n\u003cp\u003eThis plot represents the distribution of holding time (likely the time vehicles spend stopped or idling) at a certain location. X-axis (X): Represents the sequential index of vehicles or events, which could indicate individual vehicles or time slots during the data collection period. Y-axis (Holding time in seconds): Displays the duration for which vehicles were held (stationary) in seconds. The spikes in the plot indicate times when vehicles were held for longer durations. These could correspond to times when vehicles were stopped at traffic signals, pedestrian crossings, or traffic congestion. The general trend shows a large number of occurrences of low holding times (less than 1 second), but there are some instances where holding times are much longer (up to around 3.5 seconds). This indicates variability in how long vehicles are stationary, which is common in rural traffic where traffic flow is interrupted by various factors.\u003c/p\u003e\n\u003cp\u003eIn the context of TSE, holding time is crucial for understanding congestion at intersections or signalized points. High holding times often indicate bottlenecks in traffic flow, which are critical for optimizing traffic signal timings and reducing overall travel delays. Moreover, holding time data can be integrated with speed data to refine the accuracy of TSE models, particularly when addressing rural traffic network complexities.\u003c/p\u003e\n\u003cp\u003eBoth figures reflect the importance of multi-source data in traffic state estimation. For example, integrating speed data from GPS or loop detectors with holding time data can provide a more comprehensive view of traffic conditions, especially in complex environments like intersections. These figures also highlight the real-world noise and variability in traffic data, which sophisticated TSE models (such as Kalman Filters) must account for to provide accurate, actionable insights.\u003c/p\u003e\n\u003cp\u003eThese analyses can further be enhanced by considering advanced data fusion techniques, which combine different datasets (e.g., speed, holding time, GPS) to produce more reliable traffic state estimates. The fluctuations in speed and holding time underscore the need for nonlinear models capable of capturing the dynamic nature of traffic, particularly in rural environments.\u003c/p\u003e\n\u003cp\u003eAdditionally, a scatter plot was utilized to investigate the relationship between speed and vehicle length. Correlation analysis between these two variables indicated a moderate negative correlation, suggesting that larger vehicles generally travel at lower speeds, which aligns with real-world traffic expectations.\u003c/p\u003e\n\u003ch4\u003eMachine Learning Models for Classification\u003c/h4\u003e\n\u003cp\u003eTo classify vehicles based on their speed and length, a \u003cstrong\u003eLogistic Regression\u003c/strong\u003e model was trained using vehicle speed and length as independent variables and vehicle type as the dependent variable. The dataset was split into \u003cstrong\u003e80% training and 20% testing subsets.\u003c/strong\u003e Model performance was evaluated using classification metrics: precision, recall, F1-score, and accuracy.\u003c/p\u003e\n\u003cul type=\"disc\"\u003e\n \u003cli\u003e\u003cstrong\u003eModel Performance:\u003c/strong\u003e\n \u003cul type=\"circle\"\u003e\n \u003cli\u003e\u003cstrong\u003eSmall Vehicles:\u003c/strong\u003e Precision = 0.69, Recall = 0.66, F1-Score = 0.67.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eMedium Vehicles:\u003c/strong\u003e Precision = 0.95, Recall = 0.93, F1-Score = 0.94.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eLarge Vehicles:\u003c/strong\u003e Precision = 0.89, Recall = 0.95, F1-Score = 0.92.\u003cbr\u003eOverall, the model achieved a classification accuracy of \u003cstrong\u003e91.9%.\u003c/strong\u003e\u003c/li\u003e\n \u003c/ul\u003e\n \u003c/li\u003e\n\u003c/ul\u003e\n\u003ch4\u003eClustering Using K-Means\u003c/h4\u003e\n\u003cp\u003eIn addition to classification, the K-means clustering algorithm was applied to group vehicles based on their natural characteristics, such as speed and length. K-means is an unsupervised learning algorithm that groups data points into clusters without predefined labels. In this study, K-means was used to identify patterns in the data by grouping vehicles into three clusters, each corresponding to different vehicle types (e.g., small cars, medium vehicles, and large trucks).\u003c/p\u003e\n\u003cp\u003eK-means clustering was used to divide vehicles into three clusters based on speed and length. The algorithm identified natural groupings corresponding to small cars, medium-sized vehicles, and large trucks.\u003c/p\u003e\n\u003cul type=\"disc\"\u003e\n \u003cli\u003e\u003cstrong\u003eCluster Centers:\u003c/strong\u003e\n \u003cul type=\"circle\"\u003e\n \u003cli\u003e\u003cstrong\u003eCluster 1:\u003c/strong\u003e Average Speed = 59.3 km/h, Length = 4.4 m.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eCluster 2:\u003c/strong\u003e Average Speed = 63.1 km/h, Length = 9.0 m.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eCluster 3:\u003c/strong\u003e Average Speed = 93.5 km/h, Length = 4.4 m.\u003c/li\u003e\n \u003c/ul\u003e\n \u003c/li\u003e\n\u003c/ul\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eMetric/Category\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eCluster 1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eCluster 2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eCluster 3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eOverall\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eVehicle Type\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eSmall Cars\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eLarge Trucks\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eMedium Vehicles\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eAll Types\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eAverage Speed (km/h)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e59.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e63.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e93.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eAverage Length (m)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e4.4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e9.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e4.4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eNumber of Vehicles\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e1,200\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e800\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e1,000\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e3,000\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003ePrecision\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e69%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e89%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e95%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eRecall\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e66%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e95%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e93%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003eF1-Score\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e67%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e92%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e94%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 125px;\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eBy analyzing speed and flow for these clusters separately across different time periods (morning, noon, night), distinct traffic dynamics for each group were observed. These findings demonstrate the benefits of clustering in traffic state estimation and flow prediction.\u003c/p\u003e\n\u003cp\u003eThese clusters align with the general observation of different vehicle types, indicating that K-means successfully identified natural groupings in the dataset.\u003c/p\u003e"},{"header":"Results \u0026 Discussion","content":"\u003ch4\u003eClustering Analysis Results\u003c/h4\u003e\n\u003cp\u003eThe K-means clustering approach grouped the vehicles into three distinct clusters based on their speed and length. These clusters represent different types of vehicles typically seen at toll stations, providing important insights into traffic patterns.\u003c/p\u003e\n\u003cp\u003eThe clustering results are shown in Table 1, with the cluster centers representing the average speed and vehicle length for each group:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 1: Cluster Centers for Vehicle Classification Based on Speed and Length\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eCluster\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eAverage Speed (km/h)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eAverage Vehicle Length (m)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eInterpretation\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003e1\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e59.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e4.4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003ePassenger Cars\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003e2\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e63.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e9.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eMedium-Sized Vehicles (e.g., SUVs, vans)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003e3\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e93.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e4.4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eSmaller, High-Speed Vehicles\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eThe clustering analysis revealed three key insights:\u003c/p\u003e\n\u003col start=\"1\" type=\"1\"\u003e\n \u003cli\u003e\u003cstrong\u003eCluster 1\u003c/strong\u003e: Vehicles in this group had an average speed of 59.3 km/h and a length of 4.4 meters. These characteristics align with the profile of passenger cars, which are typically shorter in length and travel at moderate speeds.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eCluster 2\u003c/strong\u003e: This group had an average speed of 63.1 km/h and a length of 9.0 meters, suggesting that it consists of medium-sized vehicles such as SUVs and vans. These vehicles are larger than passenger cars but travel at similar speeds, possibly due to size and load restrictions.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eCluster 3\u003c/strong\u003e: Vehicles in this group were characterized by high speeds, with an average speed of 93.5 km/h, and a length of 4.4 meters, similar to Cluster 1. These vehicles are likely smaller, more agile vehicles, such as sports cars, that maintain high speeds through toll stations.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003e\u0026nbsp;\u003cimg src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAnQAAABSCAYAAADdA0nNAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAzCSURBVHhe7d3NryTTH8fxvr/92JswGcOahYV47pAYOwQJiQWJCBaCxdiIhdiYBWKBiISFGAuCnZGgZTzEwoI1YzJk/gD/wOj3UZ/+nVvTfW/ffrh3ztz3KylVderpVM30t7/nnOqxcX5sIEmSpGZNErrRaFQKJEmS1I7hcLg5oaNAki41xjdJl6rEt/9165IkSWqUCZ0kSVLjTOgkSZIaZ0InSZLUOBM6SZKkxpnQSZIkNW7hhO6aa64ZbGxsTCbW4+mnn56Us7yo06dPl3PcfffdXclsdX2mOX78+GT7xx9/3JVK0vyIHYkjTIl7xBfi1W7q12Ee1LOWuLlO63xm33///aZzs96/Xr6P5on7q34e/ectrdPCCd3vv/8+OHr0aFm++uqry3q89dZbg1OnTg2eeuqpsrwbvvrqq25pumPHjpV6StIiaFg+/PDDJe7xz3cyEXdIAF544YVur91z4sSJbmk+JE+7Xc91P7Nbb721zPm+wY8//liuR6zP89ntRDtI5s6cOdOtSeu31JBrkrU//vjjgtbPRx99tHQyd+TIkRIAvvzyy65EknYfvTwnT54siUIdjxKjaLxe7O66665u6f9oiFP/dVj3M6M3Lm655ZZyzkOHDnUlg8FDDz00+f5gzvp2VvU8qNteJPna35ZK6Phgppfugw8+KHPQIrrtttu6tf+kKztTPox1lzktGqYsp7weUkhZpmmtr/rY7brZ63rlXMxTxsT5JO1PxIO33367LL/yyitl3kfjlXiI/pBf4kpdznI9FEhPVrahXkbW67Jp6v2Ycm3iHA1vUE5Mq685K/YlftZ159jEzVmvw6zqmaGuU74LKEvvHNjGdwm9c+Be2bdf76jvPefc7nnU104Zx+TPkQnUI3XjGVCe80lrNW6NFN9++223tDMnTpygOVOm8YeolL366qtlHqyznfmpU6fK8rjV1m09f37cUpuUsU/mOXf2zX6UZ9s4oSzbuDbrTLk+x7HONet1jgXHso5cF5TnvOyb80lq06LxDYk1dSyZJfEtMaOOMcg6sSbnZV7Hr3obWE9sIgbWsSn7Y6v4mHrlnKivmdhdH8M9sJ57zvmzP9ett9dy/Vnba9s9M5ZTJ+bUA3X9I+fKM0HOl/PnPtg390h9t3se9bXr+2M55+R8HJvzZn9pnRLflv6VK93Y4w9PWf7kk0/KvI/318bXGjzwwAObWlV9DAmwL93ezPto0XGeK664YtISmybHPvHEE2XO8G8frSiGA8Yf0rJOV/34g1jKaTWyjZYX9zetLpLUl96oDP2lB6rfy09sypBgfyiQd8yyjR4m4lKGSw8fPlxi07Qen63i48GDB7ul2VLH1Dn30O9hGycrk31WYatnljrdcccdZc629PwtKsczTJvvpv6fAea5Nt8f/WNX+WyknVg6oUMSp3fffbckRDfddFNZr5EcEZTGraKyTpDqI1hthy5uAsC4VVTW6x9jLILgSJd4AuDff/9dAuO4ZWV3uaSSIAUv3S+i/3J8/a5X37SEILEo72WdO3euzPtWHR+x3TmImX2rfmbcN/efhGrRmLzIcTu5tj+C0F5aSUJHzxtI0h599NHS8qnR0uHDkMRvUXnJlmvsxFaJIr2LtNAypbVFUpekc9rLxJL2B+JZevK/+eabMu/Lu2azzNNY3QrXr+NUP8Zi0fi4nbw7thOrfmY0sOv7X0UvGJ0P81jHtaV1WElCx1/wfHj73fOoWy3TWnOxXeumbhmdPXu2W7pQ9qPHENOGTAk4JHMkbQkszJlo5WboleEFSftbfqWZVzFqrCcevfjii2We9cSiNHpjnviFNDC5bhIQGsjTkpGdxMcMJ9YSJ3OenGPRBHEVzyzPjQ6BlHNsfa87Me27imc57Xms6tqce9H6SjsybnEUy7w0jLwkOk39AmleHs1yXmLNxHrU5bxkmhdNmerzUD5OzMpLq/3zRX1syjmmLuOcYM65Ul7XSVJ7lo1v0Y8jTMS3Wh3vmIgzW5Wjjjcs1/oxjTqgX1bXrR8fkbL8YGDaNfsxMffWr3t9LNNWlnlm6G9jvV9Ppv5z4j5nnZdtKcu9T3se81y7/yzyfZH1PH9pXRLfNvjP+C/dYDQaDYbDIYuSdEkxvkm6VCW+rWTIVZIkSXvHhE6SJKlxJnSSJEmNM6GTJElqnAmdJElS40zoJEmSGmdCJ0mS1DgTOkmSpMaZ0EmSJDXOhE6SJKlxJnSSJEmNM6GTJElqnAmdJElS40zoJEmSGmdCJ0mS1DgTOkmSpMaZ0EmSJDXOhE6SJKlxJnSSJEmNM6GTJElqnAmdJElS4zbOj7EwGo1KgSRJktoxHA43J3TXX3992SDthQMHDnRL0moZ3yRdqn755ZeS0DnkKkmS1DgTOkmSpMaZ0EmSJDXOhE6SJKlxJnSSJEmNazqh+/TTTwfXXXddt7Z/3HfffYPnn3++W5MkSfvd0gndZZddNpnOnDmza4kGydxjjz3Wra1ffZ9MP/30U7dld/F8v/76625N0l6icZWYsJ033nhjUwwhXkZ9HpYlaaeWSugIPC+//PLgn3/+KdM999yza8nG/fffP3j//fe7tfXj/rjXO++8syzfeOON3Zb1I3nkywCvvfZaqYOkvUXj6qqrrirx4OTJk9uOFpw9e3YSK5kOHz5cyvlsnz59elLOcj7vkjSvpRI6krcbbrihWxsMfv31V5ONNTh+/Hi3JOliQO/ae++9N3jmmWfKOg28I0eOlJGDaSh/8MEHu7XNSPTquMkyZZK0E0sldLROjx49umnogB6kyLteTNOGGeohiP4wQ8qZ6uFNAmPK//rrr650s3qfnJdzsJ5WNOWsU77KYWLuL9dmyv1yr1y73l4H//pZZKKOHEPi/NJLL5WyWn1M/VwlrRf/MjvxL71sYP2HH37o1jajd59YOa0X7+abby7JYeIcy3UclaR5LJXQffHFF2V+7bXXXpBs5F0vghMBi6EEWp4MyyJDChlmYN+Uca6UJxCCBKge4p015MpwLEMg+Oyzz8qcFjT704vIeW6//fZyDnq//vzzz7LPskiquL/U7/HHHy/rXI+EjOtkO/fBBAI523Mcz4mJulNfvihy38FzRfb//PPPy7qk9aMxSY9c7dChQzNjCbEyn1XiW92YI14lziX2SdJOLZXQ0Tol+JC4gGCU3q6868U2AhaOHTtWAh6JD8lVep2Y8N13300CXcrZByQ9H3744WQd9XIfCRyJUFq9fTmWpClJ37JIqri/1J2ki3X+H5IEbOpDgoYrr7yyzHHu3LmyLR555JHyHs1WeK7PPvtsWeZYh2iki1d68oiLxL5pP+hKA68/WiFJ81gqoQuCVBI7kpi69Vm7/PLLu6X/ENg4LlMSKxKUupyJBI1evIMHD5Z95kHQzPtnJHb5n3OTYKalzDQr6VsE5+3XvR6WmYb6kPilHgzbcB5Je6//OgTxjQZZv9FFo6pumM3C55396tcxOJYGGvGCOLfK10Ak7Q9LJXT9lmR65Wa92xZJcPr7JYj1hy0IoEl26M2a17333luCI4Hz559/3pRYkTwSPElC++8B7hR1SxLbD/IE63nOXQ+5UGffoZEuDkm0MpGQ0TjsxynWeb1kXolHjEwwXBs0dIkBkrQTSyV0BJ289wYSF8rqX77WgenNN9+cDM/Se8awZxI15gS1DM/WySJDrfTQcSzHJUGinCA6658LIGCSYHLdeoiT5Cv1JnGap1W9FRIx6k0CSX3q1jXBerseOu6dFnq+MDIsG3lXh3ovk3hKWg0+08SjxBE+mzTmEr+2wjH16yLEH5K4sIde0iKWSuiSnGUogh9H8GMEkq8gGcl2kp30PNHqTe8Y2xgazTthv/32W0kEcxzv3iE9gPkRBoGQqZ8A1XgfjWHgfqAl0cr5SRIJ0ARa1pNk1ignCNf1ypTgyzm4f66Xba+//vokgCf5JPhzzawzFF0fkym9fvyAg+MJ9M8991ypA/uTODKxnHVJu4N4lDjC57mOQ/kVPYgn9ecadTziPLPipCTNa+P8GAuj0WjyjtmqENRIuAxOW0sCWSfCIBFMkrsfHDhwoFuSVmsd8U2SLgb8M0rD4XA1P4rQcp588skL3g0kyauHiSVJkmZZW0JH71w9NKjZ3nnnnTJkUw/LMAQ9z/s4kiRJax1ylXbCIVeti/FN0qXKIVdJkqRLhAmdJElS40zoJEmSGmdCJ0mS1DgTOkmSpMaZ0EmSJDXOhE6SJKlxJnSSJEmNM6GTJElqnAmdJElS40zoJEmSGmdCJ0mS1DgTOkmSpMaZ0EmSJDXOhE6SJKlxJnSSJEmNM6GTJElqnAmdJElS4zbOj7EwGo1KgSRJktoxHA4H/wK/7St9dwumWgAAAABJRU5ErkJggg==\" width=\"628\" height=\"82\"\u003e\u003c/p\u003e\n\u003cp\u003eThe combined findings from the clustering, correlation, and logistic regression models provide comprehensive insights into the traffic flow and vehicle classification at toll stations. The clustering analysis revealed distinct vehicle types, while the logistic regression model demonstrated strong predictive capabilities for real-time vehicle classification. Additionally, the correlation analysis confirmed the expected inverse relationship between vehicle speed and length, reinforcing the need for infrastructure solutions that account for different vehicle types.\u003c/p\u003e\n\u003cp\u003eThe clustering analysis successfully grouped vehicles based on their natural characteristics, offering insights into how vehicle types utilize toll stations. The identification of distinct vehicle clusters can inform toll station infrastructure planning, particularly for optimizing lane usage based on vehicle type.\u003c/p\u003e\n\u003ch4\u003eCorrelation Analysis\u003c/h4\u003e\n\u003cp\u003eA correlation analysis was conducted to explore the relationship between vehicle speed and length. Table 2 presents the correlation results.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 2: Correlation Between Vehicle Speed and Length\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eVariables\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eCorrelation Coefficient\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eSpeed vs. Length\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e-0.58\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eThe negative correlation coefficient of \u003cstrong\u003e-0.58\u003c/strong\u003e indicates a moderate inverse relationship between vehicle speed and length. This suggests that larger vehicles, such as trucks and commercial vehicles, tend to travel at slower speeds compared to smaller vehicles like passenger cars. The correlation is consistent with traffic behavior patterns, where larger, heavier vehicles typically reduce speed due to their size and load, especially at toll stations.\u003c/p\u003e\n\u003ch4\u003eExploratory Data Analysis (EDA)\u003c/h4\u003e\n\u003cp\u003eThe exploratory data analysis revealed patterns in the dataset related to vehicle behavior at toll stations. Two histograms were plotted to visualize the distribution of vehicle speeds and lengths.\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003e\u003cstrong\u003eVehicle Speed Distribution\u003c/strong\u003e: The histogram showed that most vehicles travel between \u003cstrong\u003e50 and 90 km/h\u003c/strong\u003e. However, a few outliers travel at significantly higher or lower speeds, possibly due to congestion, vehicle type, or driver behavior.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eVehicle Length Distribution\u003c/strong\u003e: The length distribution was bimodal, with peaks around \u003cstrong\u003e4.5 meters\u003c/strong\u003e and \u003cstrong\u003e9 meters\u003c/strong\u003e. This suggests that toll stations experience a mix of smaller passenger cars and larger commercial vehicles, with relatively few vehicles falling in between.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eThe combination of speed and length distributions supports the existence of distinct vehicle groups, which was further confirmed by the clustering results.\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"898\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMetric\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eDevice ID\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eDate\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 146px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eTimestamp\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCoilID\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRoadID\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSpeed (km/h)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eVehicle Length (m)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eVehicle Type\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 107px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eEntry Timestamp\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 86px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eOccupancy Time (s)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 50px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eHour\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMin\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003eNaN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e2020-01-18 00:02:36\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 146px;\"\u003e\n \u003cp\u003e1.579277e+09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e-3.658\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e-1.342\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e0.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 107px;\"\u003e\n \u003cp\u003e1.579277e+12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 86px;\"\u003e\n \u003cp\u003e-1.374\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 50px;\"\u003e\n \u003cp\u003e0.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e25th Percentile\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003eNaN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e2020-01-18 07:46:03\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 146px;\"\u003e\n \u003cp\u003e1.579305e+09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e-0.654\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e-0.346\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 107px;\"\u003e\n \u003cp\u003e1.579305e+12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 86px;\"\u003e\n \u003cp\u003e-0.729\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 50px;\"\u003e\n \u003cp\u003e7.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMedian (50%)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003eNaN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e2020-01-18 11:15:16.500\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 146px;\"\u003e\n \u003cp\u003e1.579317e+09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e0.027\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e-0.346\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 107px;\"\u003e\n \u003cp\u003e1.579317e+12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 86px;\"\u003e\n \u003cp\u003e-0.413\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 50px;\"\u003e\n \u003cp\u003e11.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e75th Percentile\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003eNaN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e2020-01-18 15:04:44.500\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 146px;\"\u003e\n \u003cp\u003e1.579331e+09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e0.723\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e1.417\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 107px;\"\u003e\n \u003cp\u003e1.579331e+12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 86px;\"\u003e\n \u003cp\u003e0.517\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 50px;\"\u003e\n \u003cp\u003e15.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMax\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003eNaN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003e2020-01-18 23:59:30\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 146px;\"\u003e\n \u003cp\u003e1.579363e+09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e2.945\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e1.417\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 107px;\"\u003e\n \u003cp\u003e1.579363e+12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 86px;\"\u003e\n \u003cp\u003e13.637\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 50px;\"\u003e\n \u003cp\u003e23.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 78px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eStd Dev\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003eNaN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 60px;\"\u003e\n \u003cp\u003eNaN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 146px;\"\u003e\n \u003cp\u003e1.736961e+04\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 59px;\"\u003e\n \u003cp\u003e0.4997\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 65px;\"\u003e\n \u003cp\u003e0.4997\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 58px;\"\u003e\n \u003cp\u003e1.0001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e1.0001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 64px;\"\u003e\n \u003cp\u003e0.5615\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 107px;\"\u003e\n \u003cp\u003e1.736959e+07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 86px;\"\u003e\n \u003cp\u003e1.0001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 50px;\"\u003e\n \u003cp\u003e4.825\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003ch4\u003eLogistic Regression Model for Vehicle Classification\u003c/h4\u003e\n\u003cp\u003eA logistic regression model was applied to predict vehicle types based on speed and length. The model achieved an overall accuracy of \u003cstrong\u003e91.9%\u003c/strong\u003e, with detailed performance metrics provided in Table 3.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 3: Performance Metrics for Logistic Regression Model\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eVehicle Type\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003ePrecision\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eRecall\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eF1-Score\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eSmall Vehicles\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.69\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.66\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.67\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eMedium Vehicles\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.95\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.93\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.94\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eLarge Vehicles\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.89\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.95\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.92\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eThe model performed exceptionally well for medium and large vehicles, with F1-scores of \u003cstrong\u003e0.94\u003c/strong\u003e and \u003cstrong\u003e0.92\u003c/strong\u003e, respectively. However, the model slightly underperformed for small vehicles, with an F1-score of \u003cstrong\u003e0.67\u003c/strong\u003e. This discrepancy may be due to the overlap in speed and length characteristics between small vehicles and other categories, making them harder to distinguish.\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"834\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 73px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMetric\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 93px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eTimestamp\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 68px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCoilID\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRoadID\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 71px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSpeed (km/h)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 103px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eVehicle Length (m)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eVehicle Type\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eEntry Timestamp\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 102px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eOccupancy Time (s)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 73px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCount\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 93px;\"\u003e\n \u003cp\u003e6,240\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 68px;\"\u003e\n \u003cp\u003e6,240\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e6,240\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 71px;\"\u003e\n \u003cp\u003e6,240\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 103px;\"\u003e\n \u003cp\u003e6,240\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e6,240\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003e6,240\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 102px;\"\u003e\n \u003cp\u003e6,240\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 73px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMean\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 93px;\"\u003e\n \u003cp\u003e1.579318e+09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 68px;\"\u003e\n \u003cp\u003e1.5179\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e1.5179\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 71px;\"\u003e\n \u003cp\u003e58.04\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 103px;\"\u003e\n \u003cp\u003e5.30\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e2.21\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003e1.579318e+12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 102px;\"\u003e\n \u003cp\u003e0.5077\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 73px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eStd Dev\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 93px;\"\u003e\n \u003cp\u003e1.736961e+04\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 68px;\"\u003e\n \u003cp\u003e0.4997\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e0.4997\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 71px;\"\u003e\n \u003cp\u003e13.89\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 103px;\"\u003e\n \u003cp\u003e2.61\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e0.5615\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003e1.736959e+07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 102px;\"\u003e\n \u003cp\u003e0.2560\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 73px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMin\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 93px;\"\u003e\n \u003cp\u003e1.579277e+09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 68px;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 71px;\"\u003e\n \u003cp\u003e7.24\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 103px;\"\u003e\n \u003cp\u003e1.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003e1.579277e+12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 102px;\"\u003e\n \u003cp\u003e0.1560\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 73px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e25th Percentile\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 93px;\"\u003e\n \u003cp\u003e1.579305e+09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 68px;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e1.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 71px;\"\u003e\n \u003cp\u003e48.96\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 103px;\"\u003e\n \u003cp\u003e4.40\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003e1.579305e+12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 102px;\"\u003e\n \u003cp\u003e0.3210\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 73px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMedian (50%)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 93px;\"\u003e\n \u003cp\u003e1.579317e+09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 68px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 71px;\"\u003e\n \u003cp\u003e58.41\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 103px;\"\u003e\n \u003cp\u003e4.40\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003e1.579317e+12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 102px;\"\u003e\n \u003cp\u003e0.4020\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 73px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e75th Percentile\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 93px;\"\u003e\n \u003cp\u003e1.579331e+09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 68px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 71px;\"\u003e\n \u003cp\u003e68.08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 103px;\"\u003e\n \u003cp\u003e9.00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e3.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003e1.579331e+12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 102px;\"\u003e\n \u003cp\u003e0.6400\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 73px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMax\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 93px;\"\u003e\n \u003cp\u003e1.579363e+09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 68px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 72px;\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 71px;\"\u003e\n \u003cp\u003e98.94\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 103px;\"\u003e\n \u003cp\u003e9.00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 114px;\"\u003e\n \u003cp\u003e3.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 138px;\"\u003e\n \u003cp\u003e1.579363e+12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 102px;\"\u003e\n \u003cp\u003e3.9990\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003eExplanation of the Analysis and Predictive Model Code\u003c/strong\u003e\u003c/p\u003e\n\u003ch4\u003e\u003cstrong\u003e1. Data Preprocessing\u003c/strong\u003e\u003c/h4\u003e\n\u003cp\u003eThe dataset is first loaded and inspected to understand its structure, including column names, data types, and missing values. Non-essential columns like LicensePlate are dropped to focus on relevant attributes. Missing values are handled using forward filling (ffill) to ensure data continuity. The Date column is converted to a datetime object for time-based analysis, and an additional Hour column is extracted to study hourly patterns. Categorical data, such as VehicleType, is encoded into numeric values using LabelEncoder. Numerical columns like Speed_kmh and VehicleLength_m are scaled using StandardScaler for normalization, which improves clustering and predictive model performance.\u003c/p\u003e\n\u003ch4\u003e\u003cstrong\u003e2. Exploratory Data Analysis (EDA)\u003c/strong\u003e\u003c/h4\u003e\n\u003cp\u003eThe EDA phase provides insights into the dataset\u0026apos;s relationships and distributions:\u003c/p\u003e\n\u003cul type=\"disc\"\u003e\n \u003cli\u003e\u003cstrong\u003ePair Plots\u003c/strong\u003e: Visualize the pairwise relationships between features, helping to identify correlations and clusters.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eCorrelation Heatmap\u003c/strong\u003e: Highlights linear correlations among features like speed, vehicle length, and occupancy time. Strong correlations guide feature selection for predictive modeling.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eDescriptive Statistics\u003c/strong\u003e: A summary table is generated to show key statistical metrics (e.g., mean, median, standard deviation) for all numerical features. This step ensures data integrity and identifies outliers.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch4\u003e\u003cstrong\u003e3. Clustering Using KMeans\u003c/strong\u003e\u003c/h4\u003e\n\u003cp\u003eKMeans clustering is applied to group vehicles into clusters based on features like Speed_kmh, VehicleLength_m, and OccupancyTime_s. The number of clusters is set to three, representing distinct categories of vehicles with similar characteristics.\u003c/p\u003e\n\u003cp\u003eClusters are visualized using a scatter plot, where colors represent different groups. This helps understand how vehicles are distributed in terms of speed and size, providing insights into traffic patterns or classifications (e.g., small cars, heavy vehicles).\u003c/p\u003e\n\u003ch4\u003e\u003cstrong\u003e4. Artificial Neural Network (ANN) for Speed Prediction\u003c/strong\u003e\u003c/h4\u003e\n\u003cp\u003eTo predict vehicle speed:\u003c/p\u003e\n\u003col start=\"1\" type=\"1\"\u003e\n \u003cli\u003e\u003cstrong\u003eFeature Selection\u003c/strong\u003e: The input features (VehicleLength_m, OccupancyTime_s, VehicleType, RoadID, Cluster) are selected based on their relevance to speed prediction.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eData Splitting\u003c/strong\u003e: The dataset is split into training (80%) and testing (20%) subsets to evaluate model performance.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eANN Model\u003c/strong\u003e: A Multi-Layer Perceptron (MLP) regressor is built with three hidden layers:\u003cul type=\"circle\"\u003e\n \u003cli\u003eLayer sizes: 64, 32, and 16 neurons.\u003c/li\u003e\n \u003cli\u003eActivation function: ReLU (Rectified Linear Unit).\u003c/li\u003e\n \u003cli\u003eOptimizer: Adam for efficient weight updates.\u003c/li\u003e\n \u003cli\u003eMaximum iterations: 500 for convergence.\u003c/li\u003e\n \u003c/ul\u003e\n \u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eThe model is trained on the training set and validated on the test set. Predictions are compared against actual speeds using performance metrics:\u003c/p\u003e\n\u003cul type=\"disc\"\u003e\n \u003cli\u003e\u003cstrong\u003eMean Squared Error (MSE)\u003c/strong\u003e: Measures prediction error magnitude.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eR2 Score\u003c/strong\u003e: Indicates how well the model explains variance in the data (higher is better).\u003c/li\u003e\n\u003c/ul\u003e\u003ch4\u003eDiscussion\u003c/h4\u003e\n\u003cp\u003eThe clustering analysis and logistic regression model provide comprehensive insights into traffic flow and vehicle classification at toll stations. The results show that machine learning techniques can effectively predict vehicle types based on speed and length, offering potential applications in optimizing toll station operations. For instance, cluster analysis can guide lane assignment based on vehicle type, while the logistic regression model can be used for real-time vehicle classification, improving toll collection efficiency. The correlation between vehicle speed and length confirms well-established traffic patterns, where larger vehicles tend to travel at slower speeds. This finding supports previous research in traffic flow analysis, reinforcing the need for customized infrastructure solutions at toll stations to accommodate different vehicle types. Overall, the combination of clustering, correlation analysis, and machine learning provides a robust framework for analyzing and managing traffic at toll stations, contributing to improved traffic flow and enhanced decision-making in infrastructure planning.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eIn this research, we effectively utilized machine learning techniques to categorize vehicles at toll stations based on their speed and length, providing valuable insights for traffic flow management. The K-means clustering algorithm identified three distinct vehicle categories: small passenger cars, medium-sized vehicles (like SUVs and vans), and larger vehicles (such as trucks or commercial vehicles). Correlation analysis revealed a moderate inverse relationship between vehicle speed and length, consistent with existing traffic studies indicating that larger vehicles generally travel at slower speeds. Furthermore, the logistic regression model achieved an impressive accuracy rate of 91.9% in predicting vehicle types, demonstrating particularly strong performance in identifying medium and large vehicles. This model can be implemented for real-time vehicle classification at toll stations, aiding in the optimization of lanes and overall infrastructure management. The results of this study underscore the significance of data-driven methodologies in enhancing traffic operations and planning, ultimately contributing to more efficient toll station management and better resource allocation.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eFunding Acquisition\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eKKF0202302375 \"Development and Testing of an Intelligent Integrated Information Module for Monitoring and Identifying Risks on General National and Provincial Trunk Roads\"\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAl-Selwi, H. F., Aziz, A. A., Abas, F. Bin, Kayani, A., \u0026amp; Noor, N. M. (2023). Attention Based Spatial-Temporal GCN with Kalman filter for Traffic Flow Prediction. \u003cem\u003eInternational Journal of Technology\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(6). https://doi.org/10.14716/ijtech.v14i6.6646\u003c/li\u003e\n\u003cli\u003eBao, J., Kantarcioglu, M., Vorobeychik, Y., \u0026amp; Kamhoua, C. (2023). IoTFlowGenerator: Crafting Synthetic IoT Device Traffic Flows for Cyber Deception. \u003cem\u003eProceedings of the International Florida Artificial Intelligence Research Society Conference, FLAIRS\u003c/em\u003e, \u003cem\u003e36\u003c/em\u003e. https://doi.org/10.32473/flairs.36.133376\u003c/li\u003e\n\u003cli\u003eBekiaris-Liberis, N., Roncoli, C., \u0026amp; Papageorgiou, M. (2016). Highway traffic state estimation with mixed connected and conventional vehicles. \u003cem\u003eIEEE Transactions on Intelligent Transportation Systems\u003c/em\u003e, \u003cem\u003e17\u003c/em\u003e(12). https://doi.org/10.1109/TITS.2016.2552639\u003c/li\u003e\n\u003cli\u003eChin, A. T. H., \u0026amp; Tay, J. H. (2001). Developments in air transport: Implications on investment decisions, profitability and survival of Asian airlines. \u003cem\u003eJournal of Air Transport Management\u003c/em\u003e, \u003cem\u003e7\u003c/em\u003e(5). https://doi.org/10.1016/S0969-6997(01)00026-6\u003c/li\u003e\n\u003cli\u003eGarg, P., Khaparde, P., Patle, K. S., Bhaliya, C., Kumar, A., Joshi, M. V., \u0026amp; Palaparthy, V. S. (2023). Environmental and Soil Parameters for Germination of Leaf Spot Disease in the Groundnut Plant Using IoT-Enabled Sensor System. \u003cem\u003eIEEE Sensors Letters\u003c/em\u003e, \u003cem\u003e7\u003c/em\u003e(12). https://doi.org/10.1109/LSENS.2023.3330923\u003c/li\u003e\n\u003cli\u003eGeetha, A., \u0026amp; Subramani, C. (2019). Development of driving cycle under real world traffic conditions: A case study. \u003cem\u003eInternational Journal of Electrical and Computer Engineering\u003c/em\u003e, \u003cem\u003e9\u003c/em\u003e(6). https://doi.org/10.11591/ijece.v9i6.pp4798-4803\u003c/li\u003e\n\u003cli\u003eGhiassi, M., \u0026amp; Lee, S. (2018). A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach. \u003cem\u003eExpert Systems with Applications\u003c/em\u003e, \u003cem\u003e106\u003c/em\u003e. https://doi.org/10.1016/j.eswa.2018.04.006\u003c/li\u003e\n\u003cli\u003eGrumert, E. F., \u0026amp; Tapani, A. (2018). Traffic State Estimation Using Connected Vehicles and Stationary Detectors. \u003cem\u003eJournal of Advanced Transportation\u003c/em\u003e, \u003cem\u003e2018\u003c/em\u003e. https://doi.org/10.1155/2018/4106086\u003c/li\u003e\n\u003cli\u003eGunda, S. K., \u0026amp; Dhanikonda, V. S. S. S. S. (2021). Discrimination of transformer inrush currents and internal fault currents using extended kalman filter algorithm (Ekf). \u003cem\u003eEnergies\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(19). https://doi.org/10.3390/en14196020\u003c/li\u003e\n\u003cli\u003eHuang, A. J., \u0026amp; Agarwal, S. (2022). Physics-Informed Deep Learning for Traffic State Estimation: Illustrations with LWR and CTM Models. \u003cem\u003eIEEE Open Journal of Intelligent Transportation Systems\u003c/em\u003e, \u003cem\u003e3\u003c/em\u003e. https://doi.org/10.1109/OJITS.2022.3182925\u003c/li\u003e\n\u003cli\u003eHuang, J., Song, G., He, F., \u0026amp; Tan, Z. (2023). Energetic Impacts of Autonomous Vehicles in Real-World Traffic Conditions from Nine Open-Source Datasets. \u003cem\u003eIEEE Transactions on Intelligent Transportation Systems\u003c/em\u003e, \u003cem\u003e24\u003c/em\u003e(9). https://doi.org/10.1109/TITS.2023.3272914\u003c/li\u003e\n\u003cli\u003eIbarra-Espinosa, S., Ynoue, R., Giannotti, M., Ropkins, K., \u0026amp; de Freitas, E. D. (2019). Generating traffic flow and speed regional model data using internet GPS vehicle records. \u003cem\u003eMethodsX\u003c/em\u003e, \u003cem\u003e6\u003c/em\u003e. https://doi.org/10.1016/j.mex.2019.08.018\u003c/li\u003e\n\u003cli\u003eJin, W.-L. (2021). The Cell Transmission Model (CTM). In \u003cem\u003eIntroduction to Network Traffic Flow Theory\u003c/em\u003e. https://doi.org/10.1016/b978-0-12-815840-1.00017-5\u003c/li\u003e\n\u003cli\u003eKhan, M. A., Ghazal, T. M., Lee, S. W., \u0026amp; Rehman, A. (2022). Data fusion-based machine learning architecture for intrusion detection. \u003cem\u003eComputers, Materials and Continua\u003c/em\u003e, \u003cem\u003e70\u003c/em\u003e(2). https://doi.org/10.32604/cmc.2022.020173\u003c/li\u003e\n\u003cli\u003eKim, T., \u0026amp; Park, T. H. (2020). Extended kalman filter (Ekf) design for vehicle position tracking using reliability function of radar and lidar. \u003cem\u003eSensors (Switzerland)\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e(15). https://doi.org/10.3390/s20154126\u003c/li\u003e\n\u003cli\u003eKore, A., \u0026amp; Patil, S. (2019). Internet of things (Iot) enabled wireless sensor networks security challenges and current solutions. \u003cem\u003eInternational Journal of Innovative Technology and Exploring Engineering\u003c/em\u003e, \u003cem\u003e9\u003c/em\u003e(1). https://doi.org/10.35940/ijitee.A4023.119119\u003c/li\u003e\n\u003cli\u003eLamberty, A., \u0026amp; Kreyenschmidt, J. (2022). Ambient Parameter Monitoring in Fresh Fruit and Vegetable Supply Chains Using Internet of Things‐Enabled Sensor and Communication Technology. \u003cem\u003eFoods\u003c/em\u003e, \u003cem\u003e11\u003c/em\u003e(12). https://doi.org/10.3390/foods11121777\u003c/li\u003e\n\u003cli\u003eLu, N., Cheng, N., Zhang, N., Shen, X., \u0026amp; Mark, J. W. (2014). Connected vehicles: Solutions and challenges. In \u003cem\u003eIEEE Internet of Things Journal\u003c/em\u003e (Vol. 1, Issue 4). https://doi.org/10.1109/JIOT.2014.2327587\u003c/li\u003e\n\u003cli\u003eMuhammed T, S., \u0026amp; Mathew, S. K. (2022). The disaster of misinformation: a review of research in social media. In \u003cem\u003eInternational Journal of Data Science and Analytics\u003c/em\u003e (Vol. 13, Issue 4). https://doi.org/10.1007/s41060-022-00311-6\u003c/li\u003e\n\u003cli\u003eNaqvi, F. H., Ali, S., Haseeb, B., Khan, N., Qureshi, S., Sajid, T., \u0026amp; Aslam, M. I. (2023). Design and Implementation of Smart Contract in Supply Chain Management Using Blockchain and Internet of Things \u0026dagger;. \u003cem\u003eEngineering Proceedings\u003c/em\u003e, \u003cem\u003e32\u003c/em\u003e(1). https://doi.org/10.3390/engproc2023032015\u003c/li\u003e\n\u003cli\u003eNing, H., Farha, F., Mohammad, Z. N., \u0026amp; Daneshmand, M. (2020). A Survey and Tutorial on \u0026ldquo;Connection Exploding Meets Efficient Communication\u0026rdquo; in the Internet of Things. \u003cem\u003eIEEE Internet of Things Journal\u003c/em\u003e, \u003cem\u003e7\u003c/em\u003e(11). https://doi.org/10.1109/JIOT.2020.2996615\u003c/li\u003e\n\u003cli\u003ePang, M., \u0026amp; Yang, M. (2020). Coordinated control of urban expressway integrating adjacent signalized intersections based on pinning synchronization of complex networks. \u003cem\u003eTransportation Research Part C: Emerging Technologies\u003c/em\u003e, \u003cem\u003e116\u003c/em\u003e. https://doi.org/10.1016/j.trc.2020.102645\u003c/li\u003e\n\u003cli\u003ePrazeres, N., Costa, R. L. de C., Santos, L., \u0026amp; Rabad\u0026atilde;o, C. (2023). Engineering the application of machine learning in an IDS based on IoT traffic flow. \u003cem\u003eIntelligent Systems with Applications\u003c/em\u003e, \u003cem\u003e17\u003c/em\u003e. https://doi.org/10.1016/j.iswa.2023.200189\u003c/li\u003e\n\u003cli\u003eQi, X., Wu, G., Boriboonsomsin, K., \u0026amp; Barth, M. J. (2018). Data-driven decomposition analysis and estimation of link-level electric vehicle energy consumption under real-world traffic conditions. \u003cem\u003eTransportation Research Part D: Transport and Environment\u003c/em\u003e, \u003cem\u003e64\u003c/em\u003e. https://doi.org/10.1016/j.trd.2017.08.008\u003c/li\u003e\n\u003cli\u003eRafique, A. A., Al-Rasheed, A., Ksibi, A., Ayadi, M., Jalal, A., Alnowaiser, K., Meshref, H., Shorfuzzaman, M., Gochoo, M., \u0026amp; Park, J. (2023). Smart Traffic Monitoring Through Pyramid Pooling Vehicle Detection and Filter-Based Tracking on Aerial Images. \u003cem\u003eIEEE Access\u003c/em\u003e, \u003cem\u003e11\u003c/em\u003e. https://doi.org/10.1109/ACCESS.2023.3234281\u003c/li\u003e\n\u003cli\u003eRambabu, K., \u0026amp; Venkatram, N. (2018). Traffic flow features as metrics (TFFM): Detection of application layer level DDOS attack scope of IOT traffic flows. \u003cem\u003eInternational Journal of Engineering and Technology(UAE)\u003c/em\u003e, \u003cem\u003e7\u003c/em\u003e(2). https://doi.org/10.14419/ijet.v7i2.7.10293\u003c/li\u003e\n\u003cli\u003eTuna, E., \u0026amp; Soysal, A. (2021). LSTM and GRU based traffic prediction using live network data. \u003cem\u003eSIU 2021 - 29th IEEE Conference on Signal Processing and Communications Applications, Proceedings\u003c/em\u003e. https://doi.org/10.1109/SIU53274.2021.9478011\u003c/li\u003e\n\u003cli\u003eWang, A., Xu, J., Zhang, M., Zhai, Z., Song, G., \u0026amp; Hatzopoulou, M. (2022). Emissions and fuel consumption of a hybrid electric vehicle in real-world metropolitan traffic conditions. \u003cem\u003eApplied Energy\u003c/em\u003e, \u003cem\u003e306\u003c/em\u003e. https://doi.org/10.1016/j.apenergy.2021.118077\u003c/li\u003e\n\u003cli\u003eWang, B., Han, Y., Wang, S., Tian, D., Cai, M., Liu, M., \u0026amp; Wang, L. (2022). A Review of Intelligent Connected Vehicle Cooperative Driving Development. In \u003cem\u003eMathematics\u003c/em\u003e (Vol. 10, Issue 19). https://doi.org/10.3390/math10193635\u003c/li\u003e\n\u003cli\u003eWang, Y., Zhao, M., Yu, X., Hu, Y., Zheng, P., Hua, W., Zhang, L., Hu, S., \u0026amp; Guo, J. (2022). Real-time joint traffic state and model parameter estimation on freeways with fixed sensors and connected vehicles: State-of-the-art overview, methods, and case studies. \u003cem\u003eTransportation Research Part C: Emerging Technologies\u003c/em\u003e, \u003cem\u003e134\u003c/em\u003e. https://doi.org/10.1016/j.trc.2021.103444\u003c/li\u003e\n\u003cli\u003eWilliams, B., Onsman, A., \u0026amp; Brown, T. (2010). Exploratory factor analysis: A five-step guide for novices. \u003cem\u003eJournal of Emergency Primary Health Care\u003c/em\u003e, \u003cem\u003e8\u003c/em\u003e(3). https://doi.org/10.33151/ajp.8.3.93\u003c/li\u003e\n\u003cli\u003eXing, J., Wu, W., Cheng, Q., \u0026amp; Liu, R. (2022). Traffic state estimation of urban road networks by multi-source data fusion: Review and new insights. In \u003cem\u003ePhysica A: Statistical Mechanics and its Applications\u003c/em\u003e (Vol. 595). https://doi.org/10.1016/j.physa.2022.127079\u003c/li\u003e\n\u003cli\u003eYang, H., Du, L., Zhang, G., \u0026amp; Ma, T. (2023). A Traffic Flow Dependency and Dynamics based Deep Learning Aided Approach for Network-Wide Traffic Speed Propagation Prediction. \u003cem\u003eTransportation Research Part B: Methodological\u003c/em\u003e, \u003cem\u003e167\u003c/em\u003e. https://doi.org/10.1016/j.trb.2022.11.009\u003c/li\u003e\n\u003cli\u003eYang, N., Yang, L., Du, X., Guo, X., Meng, F., \u0026amp; Zhang, Y. (2023). Blockchain based trusted execution environment architecture analysis for multi - source data fusion scenario. \u003cem\u003eJournal of Cloud Computing\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e(1). https://doi.org/10.1186/s13677-023-00494-8\u003c/li\u003e\n\u003cli\u003eYin, R., Li, K., \u0026amp; Yu, J. (2007). Traffic forecast for visitiors in World Expo 2010 Shanghai Arena. \u003cem\u003eTongji Daxue Xuebao/Journal of Tongji University\u003c/em\u003e, \u003cem\u003e35\u003c/em\u003e(8).\u003c/li\u003e\n\u003cli\u003eYokoya, Y. (2004). Dynamics of traffic flow with real-time traffic information. \u003cem\u003ePhysical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics\u003c/em\u003e, \u003cem\u003e69\u003c/em\u003e(1). https://doi.org/10.1103/PhysRevE.69.016121\u003c/li\u003e\n\u003cli\u003eZhang, J., Xiao, W., Coifman, B., \u0026amp; Mills, J. P. (2020). Vehicle Tracking and Speed Estimation from Roadside Lidar. \u003cem\u003eIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e. https://doi.org/10.1109/JSTARS.2020.3024921\u003c/li\u003e\n\u003cli\u003eZhang, Z., Yang, X. T., \u0026amp; Yang, H. (2023). A review of hybrid physics-based machine learning approaches in traffic state estimation. In \u003cem\u003eIntelligent Transportation Infrastructure\u003c/em\u003e (Vol. 2). https://doi.org/10.1093/iti/liad002\u003c/li\u003e\n\u003cli\u003eZhou, D., Fang, J., Yan, F., Zhao, T., Zhang, F., Yang, R., Ma, Y., \u0026amp; Wang, L. (2018). Simulating LIDAR Point Cloud for Autonomous Driving using Real-world Scenes and Traffic Flows. \u003cem\u003eArXiv Preprint ArXiv:1811.07112\u003c/em\u003e.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-5927838/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5927838/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eTraffic state estimation (TSE) is essential for enhancing transportation systems by providing critical, real-time data on road conditions to support decision-making and optimize network performance. Traditional TSE methods have predominantly focused on highways, relying on single-source data inputs like loop detectors or GPS data, which may limit adaptability in diverse traffic scenarios. However, the integration of multi-source data spanning loop detectors, GPS, and Bluetooth has opened new pathways for improved accuracy and responsiveness in TSE models, particularly within rural arterial networks and at complex intersections. This review analyzes the progression of TSE methodologies, focusing on model-based techniques such as the Kalman Filter (KF), Sliding Kalman Filter (SKF), and cell transmission models. By examining the combined use of varied data inputs, this review underscores the benefits of multi-source fusion in accurately capturing dynamic traffic conditions in rural settings. Key challenges, including non-linear traffic flows, inherent data noise, and the limitations of current validation methods, are discussed. Future research directions are identified, highlighting the need for adaptable algorithms that can effectively manage the complex, variable datasets characteristic of rural traffic environments.\u003c/p\u003e","manuscriptTitle":"Multi-Source Traffic State Estimation: Exploring Advanced Filtering Algorithms for Rural Arterial Networks","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-02-06 12:52:43","doi":"10.21203/rs.3.rs-5927838/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c72d3c6a-9268-46b0-9ea4-6a97e73cb875","owner":[],"postedDate":"February 6th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-04-28T01:53:06+00:00","versionOfRecord":[],"versionCreatedAt":"2025-02-06 12:52:43","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5927838","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5927838","identity":"rs-5927838","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00