A Deep Hybrid CNN–ViT Architecture Incorporating Advanced 3D Features for the Estimation of Visibility and Runway Visual Range | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Deep Hybrid CNN–ViT Architecture Incorporating Advanced 3D Features for the Estimation of Visibility and Runway Visual Range Anand Shankar, Bikash Chandra Sahana This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8678337/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 4 You are reading this latest preprint version Abstract Estimating visibility poses significant issues for transportation safety and operational decision-making, especially in severe weather circumstances where image-based evaluation becomes unreliable. Conventional deep learning (DL) models demonstrate limited feature extraction capabilities from compromised images, while physics-based methods require predefined parameters and exhibit inadequate generalization across diverse atmospheric conditions. This study introduces a hybrid architecture that amalgamates various information sources for the continuous assessment of visibility and runway visual range (RVR) from individual images. The proposed architecture includes a three-dimensional feature matrix—the DDT matrix—encoding dark channel, depth, and transmittance components based on atmospheric scattering theory. Physically informed features are combined with learned representations obtained from Convolutional Neural Networks (CNNs) for local degradation pattern identification and Vision Transformers (ViT) for global contextual modelling through self-attention mechanisms. Meteorological factors such as temperature, winds, and atmospheric pressure are integrated to furnish environmental context. A random forest regressor executes multimodal fusion and final estimation from these diverse feature streams. The quantitative assessment of three datasets—Visibility Image Dataset I (daytime), Dataset II (night-time), and Dataset III (mixed climatic conditions)—results in a Root Mean Squared Error (RMSE) of 117 and a Mean Absolute Error (MAE) of 68.81. This indicates a 22% decrease in error relative to single physical feature methodologies (RMSE ≈ 150). Ablation experiments illustrate the impact of each component on total performance. The approach overcomes shortcomings in current methodologies by integrating local and global feature extraction, including explicit physical models with learned representations, and facilitating continuous regression instead of discrete classification. Cross-dataset validation demonstrates consistent performance across several environmental contexts, encompassing both urban and rural environments with differing availability of reference objects. The findings indicate practical usefulness for aviation safety systems, transportation management infrastructure, and atmospheric monitoring networks that necessitate dependable real-time visibility evaluation under adverse meteorological situations. Transformers Deep learning Feature extraction Multi-parameter fusion Visibility estimation and runway visual range Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 1 Introduction Visibility is the maximum distance at which the human eye can distinguish an object [ 1 ],[ 2 ]. The atmospheric extinction coefficient, a key indicator of air clarity, directly impacts visibility. RVR is the distance a pilot on the runway centreline can see the runway surface markings or lights identifying the runway centreline [ 3 , 4 ]. RVR is crucial for aviation operations in low-visibility conditions like fog, rain, etc. Accurate estimation of visibility and RVR is critical for guaranteeing safe and efficient aviation services, including take off, landing, and ground movement. In addition, reliable visibility estimation is essential for marine navigation, where it aids in collision avoidance and route planning, and for road traffic, where it enhances driving safety and helps in traffic management under adverse weather conditions. Statistics indicate that the likelihood of traffic accidents is significantly higher on foggy days compared to clear ones [ 5 ]. Accurate estimation of visibility and RVR is challenging in operational weather services but has become crucial for aviation services and overall transportation safety [ 6 – 8 ]. Initially, researchers typically used both instrument-based and manual observational methods to estimate visibility. Visibility-estimating equipment, although expensive, utilizes optical components for detection, whereas visual methods are still susceptible to human error caused by subjective influences. Recognizing the potential of image processing and computer vision, researchers began using these techniques to estimate visibility. Image-based visibility estimation approaches have evolved from physical models [ 9 , 10 ]—which rely on the atmospheric scattering model and require scene-specific parameters such as depth and distance—to advanced deep learning (DL) methods [ 11 ] that learn direct mappings from images to visibility or RVR values. However, current CNN-based visibility models fail to extract visibility-related discriminative features explicitly, limiting their estimating capability. In contrast, other domains have successfully combined CNNs, Vision Transformers (ViT), and multi-stream feature fusion to capture spatial and contextual relationships more effectively [ 12 – 14 ]. Motivated by these advances, this study introduces an innovative end-to-end framework for visibility and RVR estimation that fuses engineered meteorological inputs, deep features from CNNs and ViT, and atmospheric model principles to improve estimation accuracy and reliability. Nevertheless, the proposed CNN-Transformer hybrid remains an optimal and understandable design choice for robust, data-driven estimation of visibility and RVR. The combination of CNN and Transformer, as depicted in Fig. 1 , is especially advantageous to the proposed architecture since each has strengths that complement each other when it comes to feature representation. CNNs excel at detecting minute spatial details such as textures, edges, and variations in haze intensity, all of which contribute to the identification of degradation patterns in visibility. Transformers, on the other hand, are exceptionally adept at learning global contextual dependencies through self-attention mechanisms. The following lets the model grasp interactions throughout the complete scene and depth-dependent scattering effects, which are critical for appropriately estimating visibility in varied situations. Thereby, combining CNNs and transformers lets the network use both fine-grained local features and global contextual reasoning at the same time, which makes it more robust and able to generalize. The framework centres on a DDT matrix. These matrices, derived from scattering models, dark channel priors, and depth data, offer a holistic measure of visibility and RVR. For accurate visibility and RVR estimation, the DDT matrix includes critical engineered properties A machine learning (ML) block is used to fuse these parameters effectively. This involves using a fully connected (FC) layer and feature combination before generating the final visibility and RVR estimate. Three unique visibility datasets were created to thoroughly assess the suggested technique. The datasets comprise real-time photographs obtained at Patna Airport under varying illumination conditions: daytime (Dataset I), night-time (Dataset II), and a combination of both (Dataset III). Each dataset is labelled with corresponding standard visibility and RVR values obtained from instrument-based measurements using a forward scatterometer (FSM). The test and validation demonstrate that our strategy surpasses other well-known DL approaches [ 15 – 18 ]. This research leverages advancements in CNNs, transformers, and 3D feature extraction techniques, along with the widespread use of surveillance systems (e.g., CCTV cameras). This demonstrates that image-based visibility estimation algorithms are both practical and applicable to real-world scenarios, including land-based, maritime, and critical aviation services. This research presents multiple significant contributions: It generates distinct real-time image datasets under different illumination circumstances to estimate visibility assessment techniques for aviation services. The proposed method utilizes real-time CCTV and FSM data from Patna Airport, markedly enhancing the precision of visibility and RVR estimation compared to conventional techniques. The model is entirely automated and hardware-agnostic and eliminates the need for manual feature extraction, guaranteeing consistent performance. A novel 3D multi-feature stream—comprising transmittance, dark channel, and depth matrices—is introduced. The study unfolds as follows: Section 1 introduces the study, Section 2 surveys related work, Section 3 outlines the materials, Section 4 explains the methods, Section 5 shares the results, Section 6 interprets the findings, and Section 7 concludes. 2 Related Works The first part of this section provides a concise overview of image-based visibility estimating methods based on physical models, while the second part discusses DL approaches. 2.1 Visibility via Physics-based Models Various image-based visibility estimation methods have been proposed. The Dark Channel Prior (DCP) theory [ 19 ] utilizes haze effects and dark channel features to assess visibility, employing techniques such as identifying vanishing points and averaging pixel distances. Even if it works, guided filtering [ 20 ] enhances DCP accuracy by refining transmittance maps through noise reduction and edge information preservation, which improves the quality of the maps. There is also the option of using local contrast-based approaches [ 21 ]; however, these tend to be less accurate in situations with complicated lighting or rich texture. Calibration methods for cameras [ 22 ]that use nonlinear least squares fitting to correlate picture features with real-world distances provide accurate estimates; nevertheless, these methods necessitate very specific image conditions and complex parameter tweaking. In visibility estimation, transmittance, dark channel, and dark matrix are often considered together because they capture complementary aspects of atmospheric scattering and scene depth. The transmittance map shows how much light reaches the camera without being scattered, which is directly related to how the scene can be seen. The dark channel finds areas that are affected by haze by using the DCP assumption that at least one colour channel in a local patch has low intensity even though there is no haze. This indicator provides a qualitative clue about how much haze is present. The dark matrix improves this representation even more by keeping spatial coherence and texture consistency, which cuts down on artefacts caused by changes in light or surface. By combining these three physically interpreted and visually robust estimates of visibility, the overall estimate becomes more accurate and reliable across a wider range of lighting and atmospheric conditions. 2.2 Visibility via DL Models Studies [ 23 – 25 ] have applied CNNs for single-image visibility estimation, but missing domain-specific features have hampered their accuracy. Integrating engineered features like brightness improved results [ 26 , 27 ], though such features lacked specificity to visibility. Despite usingstructural similarity (SSIM) across pairs of hazy and clear images, HazDesNet[ 28 ] ran into complications with scalability because of data limitations. Despite limitations in the dataset and the model's architecture, a CNN model trained on Korean CCTV data attained 84% accuracy [ 15 ]. Complicating matters further, a hybrid DCNN-SVM model [ 29 ] only achieved limited success. TVRNet[ 30 ] provided a highly effective trainable end-to-end CNN model for estimating fog density, while a PCA-DBN model [ 31 ]achieved 79% accuracy in predicting visibility trends. Webcam-based estimation was made possible by VisNet[ 32 ]connecting three CNN streams; however, errors were introduced by manual sub region selection [ 33 , 34 ]. While DL approaches show more accuracy and resilience, classic methods like vanishing points and DCP have their uses, as shown in Table 1 . Hence, this study focuses on DL-based visibility estimation. Table 1 Overview and comparison of visibility estimation strategies. Strategy Technique of Analysis Major Achievements Limitations/Setbacks Vanishing Point Estimate visibility using geometric cues from convergent lines. - In structured scenarios, it is both simple and effective. - Quick and easy method. -Restricted to scenes with distinct linear patterns. - Noise and obstructions are problems DCP Theory Utilizes information on color, polarization, and depth. - Improves accuracy by combining several cues. - Functions well in different lighting conditions. - Specific sensors are needed. - Processing is time-consuming. DL Neural networks that have been trained using image datasets are utilized. - Superb precision and applicability. -Manages intricate scenes. -Requires big datasets with labels. -Training and inference are expensive. 3 Materials Publicly available real-world time-series datasets for visibility estimation using images are few, and none of them focus on RVR estimation using images. To fill this void, three specialized datasets were created utilizing optical surveillance data collected from IP cameras: Dataset I, Dataset II, and Dataset III. The relevant standard visibility and RVR values are tagged on each image using a co-located FSM, which ensures accurate ground-truth references. The Meteorological Park on Runway 25 of Jay Prakash Narayan International (JPNI) Airport, Patna, was the site of 8,820 daytime pictures that make up Dataset I. The photographs cover the whole surveillance window, which begins at 0600 IST and ends at 1800 IST. From 1800 IST to 0600 IST, 4,410 photos were taken at night from the same location for Dataset II. A total of 13,230 photos were acquired over an intensive surveillance period from December 26, 2023, to January 31, 2024, for Dataset III, which includes both day and night situations. The experimental setup is presented in Fig. 2 , which is located in the Meteorological Park on Runway 25 at JPNI Airport in Patna (ICAO: VEPT; 25.5947° N, 85.0908° E). Part of the system is a weather sensor (FSM PWD22/52 Present Weather Detector) and an infrared (IR) fixed bullet IP camera (Hicks Vision) with full high definition (HD). The IP camera, which is equipped with CMOS sensors and infrared LEDs for night vision, was set up to capture images at 1-minute intervals during low-visibility events between December 25, 2023, and January 31, 2024. It operates at a frame rate of 25 fps and has the ability to pan and pivot at 355° and 90°, respectively. The PWD22/52 scatterometer is a multivariable optical sensor that measures visibility, RVR, and weather conditions by utilizing 45° forward light scattering from atmospheric particulates. It has a sampling volume of 0.1 litres and an integrated background luminance sensor. This co-located arrangement facilitated the synchronized acquisition of visual and meteorological data, thereby establishing a solid foundation for the estimation of visibility and RVR in various fog and illumination conditions. Figure 3 (a–c) illustrates representative sample images from Datasets I, II, and III. The distribution of visibility ranges ( 1000 m) is depicted in Fig. 3 (d) to further elucidate the composition and characteristics of these datasets. This visualization emphasizes the balance and variability of the dataset across various visibility conditions, providing insight into its overall suitability for model training and evaluation, as well as its diversity and coverage. 4 Methodology CNNs capture low-level visual features, such as edges and textures, while ViT model complex spatial relationships within images by employing multi-head self-attention. Standard Scaler was employed to perform exhaustive data pre-processing, which included image normalization, resizing, and feature scaling, to ensure numerical consistency across datasets prior to model training. Furthermore, to guarantee the reliability and quality of the training data, missing or invalid meteorological records were interpolated and filtered to remove anomalies. The DDT Matrix augments spatial awareness by encoding depth, distance, and temporal information. Figure 1 depicts the comprehensive framework, which includes real-time CCTV footage, meteorological data, and features generated from images. This multi-stream fusion method effectively integrates visual and contextual meteorological data, achieving strong performance across diverse illumination and atmospheric conditions. 4.1 Dark Channel Prior Framework DCP is a widely used image dehazing approach based on empirical observations. A study of 5,000 clear outdoor images found that each local patch typically contains at least one pixel with a near-zero value in one Red, Green, Blue (RGB) channel [ 20 , 35 ] . The dark channel \(\:{J}^{dark}\left(x\right)\) is defined as: $$\:{J}^{dark}\left(x\right)={\text{m}\text{i}\text{n}\left[\text{min}\left({J}^{c}\left(y\right)\right)\right]}_{y\in\:{\Omega\:}\left(x\right)\:cϵ[r,g,b]}$$ 1 For a haze-free RGB image \(\:{J}^{c}\) captured on a sunny day, the dark channel \(\:{J}^{dark}\) , calculated over local regions \(\:{\Omega\:}\left(x\right)\) , exhibits exceedingly low intensity values—approaching zero—in illuminated, non-sky regions. The result is expressed as \(\:{J}^{dark}\) ≈0 in Eq. ( 2 ). $$\:{J}^{dark}\to\:0$$ 2 4.2 Atmospheric Scattering Framework The atmospheric model treats the brightest point in the dark channel as a fixed light source L to guide the dehazing process [ 36 ] (shown in Eq. (3)). I(x) = R(x).τ(x) + L(1-τ(x)) (3) The observed intensity I(x) at any given pixel x in the hazy image is a combination of the actual scene radiance R(x), which is the clear image, and the global atmospheric light L, modulated by the transmission factor τ(x). The transmission τ(x) indicates the portion of scene radiance that reaches the camera without being scattered or absorbed by the medium. For practical colour processing, the haze model is reformulated per channel by normalizing with atmospheric light L c , yielding: $$\:\frac{{I}^{c}\left(x\right)}{{L}^{c}}=\tau\:\left(x\right)\frac{{R}^{c}\left(x\right)}{{L}^{c}}+1-\tau\:\left(x\right)$$ 4 Equation ( 5 ) is derived by combining the normalized form (Eq. 4 ) with the original haze model (Eq. 3), enabling clearer estimation of scene radiance and transmittance [ 37 ]. $$\:\stackrel{\sim}{\tau\:\left(x\right)}=1-{min}_{y\in\:{\Omega\:}\left(x\right)}\left({min}_{c}\frac{{R}^{c}\left(y\right)}{{L}^{c}}\right)$$ 5 Equation ( 5 ) estimates transmittance τ(x), but integrating it with the DCP method for dehazing can result in loss of depth cues in the image [ 30 ]. To preserve depth perception, a weighting factor w (0 ≤ w < 1) is introduced, allowing distant objects to retain slight haze, as shown in Eq. ( 6 ). $$\:\stackrel{\sim}{\tau\:\left(x\right)}=1-w{\:min}_{y\in\:{\Omega\:}\left(x\right)}\left({min}_{c}\frac{{R}^{c}\left(y\right)}{{L}^{c}}\right)$$ 6 The transmittance map of an RGB image is consistent across all three channels [ 30 ]. When ambient light is uniformly distributed, the function τ(x) can be calculated as shown in Eq. ( 7 ). $$\:\tau\:\left(x\right){=e}^{-\beta\:d\left(x\right)}$$ 7 In this context, β signifies the atmospheric attenuation coefficient, while d(x) indicates the depth at pixel x. 4.3 Depth and Transmission Estimation via DDT: A Matrix-Based Method Due to its excellent haze responsiveness, the DCP plays a pivotal role in dehazing and visibility estimation. It helps derive the transmittance map via the atmospheric scattering model. Monodepth2 [ 35 ]—a self-supervised monocular method—estimates depth to overcome the lack of depth data in typical surveillance. Eq. ( 7 ) captures the relationship between scene depth and transmittance. Eq. ( 7 ) highlights the link between scene depth and transmittance. Testing shows that depth, dark channel, and transmittance matrices effectively reflect changes in pixel appearance under various lighting and visibility conditions, as shown in Fig. 4 . Key aspects for visibility estimates are the dark channel, transmittance, and depth matrices, as demonstrated experimentally, because their pixel value distributions change significantly with visibility conditions. The innovative DDT method expands upon these principles by merging the three matrices into a single 3D stream matrix that estimates RVR and visibility. 4.4 Proposed Model This study introduces a new approach to visibility and RVR estimation. It involves combining CNN and transformer-based models with a 3D multi-feature DDT matrix that includes depth, dark channel, and transmittance matrices. This method uses state-of-the-art ML approaches to combine important image features, with or without meteorological factors, to improve the accuracy of estimates. The DDT matrix fuses three critical components. The dark channel's transmittance matrix draws attention to meteorological interference like fog or haze. The Dark Channel Matrix identifies low-intensity pixels across colour channels that indicate reduced visibility. The depth matrix captures the spatial layout of the scene, reflecting object distances and how visibility changes with distance. To effectively capture both local details and global context from the images, a hybrid model combining CNNs and transformers is employed. CNNs are proficient at extracting low-level features like edges and textures but have limitations in modelling complex spatial relationships. Transformers complement CNNs by capturing these intricate spatial dependencies. The specific CNN architecture used in this work is shown in Fig. 5 . By associating haze intensity, depth signals, and scene composition, the ViT effectively models long-range dependencies and global contextual relationships, rendering it highly suitable for the estimation of visibility and Runway RVR. The proposed ViT architecture is illustrated in Fig. 6 , which involves the linear projection and positional embedding of image fragments prior to their processing by a Transformer encoder. The model is capable of capturing a wide range of visual dependencies across haze gradients and depth layers by focusing on multiple spatial regions simultaneously through the multi-head self-attention mechanism in each encoder layer. The feature representation is refined by normalization and Multi-layer Perceptron layers, while each attention head computes query, key, and value representations to accentuate pertinent spatial features. This hybrid design combines ViT's global attention-based modelling with CNN-derived local features to improve the accuracy and robustness of visibility and RVR estimation. Features extracted from the CNN, ViT, and DDT matrix are combined into a single vector and input to regression models for visibility and RVR estimation. Because of their robust estimations and capacity to manage high-dimensional data, RF regressor are commonly used. With these combined attributes, the models are trained to provide reliable visibility estimates in a variety of environments. The detailed workflow is shown in Fig. 7 . 5 Analysis and Results of the Experiment The article shows that when meteorological data is added to CNNs and ViT with the DDT matrix, the results are better than when the DDT matrix is used alone or separately for visibility and RVR estimates. The experiments provide a thorough performance evaluation with theoretical insights by comparing the proposed method to existing leading algorithms. Several ML regression models utilize the integrated feature set, which includes meteorological parameters, for visibility estimation. Figure 8 shows the general workflow of the suggested hybrid CNN-ViT architecture for accurate visibility and RVR estimation, while Table 2 details the software and hardware setup, including CPU, GPU, RAM, storage, operating system, programming tools, and DL frameworks. Using a CNN branch for local spatial feature extraction, a ViT encoder for global contextual learning, and a 3D feature stream that incorporates physical factors like transmittance, dark channel, and depth are the three complementing components that constitute the system. The CNN stream captures edges, textures, and hazy density patterns at a finer level through successive convolution and pooling layers. Next, a global average pooling layer is used to aggregate these localized characteristics. Then, to improve feature selectivity, dense and multi-head attention layers are applied. The model may pay attention to numerous spatial regions at once because of the eight attention heads used by the Transformer encoder component (key dimension = 64). Using the query-key-value approach, it improves visibility and depth comprehension by learning inter-patch linkages and identifying the most haze-affected locations. To achieve optimal regression accuracy, the network is trained using the Adam optimizer with the following parameters: a learning rate of 1×10⁻⁴, a batch size of 32, and 50 epochs. For the final visibility prediction, the features derived from CNN and ViT are combined and sent to an ensemble regressor, which consists of RF and XGBoost. This hybrid architecture allows for strong and versatile performance in different lighting and weather circumstances by balancing the integration of local texture awareness, global scene interpretation, and physical haze factors. Learning rate, batch size, number of epochs, optimizer, and loss function are some of the critical training parameters shown in Fig. 8 , which also provides an overview of the network architecture. CNNs are adept at picking up on fine-grained details like textures and edges, but they have a challenging time processing more abstract visual data or figuring out where things are in space. In contrast, transformers [ 38 – 40 ]capture global dependencies across an image using self-attention, making them more effective for large-scale datasets where contextual relationships are important. Table 2 System Specifications and Development Frameworks System Specifications Development Framework CPU Intel(R) Core(TM) [email protected] GHz,12thGen System Configuration Window 10 64 bit. Python 3.11.5, tensorflow: 2.15.0 + Keras-applications: 1.0.8 + opencv-python: 4.8.1.78 RAM 16.0 GB (15.7 GB usable) GPU UHD Graphics 770(7.8GB) In contrast, transformers [ 38 – 40 ]capture global dependencies across an image using self-attention, making them more effective for large-scale datasets where contextual relationships are important. A novel end-to-end paradigm for visibility and RVR estimation integrates CNNs, VIT, and 3D feature streams. CNNs extract detailed features, transformers model long-range dependencies, and the DDT Matrix enhances spatial awareness with depth, distance, and time-related data. This hybrid approach improves estimation accuracy by integrating local feature extraction with environmental context. The model functions with or without meteorological data; hence, it improves accuracy when such data is accessible. 5.1 Effective Uses of the DDT Matrix The model estimates visibility and RVR by using dark channel, transmittance, and depth matrices, with each component being independently tested to determine its contribution. A modified CNN and ViT combination is used as a feature extractor, with the DDT matrix central to the estimation process.The original image and the DDT matrix are used in a baseline experiment for comparison analysis. Table 3 shows how well different input qualities work. R² = 0.96 for visibility and 0.97 for RVR are achieved using the dark channel matrix, which represents the intensity of the fog; however, these values are marginally lower than the original image. The transmittance matrix, which represents the concentration of particles in the atmosphere, is also quite accurate. Accuracy is enhanced to R2 = 0.97 for visibility and 0.98 for RVR by using the depth matrix, which records spatial information. The DDT matrix, a fusion of all three, provides the highest accuracy (R 2 = 0.97 for visibility and 0.98 for RVR) with the lowest RMSE and MAE, proving it the most reliable method for visibility and RVR estimation and improving precision and operational utility in aviation meteorology. Table 3 Comparative Analysis of Feature Matrices for Visibility and RVR Estimation Input Source Base Image Dark Channel Feature Transmittance Component Depth Information 3D DDT Composite R 2 Error (Visibility) 0.97 0.96 0.96 0.97 0.97 RMSE (Visibility) 126.2 150 150.3 121.5 117 MAE (Visibility) 71.8 85.5 85.42 68.01 68.81 R 2 (RVR) 0.98 0.97 0.97 0.98 0.98 RMSE (RVR) 114.6 139.4 139.4 113.23 112 MAE (RVR) 62.3 76.43 76.43 57.03 57.03 5.2 Comparative Analysis The study comprehensively evaluated CNN and ViT feature extractors for visibility and RVR estimation, analysing the influence of atmospheric conditions. To guarantee reliable and operationally applicable results, the study contrasted the suggested method with prior visibility estimating techniques [ 15 ], [ 29 ]. Key performance metrics, including RMSE, MAE, and R², were used to assess visibility and RVR accuracy. A full comparison analysis for Dataset III under both day and night situations is provided in Fig. 9 (a), which exhibits visibility estimation performance, and Fig. 9 (b), which emphasizes RVR estimation. The complete approach shows how the proposed strategy works better in real-world applications and gives useful insights into how meteorological data might improve model precision. Figure 9 results demonstrate the significant impact of integrating weather parameters and the DDT matrix on visibility and RVR estimation. With low error rates (26.71 for visibility and 23.47 for RVR, respectively) and high R² values (0.99), the most accurate estimations are produced by the model that contains both the DDT matrix and meteorological parameters, in addition to CNN and ViT extractors. The significance of meteorological data is underscored when we exclude weather parameters, which raises the MSE to 13,689.9 for visibility and 9,654.2 for RVR, with reduced R² values. Removing the weather factors and DDT matrix negatively impacts performance. The visibility MSE is 32,108, and the RVR MSE is 19,204, and the R² scores are the lowest at 0.94 and 0.96, respectively. The CNN &ViT model with weather data but without the DDT matrix still performs well, achieving a MSE of 19.45 for visibility and 17.42 for RVR, indicating the DDT matrix's enhancement but the crucial role of weather parameters. When applied to aviation meteorology, the combined use of hybrid DL, DDT feature extraction, and weather data greatly enhances visibility and RVR estimation, surpassing that of conventional approaches and guaranteeing more accurate estimation. 6 Discussion The advancement of visibility estimation techniques has shifted from conventional ML methods to advanced DL frameworks in the past ten years. Initial studies utilized SVM and Multi-Output Support Vector Regression (MSVR) frameworks, wherein handcrafted features were identified and input into conventional ML methods. [ 41 ] illustrated this methodology utilizing a VGG16-MSVR model that attained 85% classification accuracy on structured datasets. [ 42 ] combined AlexNet and Deep Convolutional Neural Networks (DCNN) with SVM classifiers, getting 99.02% accuracy on the FROSI dataset. These methods established fundamental capabilities in automated visibility evaluation; nevertheless, their dependence on static, humanly designed features limited their flexibility to variable meteorological conditions and different contextual situations. The advent of end-to-end deep learning systems signified a substantial methodological transition towards automatic feature extraction from raw images. CNN methodologies obviated the necessity for human feature engineering by acquiring hierarchical spatial representations via consecutive convolutional and pooling procedures. [ 29 ] employed AlexNet, ResNet, and DenseNet architectures, achieving a classification accuracy of 99.02%, whereas [ 27 ] documented an accuracy of 98.3% with DenseNet variations. Notwithstanding these enhancements in classification efficacy, pure CNN architectures predominantly identify limited spatial patterns and short-range relationships within the convolutional kernels' receptive field. This architectural feature restricts their ability to represent extensive atmospheric contexts, long-distance spatial linkages, and temporal dynamics that define real-world visibility degradation occurrences. The acknowledgment of these constraints prompted the creation of hybrid architectures that combine spatial and temporal modelling abilities. [ 43 ] introduced a CNN-LSTM system that integrates convolutional feature extraction with Long Short-Term Memory networks to effectively capture spatial deterioration patterns and temporal atmospheric shifts. This design attained a Mean Squared Error (MSE) of 19.45 in regression-based visibility estimation tasks, indicating significant advancement over prior methodologies. The CNN component extracts local degradation features, including texture loss, contrast reduction, and scattering effects, while the LSTM network models sequential dependencies across consecutive frames, allowing the system to monitor atmospheric changes such as fog intensification or the onset of precipitation. Comparative hybrid methodologies encompass [ 44 ], who attained an accuracy of 90.4% with a Mean Squared Error (MSE) of 9.6 utilizing a CNN-RNN architecture, and [ 45 ], who realized Root Mean Squared Error (RMSE) values ranging from 6.71 to 8.63 via the integration of satellite imagery, Numerical Weather Prediction (NWP) data, and surface observations. [ 46 ] created DMRVisNet, which integrated explicit physical degradation models grounded in air scattering theory; yet, this method exhibited constrained efficacy for long-range estimates, with an RMSE of 93.11. The incorporation of weather variables is a crucial element in the precision of visibility estimation, which has been inadequately explored in prior research. [ 18 ] performed ablation studies revealing that their CNN-LSTM framework attained a mean squared error (MSE) of 26.71 with solely image-based features, which significantly escalated to MSE values of 13,689.9 and 32,108 upon the complete exclusion of temperature, humidity, and precipitation data from the model input. The results quantify the critical role of atmospheric conditions in enhancing contextual comprehension and numerical precision in visibility evaluation. Meteorological factors offer clear insights into the physical processes that lead to visibility reduction, such as condensation mechanisms in fog, particle concentration in haze, and scattering characteristics during precipitation. Recent studies have examined transformer-based designs utilizing self-attention methods to capture global spatial linkages, free from the locality limits of convolutional processes. [ 18 ] created STCN-Net, which integrates Swin Transformer with ResNet18, attaining 97.9% accuracy in visibility classification tasks. [ 47 ] performed a comparative analysis of DenseNet121 and Vision Transformer (ViT) architectures, achieving an accuracy of 96.69% with the transformer-based method. These experiments illustrate the efficacy of attention mechanisms in modelling long-range atmospheric relationships; nevertheless, pure transformer topologies may compromise the local feature sensitivity that CNNs offer for identifying fine-grained degradation patterns at the pixel level. The proposed paradigm mitigates highlighted constraints by integrating multimodal aspects, which encompass local spatial characteristics, global contextual representations, explicit physical modelling, and environmental parameters. The architecture utilizes CNNs for extracting local degradation patterns, vision transformers for capturing global atmospheric context via self-attention, a three-dimensional DDT matrix that encodes depth, dark channel, and transmittance information grounded in atmospheric physics, as well as meteorological variables such as temperature, wind speed, and atmospheric pressure. A Random Forest regressor does final fusion and estimation by merging diverse information sources. The quantitative assessment results in a Root Mean Square Error (RMSE) of 117 and a Mean Absolute Error (MAE) of 68.81 across multi-condition datasets that include daytime, night-time, and diverse weather conditions. The baseline evaluation with solely physical attributes from the DDT matrix, devoid of learned components, yields an RMSE of roughly 150, signifying that the incorporation of deep learning features with physical modelling results in a 22% decrease in estimation error. Cross-dataset examination demonstrates uniform performance across several environmental contexts, encompassing urban settings rich in structural references and rural landscapes with little visual anchoring. This generalization capability mitigates the shortcomings identified in previous approaches that have shown performance decline under certain environmental circumstances or visibility ranges. The framework's capacity to distinguish between distinct degradation mechanisms—such as fog resulting from condensation, haze due to particulate suspension, and visibility impairment during precipitation—arises from the amalgamation of learned features that encapsulate visual appearance patterns with meteorological variables that represent fundamental physical processes. This study used a regression-based methodology instead of discrete categorization. Previous studies predominantly employed classification frameworks that assign visibility observations to predefined categorical ranges. [ 15 , 27 , 32 ] all presented accuracy measures derived from categorical visibility bins. The regression method facilitates ongoing visibility and runway visual range assessment, delivering detailed numerical values instead of categorical classifications. This capability is particularly relevant for operational applications in aviation, maritime navigation, and transportation management, where safety regulations and decision-making protocols require precise continuous measurements instead of categorical ranges. Direct numerical comparisons among research necessitate the consideration of methodological discrepancies in evaluation processes, dataset attributes, and performance indicators. Research indicating classification accuracy utilized distinct visible categories and assessed correct categorization rates, while regression-based methodologies applied error metrics such as MSE, RMSE, and MAE to quantify continuous prediction accuracy. Variations in geographic location, temporal coverage, weather diversity, and picture capture conditions in datasets further confound cross-study comparisons. The improvement from an RMSE of about 150 with only physical features to an RMSE of 117 through full multimodal integration shows a measurable improvement. However, absolute performance metrics depend on the dataset's visibility distributions and measurement ranges. 7 Conclusion This study presents a hybrid framework that integrates designed and automatically learned features for the estimation of visibility and RVR. The model combines ViT and CNNs with a 3D multi-channel feature matrix, known as the DDT matrix, which includes depth, dark channel, and transmittance elements. This design facilitates the concurrent acquisition of local deterioration patterns and global contextual information across various weather circumstances. The system integrates meteorological data (temperature, wind, atmospheric pressure, etc.) to distinguish between various vision degradation mechanisms, such as the onset of fog and precipitation-induced scattering. The quantitative assessment indicates that the hybrid CNN-ViT-DDT model attains an RMSE of 117 and an MAE of 68.81, signifying a 22% enhancement compared to standalone physical feature methods (dark channel and transmittance alone: RMSE about 150). A Random Forest (RF) regressor serves as the integration layer, amalgamating multi-stream features from CNN, ViT, DDT components, and meteorological data. A comparative investigation of three real-world datasets (daytime, night-time, and mixed settings) reveals continuous performance benefits for conventional ML and single-stream DL benchmarks. The framework maintains estimation accuracy across diverse environmental contexts, with minimal visual indicators to urban settings marked by abundant structural elements. The experimental findings demonstrate that the integration of multimodal features significantly enhances visibility and RVR estimation accuracy under adverse situations such as dense fog, heavy precipitation, and low-light environments. The findings indicate practical relevance for aviation safety systems, transportation management infrastructure, and atmospheric monitoring networks, where accurate visibility assessment is essential for operational decision-making. Declarations Conflict of Interest The author affirms that there are no conflicts of interest related to the publication of this article. Competing Interests The authors confirm that they have no competing interests associated with this study. Compliance with Ethical Standards The authors declare that the research was conducted without a commercial or financial relationship that could be interpreted as a potential conflict of interest. Funding The authors assert that they did not obtain any financial support, grants, or assistance during the development of the work. Author Contribution Conceptualization, A.S.; Data Curation, A.S.; Formal Analysis, A.S.: Methodology, A.S.; Software, A.S.; Validation, A.S., Visual-ization , A.S.; Writing—original draft , A.S; writing—revised draft, A.S.,B.C.S.; Supervision, B.C.S.; all authors have read and agreed to the published version of the manuscript. Acknowledgement The authors express gratitude to the committed officials of the India Meteorological Department for their proactive assistance in instrument deployment and maintaining real-time data acquisition systems. Special thanks to Mantosh Kumar, Deepak Kumar Singh, and their team for maintaining the data acquisition system. AS also acknowledges the Director General of Meteorology, M. Mohapatra, for his continued motivation and support. References Kim, K.W.: The comparison of visibility measurement between image-based visual range, human eye-based visual range, and meteorological optical range. Atmos. Environ. 190 , 74–86 (2018). https://doi.org/https://doi.org/10.1016/j.atmosenv.2018.07.020 Shankar, A., Sahana, B.C.: Early warning of low visibility using the ensembling of machine learning approaches for aviation services at Jay Prakash Narayan International (JPNI) Airport Patna. SN Appl. Sci. 5 , 132 (2023). https://doi.org/10.1007/s42452-023-05350-7 International Civil Aviation Organization (ICAO): Manual of Runway Visual Range Observing and Reporting Practices. 105: (2005) Shankar, A., Sahana, B.C.: Efficient prediction of runway visual range by using a hybrid CNN-LSTM network architecture for aviation services. Theor. Appl. Climatol. 155 , 2215–2232 (2024). https://doi.org/10.1007/s00704-023-04751-3 Shankar, A.: The Impacts of Low Visibility on the Aviation Services of Patna Airport During the Period from 2016 to 2023. J. Airl. Oper. Aviat. Manag. 3 , 46–57 (2024). https://doi.org/https://doi.org/10.56801/jaoam.v3i1.5 3 Shankar, A., Sahana, B.C., Singh, S.P.: Prediction of Low-Visibility Events by Integrating the Potential of Persistence and Machine Learning for Aviation Services. Mausam. 75 , 977–992 (2024). https://doi.org/10.54302/mausam.v75i4.6624 Shankar, A., Kumar, A., Sinha, V.: Machine Learning Approach in the Prediction of Fog: An Early Warning System. Mausam. 75 , 1039–1050 (2024). https://doi.org/10.54302/mausam.v75i4.5919 Zhai, B., Wang, Y., Wu, B.: An ensemble learning method for low visibility prediction on freeway using meteorological data. IET Intell. Transp. Syst. 17 , 2237–2250 (2023). https://doi.org/https://doi.org/10.1049/itr2.12404 Xu, Q., Su, W., Qi, Y., Tao, W., Pollefeys, M.: Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks. Int. J. Comput. Vis. 130 , 2040–2059 (2022). https://doi.org/10.1007/s11263-022-01628-2 Lee, J.Y., DeGol, J., Zou, C., Hoiem, D.: PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility. Proc. IEEE Int. Conf. Comput. Vis. 6138–6147 (2021). https://doi.org/10.1109/ICCV48922.2021.00610 Palvanov, A., Im Cho, Y.: DHCNN for visibility estimation in foggy weather conditions. Proc. – 2018 Jt. 10th Int. Conf. Soft Comput. Intell. Syst. 19th Int. Symp. Adv. Intell. Syst. SCIS-ISIS 2018. 240–243 (2018). https://doi.org/10.1109/SCIS-ISIS.2018.00050 Chen, J., Dowman, I., Li, S., Li, Z., Madden, M., Mills, J., Paparoditis, N., Rottensteiner, F., Sester, M., Toth, C., Trinder, J., Heipke, C.: Information from imagery: ISPRS scientific vision and research agenda. ISPRS J. Photogramm Remote Sens. 115 , 3–21 (2016). https://doi.org/https://doi.org/10.1016/j.isprsjprs.2015.09.008 Papari, G., Petkov, N.: Edge and line oriented contour detection: State of the art. Image Vis. Comput. 29 , 79–103 (2011). https://doi.org/https://doi.org/ 10.1016/j.imavis.2010.08.009 Zhang, Y., Wu, Y., Chen, H.: Research progress of visual simultaneous localization and mapping based on deep learning. Yi Qi Yi Biao Xue Bao/Chinese J. Sci. Instrum. 44 , 214–241 (2023). https://doi.org/10.19650/j.cnki.cjsi.J2311081 Giyenko, A., Palvanov, A., Cho, Y.: Application of convolutional neural networks for visibility estimation of CCTV images. Int. Conf. Inf. Netw. 2018-Janua. 875–879 (2018). https://doi.org/10.1109/ICOIN.2018.8343247 Hemalatha, J., Roseline, S.A., Geetha, S., Kadry, S., Damaševičius, R.: An efficient densenet-based deep learning model for Malware detection. Entropy. 23 , 1–23 (2021). https://doi.org/10.3390/e23030344 Chaabani, H., Kamoun, F., Bargaoui, H., Outay, F., Yasar, A.U.H.: A Neural network approach to visibility range estimation under foggy weather conditions. Procedia Comput. Sci. 113 , 466–471 (2017). https://doi.org/10.1016/j.procs.2017.08.304 Liu, J., Chang, X., Li, Y., Ji, Y., Fu, J., Zhong, J.: STCN-Net: A Novel Multi-Feature Stream Fusion Visibility Estimation Approach. IEEE Access. 10 , 120329–120342 (2022). https://doi.org/10.1109/ACCESS.2022.3218456 Bae, T.W., Han, J.H., Kim, K.J., Kim, Y.T.: Coastal visibility distance estimation using dark channel prior and distance map under sea-fog: Korean Peninsula case. Sens. (Switzerland). 19 (2019). https://doi.org/10.3390/s19204432 He, Y., Ding, J., Teng, H., Han, X., Chen, Y., Zhou, W.: Visibility detection and prediction of foggy highway based on lane line detection and Winters additive model. In: 2021 40th Chinese Control Conference (CCC). pp. 7254–7259 (2021) Graves, N., Newsam, S.: Using visibility cameras to estimate atmospheric light extinction. IEEE Work. Appl. Comput. Vision, WACV 2011. 577–584 (2011). (2011). https://doi.org/10.1109/WACV.2011.5711556 Zou, J.: Visibility detection method based on camera model calibration. Proc. – 2017 4th Int. Conf. Inf. Sci. Control Eng. ICISCE 770–776 (2017). (2017). https://doi.org/10.1109/ICISCE.2017.165 Ortega, L.C., Otero, L.D., Solomon, M., Otero, C.E., Fabregas, A.: Deep learning models for visibility forecasting using climatological data. Int. J. Forecast. 39 , 992–1004 (2023). https://doi.org/10.1016/j.ijforecast.2022.03.009 Xiyu, M., Qi, X., Qiang, Z., Junchi, R., Hongbin, W., Linyi, Z.: An Improved Diracnet Convolutional Neural Network for Haze Visibility Detection. In: 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP). pp. 1–5 (2021) Wang, J., Zhang, L.: Research on Deep Learning Model of Fog Visibility Estimation Based on CNN. In: 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). pp. 1355–1359 (2021) Chincholkar, S., Rajapandy, M.: Fog Image Classification and Visibility Detection Using CNN BT - Intelligent Computing, Information and Control Systems. Presented at the (2020) Wang, H., Shen, K., Yu, P., Shi, Q., Ko, H.: Multimodal Deep Fusion Network for Visibility Assessment with a Small Training Dataset. IEEE Access. 8 , 217057–217067 (2020). https://doi.org/10.1109/ACCESS.2020.3031283 Zhang, J., Min, X., Zhu, Y., Zhai, G., Zhou, J., Yang, X., Zhang, W.: HazDesNet: An End-to-End Network for Haze Density Prediction. IEEE Trans. Intell. Transp. Syst. 23 , 3087–3102 (2022). https://doi.org/10.1109/TITS.2020.3030673 Outay, F., Taha, B., Chaabani, H., Kamoun, F., Werghi, N., Yasar, A.U.H.: Estimating ambient visibility in the presence of fog: a deep convolutional neural network approach. Pers. Ubiquitous Comput. 25 , 51–62 (2021). https://doi.org/10.1007/s00779-019-01334-w Qin, H., Qin, H.: An End-to-End Traffic Visibility Regression Algorithm. IEEE Access. 10 , 25448–25454 (2022). https://doi.org/10.1109/ACCESS.2021.3101323 Wang, Y., Du, J., Yan, Z., Song, Y., Hua, D.: Atmospheric visibility prediction by using the DBN deep learning modeland principal component analysis. Appl. Opt. 61 , 2657–2666 (2022). https://doi.org/10.1364/AO.449148 Palvanov, A., Cho, Y.I.: Visnet: Deep convolutional neural networks for forecasting atmospheric visibility. Sens. (Switzerland). 19 (2019). https://doi.org/10.3390/s19061343 Choi, W., Park, J., Kim, D., Park, J., Kim, S., Lee, H.: Development of Two-Dimensional Visibility Estimation Model Using Machine Learning: Preliminary Results for South Korea. Atmos. (Basel). 13 (2022). https://doi.org/10.3390/atmos13081233 Amiri, M., Soleimani, S.: A Hybrid Atmospheric Satellite Image-Processing Method for Dust and Horizontal Visibility Detection through Feature Extraction and Machine Learning Techniques. J. Indian Soc. Remote Sens. 50 , 523–532 (2022). https://doi.org/10.1007/s12524-021-01460-0 He, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 33 , 2341–2353 (2011). https://doi.org/10.1109/TPAMI.2010.168 Nayar, S.K., Narasimhan, S.G.: Vision in bad weather. Proc. IEEE Int. Conf. Comput. Vis. 2, 820–827 (1999). https://doi.org/10.1109/iccv.1999.790306 Negru, M., Nedevschi, S.: Image based fog detection and visibility estimation for driving assistance systems. Proc. – 2013 IEEE 9th Int. Conf. Intell. Comput. Commun. Process. ICCP 163–168 (2013). (2013). https://doi.org/10.1109/ICCP.2013.6646102 Vaswani Ashish, S., Noam, P.N., Jakob, U., Llion, J., Gomez Aidan, N., Kaiser Lukasz, I.P.: Attention Is All You Need. Adv. Neural Inf. Process. Syst. 30 (2017). https://doi.org/https://doi.org/10.48550/arXiv.1706.03762 Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in Vision: A Survey. ACM Comput. Surv. 54 (2022). https://doi.org/10.1145/3505244 Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y., Yang, Z., Zhang, Y., Tao, D.: A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45 , 87–110 (2023). https://doi.org/10.1109/TPAMI.2022.3152247 Lo, W.L., Zhu, M., Fu, H.: Meteorology visibility estimation by using multi-support vector regression method. J. Adv. Inf. Technol. 11 , 40–47 (2020). https://doi.org/10.12720/jait.11.2.40-47 Xun, L., Zhang, H., Yan, Q., Wu, Q., Zhang, J.: VISOR-NET: Visibility Estimation Based on Deep Ordinal Relative Learning under Discrete-Level Labels. Sensors. 22 , 1–20 (2022). https://doi.org/10.3390/s22166227 Shankar, A., Sahana, B.C.: System to Estimate Visibility and Runway Visual Range (RVR) from Image Data, (2024) You, Y., Lu, C., Wang, W., Tang, C.K.: Relative CNN-RNN: Learning relative atmospheric visibility from images. IEEE Trans. Image Process. 28 , 45–55 (2019). https://doi.org/10.1109/TIP.2018.2857219 Wang, J., Zhang, L.: Research on Deep Learning Model of Fog Visibility Estimation Based on CNN. IMCEC 2021 - IEEE 4th Adv. Inf. Manag. Commun. Electron. Autom. Control Conf. 1355–1359 (2021). (2021). https://doi.org/10.1109/IMCEC51613.2021.9482258 You, J., Jia, S., Pei, X., Yao, D.: DMRVisNet: Deep Multihead Regression Network for Pixel-Wise Visibility Estimation under Foggy Weather. IEEE Trans. Intell. Transp. Syst. 23 , 22354–22366 (2022). https://doi.org/10.1109/TITS.2022.3180229 Bouhsine, T., Idbraim, S., Bouaynaya, N.C., Alfergani, H., Ouadil, K.A., Johnson, C.C.: Atmospheric Visibility Image-Based System for Instrument Meteorological Conditions Estimation: A Deep Learning Approach. Proc. – 2022 9th Int. Conf. Wirel. Networks Mob. Commun. WINCOM 2022. 1–6 (2022). https://doi.org/10.1109/WINCOM55661.2022.9966454 Unsectioned Paragraphs Contributions Conceptualization, A.S.; Data Curation, A.S.; Formal Analysis, A.S.: Methodology, A.S.; Software, A.S.; Validation, A.S., Visualization, A.S.; Writing—original draft, A.S; writing—revised draft, A.S.,B.C.S.; Supervision, B.C.S.; all authors have read and agreed to the published version of the manuscript. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviewers invited by journal 27 Jan, 2026 Editor assigned by journal 24 Jan, 2026 Submission checks completed at journal 24 Jan, 2026 First submitted to journal 23 Jan, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8678337","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":581508635,"identity":"b2fd046e-a744-415c-92a9-d6a5035c6859","order_by":0,"name":"Anand Shankar","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/ElEQVRIiWNgGAWjYBACAzBpw5DAwMAGZFQAMTNzAyEtjA0MaTAtZ0BaGEnRwtgGEiOgxZy9/fmDDwl1efyz2xI/V86rjeZvB2r5UbENpxbLnjOGjTMSDhdL3Dl2WPLstuO5Mw4zNjD2nLmN22E3chibeX8cSGy4kd4g2bjtWG4DUAszYxseLfefP2zmSahLnH8jvfln45xjufMJarnBYAjUwpy44UbaMcnGhprcDYS0WPbkGM4E+iVx4420NMuGYwdyNwK1HMTnF3P24w8+AEMscd6NNOObDTV1ufPOHz744EcFbi3o4DCYPEC0eiCoI0XxKBgFo2AUjBAAABE0ZZaEby+gAAAAAElFTkSuQmCC","orcid":"","institution":"Ministry of Earth Sciences","correspondingAuthor":true,"prefix":"","firstName":"Anand","middleName":"","lastName":"Shankar","suffix":""},{"id":581508636,"identity":"cb7d2b97-62d4-494d-bf9a-d4c0ef7a9294","order_by":1,"name":"Bikash Chandra Sahana","email":"","orcid":"","institution":"National Institute of Technology Patna","correspondingAuthor":false,"prefix":"","firstName":"Bikash","middleName":"Chandra","lastName":"Sahana","suffix":""}],"badges":[],"createdAt":"2026-01-23 10:53:25","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8678337/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8678337/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":101988591,"identity":"340761c7-716d-4bf9-9b8f-d228ef3a9565","added_by":"auto","created_at":"2026-02-05 18:55:23","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":45992,"visible":true,"origin":"","legend":"\u003cp\u003eOverall architecture of the proposed models\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8678337/v1/b2bb5507644ea93bdb8ca6aa.png"},{"id":101988589,"identity":"ee38fe62-2b6c-4f09-99b1-56de58a2dbfa","added_by":"auto","created_at":"2026-02-05 18:55:23","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":815149,"visible":true,"origin":"","legend":"\u003cp\u003eDetails of the Meteorological Sensor and Collocated IP Camera Installed at the Meteorological Park of Runway 25 at Patna Airport.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8678337/v1/159f058414daa6ec75d118e6.png"},{"id":101988590,"identity":"f2a406af-be37-430c-9329-07b4a8db499f","added_by":"auto","created_at":"2026-02-05 18:55:23","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":485742,"visible":true,"origin":"","legend":"\u003cp\u003e(a–c) Examples of photographs from Datasets I–III taken in different lighting situations. (d) A histogram displaying the distribution of visibility ranges (from less than 200 m to more than 1000 m), which shows how balanced and diverse the dataset is.\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8678337/v1/40e441f5c694abc901703c0e.jpeg"},{"id":101988593,"identity":"245ceaa6-3177-466d-851c-310ce0bdddd4","added_by":"auto","created_at":"2026-02-05 18:55:23","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1210553,"visible":true,"origin":"","legend":"\u003cp\u003eVisualization of the same scene under varying visibility levels, showing (top to bottom): base image, dark channel feature, transmittance component, and depth information.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8678337/v1/d8aeb500fb7906e7bc434c17.png"},{"id":101988588,"identity":"a1e7ec18-8db9-4023-beff-7d636c5f20f2","added_by":"auto","created_at":"2026-02-05 18:55:23","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":79081,"visible":true,"origin":"","legend":"\u003cp\u003eStructure of the proposed CNN .\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8678337/v1/5e7315c4bf1c3e57558e1972.png"},{"id":101988596,"identity":"f7bb9332-69d0-4147-9da9-ec326958891d","added_by":"auto","created_at":"2026-02-05 18:55:23","extension":"jpeg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":428010,"visible":true,"origin":"","legend":"\u003cp\u003eStructure of the proposed ViT used in this work\u003c/p\u003e","description":"","filename":"floatimage6.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8678337/v1/43eaccad78ca64bd4242c806.jpeg"},{"id":102295088,"identity":"e19b504d-aede-4494-a554-f431c7c7f130","added_by":"auto","created_at":"2026-02-10 10:08:36","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":102496,"visible":true,"origin":"","legend":"\u003cp\u003eThe full workflow for estimating visibility and RVR has several important steps for processing and combining data from different feature sources.\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-8678337/v1/36ddba589f3c41157df47b7e.png"},{"id":102295417,"identity":"dfd7a12d-e808-4850-b7cf-e8e35f364c2a","added_by":"auto","created_at":"2026-02-10 10:11:07","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":84174,"visible":true,"origin":"","legend":"\u003cp\u003eOutlines layer-specific training parameters, including learning rate, batch size, epochs, optimizer, and loss function.\u003c/p\u003e","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-8678337/v1/436fc35e1d932b84b7fcfe53.png"},{"id":101988594,"identity":"b2f53033-486f-41c2-9b9a-19083fd9f585","added_by":"auto","created_at":"2026-02-05 18:55:23","extension":"jpeg","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":612823,"visible":true,"origin":"","legend":"\u003cp\u003eComparative analysis of various methods for (a) visibility estimation and (b) RVR estimation.\u003c/p\u003e","description":"","filename":"floatimage9.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8678337/v1/721afde710e5baf2d62f2510.jpeg"},{"id":102962089,"identity":"978e8d8f-52f3-47ef-bfc6-1ef0414c8e81","added_by":"auto","created_at":"2026-02-19 04:01:03","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":4264263,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8678337/v1/2f1efe02-49a3-487b-b0ea-72e8e925fab0.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Deep Hybrid CNN–ViT Architecture Incorporating Advanced 3D Features for the Estimation of Visibility and Runway Visual Range","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eVisibility is the maximum distance at which the human eye can distinguish an object [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e],[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. The atmospheric extinction coefficient, a key indicator of air clarity, directly impacts visibility. RVR is the distance a pilot on the runway centreline can see the runway surface markings or lights identifying the runway centreline [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. RVR is crucial for aviation operations in low-visibility conditions like fog, rain, etc. Accurate estimation of visibility and RVR is critical for guaranteeing safe and efficient aviation services, including take off, landing, and ground movement. In addition, reliable visibility estimation is essential for marine navigation, where it aids in collision avoidance and route planning, and for road traffic, where it enhances driving safety and helps in traffic management under adverse weather conditions.\u003c/p\u003e \u003cp\u003eStatistics indicate that the likelihood of traffic accidents is significantly higher on foggy days compared to clear ones [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Accurate estimation of visibility and RVR is challenging in operational weather services but has become crucial for aviation services and overall transportation safety [\u003cspan additionalcitationids=\"CR7\" citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Initially, researchers typically used both instrument-based and manual observational methods to estimate visibility. Visibility-estimating equipment, although expensive, utilizes optical components for detection, whereas visual methods are still susceptible to human error caused by subjective influences. Recognizing the potential of image processing and computer vision, researchers began using these techniques to estimate visibility.\u003c/p\u003e \u003cp\u003eImage-based visibility estimation approaches have evolved from physical models [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]\u0026mdash;which rely on the atmospheric scattering model and require scene-specific parameters such as depth and distance\u0026mdash;to advanced deep learning (DL) methods [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] that learn direct mappings from images to visibility or RVR values. However, current CNN-based visibility models fail to extract visibility-related discriminative features explicitly, limiting their estimating capability. In contrast, other domains have successfully combined CNNs, Vision Transformers (ViT), and multi-stream feature fusion to capture spatial and contextual relationships more effectively [\u003cspan additionalcitationids=\"CR13\" citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Motivated by these advances, this study introduces an innovative end-to-end framework for visibility and RVR estimation that fuses engineered meteorological inputs, deep features from CNNs and ViT, and atmospheric model principles to improve estimation accuracy and reliability. Nevertheless, the proposed CNN-Transformer hybrid remains an optimal and understandable design choice for robust, data-driven estimation of visibility and RVR.\u003c/p\u003e \u003cp\u003eThe combination of CNN and Transformer, as depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, is especially advantageous to the proposed architecture since each has strengths that complement each other when it comes to feature representation. CNNs excel at detecting minute spatial details such as textures, edges, and variations in haze intensity, all of which contribute to the identification of degradation patterns in visibility. Transformers, on the other hand, are exceptionally adept at learning global contextual dependencies through self-attention mechanisms. The following lets the model grasp interactions throughout the complete scene and depth-dependent scattering effects, which are critical for appropriately estimating visibility in varied situations. Thereby, combining CNNs and transformers lets the network use both fine-grained local features and global contextual reasoning at the same time, which makes it more robust and able to generalize.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe framework centres on a DDT matrix. These matrices, derived from scattering models, dark channel priors, and depth data, offer a holistic measure of visibility and RVR. For accurate visibility and RVR estimation, the DDT matrix includes critical engineered properties A machine learning (ML) block is used to fuse these parameters effectively. This involves using a fully connected (FC) layer and feature combination before generating the final visibility and RVR estimate. Three unique visibility datasets were created to thoroughly assess the suggested technique. The datasets comprise real-time photographs obtained at Patna Airport under varying illumination conditions: daytime (Dataset I), night-time (Dataset II), and a combination of both (Dataset III). Each dataset is labelled with corresponding standard visibility and RVR values obtained from instrument-based measurements using a forward scatterometer (FSM).\u003c/p\u003e \u003cp\u003eThe test and validation demonstrate that our strategy surpasses other well-known DL approaches [\u003cspan additionalcitationids=\"CR16 CR17\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. This research leverages advancements in CNNs, transformers, and 3D feature extraction techniques, along with the widespread use of surveillance systems (e.g., CCTV cameras). This demonstrates that image-based visibility estimation algorithms are both practical and applicable to real-world scenarios, including land-based, maritime, and critical aviation services.\u003c/p\u003e \u003cp\u003eThis research presents multiple significant contributions:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eIt generates distinct real-time image datasets under different illumination circumstances to estimate visibility assessment techniques for aviation services.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThe proposed method utilizes real-time CCTV and FSM data from Patna Airport, markedly enhancing the precision of visibility and RVR estimation compared to conventional techniques.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThe model is entirely automated and hardware-agnostic and eliminates the need for manual feature extraction, guaranteeing consistent performance.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eA novel 3D multi-feature stream\u0026mdash;comprising transmittance, dark channel, and depth matrices\u0026mdash;is introduced.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe study unfolds as follows: Section 1 introduces the study, Section 2 surveys related work, Section 3 outlines the materials, Section 4 explains the methods, Section 5 shares the results, Section 6 interprets the findings, and Section 7 concludes.\u003c/p\u003e"},{"header":"2 Related Works","content":"\u003cp\u003eThe first part of this section provides a concise overview of image-based visibility estimating methods based on physical models, while the second part discusses DL approaches.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Visibility via Physics-based Models\u003c/h2\u003e \u003cp\u003eVarious image-based visibility estimation methods have been proposed. The Dark Channel Prior (DCP) theory [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] utilizes haze effects and dark channel features to assess visibility, employing techniques such as identifying vanishing points and averaging pixel distances. Even if it works, guided filtering [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] enhances DCP accuracy by refining transmittance maps through noise reduction and edge information preservation, which improves the quality of the maps. There is also the option of using local contrast-based approaches [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]; however, these tend to be less accurate in situations with complicated lighting or rich texture. Calibration methods for cameras [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]that use nonlinear least squares fitting to correlate picture features with real-world distances provide accurate estimates; nevertheless, these methods necessitate very specific image conditions and complex parameter tweaking.\u003c/p\u003e \u003cp\u003eIn visibility estimation, transmittance, dark channel, and dark matrix are often considered together because they capture complementary aspects of atmospheric scattering and scene depth. The transmittance map shows how much light reaches the camera without being scattered, which is directly related to how the scene can be seen. The dark channel finds areas that are affected by haze by using the DCP assumption that at least one colour channel in a local patch has low intensity even though there is no haze. This indicator provides a qualitative clue about how much haze is present. The dark matrix improves this representation even more by keeping spatial coherence and texture consistency, which cuts down on artefacts caused by changes in light or surface. By combining these three physically interpreted and visually robust estimates of visibility, the overall estimate becomes more accurate and reliable across a wider range of lighting and atmospheric conditions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Visibility via DL Models\u003c/h2\u003e \u003cp\u003eStudies [\u003cspan additionalcitationids=\"CR24\" citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] have applied CNNs for single-image visibility estimation, but missing domain-specific features have hampered their accuracy. Integrating engineered features like brightness improved results [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], though such features lacked specificity to visibility. Despite usingstructural similarity (SSIM) across pairs of hazy and clear images, HazDesNet[\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e] ran into complications with scalability because of data limitations. Despite limitations in the dataset and the model's architecture, a CNN model trained on Korean CCTV data attained 84% accuracy [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Complicating matters further, a hybrid DCNN-SVM model [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e] only achieved limited success. TVRNet[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e] provided a highly effective trainable end-to-end CNN model for estimating fog density, while a PCA-DBN model [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]achieved 79% accuracy in predicting visibility trends. Webcam-based estimation was made possible by VisNet[\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]connecting three CNN streams; however, errors were introduced by manual sub region selection [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. While DL approaches show more accuracy and resilience, classic methods like vanishing points and DCP have their uses, as shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eHence, this study focuses on DL-based visibility estimation.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eOverview and comparison of visibility estimation strategies.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStrategy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTechnique of Analysis\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMajor Achievements\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLimitations/Setbacks\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVanishing Point\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEstimate visibility using geometric cues from convergent lines.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e- In structured scenarios, it is both simple and effective.\u003c/p\u003e \u003cp\u003e- Quick and easy method.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-Restricted to scenes with distinct linear patterns.\u003c/p\u003e \u003cp\u003e- Noise and obstructions are problems\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDCP Theory\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUtilizes information on color, polarization, and depth.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e- Improves accuracy by combining several cues.\u003c/p\u003e \u003cp\u003e- Functions well in different lighting conditions.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e- Specific sensors are needed.\u003c/p\u003e \u003cp\u003e- Processing is time-consuming.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNeural networks that have been trained using image datasets are utilized.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e- Superb precision and applicability.\u003c/p\u003e \u003cp\u003e-Manages intricate scenes.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-Requires big datasets with labels. \u003c/p\u003e \u003cp\u003e-Training and inference are expensive.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"3 Materials","content":"\u003cp\u003ePublicly available real-world time-series datasets for visibility estimation using images are few, and none of them focus on RVR estimation using images. To fill this void, three specialized datasets were created utilizing optical surveillance data collected from IP cameras: Dataset I, Dataset II, and Dataset III. The relevant standard visibility and RVR values are tagged on each image using a co-located FSM, which ensures accurate ground-truth references.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eThe Meteorological Park on Runway 25 of Jay Prakash Narayan International (JPNI) Airport, Patna, was the site of 8,820 daytime pictures that make up Dataset I. The photographs cover the whole surveillance window, which begins at 0600 IST and ends at 1800 IST.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eFrom 1800 IST to 0600 IST, 4,410 photos were taken at night from the same location for Dataset II.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eA total of 13,230 photos were acquired over an intensive surveillance period from December 26, 2023, to January 31, 2024, for Dataset III, which includes both day and night situations.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe experimental setup is presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, which is located in the Meteorological Park on Runway 25 at JPNI Airport in Patna (ICAO: VEPT; 25.5947\u0026deg; N, 85.0908\u0026deg; E). Part of the system is a weather sensor (FSM PWD22/52 Present Weather Detector) and an infrared (IR) fixed bullet IP camera (Hicks Vision) with full high definition (HD). The IP camera, which is equipped with CMOS sensors and infrared LEDs for night vision, was set up to capture images at 1-minute intervals during low-visibility events between December 25, 2023, and January 31, 2024. It operates at a frame rate of 25 fps and has the ability to pan and pivot at 355\u0026deg; and 90\u0026deg;, respectively. The PWD22/52 scatterometer is a multivariable optical sensor that measures visibility, RVR, and weather conditions by utilizing 45\u0026deg; forward light scattering from atmospheric particulates. It has a sampling volume of 0.1 litres and an integrated background luminance sensor. This co-located arrangement facilitated the synchronized acquisition of visual and meteorological data, thereby establishing a solid foundation for the estimation of visibility and RVR in various fog and illumination conditions.\u003c/p\u003e \u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e(a\u0026ndash;c) illustrates representative sample images from Datasets I, II, and III. The distribution of visibility ranges (\u0026lt;\u0026thinsp;200 m, 200\u0026ndash;500 m, 500\u0026ndash;1000 m, and \u0026gt;\u0026thinsp;1000 m) is depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e(d) to further elucidate the composition and characteristics of these datasets. This visualization emphasizes the balance and variability of the dataset across various visibility conditions, providing insight into its overall suitability for model training and evaluation, as well as its diversity and coverage.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"4 Methodology","content":"\u003cp\u003eCNNs capture low-level visual features, such as edges and textures, while ViT model complex spatial relationships within images by employing multi-head self-attention. Standard Scaler was employed to perform exhaustive data pre-processing, which included image normalization, resizing, and feature scaling, to ensure numerical consistency across datasets prior to model training. Furthermore, to guarantee the reliability and quality of the training data, missing or invalid meteorological records were interpolated and filtered to remove anomalies. The DDT Matrix augments spatial awareness by encoding depth, distance, and temporal information. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e depicts the comprehensive framework, which includes real-time CCTV footage, meteorological data, and features generated from images. This multi-stream fusion method effectively integrates visual and contextual meteorological data, achieving strong performance across diverse illumination and atmospheric conditions.\u003c/p\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Dark Channel Prior Framework\u003c/h2\u003e \u003cp\u003eDCP is a widely used image dehazing approach based on empirical observations. A study of 5,000 clear outdoor images found that each local patch typically contains at least one pixel with a near-zero value in one Red, Green, Blue (RGB) channel [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e] .\u003c/p\u003e \u003cp\u003eThe dark channel \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{J}^{dark}\\left(x\\right)\\)\u003c/span\u003e\u003c/span\u003e is defined as:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:{J}^{dark}\\left(x\\right)={\\text{m}\\text{i}\\text{n}\\left[\\text{min}\\left({J}^{c}\\left(y\\right)\\right)\\right]}_{y\\in\\:{\\Omega\\:}\\left(x\\right)\\:cϵ[r,g,b]}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eFor a haze-free RGB image \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{J}^{c}\\)\u003c/span\u003e\u003c/span\u003ecaptured on a sunny day, the dark channel \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{J}^{dark}\\)\u003c/span\u003e\u003c/span\u003e, calculated over local regions \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\Omega\\:}\\left(x\\right)\\)\u003c/span\u003e\u003c/span\u003e, exhibits exceedingly low intensity values\u0026mdash;approaching zero\u0026mdash;in illuminated, non-sky regions. The result is expressed as \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{J}^{dark}\\)\u003c/span\u003e\u003c/span\u003e\u0026asymp;0 in Eq.\u0026nbsp;(\u003cspan refid=\"Equ2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:{J}^{dark}\\to\\:0$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Atmospheric Scattering Framework\u003c/h2\u003e \u003cp\u003eThe atmospheric model treats the brightest point in the dark channel as a fixed light source L to guide the dehazing process [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e] (shown in Eq.\u0026nbsp;(3)).\u003c/p\u003e \u003cp\u003eI(x)\u0026thinsp;=\u0026thinsp;R(x).τ(x)\u0026thinsp;+\u0026thinsp;L(1-τ(x)) (3)\u003c/p\u003e \u003cp\u003eThe observed intensity I(x) at any given pixel x in the hazy image is a combination of the actual scene radiance R(x), which is the clear image, and the global atmospheric light L, modulated by the transmission factor τ(x). The transmission τ(x) indicates the portion of scene radiance that reaches the camera without being scattered or absorbed by the medium.\u003c/p\u003e \u003cp\u003eFor practical colour processing, the haze model is reformulated per channel by normalizing with atmospheric light L\u003csup\u003ec\u003c/sup\u003e, yielding:\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:\\frac{{I}^{c}\\left(x\\right)}{{L}^{c}}=\\tau\\:\\left(x\\right)\\frac{{R}^{c}\\left(x\\right)}{{L}^{c}}+1-\\tau\\:\\left(x\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eEquation (\u003cspan refid=\"Equ4\" class=\"InternalRef\"\u003e5\u003c/span\u003e) is derived by combining the normalized form (Eq.\u0026nbsp;\u003cspan refid=\"Equ3\" class=\"InternalRef\"\u003e4\u003c/span\u003e) with the original haze model (Eq.\u0026nbsp;3), enabling clearer estimation of scene radiance and transmittance [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e].\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:\\stackrel{\\sim}{\\tau\\:\\left(x\\right)}=1-{min}_{y\\in\\:{\\Omega\\:}\\left(x\\right)}\\left({min}_{c}\\frac{{R}^{c}\\left(y\\right)}{{L}^{c}}\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eEquation (\u003cspan refid=\"Equ4\" class=\"InternalRef\"\u003e5\u003c/span\u003e) estimates transmittance τ(x), but integrating it with the DCP method for dehazing can result in loss of depth cues in the image [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eTo preserve depth perception, a weighting factor w (0\u0026thinsp;\u0026le;\u0026thinsp;w\u0026thinsp;\u0026lt;\u0026thinsp;1) is introduced, allowing distant objects to retain slight haze, as shown in Eq.\u0026nbsp;(\u003cspan refid=\"Equ5\" class=\"InternalRef\"\u003e6\u003c/span\u003e).\u003cdiv id=\"Equ5\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$\\:\\stackrel{\\sim}{\\tau\\:\\left(x\\right)}=1-w{\\:min}_{y\\in\\:{\\Omega\\:}\\left(x\\right)}\\left({min}_{c}\\frac{{R}^{c}\\left(y\\right)}{{L}^{c}}\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eThe transmittance map of an RGB image is consistent across all three channels [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. When ambient light is uniformly distributed, the function τ(x) can be calculated as shown in Eq.\u0026nbsp;(\u003cspan refid=\"Equ6\" class=\"InternalRef\"\u003e7\u003c/span\u003e).\u003cdiv id=\"Equ6\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$$\\:\\tau\\:\\left(x\\right){=e}^{-\\beta\\:d\\left(x\\right)}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e7\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eIn this context, β signifies the atmospheric attenuation coefficient, while d(x) indicates the depth at pixel x.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e4.3 Depth and Transmission Estimation via DDT: A Matrix-Based Method\u003c/h2\u003e \u003cp\u003eDue to its excellent haze responsiveness, the DCP plays a pivotal role in dehazing and visibility estimation. It helps derive the transmittance map via the atmospheric scattering model. Monodepth2 [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]\u0026mdash;a self-supervised monocular method\u0026mdash;estimates depth to overcome the lack of depth data in typical surveillance. Eq.\u0026nbsp;(\u003cspan refid=\"Equ6\" class=\"InternalRef\"\u003e7\u003c/span\u003e) captures the relationship between scene depth and transmittance. Eq.\u0026nbsp;(\u003cspan refid=\"Equ6\" class=\"InternalRef\"\u003e7\u003c/span\u003e) highlights the link between scene depth and transmittance. Testing shows that depth, dark channel, and transmittance matrices effectively reflect changes in pixel appearance under various lighting and visibility conditions, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eKey aspects for visibility estimates are the dark channel, transmittance, and depth matrices, as demonstrated experimentally, because their pixel value distributions change significantly with visibility conditions. The innovative DDT method expands upon these principles by merging the three matrices into a single 3D stream matrix that estimates RVR and visibility.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e4.4 Proposed Model\u003c/h2\u003e \u003cp\u003eThis study introduces a new approach to visibility and RVR estimation. It involves combining CNN and transformer-based models with a 3D multi-feature DDT matrix that includes depth, dark channel, and transmittance matrices. This method uses state-of-the-art ML approaches to combine important image features, with or without meteorological factors, to improve the accuracy of estimates.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe DDT matrix fuses three critical components. The dark channel's transmittance matrix draws attention to meteorological interference like fog or haze. The Dark Channel Matrix identifies low-intensity pixels across colour channels that indicate reduced visibility. The depth matrix captures the spatial layout of the scene, reflecting object distances and how visibility changes with distance.\u003c/p\u003e \u003cp\u003eTo effectively capture both local details and global context from the images, a hybrid model combining CNNs and transformers is employed. CNNs are proficient at extracting low-level features like edges and textures but have limitations in modelling complex spatial relationships. Transformers complement CNNs by capturing these intricate spatial dependencies. The specific CNN architecture used in this work is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eBy associating haze intensity, depth signals, and scene composition, the ViT effectively models long-range dependencies and global contextual relationships, rendering it highly suitable for the estimation of visibility and Runway RVR. The proposed ViT architecture is illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, which involves the linear projection and positional embedding of image fragments prior to their processing by a Transformer encoder. The model is capable of capturing a wide range of visual dependencies across haze gradients and depth layers by focusing on multiple spatial regions simultaneously through the multi-head self-attention mechanism in each encoder layer. The feature representation is refined by normalization and Multi-layer Perceptron layers, while each attention head computes query, key, and value representations to accentuate pertinent spatial features. This hybrid design combines ViT's global attention-based modelling with CNN-derived local features to improve the accuracy and robustness of visibility and RVR estimation.\u003c/p\u003e \u003cp\u003eFeatures extracted from the CNN, ViT, and DDT matrix are combined into a single vector and input to regression models for visibility and RVR estimation. Because of their robust estimations and capacity to manage high-dimensional data, RF regressor are commonly used. With these combined attributes, the models are trained to provide reliable visibility estimates in a variety of environments. The detailed workflow is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"5 Analysis and Results of the Experiment","content":"\u003cp\u003eThe article shows that when meteorological data is added to CNNs and ViT with the DDT matrix, the results are better than when the DDT matrix is used alone or separately for visibility and RVR estimates. The experiments provide a thorough performance evaluation with theoretical insights by comparing the proposed method to existing leading algorithms. Several ML regression models utilize the integrated feature set, which includes meteorological parameters, for visibility estimation. Figure\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e shows the general workflow of the suggested hybrid CNN-ViT architecture for accurate visibility and RVR estimation, while Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e details the software and hardware setup, including CPU, GPU, RAM, storage, operating system, programming tools, and DL frameworks. Using a CNN branch for local spatial feature extraction, a ViT encoder for global contextual learning, and a 3D feature stream that incorporates physical factors like transmittance, dark channel, and depth are the three complementing components that constitute the system.\u003c/p\u003e \u003cp\u003eThe CNN stream captures edges, textures, and hazy density patterns at a finer level through successive convolution and pooling layers. Next, a global average pooling layer is used to aggregate these localized characteristics. Then, to improve feature selectivity, dense and multi-head attention layers are applied. The model may pay attention to numerous spatial regions at once because of the eight attention heads used by the Transformer encoder component (key dimension\u0026thinsp;=\u0026thinsp;64). Using the query-key-value approach, it improves visibility and depth comprehension by learning inter-patch linkages and identifying the most haze-affected locations.\u003c/p\u003e \u003cp\u003eTo achieve optimal regression accuracy, the network is trained using the Adam optimizer with the following parameters: a learning rate of 1\u0026times;10⁻⁴, a batch size of 32, and 50 epochs. For the final visibility prediction, the features derived from CNN and ViT are combined and sent to an ensemble regressor, which consists of RF and XGBoost. This hybrid architecture allows for strong and versatile performance in different lighting and weather circumstances by balancing the integration of local texture awareness, global scene interpretation, and physical haze factors.\u003c/p\u003e \u003cp\u003eLearning rate, batch size, number of epochs, optimizer, and loss function are some of the critical training parameters shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e, which also provides an overview of the network architecture. CNNs are adept at picking up on fine-grained details like textures and edges, but they have a challenging time processing more abstract visual data or figuring out where things are in space.\u003c/p\u003e \u003cp\u003eIn contrast, transformers [\u003cspan additionalcitationids=\"CR39\" citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]capture global dependencies across an image using self-attention, making them more effective for large-scale datasets where contextual relationships are important.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSystem Specifications and Development Frameworks\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003eSystem Specifications\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e \u003cp\u003eDevelopment Framework\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCPU\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eIntel(R) Core(TM)
[email protected] GHz,12thGen\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eSystem Configuration\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eWindow 10 64 bit.\u003c/p\u003e \u003cp\u003ePython 3.11.5, tensorflow: 2.15.0\u0026thinsp;+\u0026thinsp;Keras-applications: 1.0.8\u0026thinsp;+\u0026thinsp;opencv-python: 4.8.1.78\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRAM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e16.0 GB (15.7 GB usable)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGPU\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUHD Graphics 770(7.8GB)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eIn contrast, transformers [\u003cspan additionalcitationids=\"CR39\" citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]capture global dependencies across an image using self-attention, making them more effective for large-scale datasets where contextual relationships are important.\u003c/p\u003e \u003cp\u003eA novel end-to-end paradigm for visibility and RVR estimation integrates CNNs, VIT, and 3D feature streams. CNNs extract detailed features, transformers model long-range dependencies, and the DDT Matrix enhances spatial awareness with depth, distance, and time-related data.\u003c/p\u003e \u003cp\u003eThis hybrid approach improves estimation accuracy by integrating local feature extraction with environmental context. The model functions with or without meteorological data; hence, it improves accuracy when such data is accessible.\u003c/p\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e5.1 Effective Uses of the DDT Matrix\u003c/h2\u003e \u003cp\u003eThe model estimates visibility and RVR by using dark channel, transmittance, and depth matrices, with each component being independently tested to determine its contribution. A modified CNN and ViT combination is used as a feature extractor, with the DDT matrix central to the estimation process.The original image and the DDT matrix are used in a baseline experiment for comparison analysis.\u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows how well different input qualities work. R\u0026sup2; = 0.96 for visibility and 0.97 for RVR are achieved using the dark channel matrix, which represents the intensity of the fog; however, these values are marginally lower than the original image. The transmittance matrix, which represents the concentration of particles in the atmosphere, is also quite accurate. Accuracy is enhanced to R2\u0026thinsp;=\u0026thinsp;0.97 for visibility and 0.98 for RVR by using the depth matrix, which records spatial information. The DDT matrix, a fusion of all three, provides the highest accuracy (R\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;=\u0026thinsp;0.97 for visibility and 0.98 for RVR) with the lowest RMSE and MAE, proving it the most reliable method for visibility and RVR estimation and improving precision and operational utility in aviation meteorology.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparative Analysis of Feature Matrices for Visibility and RVR Estimation\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eInput Source\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBase Image\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDark Channel Feature\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTransmittance Component\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDepth Information\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003e3D DDT Composite\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eR\u003csup\u003e2\u003c/sup\u003e Error (Visibility)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRMSE (Visibility)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e126.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e150\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e150.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e121.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e117\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMAE (Visibility)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e71.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e85.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e85.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e68.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e68.81\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eR\u003csup\u003e2\u003c/sup\u003e (RVR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRMSE (RVR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e114.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e139.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e139.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e113.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e112\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMAE (RVR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e62.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e76.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e76.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e57.03\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e57.03\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e5.2 Comparative Analysis\u003c/h2\u003e \u003cp\u003eThe study comprehensively evaluated CNN and ViT feature extractors for visibility and RVR estimation, analysing the influence of atmospheric conditions. To guarantee reliable and operationally applicable results, the study contrasted the suggested method with prior visibility estimating techniques [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e], [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. Key performance metrics, including RMSE, MAE, and R\u0026sup2;, were used to assess visibility and RVR accuracy. A full comparison analysis for Dataset III under both day and night situations is provided in Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e (a), which exhibits visibility estimation performance, and Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e (b), which emphasizes RVR estimation.\u003c/p\u003e \u003cp\u003eThe complete approach shows how the proposed strategy works better in real-world applications and gives useful insights into how meteorological data might improve model precision.\u003c/p\u003e \u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e results demonstrate the significant impact of integrating weather parameters and the DDT matrix on visibility and RVR estimation. With low error rates (26.71 for visibility and 23.47 for RVR, respectively) and high R\u0026sup2; values (0.99), the most accurate estimations are produced by the model that contains both the DDT matrix and meteorological parameters, in addition to CNN and ViT extractors. The significance of meteorological data is underscored when we exclude weather parameters, which raises the MSE to 13,689.9 for visibility and 9,654.2 for RVR, with reduced R\u0026sup2; values.\u003c/p\u003e \u003cp\u003eRemoving the weather factors and DDT matrix negatively impacts performance. The visibility MSE is 32,108, and the RVR MSE is 19,204, and the R\u0026sup2; scores are the lowest at 0.94 and 0.96, respectively. The CNN \u0026amp;ViT model with weather data but without the DDT matrix still performs well, achieving a MSE of 19.45 for visibility and 17.42 for RVR, indicating the DDT matrix's enhancement but the crucial role of weather parameters.\u003c/p\u003e \u003cp\u003eWhen applied to aviation meteorology, the combined use of hybrid DL, DDT feature extraction, and weather data greatly enhances visibility and RVR estimation, surpassing that of conventional approaches and guaranteeing more accurate estimation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"6 Discussion","content":"\u003cp\u003eThe advancement of visibility estimation techniques has shifted from conventional ML methods to advanced DL frameworks in the past ten years. Initial studies utilized SVM and Multi-Output Support Vector Regression (MSVR) frameworks, wherein handcrafted features were identified and input into conventional ML methods. [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e] illustrated this methodology utilizing a VGG16-MSVR model that attained 85% classification accuracy on structured datasets. [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e] combined AlexNet and Deep Convolutional Neural Networks (DCNN) with SVM classifiers, getting 99.02% accuracy on the FROSI dataset. These methods established fundamental capabilities in automated visibility evaluation; nevertheless, their dependence on static, humanly designed features limited their flexibility to variable meteorological conditions and different contextual situations. The advent of end-to-end deep learning systems signified a substantial methodological transition towards automatic feature extraction from raw images. CNN methodologies obviated the necessity for human feature engineering by acquiring hierarchical spatial representations via consecutive convolutional and pooling procedures. [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e] employed AlexNet, ResNet, and DenseNet architectures, achieving a classification accuracy of 99.02%, whereas [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e] documented an accuracy of 98.3% with DenseNet variations. Notwithstanding these enhancements in classification efficacy, pure CNN architectures predominantly identify limited spatial patterns and short-range relationships within the convolutional kernels' receptive field. This architectural feature restricts their ability to represent extensive atmospheric contexts, long-distance spatial linkages, and temporal dynamics that define real-world visibility degradation occurrences. The acknowledgment of these constraints prompted the creation of hybrid architectures that combine spatial and temporal modelling abilities. [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e] introduced a CNN-LSTM system that integrates convolutional feature extraction with Long Short-Term Memory networks to effectively capture spatial deterioration patterns and temporal atmospheric shifts. This design attained a Mean Squared Error (MSE) of 19.45 in regression-based visibility estimation tasks, indicating significant advancement over prior methodologies. The CNN component extracts local degradation features, including texture loss, contrast reduction, and scattering effects, while the LSTM network models sequential dependencies across consecutive frames, allowing the system to monitor atmospheric changes such as fog intensification or the onset of precipitation. Comparative hybrid methodologies encompass [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e], who attained an accuracy of 90.4% with a Mean Squared Error (MSE) of 9.6 utilizing a CNN-RNN architecture, and [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e], who realized Root Mean Squared Error (RMSE) values ranging from 6.71 to 8.63 via the integration of satellite imagery, Numerical Weather Prediction (NWP) data, and surface observations. [\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e] created DMRVisNet, which integrated explicit physical degradation models grounded in air scattering theory; yet, this method exhibited constrained efficacy for long-range estimates, with an RMSE of 93.11. The incorporation of weather variables is a crucial element in the precision of visibility estimation, which has been inadequately explored in prior research. [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] performed ablation studies revealing that their CNN-LSTM framework attained a mean squared error (MSE) of 26.71 with solely image-based features, which significantly escalated to MSE values of 13,689.9 and 32,108 upon the complete exclusion of temperature, humidity, and precipitation data from the model input. The results quantify the critical role of atmospheric conditions in enhancing contextual comprehension and numerical precision in visibility evaluation. Meteorological factors offer clear insights into the physical processes that lead to visibility reduction, such as condensation mechanisms in fog, particle concentration in haze, and scattering characteristics during precipitation. Recent studies have examined transformer-based designs utilizing self-attention methods to capture global spatial linkages, free from the locality limits of convolutional processes. [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] created STCN-Net, which integrates Swin Transformer with ResNet18, attaining 97.9% accuracy in visibility classification tasks. [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e] performed a comparative analysis of DenseNet121 and Vision Transformer (ViT) architectures, achieving an accuracy of 96.69% with the transformer-based method. These experiments illustrate the efficacy of attention mechanisms in modelling long-range atmospheric relationships; nevertheless, pure transformer topologies may compromise the local feature sensitivity that CNNs offer for identifying fine-grained degradation patterns at the pixel level. The proposed paradigm mitigates highlighted constraints by integrating multimodal aspects, which encompass local spatial characteristics, global contextual representations, explicit physical modelling, and environmental parameters. The architecture utilizes CNNs for extracting local degradation patterns, vision transformers for capturing global atmospheric context via self-attention, a three-dimensional DDT matrix that encodes depth, dark channel, and transmittance information grounded in atmospheric physics, as well as meteorological variables such as temperature, wind speed, and atmospheric pressure. A Random Forest regressor does final fusion and estimation by merging diverse information sources. The quantitative assessment results in a Root Mean Square Error (RMSE) of 117 and a Mean Absolute Error (MAE) of 68.81 across multi-condition datasets that include daytime, night-time, and diverse weather conditions. The baseline evaluation with solely physical attributes from the DDT matrix, devoid of learned components, yields an RMSE of roughly 150, signifying that the incorporation of deep learning features with physical modelling results in a 22% decrease in estimation error. Cross-dataset examination demonstrates uniform performance across several environmental contexts, encompassing urban settings rich in structural references and rural landscapes with little visual anchoring. This generalization capability mitigates the shortcomings identified in previous approaches that have shown performance decline under certain environmental circumstances or visibility ranges. The framework's capacity to distinguish between distinct degradation mechanisms\u0026mdash;such as fog resulting from condensation, haze due to particulate suspension, and visibility impairment during precipitation\u0026mdash;arises from the amalgamation of learned features that encapsulate visual appearance patterns with meteorological variables that represent fundamental physical processes. This study used a regression-based methodology instead of discrete categorization. Previous studies predominantly employed classification frameworks that assign visibility observations to predefined categorical ranges. [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e] all presented accuracy measures derived from categorical visibility bins. The regression method facilitates ongoing visibility and runway visual range assessment, delivering detailed numerical values instead of categorical classifications. This capability is particularly relevant for operational applications in aviation, maritime navigation, and transportation management, where safety regulations and decision-making protocols require precise continuous measurements instead of categorical ranges. Direct numerical comparisons among research necessitate the consideration of methodological discrepancies in evaluation processes, dataset attributes, and performance indicators. Research indicating classification accuracy utilized distinct visible categories and assessed correct categorization rates, while regression-based methodologies applied error metrics such as MSE, RMSE, and MAE to quantify continuous prediction accuracy. Variations in geographic location, temporal coverage, weather diversity, and picture capture conditions in datasets further confound cross-study comparisons. The improvement from an RMSE of about 150 with only physical features to an RMSE of 117 through full multimodal integration shows a measurable improvement. However, absolute performance metrics depend on the dataset's visibility distributions and measurement ranges.\u003c/p\u003e"},{"header":"7 Conclusion","content":"\u003cp\u003eThis study presents a hybrid framework that integrates designed and automatically learned features for the estimation of visibility and RVR. The model combines ViT and CNNs with a 3D multi-channel feature matrix, known as the DDT matrix, which includes depth, dark channel, and transmittance elements. This design facilitates the concurrent acquisition of local deterioration patterns and global contextual information across various weather circumstances.\u003c/p\u003e \u003cp\u003eThe system integrates meteorological data (temperature, wind, atmospheric pressure, etc.) to distinguish between various vision degradation mechanisms, such as the onset of fog and precipitation-induced scattering. The quantitative assessment indicates that the hybrid CNN-ViT-DDT model attains an RMSE of 117 and an MAE of 68.81, signifying a 22% enhancement compared to standalone physical feature methods (dark channel and transmittance alone: RMSE about 150).\u003c/p\u003e \u003cp\u003eA Random Forest (RF) regressor serves as the integration layer, amalgamating multi-stream features from CNN, ViT, DDT components, and meteorological data. A comparative investigation of three real-world datasets (daytime, night-time, and mixed settings) reveals continuous performance benefits for conventional ML and single-stream DL benchmarks. The framework maintains estimation accuracy across diverse environmental contexts, with minimal visual indicators to urban settings marked by abundant structural elements.\u003c/p\u003e \u003cp\u003eThe experimental findings demonstrate that the integration of multimodal features significantly enhances visibility and RVR estimation accuracy under adverse situations such as dense fog, heavy precipitation, and low-light environments. The findings indicate practical relevance for aviation safety systems, transportation management infrastructure, and atmospheric monitoring networks, where accurate visibility assessment is essential for operational decision-making.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eConflict of Interest\u003c/h2\u003e \u003cp\u003eThe author affirms that there are no conflicts of interest related to the publication of this article.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eCompeting Interests\u003c/strong\u003e \u003cp\u003eThe authors confirm that they have no competing interests associated with this study.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eCompliance with Ethical Standards\u003c/h2\u003e \u003cp\u003eThe authors declare that the research was conducted without a commercial or financial relationship that could be interpreted as a potential conflict of interest.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThe authors assert that they did not obtain any financial support, grants, or assistance during the development of the work.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eConceptualization, A.S.; Data Curation, A.S.; Formal Analysis, A.S.: Methodology, A.S.; Software, A.S.; Validation, A.S., Visual-ization , A.S.; Writing\u0026mdash;original draft , A.S; writing\u0026mdash;revised draft, A.S.,B.C.S.; Supervision, B.C.S.; all authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThe authors express gratitude to the committed officials of the India Meteorological Department for their proactive assistance in instrument deployment and maintaining real-time data acquisition systems. Special thanks to Mantosh Kumar, Deepak Kumar Singh, and their team for maintaining the data acquisition system. AS also acknowledges the Director General of Meteorology, M. Mohapatra, for his continued motivation and support.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eKim, K.W.: The comparison of visibility measurement between image-based visual range, human eye-based visual range, and meteorological optical range. Atmos. Environ. \u003cb\u003e190\u003c/b\u003e, 74\u0026ndash;86 (2018). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1016/j.atmosenv.2018.07.020\u003c/span\u003e\u003cspan address=\"10.1016/j.atmosenv.2018.07.020\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShankar, A., Sahana, B.C.: Early warning of low visibility using the ensembling of machine learning approaches for aviation services at Jay Prakash Narayan International (JPNI) Airport Patna. SN Appl. Sci. \u003cb\u003e5\u003c/b\u003e, 132 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s42452-023-05350-7\u003c/span\u003e\u003cspan address=\"10.1007/s42452-023-05350-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eInternational Civil Aviation Organization (ICAO): Manual of Runway Visual Range Observing and Reporting Practices. 105: (2005)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShankar, A., Sahana, B.C.: Efficient prediction of runway visual range by using a hybrid CNN-LSTM network architecture for aviation services. Theor. Appl. Climatol. \u003cb\u003e155\u003c/b\u003e, 2215\u0026ndash;2232 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s00704-023-04751-3\u003c/span\u003e\u003cspan address=\"10.1007/s00704-023-04751-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShankar, A.: The Impacts of Low Visibility on the Aviation Services of Patna Airport During the Period from 2016 to 2023. J. Airl. Oper. Aviat. Manag. \u003cb\u003e3\u003c/b\u003e, 46\u0026ndash;57 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.56801/jaoam.v3i1.5 3\u003c/span\u003e\u003cspan address=\"10.56801/jaoam.v3i1.5 3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShankar, A., Sahana, B.C., Singh, S.P.: Prediction of Low-Visibility Events by Integrating the Potential of Persistence and Machine Learning for Aviation Services. Mausam. \u003cb\u003e75\u003c/b\u003e, 977\u0026ndash;992 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.54302/mausam.v75i4.6624\u003c/span\u003e\u003cspan address=\"10.54302/mausam.v75i4.6624\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShankar, A., Kumar, A., Sinha, V.: Machine Learning Approach in the Prediction of Fog: An Early Warning System. Mausam. \u003cb\u003e75\u003c/b\u003e, 1039\u0026ndash;1050 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.54302/mausam.v75i4.5919\u003c/span\u003e\u003cspan address=\"10.54302/mausam.v75i4.5919\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhai, B., Wang, Y., Wu, B.: An ensemble learning method for low visibility prediction on freeway using meteorological data. IET Intell. Transp. Syst. \u003cb\u003e17\u003c/b\u003e, 2237\u0026ndash;2250 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1049/itr2.12404\u003c/span\u003e\u003cspan address=\"10.1049/itr2.12404\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu, Q., Su, W., Qi, Y., Tao, W., Pollefeys, M.: Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks. Int. J. Comput. Vis. \u003cb\u003e130\u003c/b\u003e, 2040\u0026ndash;2059 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11263-022-01628-2\u003c/span\u003e\u003cspan address=\"10.1007/s11263-022-01628-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee, J.Y., DeGol, J., Zou, C., Hoiem, D.: PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility. Proc. IEEE Int. Conf. Comput. Vis. 6138\u0026ndash;6147 (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICCV48922.2021.00610\u003c/span\u003e\u003cspan address=\"10.1109/ICCV48922.2021.00610\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePalvanov, A., Im Cho, Y.: DHCNN for visibility estimation in foggy weather conditions. Proc. \u0026ndash;\u0026thinsp;2018 Jt. 10th Int. Conf. Soft Comput. Intell. Syst. 19th Int. Symp. Adv. Intell. Syst. SCIS-ISIS 2018. 240\u0026ndash;243 (2018). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/SCIS-ISIS.2018.00050\u003c/span\u003e\u003cspan address=\"10.1109/SCIS-ISIS.2018.00050\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, J., Dowman, I., Li, S., Li, Z., Madden, M., Mills, J., Paparoditis, N., Rottensteiner, F., Sester, M., Toth, C., Trinder, J., Heipke, C.: Information from imagery: ISPRS scientific vision and research agenda. ISPRS J. Photogramm Remote Sens. \u003cb\u003e115\u003c/b\u003e, 3\u0026ndash;21 (2016). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1016/j.isprsjprs.2015.09.008\u003c/span\u003e\u003cspan address=\"10.1016/j.isprsjprs.2015.09.008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePapari, G., Petkov, N.: Edge and line oriented contour detection: State of the art. Image Vis. Comput. \u003cb\u003e29\u003c/b\u003e, 79\u0026ndash;103 (2011). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/\u003c/span\u003e\u003cspan address=\"https://doi.org/https://doi.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.imavis.2010.08.009\u003c/span\u003e\u003cspan address=\"10.1016/j.imavis.2010.08.009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, Y., Wu, Y., Chen, H.: Research progress of visual simultaneous localization and mapping based on deep learning. Yi Qi Yi Biao Xue Bao/Chinese J. Sci. Instrum. \u003cb\u003e44\u003c/b\u003e, 214\u0026ndash;241 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.19650/j.cnki.cjsi.J2311081\u003c/span\u003e\u003cspan address=\"10.19650/j.cnki.cjsi.J2311081\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGiyenko, A., Palvanov, A., Cho, Y.: Application of convolutional neural networks for visibility estimation of CCTV images. Int. Conf. Inf. Netw. 2018-Janua. 875\u0026ndash;879 (2018). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICOIN.2018.8343247\u003c/span\u003e\u003cspan address=\"10.1109/ICOIN.2018.8343247\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHemalatha, J., Roseline, S.A., Geetha, S., Kadry, S., Damaševičius, R.: An efficient densenet-based deep learning model for Malware detection. Entropy. \u003cb\u003e23\u003c/b\u003e, 1\u0026ndash;23 (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/e23030344\u003c/span\u003e\u003cspan address=\"10.3390/e23030344\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChaabani, H., Kamoun, F., Bargaoui, H., Outay, F., Yasar, A.U.H.: A Neural network approach to visibility range estimation under foggy weather conditions. Procedia Comput. Sci. \u003cb\u003e113\u003c/b\u003e, 466\u0026ndash;471 (2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.procs.2017.08.304\u003c/span\u003e\u003cspan address=\"10.1016/j.procs.2017.08.304\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, J., Chang, X., Li, Y., Ji, Y., Fu, J., Zhong, J.: STCN-Net: A Novel Multi-Feature Stream Fusion Visibility Estimation Approach. IEEE Access. \u003cb\u003e10\u003c/b\u003e, 120329\u0026ndash;120342 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ACCESS.2022.3218456\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2022.3218456\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBae, T.W., Han, J.H., Kim, K.J., Kim, Y.T.: Coastal visibility distance estimation using dark channel prior and distance map under sea-fog: Korean Peninsula case. Sens. (Switzerland). \u003cb\u003e19\u003c/b\u003e (2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/s19204432\u003c/span\u003e\u003cspan address=\"10.3390/s19204432\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHe, Y., Ding, J., Teng, H., Han, X., Chen, Y., Zhou, W.: Visibility detection and prediction of foggy highway based on lane line detection and Winters additive model. In: 2021 40th Chinese Control Conference (CCC). pp. 7254\u0026ndash;7259 (2021)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGraves, N., Newsam, S.: Using visibility cameras to estimate atmospheric light extinction. IEEE Work. Appl. Comput. Vision, WACV 2011. 577\u0026ndash;584 (2011). (2011). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/WACV.2011.5711556\u003c/span\u003e\u003cspan address=\"10.1109/WACV.2011.5711556\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZou, J.: Visibility detection method based on camera model calibration. Proc. \u0026ndash;\u0026thinsp;2017 4th Int. Conf. Inf. Sci. Control Eng. ICISCE 770\u0026ndash;776 (2017). (2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICISCE.2017.165\u003c/span\u003e\u003cspan address=\"10.1109/ICISCE.2017.165\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOrtega, L.C., Otero, L.D., Solomon, M., Otero, C.E., Fabregas, A.: Deep learning models for visibility forecasting using climatological data. Int. J. Forecast. \u003cb\u003e39\u003c/b\u003e, 992\u0026ndash;1004 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ijforecast.2022.03.009\u003c/span\u003e\u003cspan address=\"10.1016/j.ijforecast.2022.03.009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXiyu, M., Qi, X., Qiang, Z., Junchi, R., Hongbin, W., Linyi, Z.: An Improved Diracnet Convolutional Neural Network for Haze Visibility Detection. In: 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP). pp. 1\u0026ndash;5 (2021)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, J., Zhang, L.: Research on Deep Learning Model of Fog Visibility Estimation Based on CNN. In: 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). pp. 1355\u0026ndash;1359 (2021)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChincholkar, S., Rajapandy, M.: Fog Image Classification and Visibility Detection Using CNN BT - Intelligent Computing, Information and Control Systems. Presented at the (2020)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, H., Shen, K., Yu, P., Shi, Q., Ko, H.: Multimodal Deep Fusion Network for Visibility Assessment with a Small Training Dataset. IEEE Access. \u003cb\u003e8\u003c/b\u003e, 217057\u0026ndash;217067 (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ACCESS.2020.3031283\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2020.3031283\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, J., Min, X., Zhu, Y., Zhai, G., Zhou, J., Yang, X., Zhang, W.: HazDesNet: An End-to-End Network for Haze Density Prediction. IEEE Trans. Intell. Transp. Syst. \u003cb\u003e23\u003c/b\u003e, 3087\u0026ndash;3102 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TITS.2020.3030673\u003c/span\u003e\u003cspan address=\"10.1109/TITS.2020.3030673\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOutay, F., Taha, B., Chaabani, H., Kamoun, F., Werghi, N., Yasar, A.U.H.: Estimating ambient visibility in the presence of fog: a deep convolutional neural network approach. Pers. Ubiquitous Comput. \u003cb\u003e25\u003c/b\u003e, 51\u0026ndash;62 (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s00779-019-01334-w\u003c/span\u003e\u003cspan address=\"10.1007/s00779-019-01334-w\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQin, H., Qin, H.: An End-to-End Traffic Visibility Regression Algorithm. IEEE Access. \u003cb\u003e10\u003c/b\u003e, 25448\u0026ndash;25454 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ACCESS.2021.3101323\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2021.3101323\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, Y., Du, J., Yan, Z., Song, Y., Hua, D.: Atmospheric visibility prediction by using the DBN deep learning modeland principal component analysis. Appl. Opt. \u003cb\u003e61\u003c/b\u003e, 2657\u0026ndash;2666 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1364/AO.449148\u003c/span\u003e\u003cspan address=\"10.1364/AO.449148\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePalvanov, A., Cho, Y.I.: Visnet: Deep convolutional neural networks for forecasting atmospheric visibility. Sens. (Switzerland). \u003cb\u003e19\u003c/b\u003e (2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/s19061343\u003c/span\u003e\u003cspan address=\"10.3390/s19061343\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChoi, W., Park, J., Kim, D., Park, J., Kim, S., Lee, H.: Development of Two-Dimensional Visibility Estimation Model Using Machine Learning: Preliminary Results for South Korea. Atmos. (Basel). \u003cb\u003e13\u003c/b\u003e (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/atmos13081233\u003c/span\u003e\u003cspan address=\"10.3390/atmos13081233\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAmiri, M., Soleimani, S.: A Hybrid Atmospheric Satellite Image-Processing Method for Dust and Horizontal Visibility Detection through Feature Extraction and Machine Learning Techniques. J. Indian Soc. Remote Sens. \u003cb\u003e50\u003c/b\u003e, 523\u0026ndash;532 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s12524-021-01460-0\u003c/span\u003e\u003cspan address=\"10.1007/s12524-021-01460-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHe, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. \u003cb\u003e33\u003c/b\u003e, 2341\u0026ndash;2353 (2011). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TPAMI.2010.168\u003c/span\u003e\u003cspan address=\"10.1109/TPAMI.2010.168\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNayar, S.K., Narasimhan, S.G.: Vision in bad weather. Proc. IEEE Int. Conf. Comput. Vis. 2, 820\u0026ndash;827 (1999). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/iccv.1999.790306\u003c/span\u003e\u003cspan address=\"10.1109/iccv.1999.790306\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNegru, M., Nedevschi, S.: Image based fog detection and visibility estimation for driving assistance systems. Proc. \u0026ndash;\u0026thinsp;2013 IEEE 9th Int. Conf. Intell. Comput. Commun. Process. ICCP 163\u0026ndash;168 (2013). (2013). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICCP.2013.6646102\u003c/span\u003e\u003cspan address=\"10.1109/ICCP.2013.6646102\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVaswani Ashish, S., Noam, P.N., Jakob, U., Llion, J., Gomez Aidan, N., Kaiser Lukasz, I.P.: Attention Is All You Need. Adv. Neural Inf. Process. Syst. \u003cb\u003e30\u003c/b\u003e (2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.48550/arXiv.1706.03762\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.1706.03762\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKhan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in Vision: A Survey. ACM Comput. Surv. \u003cb\u003e54\u003c/b\u003e (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1145/3505244\u003c/span\u003e\u003cspan address=\"10.1145/3505244\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHan, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y., Yang, Z., Zhang, Y., Tao, D.: A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. \u003cb\u003e45\u003c/b\u003e, 87\u0026ndash;110 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TPAMI.2022.3152247\u003c/span\u003e\u003cspan address=\"10.1109/TPAMI.2022.3152247\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLo, W.L., Zhu, M., Fu, H.: Meteorology visibility estimation by using multi-support vector regression method. J. Adv. Inf. Technol. \u003cb\u003e11\u003c/b\u003e, 40\u0026ndash;47 (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.12720/jait.11.2.40-47\u003c/span\u003e\u003cspan address=\"10.12720/jait.11.2.40-47\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXun, L., Zhang, H., Yan, Q., Wu, Q., Zhang, J.: VISOR-NET: Visibility Estimation Based on Deep Ordinal Relative Learning under Discrete-Level Labels. Sensors. \u003cb\u003e22\u003c/b\u003e, 1\u0026ndash;20 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/s22166227\u003c/span\u003e\u003cspan address=\"10.3390/s22166227\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShankar, A., Sahana, B.C.: System to Estimate Visibility and Runway Visual Range (RVR) from Image Data, (2024)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYou, Y., Lu, C., Wang, W., Tang, C.K.: Relative CNN-RNN: Learning relative atmospheric visibility from images. IEEE Trans. Image Process. \u003cb\u003e28\u003c/b\u003e, 45\u0026ndash;55 (2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TIP.2018.2857219\u003c/span\u003e\u003cspan address=\"10.1109/TIP.2018.2857219\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, J., Zhang, L.: Research on Deep Learning Model of Fog Visibility Estimation Based on CNN. IMCEC 2021 - IEEE 4th Adv. Inf. Manag. Commun. Electron. Autom. Control Conf. 1355\u0026ndash;1359 (2021). (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/IMCEC51613.2021.9482258\u003c/span\u003e\u003cspan address=\"10.1109/IMCEC51613.2021.9482258\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYou, J., Jia, S., Pei, X., Yao, D.: DMRVisNet: Deep Multihead Regression Network for Pixel-Wise Visibility Estimation under Foggy Weather. IEEE Trans. Intell. Transp. Syst. \u003cb\u003e23\u003c/b\u003e, 22354\u0026ndash;22366 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TITS.2022.3180229\u003c/span\u003e\u003cspan address=\"10.1109/TITS.2022.3180229\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBouhsine, T., Idbraim, S., Bouaynaya, N.C., Alfergani, H., Ouadil, K.A., Johnson, C.C.: Atmospheric Visibility Image-Based System for Instrument Meteorological Conditions Estimation: A Deep Learning Approach. Proc. \u0026ndash;\u0026thinsp;2022 9th Int. Conf. Wirel. Networks Mob. Commun. WINCOM 2022. 1\u0026ndash;6 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/WINCOM55661.2022.9966454\u003c/span\u003e\u003cspan address=\"10.1109/WINCOM55661.2022.9966454\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Unsectioned Paragraphs","content":"\u003cp\u003e\u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eContributions\u003c/b\u003e \u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e\u003c/p\u003e\u003cp\u003eConceptualization, A.S.; Data Curation, A.S.; Formal Analysis, A.S.: Methodology, A.S.; Software, A.S.; Validation, A.S., Visualization, A.S.; Writing\u0026mdash;original draft, A.S; writing\u0026mdash;revised draft, A.S.,B.C.S.; Supervision, B.C.S.; all authors have read and agreed to the published version of the manuscript.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"signal-image-and-video-processing","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"sivp","sideBox":"Learn more about [Signal, Image and Video Processing](http://link.springer.com/journal/11760)","snPcode":"11760","submissionUrl":"https://submission.nature.com/new-submission/11760/3","title":"Signal, Image and Video Processing","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Transformers, Deep learning, Feature extraction, Multi-parameter fusion, Visibility estimation, and runway visual range","lastPublishedDoi":"10.21203/rs.3.rs-8678337/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8678337/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eEstimating visibility poses significant issues for transportation safety and operational decision-making, especially in severe weather circumstances where image-based evaluation becomes unreliable. Conventional deep learning (DL) models demonstrate limited feature extraction capabilities from compromised images, while physics-based methods require predefined parameters and exhibit inadequate generalization across diverse atmospheric conditions. This study introduces a hybrid architecture that amalgamates various information sources for the continuous assessment of visibility and runway visual range (RVR) from individual images. The proposed architecture includes a three-dimensional feature matrix\u0026mdash;the DDT matrix\u0026mdash;encoding dark channel, depth, and transmittance components based on atmospheric scattering theory. Physically informed features are combined with learned representations obtained from Convolutional Neural Networks (CNNs) for local degradation pattern identification and Vision Transformers (ViT) for global contextual modelling through self-attention mechanisms. Meteorological factors such as temperature, winds, and atmospheric pressure are integrated to furnish environmental context. A random forest regressor executes multimodal fusion and final estimation from these diverse feature streams. The quantitative assessment of three datasets\u0026mdash;Visibility Image Dataset I (daytime), Dataset II (night-time), and Dataset III (mixed climatic conditions)\u0026mdash;results in a Root Mean Squared Error (RMSE) of 117 and a Mean Absolute Error (MAE) of 68.81. This indicates a 22% decrease in error relative to single physical feature methodologies (RMSE\u0026thinsp;\u0026asymp;\u0026thinsp;150). Ablation experiments illustrate the impact of each component on total performance. The approach overcomes shortcomings in current methodologies by integrating local and global feature extraction, including explicit physical models with learned representations, and facilitating continuous regression instead of discrete classification. Cross-dataset validation demonstrates consistent performance across several environmental contexts, encompassing both urban and rural environments with differing availability of reference objects. The findings indicate practical usefulness for aviation safety systems, transportation management infrastructure, and atmospheric monitoring networks that necessitate dependable real-time visibility evaluation under adverse meteorological situations.\u003c/p\u003e","manuscriptTitle":"A Deep Hybrid CNN–ViT Architecture Incorporating Advanced 3D Features for the Estimation of Visibility and Runway Visual Range","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-05 18:55:16","doi":"10.21203/rs.3.rs-8678337/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewersInvited","content":"","date":"2026-01-28T04:06:38+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-01-24T11:06:15+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-01-24T11:05:54+00:00","index":"","fulltext":""},{"type":"submitted","content":"Signal, Image and Video Processing","date":"2026-01-23T10:39:53+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"signal-image-and-video-processing","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"sivp","sideBox":"Learn more about [Signal, Image and Video Processing](http://link.springer.com/journal/11760)","snPcode":"11760","submissionUrl":"https://submission.nature.com/new-submission/11760/3","title":"Signal, Image and Video Processing","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"ea5eab80-d567-4522-9f79-846032391552","owner":[],"postedDate":"February 5th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-02-05T18:55:16+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-05 18:55:16","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8678337","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8678337","identity":"rs-8678337","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.