Data-driven Combination of METAR Observations and CAMS Reanalysis Aerosols to Enhance Satellite Retrieval of Surface Solar Irradiance

preprint OA: closed
Full text JSON View at publisher
Full text 183,013 characters · extracted from preprint-html · click to expand
Data-driven Combination of METAR Observations and CAMS Reanalysis Aerosols to Enhance Satellite Retrieval of Surface Solar Irradiance | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Data-driven Combination of METAR Observations and CAMS Reanalysis Aerosols to Enhance Satellite Retrieval of Surface Solar Irradiance Arindam Roy, Detlev Heinemann, Marion Schroedter-Homscheidt, Jorge Lezaca This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7820256/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 16 Feb, 2026 Read the published version in Scientific Reports → Version 1 posted 16 You are reading this latest preprint version Abstract Accurate solar irradiance forecasts are vital for photovoltaic (PV) power prediction, especially in tropical and subtropical regions affected by dust, wildfire smoke, and pollution. Yet, aerosol detection from satellites is often obstructed by clouds, AErosol RObotic NETwork (AERONET) stations are sparsely distributed, and climatological datasets cannot capture intra-day variability. Global products such as the Copernicus Atmosphere Monitoring Service (CAMS) provide broad coverage but miss local events due to coarse resolution and uncertainties in the underlying emission database. In this study, atmospheric parameters from METeorological Aerodrome Report (METAR) observations and CAMS reanalysis are used as inputs to data-driven models trained on normalized pseudo global horizontal clear sky irradiance ( GHI* CS ) targets. Models tested include gradient boosting methods, Random Forests, neural networks, and a quantum variational circuit. The predicted global horizontal clear sky irradiance ( GHI CS ) is then used in the Heliosat-3 method, which uses satellite-derived cloud index (CI) to estimate the all-sky global horizontal irradiance (GHI), for benchmarking against the all-sky GHI output of Heliosat-3 coupled with GHI CS from the physics-based McClear model. Results show the largest root mean squared error (RMSE) reductions of 3–7% under visibility of 6–8 km, with Neural Network and eXtreme Gradient Boosting (XGBoost) achieving the highest overall gain (2.6%). During dust and sand events, performance improves substantially, with Light Gradient-Boosting Machine (LightGBM) achieving a 22% reduction. These findings demonstrate the value of GHI* CS based machine learning approach for improving solar irradiance estimates in aerosol-rich environments. Earth and environmental sciences/Climate sciences Earth and environmental sciences/Environmental sciences Satellite-estimated solar irradiance aerosol classical and quantum learning CAMS McClear METAR Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 1. Introduction The integration of solar energy into the electricity grid presents unique challenges due to the fluctuating nature of solar irradiance, which can significantly affect power generation and grid stability. Accurate day-ahead and intra-day forecasts of all-sky global horizontal irradiance (GHI) are therefore essential: they support power system scheduling, reduce balancing costs, and help photovoltaic (PV) operators avoid penalties arising from forecast–production mismatches [ 1 – 3 ]. While day-ahead forecasts typically rely on numerical weather prediction (NWP), intra-day corrections are often derived from geostationary satellite imagery, which better provides more accurate cloud information due to the higher resolution [ 4 , 5 ]. Clouds remain the dominant source of irradiance variability [ 6 ], but extreme aerosol events—such as dust storms, biomass burning, or urban smog—can also cause GHI reductions comparable to cloud cover [ 7 – 9 ]. These effects are particularly significant in tropical and subtropical regions, including the Indian subcontinent, eastern China, and Indochina, where some of the highest PV deployment rates coincide with frequent aerosol episodes [ 10 – 12 ]. Estimating global horizontal clear-sky irradiance ( \(\:{GHI}_{CS}\) ) is a critical step for satellite-based all-sky GHI retrieval. Conventional approaches rely on aerosol optical depth (AOD) inputs for radiative transfer or empirical models [ 13 , 14 ]. Information on atmospheric aerosol concentration can be obtained at different spatio-temporal resolutions from satellite observations, numerical modelling, ground measurements or climatological datasets [ 15 – 18 ]. However, aerosol information is imperfect across all available sources. Satellite retrievals are limited by cloud contamination, choice of aerosol model and assumptions about aerosol properties [ 19 ]. Ground-based networks such as AERONET provide high-quality AOD measurements, but coverage is sparse and point-based observations are often unrepresentative [ 20 , 21 ]. Climatological datasets cannot capture rapid intra-day aerosol fluctuations [ 18 ]. A widely used tool for \(\:{GHI}_{CS}\) estimation is the McClear model [ 22 ], which computes \(\:{GHI}_{CS}\) using AOD and other inputs from the Copernicus Atmosphere Monitoring Service (CAMS). McClear has been shown to perform well under many conditions globally [ 23 ]. However, it inherits the limitations of the CAMS aerosol data. CAMS provides global, hourly, 40 km–resolution fields and offers valuable large-scale coverage, but its spatial and temporal resolution makes it less suited to representing local or short-lived aerosol events. Regional assessments have reported systematic biases, such as underestimation of AOD in high-load conditions in Australia [ 24 ], misrepresentation of fine-mode aerosols over the Indo-Gangetic Basin [ 25 ], and inconsistencies in regions strongly influenced by biomass burning, desert dust, or mixed aerosol sources including Brazil and the Eastern Mediterranean [ 26 – 28 ]. Therefore, the following uncertainties can be identified with the different sources of aerosol information: (i) limitations in retrieval and numerical modelling algorithms, (ii) naïve aerosol constancy assumptions in climatology, and (iii) limited representativeness of sparsely available ground measurements. Surface horizontal visibility has long been recognized as a proxy for aerosol extinction [ 29 – 35 ], with early work such as the Elterman model establishing a link between visibility and vertical aerosol profiles. Modern retrievals have refined these methods with empirical corrections, optimization techniques, and calibration against satellite AOD products [ 36 – 40 ]. Importantly, visibility is routinely reported in METeorological Aerodrome Reports (METAR) at airports worldwide, yielding a dense, near-real-time dataset that far surpasses the spatial coverage of dedicated aerosol networks [ 41 – 43 ]. Yet, visibility is influenced not only by aerosols but also by humidity, fog, precipitation, and wind [ 44 – 46 ]. As a result, its correlation with ground-based AOD is modest except under dust-dominated conditions [ 47 , 48 ], and the interaction between AOD and relative humidity (RH) further complicates the relationship [ 49 , 50 ]. On its own, visibility is therefore insufficient as a direct substitute for AOD, but it holds promise when integrated with complementary datasets. Machine learning (ML) offers a flexible framework for combining heterogeneous inputs and extracting non-linear relationships that elude traditional parameterizations [ 51 , 52 ]. Previous studies have applied tree-based models such as decision trees and random forests to estimate visibility from monthly or daily aerosol information and vice-versa [ 41 , 52 ], but they fail to resolve rapid aerosol changes, leading to biased irradiance forecasts [ 53 , 54 ]. More advanced ML methods, including gradient boosting frameworks (e.g., XGBoost, LightGBM, CatBoost) and Neural Networks, offer improved performance, scalability, and robustness across diverse datasets [ 55 – 59 ]. Quantum variational circuits (QVCs) have also been proposed for ML applications [ 60 , 61 ], though their application to solar energy meteorology remains largely exploratory. Despite these advances, few studies have systematically explored the integration of real-time visibility (METAR) with reanalysis products (CAMS) to improve clear-sky irradiance estimation. This study addresses that gap with a data-driven framework for estimating \(\:{GHI}_{CS}\) . Specifically, It: Presents a data-driven approach using machine learning (ML) models for estimating global horizontal clear sky irradiance ( \(\:{GHI}_{CS}\) ) by combining METAR and CAMS aerosol datasets. Presents an approach for obtaining normalized pseudo global horizontal clear sky irradiance ( \(\:{GHI}_{CS}^{*}\) ) targets using ground measured GHI, satellite estimated cloud index (CI) and the top of atmosphere (TOA) irradiance, in order to compensate for the lack of direct measurements of \(\:{GHI}_{CS}\) in all-weather situations. Benchmarks the accuracy of satellite-estimated all-sky GHI derived using the \(\:{GHI}_{CS}\) output from the ML models utilizing METAR and CAMS data against the satellite-estimated all-sky GHI derived using the McClear model, at four unseen sites. Validates the improvement in estimated all-sky GHI across a range of visibility situations and aerosol-related METAR weather codes. Validates the improvement in estimated all-sky GHI across a range of relative humidity (RH) conditions. 2. Data and Method 2.1. Ground measured GHI Ground observations of GHI are obtained from five stations located in regions strongly influenced by diverse aerosol conditions (Table 1 ). Data for Cairo, Gurgaon, Da Nang and Chiba are obtained from the CAMS Evaluation and Quality Control database hosted at MinesParis [ 62 ], while Xianghe measurements are retrieved via the BSRN FTP server [ 63 ]. The Cairo and Xianghe stations are equipped with Kipp & Zonen CM21 secondary standard class A pyranometers, Gurgaon uses an Eppley PSP pyranometer, Da Nang is equipped with Huskeflux SR20 secondary standard class A pyranometer, and the Chiba SKYNET station employs a POM-01 sky radiometer. Table 1 provides a summary of the GHI measurement stations used in this study. All datasets are quality controlled using the libinsitu software package [ 62 ]. This includes removal of values flagged as invalid by the physical possible limit (PPL) and extremely rare limit tests. Following quality control, the GHI datasets are averaged from 1-minute to 30-minute resolution before being used in this study. These stations are selected because they are located in regions characterized by frequent and diverse aerosol loading: Cairo Strongly affected by a mix of urban emissions, biomass burning, and desert dust [ 64 ]. Dust storms, especially in spring, contribute to high AOD and influence cloud properties [ 65 ]. A unique “urban haze” composed of submicron ammonium chloride (from biomass burning) and super micron dust has been reported [ 66 ]. Gurgaon (Delhi) High aerosol concentrations result from industrial-vehicular emissions, biomass burning and dust storms, with significant seasonal variations. Biomass burning dominates in the post-monsoon and winter periods [ 67 , 68 ], industrial emissions persist year-round with peaks after monsoon [ 69 ] and dust storms are common during pre-monsoon and monsoon [ 70 ]. Xianghe (Beijing) Summer exhibits the highest AOD and fine-mode fraction due to urban haze [ 71 ], winter has moderate AOD with increased coarse-mode aerosols from heating activities [ 72 ], and spring is influenced by desert dust [ 71 ]. Chiba (Tokyo) Organic aerosols dominate composition (40–60%) across seasons, with daytime peaks [ 73 ]. Diesel exhaust is a major source of fine particulate matter [ 74 ]. Da Nang Rice straw burning during late summer-autumn harvests elevates PM 2.5 and NO 2 [ 75 ]. Such practices are most prevalent during the harvest season from late summer to early autumn. Black carbon from quarrying and vehicular pollution peaks in the dry season (June – July) [ 76 ]. Table 1 Stations providing GHI ground observations Site Network Location Source of data Time period Distance to next airport/METAR observation (km) Cairo enerMENA 30.04 ˚N, 31.01˚E http://tds.webservice-energy.org/ 2015–2019 39 Gurgaon BSRN 28.42 ˚N, 77.16 ˚E http://tds.webservice-energy.org/ 2018–2019 12 Da Nang ESMAP 16.01 ˚N, 108.19 ˚E http://tds.webservice-energy.org/ 2017–2019 8 Xianghe BSRN 39.75 ˚N, 116.96 ˚E ftp://ftp.bsrn.awi.de/ 2010–2015 71 Chiba SKYNET 35.63 ˚N, 140.10 ˚E http://tds.webservice-energy.org/ 2015–2017 20 2.2. Cloud observations from satellites Surface Solar Radiation Data Set – Heliosat (SARAH-3) [ 77 ], available at 30-minute temporal resolution on a 0.05˚ x 0.05˚ regular grid, is generated by applying the MAGICSOL algorithm on the images from Meteosat, located at 0 ˚E. MAGICSOL derives the effective cloud albedo (CAL) using the original Heliosat method [ 78 ]. In the SARAH-3 dataset, CAL is the variable corresponding to CI. For this study, CAL values for the Cairo IEA-PVPS station (30.04 ˚N, 31.01 ˚E) are extracted via spatial interpolation for the time period 2015–2019 (Table 2 ). CAL is converted to clear sky index ( \(\:{k}_{c}\) ) following the procedure in [ 79 ], summarized in Equation \(\:1\) . $$\:\begin{array}{c}{k}_{c}=\left\{\begin{array}{c}1.2,\:\text{f}\text{o}\text{r}\:\:\:\:\:\:\:\:\:CI\le\:\:-0.2\\\:1-CI,\:\text{f}\text{o}\text{r}-0.2\le\:CI\le\:0.8\\\:1.661-1.7814CI+0.7250{CI}^{2},\:\text{f}\text{o}\text{r}\:\:\:\:\:0.8\le\:CI\le\:1.05\\\:0.09,\:\text{f}\text{o}\text{r}\:\:1.05<CI\end{array}\right\}\:\#\left(1\right)\end{array}$$ where, \(\:{k}_{c}:\) clear sky index \(\:CI:\) cloud index Complementary datasets of cloud opacity at 30-minute resolution for Xianghe, Chiba, Gurgaon and Da Nang are obtained from the Solcast platform [ 80 ]. Solcast does not release the full details of its proprietary methodology; however, published studies indicate that its approach is based on semi-empirical retrievals of cloud properties from geostationary satellite imagery [ 81 ]. In line with prior literature [ 82 ], cloud opacity is considered equivalent to CI (or CAL), and is therefore converted to \(\:{k}_{c}\) using Equation \(\:1\) . Table 2 Satellite-estimated products used in this study Site name Satellite product name Source Cairo Cloud albedo Online repository of the Satellite Application Facility (CM-SAF) on Climate Monitoring, SARAH-3 dataset Gurgaon Cloud opacity Solcast web platform and API Da Nang Cloud opacity Solcast web platform and API Xianghe Cloud opacity Solcast web platform and API Chiba Cloud opacity Solcast web platform and API 2.3. Aerosols and other atmospheric parameters 2.3.1. McClear Clear Sky Irradiance and CAMS Aerosol \(\:{GHI}_{CS}\) for the sites used in this study are obtained from the McClear service of CAMS [ 22 ]. The atmospheric composition input into the McClear model comes from the CAMS global reanalysis, which has a horizontal resolution of ~ 40 km and a temporal resolution of 3 hours [ 83 ]. In addition, McClear internally calculates solar geometry parameters and top of atmosphere irradiance (TOA). For this study, McClear \(\:{GHI}_{CS}\) and atmospheric composition data are retrieved for each site through the CAMS Atmosphere Data Store (ADS) using cdsapi in expert mode. Outputs are requested at 30-minute temporal resolution, consistent with the temporal resolution of the METAR data. As CAMS reanalysis is available at 3 hourly resolution, the 30-minute values are obtained by assuming constant atmospheric conditions within each 3 hour window. The full list of parameters used in this study is summarized in Table 3 . Table 3 Summary of the CAMS Global Reanalysis parameters used in this study Parameter Description TOA Irradiation on a horizontal plane at the top of atmosphere sza Solar zenith angle in degrees tco3 Total column content of ozone in Dobson unit tcwv Total column content of water vapour in kg/m 2 AOD BC Partial aerosol optical depth at 550 nm for black carbon AOD DU Partial aerosol optical depth at 550 nm for dust AOD SS Partial aerosol optical depth at 550 nm for sea salt AOD OR Partial aerosol optical depth at 550 nm for organic matter AOD SU Partial aerosol optical depth at 550 nm for sulphate 2.3.2 METAR METAR recorded atmospheric parameters observed once every 30 minutes are obtained for the closest airport to the five sites. The datasets shown in Table 4 are downloaded from the Iowa Environmental Mesonet repository [ 84 ] maintained by the Iowa State University of Science and Technology, which has a long-term archive of airport Automated Surface/ Weather Observation Stations (ASOS/AWOS) for weather parameters. The temperature, wind speed and visibility measurements are converted to SI units, i.e., ˚C, m/s and km. Visibility measurements at airports commonly use transmissometers and forwards scatter sensors for METAR reports [ 85 ]. Quality checks involve comparing sensor data with human observations and reference instruments. Table 4 Atmospheric parameters from METAR data Parameter Description relh RH in % vsby Visibility in miles wxcodes Significant weather observations Furthermore, METAR provides observations of the significant weather. Namely, the classes Haze (HZ), Smoke (FU), Widespread Dust (DU), Sand (SA), Sandstorm (SS), Duststorm (DS) and Dust/ Sand whirls (PO) are related to aerosols and are used for diagnostic classification of the results. 3. Machine Learning setup 3.1. Training-validation-test data split Table 5 Availability of quality controlled datapoints for the analysis Site Quality checked datapoints Training and Validation Testing Cairo 20,914 - Gurgaon - 6,222 Da Nang - 12,800 Xianghe - 16,268 Chiba - 14,682 Cairo is a site with the largest number of data points and is characterized both by dust and anthropogenic pollution conditions. Therefore, it is chosen for the development of the ML models. Two-third of the available datapoints from Cairo, as shown in Table 5 are used for training the models and the remaining one-third for validation and hyperparameter tuning. The training-validation split is not done randomly but in a chronological manner, to ensure that different datapoints from the same days do not appear in the training and validation datasets. Otherwise, due to similarity in the atmospheric situation over a day, the model may produce memorized results instead of learning. The data from the remaining four sites are used to test the performance of the model on previously unseen sites. This is done to check whether the trained models are able to overcome site-dependency. 3.2. Predictor preparation In order to reduce the computational load, the CAMS AOD values of the different species at 550 nm are not entered simultaneously as inputs into the models. They are summed up to produce (i) total AOD at 550 nm. Further input parameters into the ML models are selected as follows: (ii) Visibility measurements from aiport, which provide local information on the atmospheric aerosol loading at the surface. (iii) RH, as it is correlated to the presence of fog and mist, which are known to occur with smog. (iv) Solar zenith angle (SZA), as the cosine of SZA is inversely proportional to the air mass that the TOA irradiance travels through and undergoes dissipation before reaching the surface. (v) Solar azimuth angle, as it is correlated to the diurnal movement of the Sun. (vi) Total column water vapour (TCWV), as it is found to be a significant contributor to the reduction of GHI and the dissipative effect increases with the increase in SZA [ 86 ]. 3.3. Target preparation The cloud-free component of irradiance in all-sky situations cannot be measured directly. GHI measurements taken during cloudless periods are equivalent to \(\:{GHI}_{CS}\) . Various approaches for filtering clear sky situations are found in the literature, and the majority of them uses a clear sky model or requires all three components of solar irradiance or use some statistical approaches [ 87 – 89 ]. On the one hand, the filtering leads to a considerable reduction of the number of available datapoints available for training and validation. On the other hand, this again introduces the problem that sudden changes in irradiance due to aerosol loading could be considered as cloudy situations and eliminated from the dataset. Severe dust storms and smog events are often associated with the presence of clouds and low-level stratus respectively, and therefore it is often difficult to isolate the aerosol impact from cloud impact in such situations. Furthermore, as \(\:{GHI}_{CS}\) is finally used for deriving satellite-estimated GHI from CI in both clear and cloudy situations, it is necessary to evaluate its performance also in both situations. Due to these reasons, a normalized pseudo global horizontal clear sky irradiance ( \(\:{GHI}_{CS}^{*}\) ) is derived, starting from the expression of \(\:{GHI}_{CS}\) at ground level shown in Equation \(\:2\) . $$\:\begin{array}{c}GH{I}_{CS}=\frac{GHI}{{k}_{c}}\approx\:\frac{GH{I}_{ground}}{\left(1-n\right)}\:\#\left(2\right)\end{array}$$ where, \(\:GH{I}_{ground}:\) ground measured GHI \(\:{k}_{c}:\) clear sky index \(\:{GHI}_{CS}:\) clear sky GHI \(\:n\) satellite estimated cloud index or cloud opacity \(\:GH{I}_{ground}\) is obtained from surface measurements and CI from satellite images. Of course, this equation will not hold true for situations where the cloudiness seen by the pyranometer at the surface level does not match the cloudiness seen from satellite due to the effects of parallax and spatial resolution. However, it is expected that the statistics- based machine learning methods will be able to handle these outlier situations. Furthermore, the above expression is normalized by the TOA irradiance in order to restrict the \(\:{GHI}_{CS}^{*}\) values within the range \(\:\left[\text{0,1}\right]\) , as shown in Equation \(\:3\) , which is more efficient for training ML models. Overshooting of GHI values beyond TOA irradiances due to cloud enhancement are neglected in this approach, which is justified by the 30 min averages of GHI analyzed. $$\:\begin{array}{c}Target=\frac{GH{I}_{clear}}{TOA}=\frac{GH{I}_{ground}}{\left(\left(1-n\right)\times\:TOA\right)}\#\left(3\right)\end{array}$$ 3.4. Machine Learning models Popular models for multi-variate regression are used in this analysis, including gradient boosting methods – (i) XGBoost, (ii) LightGBM, (iii) CatBoost, tree-based methods – (iv) Extra Trees, (v) Random Forest, and (vi) Neural Network. Furthermore, a more recent approach of using QVC for machine learning has also been explored. The following subsections provide a brief description of each model. 3.4.1. EXtreme Gradient Boosting (XGBoost) XGBoost leverages the principles of boosting ensemble techniques to enhance prediction accuracy. It operates on the premise of sequentially adding weak learners (typically decision trees) to improve the performance of the overall model. XGBoost employs a unique regularization approach and handles missing values internally while optimizing computation speed and model robustness through parallel processing. An efficient and scalable Python implementation of XGBoost published by the original authors has been used in this analysis [ 90 ]. 3.4.2. Light Gradient-Boosting Machine (LightGBM) LightGBM, developed by Microsoft, improves upon traditional gradient boosting frameworks by integrating Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) techniques. These innovations allow LightGBM to handle vast datasets effectively while reducing memory usage and computation time. Similar to XGBoost, LightGBM uses a decision-tree-based learning algorithm but optimizes the training process by exclusively focusing on the gradients of the chosen data subset. The latest version of the official LightGBM python implementation from Microsoft is used in this analysis [ 91 ]. 3.4.3. Categorical Boosting (CatBoost) CaBoost is a gradient boosting algorithm that uses ordered boosting to reduce prediction shift and target leakage [ 92 ]. In several studies, CatBoost achieved competitive or enhanced accuracy on tasks with imbalanced or categorical data, although its training speed was generally slower than that of LightGBM and XGBoost [ 93 ]. 3.4.4. Random Forest Random Forests combine many decision trees to improve predictions [ 94 ]. Random Forests build trees by drawing bootstrap samples and choosing splits that optimize measures such as impurity or variance reduction. 3.4.5. Extremely Randomized Trees (Extra-Trees) Extra-Trees averages the predictions from multiple decision trees, obtained by portioning the input-space with randomly generated splits [ 95 ]. However, Extra-Trees work on the full training set and select both the splitting feature and the split point at random [ 58 ]. Empirical work indicates that in high-dimensional or noisy settings Extra-Trees may match or exceed the performance of Random Forests 3.4.6. Neural Network (NeuralNetTorch) Neural network consists of multiple layers of perceptrons or neurons, which learn to transform input data into desired output through a process of weighted connections. It utilizes backpropagation to adjust the weights based on the error between predicted and actual outputs, which facilitates learning intricate patterns in data. PyTorch implementation of Neural Network is used in this analysis [ 96 ]. 3.4.7. Quantum Variational Circuit (QVC) QVCs encode classical data into quantum states and employ a parameterized quantum circuit (ansatz) to produce the predictions [ 93 ]. The data encoder circuit determines the frequency spectrum of the quantum model, which in turn affects its expressivity and thereby its ability to learn different types of functions [ 76 ]. In this study, a feature encoder with learnable parameters is used, as shown in Fig. 1 , for the chosen input predictors \(\:{x}_{m}\) . The inputs are encoded through parameterized rotations \(\:{R}_{X}\left({\theta\:}_{mX}\cdot\:{x}_{m}\right)\) , \(\:{R}_{Y}\left({\theta\:}_{mY}\cdot\:{x}_{m}\right)\:\) and \(\:{R}_{Z}\left({\theta\:}_{mZ}\cdot\:{x}_{m}\right)\) , as it has been shown that angle encoding with learnable parameters can help reduce circuit depth [ 97 ]. \(\:{\theta\:}_{mX}\) , \(\:{\theta\:}_{mY}\) and \(\:{\theta\:}_{mZ}\) are the learnable rotation parameters corresponding to the input feature \(\:{x}_{m}\) . 4. Results and Discussion As already mentioned, it is not straightforward to evaluate the quality of \(\:{GHI}_{CS}\) estimates in all-sky situations because \(\:{GHI}_{CS}\) cannot be directly measured in cloudy situations. Therefore, the \(\:{GHI}_{CS}\) estimates obtained from the ML models are evaluated by using them in the Heliosat-3 method and validating the accuracy of satellite-estimated all-sky GHI derived from them against the ground measured GHI. \(\:{GHI}_{CS}\) from the physics-based McClear model, which utilizes CAMS AOD, is also used in the Heliosat-3 method to produce satellite-estimated all-sky GHI, and is used as a reference benchmark. All GHI datasets are averaged to 30 minutes resolution, prior to validation. The general performance for all the test datapoints used in this analysis is evaluated using the coefficient of determination (R 2 ) and the root mean square error (RMSE), shown in Eqs. 4 and 5 respectively. The R 2 metric gives an idea about the overall fit of the estimated values compared to the measured values. RMSE shows the average deviation of the estimated values with strong emphasis on large errors. The utility of the additional METAR data is analyzed by evaluating the percentage improvement in RMSE due to the ML models in comparison to the McClear model, across the available range of visibility values. $$\:{R}^{2}=1-{\sum\:}_{i=1}^{n}\frac{{\left({y}_{target}^{i}-{y}_{model}^{i}\right)}^{2}}{{\left({y}_{target}^{i}-\frac{1}{n}{\sum\:}_{i=1}^{n}{y}_{target}^{i}\right)}^{2}}$$ 4 $$\:RMSE\:=\sqrt{\frac{1}{n}{\sum\:}_{i=1}^{N}{\left({y}_{model}^{i}-{y}_{target}^{i}\right)}^{2}}$$ 5 The overall all-sky RMSE in Heliosat-3 estimated GHI using the \(\:{GHI}_{CS}\) values obtained from ML models, are slightly reduced compared to the RMSE when \(\:{GHI}_{CS}\) obtained from McClear is used (Fig. 2 ). Out of the models tested in this study, XGBoost and NeuralNetTorch show the largest reduction in RMSE. While the QVC shows the least improvement in RMSE, it must also be considered that it uses a very low number of learnable parameters (188) in comparison to the other models such as the Neural Network (which uses 50561 learnable parameters). Also, the number of layers had to be restricted due to the computational requirements. Most of the ML models did not perform well at the Xinaghe site. This could be attributed to the fact that the visibility measurement station was the farthest from the GHI measurement station among all the sites considered here (shown in Table 2 ). The R 2 metric (Table 6 ) shows that the accuracy of the all-sky GHI derived using different ML models and McClear, are comparable. The overall impact is low, but positive. Table 6 R 2 of the satellite-estimated GHI against ground measured GHI using \(\:{GHI}_{CS}\) from different models CatBoost ExtraTrees LightGBM NeuralNet QVC RandomForest XGBoost McClear 0.92 0.92 0.92 0.92 0.91 0.92 0.92 0.91 Consistent improvement in RMSE is observed for visibility values between 6 and 10 km (Fig. 3 ). 10 km is the operational threshold of visibility reporting at airports, beyond which no significant weather phenomena such as haze, smog, dust storm, smoke etc., are found according to WMO guidelines [ 98 ]. However, it is also noticeable that for visibility values lower than 6 km, limited or no improvement is observed. Such drastically low values of visibility are often caused by the presence of hydrometeors. Since the cloud sources of hydrometeors are already being taken into account by the CI parameter, the lower visibility values may overcompensate for the reduction in GHI. Although, the dew point temperature and RH parameters are used as inputs in order to eliminate such situations, the filtering may not have been effective enough. Large errors in visibility derived AOD in situations with higher RH were noted in [ 37 ]. In general, the observations in this study are in line with previous findings that show that visibility is not a perfect proxy for AOD [ 99 ]. The largest improvements in RMSE are observed within the visibility range of 6 to 8 km. Figure 4 shows the improvement or deterioration of the RMSE in Heliosat-3 estimated GHI, when using \(\:{GHI}_{CS}\) values from the ML models instead of the physics-based McClear model, for aerosol-relevant significant weather situations classified in the METAR data. The largest and most consistent improvement in RMSE is observed in the presence of dust and sand aerosol with all ML models. In particular, the LightGBM model shows the highest reduction in RMSE (almost 20%). Only three models – CatBoost, LightGBM and XGBoost, show a significant reduction of RMSE during smoke events. While none of the ML models was able to show an improvement in RMSE during situations with haze. The lowest visibility values, ranging from 1 to 3 km, are observed for weather situations with smoke (FU). Smoke particles are typically small. This leads to a more effective extinction of light in the shorter wavelengths, leading to a greater reduction of visibility [ 100 ]. Dust particles, which are often larger [ 101 ], tend to scatter light less efficiently but can still cause significant attenuation in high concentrations. Depending on the traveling distance, larger particles are removed by dry deposition. This explains the larger range of visibility values, between 2.5 and 5.5 km, observed in the presence of dust and sand aerosol events. Haze (HZ) primarily consists of dispersed secondary aerosols, which could also originate from anthropogenic sources as well as from biomass burning [ 102 ]. Due to the relatively lower concentrations than smoke at the source of origin, higher average visibility is observed during haze conditions in Fig. 4 . The improvement in RMSE of the Heliosat-3 estimated GHI shows a bi-modal distribution with respect to the RH values (Fig. 5 ). For very low and high RH values, an improvement in RMSE is observed. However, almost no improvement or even deterioration is observed for intermediate values of RH. In existing literature, it has been shown that variations in visibility are well-correlated to variations in particulate matter or AOD under low RH conditions and display an inverse proportional or exponential relationship at higher RH values [ 103 ]. Due to the hygroscopic nature of some aerosol types, they increase their size and scattering cross-section by absorbing moisture from the air, which leads to a greater visibility reduction at higher levels of RH. As the ML models are trained on visibility, AOD and RH datasets, this could account for the improvements in RMSE observed with the ML models at higher RH values. However, for intermediate values of RH, visibility measurements are very sensitive towards AOD or particulate matter concentrations [ 103 ]. 5. Summary and Conclusion This study introduced a machine learning (ML) framework for estimating global horizontal clear sky irradiance ( \(\:{GHI}_{CS}\) ) at 30-minute resolution by combining atmospheric parameters from the METeorological Aerodrome Report (METAR) with aerosol information from Copernicus Atmosphere Monitoring Service (CAMS) reanalysis. To address the absence of direct \(\:{GHI}_{CS}\) measurements, a normalized pseudo global horizontal clear sky irradiance ( \(\:{GHI}_{CS}^{*}\) ) target was employed for model training. Models trained on data from Cairo were tested on four unseen sites in tropical and sub-tropical environments. When coupled with the Heliosat-3 model to derive all-sky GHI, the ML-derived \(\:{GHI}_{CS}\) values outperformed the physics-based McClear estimates on an overall basis. Neural Network (NeuralNetTorch) and eXtreme Gradient Boosting (XGBoost) yielded the most robust overall improvements, while quantum variational circuit (QVC) achieved notable gains despite the limited number of parameters. The strongest benefits were observed for visibility values between 6 and 8 km. Large reductions in RMSE of up to 22% were observed during dust and sand aerosol events, with moderate improvements under smoke, while haze events showed no improvement. Performance also exhibited a bimodal dependence on relative humidity (RH), with gains most pronounced in low and high RH regimes, and little to no improvement in the intermediate range. This behavior likely reflects the changing relationship between visibility and RH, which is weak at low RH, becomes strongly inverse at high RH, and transitions nonlinearly in the mid-range. These findings demonstrate that ML-based \(\:{GHI}_{CS}\) estimates using local METAR data offer a useful enhancement for the existing satellite-based GHI estimation models, particularly in aerosol-rich regions where existing physics-based models face limitations due to spatial resolution. Looking ahead, expanding the training domain to multiple sites, incorporating aerosol-type specific AOD, and exploring domain adaptation techniques may further improve the accuracy of satellite retrieved GHI. This approach holds promise for advancing operational PV power prediction and solar resource assessment in regions strongly impacted by aerosols. Declarations Funding The work was supported with funding from the German Academic Exchange Service (DAAD). Author Contribution A.R and D.H conceived the idea and designed the study. M.SH formulated the evaluation techniques. J.L acquired the datasets and implemented Heliosat-3. A.R did the simulations with ML models. A.R, D.H and M.SH wrote the paper Data Availability SARAH3, METAR, and ground observations used in this study are openly available for any purpose, SOLCAST data is available for research and education purposes. All download links are mentioned in the data section. The output data of the machine learning models will be made available upon request. References Al-Dahidi, S., Ayadi, O., Alrbai, M., Adeeb, J. (2019) Ensemble Approach of Optimized Artificial Neural Networks for Solar Photovoltaic Power Prediction. IEEE Access , 7 , 81741–81758. Khodayar, M., Mohammadi, S., Khodayar, M.E., Wang, J., Liu, G. (2020) Convolutional Graph Autoencoder: A Generative Deep Neural Network for Probabilistic Spatio-Temporal Solar Irradiance Forecasting. IEEE Trans. Sustain. Energy , 11 (2), 571–583. Najdawi, F.Z. and Villarreal, R. (2023) Utilizing the Vector Autoregression Model (VAR) for Short-Term Solar Irradiance Forecasting. EPE , 15 (11), 353–362. Edoli, E., Fiorenzani, S., Vargiolu, T. (2016) Optimal Trading Strategies in Intraday Power Markets, in Optimization Methods for Gas and Power Markets: Theory and Cases (eds E. Edoli, S. Fiorenzani, T. Vargiolu), Palgrave Macmillan, London, pp. 161–184. Ewan D. Dunlop, Lucien Wald, Marcel Suri (2006) Solar Energy Resource Management for Electricity Generation from Local Level to Global Scale , Nova Science Publishers Inc. Yamasoe, M.A., do Rosário, N.M.E., Barros, K.M. (2017) Downward solar global irradiance at the surface in São Paulo city—The climatological effects of aerosol and clouds. JGR Atmospheres , 122 (1), 391–404. Kosmopoulos, P.G., Kazadzis, S., Taylor, M., Athanasopoulou, E., Speyer, O., Raptis, P.I., Marinou, E., Proestakis, E., Solomos, S., Gerasopoulos, E., Amiridis, V., Bais, A., Kontoes, C. (2017) Dust impact on surface solar irradiance assessed with model simulations, satellite observations and ground-based measurements. Atmos. Meas. Tech. , 10 (7), 2435–2453. Schafer, J.S., Eck, T.F., Holben, B.N., Artaxo, P., Yamasoe, M.A., Procopio, A.S. (2002) Observed reductions of total solar irradiance by biomass‐burning aerosols in the Brazilian Amazon and Zambian Savanna. Geophysical Research Letters , 29 (17). Costa, R.S., Martins, F.R., Pereira, E.B. (2016) Atmospheric aerosol influence on the Brazilian solar energy assessment: Experiments with different horizontal visibility bases in radiative transfer model. Renewable Energy , 90 , 120–135. Husar, R.B., Husar, J.D., Martin, L. (2000) Distribution of continental surface aerosol extinction based on visual range data. Atmospheric Environment , 34 (29-30), 5067–5078. Tantiwechwuttikul, R., Yarime, M., Ito, K. (2019) Solar Photovoltaic Market Adoption: Dilemma of Technological Exploitation vs Technological Exploration, in Technologies and Eco-innovation towards Sustainability II: Eco Design Assessment and Management (eds A.H. Hu, M. Matsumoto, T.C. Kuo, S. Smith), Springer Singapore, Singapore, pp. 215–227. Hermann, M., Heintzenberg, J., Wiedensohler, A., Zahn, A., Heinrich, G., Brenninkmeijer, C.A.M. (2003) Meridional distributions of aerosol particle number concentrations in the upper troposphere and lower stratosphere obtained by Civil Aircraft for Regular Investigation of the Atmosphere Based on an Instrument Container (CARIBIC) flights. J. Geophys. Res. , 108 (D3). Sun, X., Bright, J.M., Gueymard, C.A., Bai, X., Acord, B., Wang, P. (2021) Worldwide performance assessment of 95 direct and diffuse clear-sky irradiance models using principal component analysis. Renewable and Sustainable Energy Reviews , 135 , 110087. Kamath, H.G. and Srinivasan, J. (2020) Validation of global irradiance derived from INSAT-3D over India. Solar Energy , 202 , 45–54. Gueymard, C.A., Habte, A., Sengupta, M. (2018) Reducing Uncertainties in Large-Scale Solar Resource Data: The Impact of Aerosols. IEEE J. Photovoltaics , 8 (6), 1732–1737. Foyo-Moreno, I., Alados, I., Antón, M., Fernández-Gálvez, J., Cazorla, A., Alados-Arboledas, L. (2014) Estimating aerosol characteristics from solar irradiance measurements at an urban location in southeastern Spain. JGR Atmospheres , 119 (4), 1845–1859. Houborg, R., Soegaard, H., Emmerich, W., Moran, S. (2007) Inferences of all‐sky solar irradiance using Terra and Aqua MODIS satellite data. International Journal of Remote Sensing , 28 (20), 4509–4535. Remund, J., Wald, L., Lefèvre, M., Ranchin, T., Page, J.H. (2003) Worldwide Linke turbidity information, in Proceedings of ISES Solar World Congress 2003 . International Solar Energy Society (ISES), Göteborg, Sweden, 13 p. Kim, M., Levy, R.C., Remer, L.A., Mattoo, S., Gupta, P. (2024) Parameterizing spectral surface reflectance relationships for the Dark Target aerosol algorithm applied to a geostationary imager. Atmos. Meas. Tech. , 17 (7), 1913–1939. Schutgens, N.A.J. (2020) Site representativity of AERONET and GAW remotely sensed aerosol optical thickness and absorbing aerosol optical thickness observations. Atmos. Chem. Phys. , 20 (12), 7473–7488. Lee, K.-H., Yoo, J.-M., Wong, M.-S. (2020) Estimation of Directional Surface Reflectance and Atmospheric Aerosols Over East Asia Using a Multi-Channel Geostationary Satellite, in 2020 IEEE International Geoscience & Remote Sensing Symposium: Proceedings : September 26-October 2, 2020, virtual . IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium, 9/26/2020 - 10/2/2020, Waikoloa, HI, USA. IEEE, Piscataway, NJ, pp. 5600–5603. Lefèvre, M., Oumbe, A., Blanc, P., Espinar, B., Gschwind, B., Qu, Z., Wald, L., Schroedter-Homscheidt, M., Hoyer-Klick, C., Arola, A., Benedetti, A., Kaiser, J.W., Morcrette, J.-J. (2013) McClear: a new model estimating downwelling solar radiation at ground level in clear-sky conditions. Atmos. Meas. Tech. , 6 (9), 2403–2418. Gueymard, C.A. and Yang, D. (2020) Worldwide validation of CAMS and MERRA-2 reanalysis aerosol optical depth products using 15 years of AERONET observations. Atmospheric Environment , 225 , 117216. Isaza, A., Kay, M., Evans, J.P., Bremner, S., Prasad, A. (2021) Validation of Australian atmospheric aerosols from reanalysis data and CMIP6 simulations. Atmospheric Research , 264 , 105856. Ansari, K. and Ramachandran, S. (2024) Optical and physical characteristics of aerosols over Asia: AERONET, MERRA-2 and CAMS. Atmospheric Environment , 326 , 120470. Witthuhn, J., Hünerbein, A., Deneke, H. (2020) Evaluation of satellite-based aerosol datasets and the CAMS reanalysis over ocean utilizing shipborne reference observations. Atmos. Meas. Tech. , 13 (3), 1387–1412. Júnior, A.L.P., Curado, L.F.A., Da Palácios, R.S., Santos, L.O.F.d., Querino, C.A.S., Da Querino, J.K.A.S., Rodrigues, T.R., Marques, J.B. (2025) Evaluation of Aerosol Optical Depth (Aod) Estimated by Copernicus Atmosphere Monitoring Service (Cams) in Brazil. Theor Appl Climatol , 156 (2). Tuna Tuygun, G. and Elbir, T. (2024) Comparative analysis of CAMS aerosol optical depth data and AERONET observations in the Eastern Mediterranean over 19 years. Environ Sci Pollut Res , 31 (18), 27069–27084. Koschmieder, H. (1924) Theorie der horizontalen sichtweite, Beitrage zur Physik der Freien Atmosphare. Meteorologische Zeitschrift , 12 , 3353. Horvath, H. (1971) On the applicability of the koschmieder visibility formula. Atmospheric Environment (1967) , 5 (3), 177–184. Ozkaynak, H., Schatz, A.D., Thurston, G.D., Isaacs, R.G., Husar, R.B. (1985) Relationships between Aerosol Extinction Coefficients Derived from Airport Visual Range Observations and Alternative Measures of Airborne Particle Mass. Journal of the Air Pollution Control Association , 35 (11), 1176–1185. Friedlander, S.K. (2000) Smoke, Dust and Haze, Oxford University Press. Peterson, J.T. and Fee, C.J. (1981) Visibility-atmospheric turbidity dependence at Raleigh, North Carolina. Atmospheric Environment (1967) , 15 (12), 2561–2563. Elterman, L. (1970) Relationships between vertical attenuation and surface meteorological range. Appl. Opt. , 9 (8), 1804–1810. Zhang, S., Wu, J., Fan, W., Yang, Q., Zhao, D. (2020) Review of aerosol optical depth retrieval using visibility data. Earth-Science Reviews , 200 , 102986. Qiu, J. and Lin, Y. (2001) A parameterization model of aerosol optical depths in China. Acta Meteorol. Sin , 59 (3), 368–372. Wu, J., Luo, J., Zhang, L., Xia, L., Zhao, D., Tang, J. (2014) Improvement of aerosol optical depth retrieval using visibility data in China during the past 50 years. JGR Atmospheres , 119 (23). Zhang, Z., Wu, W., Wei, J., Song, Y., Yan, X., Zhu, L., Wang, Q. (2017) Aerosol optical depth retrieval from visibility in China during 1973–2014. Atmospheric Environment , 171 , 38–48. Li, F., Zhang, L., Wei, Q., Yang, Y., Han, F., Li, W., Zhao, C., Wang, W. (2022) An improved method for retrieving aerosol optical depth using the ground-level meteorological data over the South-central Plain of Hebei Province, China. Atmospheric Pollution Research , 13 (3), 101334. Wu, J., Zhang, S., Yang, Q., Zhao, D., Fan, W., Zhao, J., Shen, C. (2021) Using particle swarm optimization to improve visibility-aerosol optical depth retrieval method. npj Clim Atmos Sci , 4 (1), 1–12. Hao, H., Wang, K., Zhao, C., Wu, G., Li, J. (2024) Visibility-derived aerosol optical depth over global land from 1959 to 2021. Earth Syst. Sci. Data , 16 (7), 3233–3260. Vijayakumar, K., Devara, P.C.S., Sonbawne, S.M., Giles, D.M., Holben, B.N., Rao, S.V.B., Jayasankar, C.K. (2020) Solar radiometer sensing of multi-year aerosol features over a tropical urban station: direct-Sun and inversion products. Atmos. Meas. Tech. , 13 (10), 5569–5593. Ineichen, P. and Perez, R. (2010) Aerosol quantification based on global irradiance. Solar Paces 2010 proceedings . Sequeira, R. and Lai, K.-H. (1998) The effect of meteorological parameters and aerosol constituents on visibility in urban Hong Kong. Atmospheric Environment , 32 (16), 2865–2871. Wen, C.-C. and Yeh, H.-H. (2010) Comparative influences of airborne pollutants and meteorological parameters on atmospheric visibility and turbidity. Atmospheric Research , 96 (4), 496–509. Peng, Y., Wang, H., Hou, M., Jiang, T., Zhang, M., Zhao, T., Che, H. (2020) Improved method of visibility parameterization focusing on high humidity and aerosol concentrations during fog–haze events: Application in the GRAPES_CAUCE model in Jing-Jin-Ji, China. Atmospheric Environment , 222 , 117139. Goudie, A.S. and Middleton, N.J. (1992) The changing frequency of dust storms through time. Climatic Change , 20 (3), 197–225. Mahowald, N.M., Ballantine, J.A., Feddema, J., Ramankutty, N. (2007) Global trends in visibility: implications for dust sources. Atmos. Chem. Phys. , 7 (12), 3309–3339. Tavartkiladze, K.A. and Amiranashvili, A.G. (2007) The Influence of Relative Humidity on the Changeability of the Atmospheric Aerosol Optical Depth, in Nucleation and atmospheric aerosols: 17th international conference, Galway, Ireland, 2007 (eds C.D. O'Dowd and P.E. Wagner), Springer, Berlin, pp. 761–765. Wilson, R.T., Milton, E.J., Nield, J.M. (2015) Are visibility-derived AOT estimates suitable for parameterizing satellite data atmospheric correction algorithms? International Journal of Remote Sensing , 36 (6), 1675–1688. Verbois, H., Rusydi, A., Thiery, A. (2018) Probabilistic forecasting of day-ahead solar irradiance using quantile gradient boosting. Solar Energy , 173 , 313–327. Nabavi, S.O., Haimberger, L., Abbasi, R., Samimi, C. (2018) Prediction of aerosol optical depth in West Asia using deterministic models and machine learning algorithms. Aeolian Research , 35 , 69–84. Kosmopoulos, P. (2024) Impact of aerosols on solar energy production, in Planning and Management of Solar Power from Space (ed P. Kosmopoulos), Academic Press, pp. 89–104. Ina Neher, Tina Buchmann, Susanne Crewell, Bernhard Pospichal, Stefanie Meilinger (2019) Impact of atmospheric aerosols on solar power. Meteorologische Zeitschrift , 28 (4), 305–321. Bentéjac, C., Csörgő, A., Martínez-Muñoz, G. (2021) A comparative analysis of gradient boosting algorithms. Artif Intell Rev , 54 (3), 1937–1967. Zhang, J., Mucs, D., Norinder, U., Svensson, F. (2019) LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity-Application to the Tox21 and Mutagenicity Data Sets. Journal of Chemical Information and Modeling , 59 (10), 4150–4158. Hancock, J.T. and Khoshgoftaar, T.M. (2020) CatBoost for big data: an interdisciplinary review. Journal of big data , 7 (1), 94. Geurts, P., Ernst, D., Wehenkel, L. (2006) Extremely randomized trees. Machine Learning , 63 (1), 3–42. Car, Z., Baressi Šegota, S., Anđelić, N., Lorencin, I., Mrzljak, V. (2020) Modeling the Spread of COVID-19 Infection Using a Multilayer Perceptron. Computational and Mathematical Methods in Medicine , 2020 (1), 5714714. Benedetti, M., Lloyd, E., Sack, S., Fiorentini, M. (2019) Parameterized quantum circuits as machine learning models. Quantum Sci. Technol. , 4 (4), 43001. Schuld, M., Sinayskiy, I., Petruccione, F. (2015) An introduction to quantum machine learning. Contemporary Physics , 56 (2), 172–185. Blanc, P., Jolivet, R., Ménard, L., Saint-Drenan, Y.-M. (2022) Data sharing of in-situ measurements following GEO and FAIR principles in the solar energy sector. Centre O.I.E. MINES Paris, Working document (ed 1.0). Driemel, A., Augustine, J., Behrens, K., Colle, S., Cox, C., Cuevas-Agulló, E., Denn, F.M., Duprat, T., Fukuda, M., Grobe, H., Haeffelin, M., Hodges, G., Hyett, N., Ijima, O., Kallis, A., Knap, W., Kustov, V., Long, C.N., Longenecker, D., Lupi, A., Maturilli, M., Mimouni, M., Ntsangwane, L., Ogihara, H., Olano, X., Olefs, M., Omori, M., Passamani, L., Pereira, E.B., Schmithüsen, H., Schumacher, S., Sieger, R., Tamlyn, J., Vogt, R., Vuilleumier, L., Xia, X., Ohmura, A., König-Langlo, G. (2018) Baseline Surface Radiation Network (BSRN): structure and data description (1992–2017). Earth Syst. Sci. Data , 10 (3), 1491–1501. El‐Metwally, M., Alfaro, S.C., Abdel Wahab, M., Chatenet, B. (2008) Aerosol characteristics over urban Cairo: Seasonal variations as retrieved from Sun photometer measurements. JGR Atmospheres , 113 (D14). El‐Askary, H. and Kafatos, M. (2008) Dust storm and black cloud influence on aerosol optical properties over Cairo and the Greater Delta region, Egypt. International Journal of Remote Sensing , 29 (24), 7199–7211. Christodoulou, A., Bezantakos, S., Bourtsoukidis, E., Stavroulas, I., Afif, C., Borbon, A., Vrekoussis, M., Mihalopoulos, N., Sauvage, S., Sciare, J. (2024) Submicron aerosol pollution in Greater Cairo (Egypt): A new type of urban haze? , Copernicus GmbH. Lalchandani, V., Srivastava, D., Dave, J., Mishra, S., Tripathi, N., Shukla, A.K., Sahu, R., Thamban, N.M., Gaddamidi, S., Dixit, K., Ganguly, D., Tiwari, S., Srivastava, A.K., Sahu, L., Rastogi, N., Gargava, P., Tripathi, S.N. (2022) Effect of Biomass Burning on PM 2.5 Composition and Secondary Aerosol Formation During Post‐Monsoon and Winter Haze Episodes in Delhi. JGR Atmospheres , 127 (1). Bhowmik, H.S., Tripathi, S.N., Shukla, A.K., Lalchandani, V., Murari, V., Devaprasad, M., Shivam, A., Bhushan, R., Prévôt, A.S.H., Rastogi, N. (2024) Contribution of fossil and biomass-derived secondary organic carbon to winter water-soluble organic aerosols in Delhi, India. The Science of the total environment , 912 , 168655. Jain, S., Sharma, S.K., Vijayan, N., Mandal, T.K. (2020) Seasonal characteristics of aerosols (PM2.5 and PM10) and their source apportionment using PMF: A four year study over Delhi, India. Environmental pollution (Barking, Essex : 1987) , 262 , 114337. Sharma, M., Kaskaoutis, D.G., Singh, R.P., Singh, S. (2014) Seasonal Variability of Atmospheric Aerosol Parameters over Greater Noida Using Ground Sunphotometer Observations. Aerosol Air Qual. Res. , 14 (3), 608–622. Yan, L. and Liu, X. (2009) Seasonal variation of atmospheric aerosol and its relation to cloud faction over Beijing-Tianjin-Hebei region. Chin. Res. Environ. Sci , 22 , 924–931. Li, B.G., Ran, Y., Tao, S. (2008) Seasonal variation and spatial distribution of atmospheric aerosols in Beijing. Acta Scientiae Circumstantiae , 28 (7), 1425–1429. Takegawa, N., Miyakawa, T., Kondo, Y., Jimenez, J.L., Zhang, Q., Worsnop, D.R., Fukuda, M. (2006) Seasonal and diurnal variations of submicron organic aerosol in Tokyo observed using the Aerodyne aerosol mass spectrometer. JGR Atmospheres , 111 (D11). Iijima, A., Tago, H., Kumagai, K., Kato, M., Kozawa, K., Sato, K., Furuta, N. (2008) Regional and seasonal characteristics of emission sources of fine airborne particulate matter collected in the center and suburbs of Tokyo, Japan as determined by multielement analysis and source receptor models. Journal of environmental monitoring : JEM , 10 (9), 1025–1032. Phan, N.-T. and Dinh-Tri, C. (2024) Assessment of air pollutant emissions from rice straw open burning in Hoa Vang district, Da Nang city, Vietnam. UD-JST , 25–32. Pham, T.T.K., Le, S.H., Nguyen, T., Balasubramanian, R., Tran, P.T.M. (2024) Characteristics of airborne particles in stone quarrying areas: Human exposure assessment and mitigation. Environmental research , 245 , 118087. Pfeifroth, U., Kothe, S., Drücke, J., Trentmann, J., Schröder, M., Selbach, N., Hollmann, R. (2023) Surface Radiation Data Set - Heliosat (SARAH) - Edition 3, Satellite Application Facility on Climate Monitoring (CM SAF). https://wui.cmsaf.eu/safira/action/viewDoiDetails?acronym=SARAH_V003. Hammer, A., Heinemann, D., Hoyer, C., Kuhlemann, R., Lorenz, E., Müller, R., Beyer, H.G. (2003) Solar energy assessment using remote sensing technologies. Remote Sensing of Environment , 86 (3), 423–432. Hammer, A., Kühnert, J., Weinreich, K., Lorenz, E. (2015) Short-Term Forecasting of Surface Solar Irradiance Based on Meteosat-SEVIRI Data Using a Nighttime Cloud Index. Remote Sensing , 7 (7), 9070–9090. Solar API and Weather Forecasting Tool | Solcast™. https://solcast.com/ (9 April 2025). Bright, J.M. (2019) Solcast: Validation of a satellite-derived solar irradiance dataset. Solar Energy , 189 , 435–449. Bright, J.M., Killinger, S., Lingfors, D., Engerer, N.A. (2017) Integration of distributed solar forecasting with distribution network operations in Australia. ISES Sol. World Congr. 2015, Abu Dhabi, United Arab Emirates, Oct. 29-Novemb. 2 . Inness, A., Ades, M., Agustí-Panareda, A., Barré, J., Benedictow, A., Blechschmidt, A.-M., Dominguez, J.J., Engelen, R., Eskes, H., Flemming, J., others (2019) The CAMS reanalysis of atmospheric composition. Atmos. Chem. Phys. , 19 (6), 3515–3556. Herzmann, D., Arritt, R., Todey, D. (2004) Iowa environmental mesonet. Available at mesonet. agron. iastate. edu/request/coop/fe. phtml (verified 27 Sept. 2005). Iowa State Univ., Dep. of Agron., Ames, IA . Chan, P.W. (2016) A test of visibility sensors at Hong Kong International Airport. Weather , 71 (10), 241–246. Bilbao, J., Román, R., Yousif, C., Mateos, D., Miguel, A. de (2014) Total ozone column, water vapour and aerosol effects on erythemal and global solar irradiance in Marsaxlokk, Malta. Atmospheric Environment , 99 , 508–518. Bright, J.M., Sun, X., Gueymard, C.A., Acord, B., Wang, P., Engerer, N.A. (2020) Bright-Sun: A globally applicable 1-min irradiance clear-sky detection model. Renewable and Sustainable Energy Reviews , 121 , 109706. Alia-Martinez, M., Antonanzas, J., Urraca, R., Martinez-de-Pison, F.J., Antonanzas-Torres, F. (2016) Benchmark of algorithms for solar clear-sky detection. Journal of Renewable and Sustainable Energy , 8 (3). Ellis, B.H., Deceglie, M., Jain, A. (2019) Automatic Detection of Clear-Sky Periods From Irradiance Data. IEEE J. Photovoltaics , 9 (4), 998–1005. Chen, T. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . Association for Computing Machinery, New York, NY, USA, pp. 785–794. Shi, Y., Ke, G., Chen, Z., Zheng, S., Liu, T.-Y. (2022) Quantized Training of Gradient Boosting Decision Trees, in Advances in Neural Information Processing Systems (eds S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh). Curran Associates, Inc, pp. 18822–18833. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A. (2018) CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems , 31 . Sahin, E.K. (2022) Comparative analysis of gradient boosting algorithms for landslide susceptibility mapping. Geocarto International , 37 (9), 2441–2465. Scornet, E. (2016) Random forests and kernel methods. IEEE Transactions on Information Theory , 62 (3), 1485–1500. Wehenkel, L., Ernst, D., Geurts, P. (2006) Ensembles of extremely randomized trees and some generic applications, in Robust methods for power system state estimation and load forecasting . Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S. (2019) PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems , 32 . Ovalle-Magallanes, E., Alvarado-Carrillo, D.E., Avina-Cervantes, J.G., Cruz-Aceves, I., Ruiz-Pinales, J. (2023) Quantum angle encoding with learnable rotation applied to quantum–classical convolutional neural networks. Applied Soft Computing , 141 , 110307. Manual on Codes, International Codes, vol. I. 1 (Annex II to WMO Technical Regulations), part A, Alphanumeric Code s (1995). Zhang, Z.Y., Wong, M.S., Lee, K.H. (2016) Evaluation of the representativeness of ground-based visibility for analysing the spatial and temporal variability of aerosol optical thickness in China. Atmospheric Environment , 147 , 31–45. YuFeng Yang and Ting Li (2018) Study on the relationship between PM2.5 concentration and visibility in Beijing based on light scattering theory. Fourth Seminar on Novel Optoelectronic Detection Technology and Application. SPIE, pp. 165–171. Janicka, L., Stachlewska, I.S., Veselovskii, I., Baars, H. (2017) Temporal variations in optical and microphysical properties of mineral dust and biomass burning aerosol derived from daytime Raman lidar observations over Warsaw, Poland. Atmospheric Environment , 169 , 162–174. Guo, B., Wang, Y., Zhang, X., Che, H., Zhong, J., Chu, Y., Cheng, L. (2020) Temporal and spatial variations of haze and fog and the characteristics of PM2.5 during heavy pollution episodes in China from 2013 to 2018. Atmospheric Pollution Research , 11 (10), 1847–1856. Wang, X., Zhang, R., Yu, W. (2019) The Effects of PM 2.5 Concentrations and Relative Humidity on Atmospheric Visibility in Beijing. JGR Atmospheres , 124 (4), 2235–2259. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 16 Feb, 2026 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 07 Nov, 2025 Reviews received at journal 06 Nov, 2025 Reviews received at journal 05 Nov, 2025 Reviews received at journal 03 Nov, 2025 Reviews received at journal 29 Oct, 2025 Reviewers agreed at journal 15 Oct, 2025 Reviewers agreed at journal 15 Oct, 2025 Reviewers agreed at journal 15 Oct, 2025 Reviewers agreed at journal 14 Oct, 2025 Reviewers agreed at journal 14 Oct, 2025 Reviewers agreed at journal 14 Oct, 2025 Reviewers invited by journal 14 Oct, 2025 Editor invited by journal 14 Oct, 2025 Editor assigned by journal 13 Oct, 2025 Submission checks completed at journal 13 Oct, 2025 First submitted to journal 09 Oct, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7820256","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":534480504,"identity":"0672e1d8-dad1-4a1d-87cc-ac54af9d4968","order_by":0,"name":"Arindam Roy","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABHElEQVRIie3RMUvEMBTA8XcEWoeErq9Url/hlYNWcfGrOLkIIh0VLRyki7iL3newi3QUAjcFv0CXguB0giCIIIjptYsS7Hpg/tNr6I8kLYDLtZFNCuDD1AIggA/ULXbRGGG0JmyUmAbiYS9HSHCjZLs6u4esvExPRb0DAWPV00mtjuN9VrUWgo8HZXK7bGBb67QRGiGce/nsWqvd6sHLrdvoiYy41wDikSESL+4UTyMhFSUFT9Ei4jX56kluCJDi2edAsg8LoY4I2RM2kG5QFANPbedKDAkXVw1HvszDhezvEgl9SMS83HawqfafcfXeTNGfV68v8hwCM7yJeo/i0qzYthkyv2br1+ch9cf7fX778zkuRonL5XL9j74Biy5Yx7KXT5AAAAAASUVORK5CYII=","orcid":"","institution":"German Aerospace Center","correspondingAuthor":true,"prefix":"","firstName":"Arindam","middleName":"","lastName":"Roy","suffix":""},{"id":534480505,"identity":"7a1a384d-d186-4e80-a5bf-2870d8345d4a","order_by":1,"name":"Detlev Heinemann","email":"","orcid":"","institution":"Carl von Ossietzky University of Oldenburg","correspondingAuthor":false,"prefix":"","firstName":"Detlev","middleName":"","lastName":"Heinemann","suffix":""},{"id":534480506,"identity":"b1356823-f8ff-4a4b-980e-f3805a98737f","order_by":2,"name":"Marion Schroedter-Homscheidt","email":"","orcid":"","institution":"German Aerospace Center","correspondingAuthor":false,"prefix":"","firstName":"Marion","middleName":"","lastName":"Schroedter-Homscheidt","suffix":""},{"id":534480507,"identity":"f67e23a0-4bf0-4d8e-b0a0-57ad04e3706a","order_by":3,"name":"Jorge Lezaca","email":"","orcid":"","institution":"German Aerospace Center","correspondingAuthor":false,"prefix":"","firstName":"Jorge","middleName":"","lastName":"Lezaca","suffix":""}],"badges":[],"createdAt":"2025-10-09 18:08:23","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7820256/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7820256/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-026-39971-w","type":"published","date":"2026-02-16T15:59:25+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":94582826,"identity":"f12ea1d5-71e9-4ff7-8a9c-6bdbba53bfb7","added_by":"auto","created_at":"2025-10-28 18:13:29","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1121866,"visible":true,"origin":"","legend":"","description":"","filename":"manuscriptClearSkyIrradianceMETARandCAMS20251013.docx","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/441a96f5dc145242b29bc709.docx"},{"id":94582952,"identity":"1e9876b4-3dc6-42cc-931c-fd89b539ad19","added_by":"auto","created_at":"2025-10-28 18:13:37","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6518,"visible":true,"origin":"","legend":"","description":"","filename":"f5899d37f9cf493e82a6b62113accd30.json","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/a3c4f87ccbf7d50af2fd47d6.json"},{"id":94582531,"identity":"758d3c77-65c4-4de2-a1ba-637ebcf39ef4","added_by":"auto","created_at":"2025-10-28 18:13:13","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":202391,"visible":true,"origin":"","legend":"","description":"","filename":"f5899d37f9cf493e82a6b62113accd301enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/1921683a88615edd99541170.xml"},{"id":94582483,"identity":"0bbcc0dc-5a93-416e-90a6-1892d705e8f9","added_by":"auto","created_at":"2025-10-28 18:13:12","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":18889,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/90d1bf627665db1e83d5145c.png"},{"id":94583578,"identity":"f3330123-26d0-46fb-badb-c6636e0d62b5","added_by":"auto","created_at":"2025-10-28 18:14:11","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":38458,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/02138903986c8c827825b228.png"},{"id":94583635,"identity":"c6e11098-cd04-47cf-a664-00dc8f3239a6","added_by":"auto","created_at":"2025-10-28 18:14:20","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":73030,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/b1bdb88fd98b1ac9799c336b.png"},{"id":94583639,"identity":"9fa92a1d-6699-40ec-bfe3-953280ca0d2a","added_by":"auto","created_at":"2025-10-28 18:14:20","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":16249,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/8798b9b6ef2b13f056517f9f.png"},{"id":94583633,"identity":"00540670-32d0-4833-acd1-057fcfcd7987","added_by":"auto","created_at":"2025-10-28 18:14:19","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":86546,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/ca928460f0439f422bdcf5cf.png"},{"id":94583158,"identity":"ef7e7ee0-f88c-4501-8375-5d4a3ba65620","added_by":"auto","created_at":"2025-10-28 18:13:47","extension":"xml","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":199827,"visible":true,"origin":"","legend":"","description":"","filename":"f5899d37f9cf493e82a6b62113accd301structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/df0b6cb4eccbe4510424c135.xml"},{"id":94582409,"identity":"3e6ba98c-47e2-4399-866e-2124e1980d08","added_by":"auto","created_at":"2025-10-28 18:13:07","extension":"html","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":224383,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/814051ba6374f4ab1dcde7c1.html"},{"id":94582123,"identity":"e6611bed-5f52-4dfd-bf88-75dec7f2bfdc","added_by":"auto","created_at":"2025-10-28 18:12:56","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":75658,"visible":true,"origin":"","legend":"\u003cp\u003e(a) Data encoder layer and (b) Ansatz of the Quantum Variational Circuit\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/6bb272b2fb06f0438159c99a.png"},{"id":94582181,"identity":"eff45de6-71b6-4a73-ac60-39c76cd5e0ca","added_by":"auto","created_at":"2025-10-28 18:12:59","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":124790,"visible":true,"origin":"","legend":"\u003cp\u003ePercentage change of RMSE in Heliosat-3 estimated GHI when using \u0026nbsp;\u003cem\u003eGHI\u003c/em\u003e\u003csub\u003e\u003cem\u003ecs\u003c/em\u003e\u003c/sub\u003e from the ML models instead of McClear. Blue implies positive change or improvement and red implies negative change or deterioration\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/6cd77fa130fdac78588e3b9d.png"},{"id":94583774,"identity":"eeeb87cc-6ce9-4b92-b995-6783f3b35ad6","added_by":"auto","created_at":"2025-10-28 18:14:32","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":259231,"visible":true,"origin":"","legend":"\u003cp\u003ePercentage change of RMSE in Heliosat-3 estimated GHI when using \u003cem\u003eGHI\u003c/em\u003e\u003csub\u003e\u003cem\u003eCS\u003c/em\u003e\u003c/sub\u003e from the ML models instead of McClear for different visibility ranges at the four unseen sites. Positive values indicate improvement in RMSE while negative values indicate deterioration, with respect to McClear\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/0c29ad8792b0533548686145.png"},{"id":94582262,"identity":"2d7e015c-8806-4600-adb0-6ef19e798218","added_by":"auto","created_at":"2025-10-28 18:13:02","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":43060,"visible":true,"origin":"","legend":"\u003cp\u003ePercentage change of RMSE in Heliosat-3 estimated GHI when using \u003cem\u003eGHI\u003c/em\u003e\u003csub\u003e\u003cem\u003eCS\u003c/em\u003e\u003c/sub\u003e from the ML models instead of McClear in three different aerosol situations. HZ = Haze; FU = Smoke; DU_SA_DS_SS_PO = Widespread Dust, Sand, Duststorm, Sandstorm, Dust/Sand whirls. Positive values indicate improvement in RMSE while negative values indicate deterioration, with respect to McClear\u0026nbsp;\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/bf457b19af5674d3a1c719c9.png"},{"id":94582924,"identity":"7f624342-5042-49cb-b0dd-8d528e0833fe","added_by":"auto","created_at":"2025-10-28 18:13:36","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":318537,"visible":true,"origin":"","legend":"\u003cp\u003ePercentage change of RMSE in Heliosat-3 estimated GHI when using \u003cem\u003eGHI\u003c/em\u003e\u003csub\u003e\u003cem\u003eCS\u003c/em\u003e\u003c/sub\u003e from the ML models instead of McClear for different RH ranges at the four unseen sites. Positive values indicate improvement in RMSE while negative values indicate deterioration, with respect to McClear\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/ede9a9e5cd12cb7fc0c94833.png"},{"id":103251194,"identity":"37d437b3-45ea-4023-9d8a-eeb02ee5ab32","added_by":"auto","created_at":"2026-02-23 16:05:53","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2018013,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7820256/v1/cf93a67c-1cf0-4663-b224-03a6494c8ad8.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Data-driven Combination of METAR Observations and CAMS Reanalysis Aerosols to Enhance Satellite Retrieval of Surface Solar Irradiance","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eThe integration of solar energy into the electricity grid presents unique challenges due to the fluctuating nature of solar irradiance, which can significantly affect power generation and grid stability. Accurate day-ahead and intra-day forecasts of all-sky global horizontal irradiance (GHI) are therefore essential: they support power system scheduling, reduce balancing costs, and help photovoltaic (PV) operators avoid penalties arising from forecast\u0026ndash;production mismatches [\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. While day-ahead forecasts typically rely on numerical weather prediction (NWP), intra-day corrections are often derived from geostationary satellite imagery, which better provides more accurate cloud information due to the higher resolution [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Clouds remain the dominant source of irradiance variability [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], but extreme aerosol events\u0026mdash;such as dust storms, biomass burning, or urban smog\u0026mdash;can also cause GHI reductions comparable to cloud cover [\u003cspan additionalcitationids=\"CR8\" citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. These effects are particularly significant in tropical and subtropical regions, including the Indian subcontinent, eastern China, and Indochina, where some of the highest PV deployment rates coincide with frequent aerosol episodes [\u003cspan additionalcitationids=\"CR11\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eEstimating global horizontal clear-sky irradiance (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e) is a critical step for satellite-based all-sky GHI retrieval. Conventional approaches rely on aerosol optical depth (AOD) inputs for radiative transfer or empirical models [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Information on atmospheric aerosol concentration can be obtained at different spatio-temporal resolutions from satellite observations, numerical modelling, ground measurements or climatological datasets [\u003cspan additionalcitationids=\"CR16 CR17\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. However, aerosol information is imperfect across all available sources. Satellite retrievals are limited by cloud contamination, choice of aerosol model and assumptions about aerosol properties [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. Ground-based networks such as AERONET provide high-quality AOD measurements, but coverage is sparse and point-based observations are often unrepresentative [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Climatological datasets cannot capture rapid intra-day aerosol fluctuations [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. A widely used tool for \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e estimation is the McClear model [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e], which computes \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e using AOD and other inputs from the Copernicus Atmosphere Monitoring Service (CAMS). McClear has been shown to perform well under many conditions globally [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. However, it inherits the limitations of the CAMS aerosol data. CAMS provides global, hourly, 40 km\u0026ndash;resolution fields and offers valuable large-scale coverage, but its spatial and temporal resolution makes it less suited to representing local or short-lived aerosol events. Regional assessments have reported systematic biases, such as underestimation of AOD in high-load conditions in Australia [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], misrepresentation of fine-mode aerosols over the Indo-Gangetic Basin [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e], and inconsistencies in regions strongly influenced by biomass burning, desert dust, or mixed aerosol sources including Brazil and the Eastern Mediterranean [\u003cspan additionalcitationids=\"CR27\" citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. Therefore, the following uncertainties can be identified with the different sources of aerosol information: (i) limitations in retrieval and numerical modelling algorithms, (ii) na\u0026iuml;ve aerosol constancy assumptions in climatology, and (iii) limited representativeness of sparsely available ground measurements.\u003c/p\u003e\u003cp\u003eSurface horizontal visibility has long been recognized as a proxy for aerosol extinction [\u003cspan additionalcitationids=\"CR30 CR31 CR32 CR33 CR34\" citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e], with early work such as the Elterman model establishing a link between visibility and vertical aerosol profiles. Modern retrievals have refined these methods with empirical corrections, optimization techniques, and calibration against satellite AOD products [\u003cspan additionalcitationids=\"CR37 CR38 CR39\" citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. Importantly, visibility is routinely reported in METeorological Aerodrome Reports (METAR) at airports worldwide, yielding a dense, near-real-time dataset that far surpasses the spatial coverage of dedicated aerosol networks [\u003cspan additionalcitationids=\"CR42\" citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. Yet, visibility is influenced not only by aerosols but also by humidity, fog, precipitation, and wind [\u003cspan additionalcitationids=\"CR45\" citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e]. As a result, its correlation with ground-based AOD is modest except under dust-dominated conditions [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e, \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e], and the interaction between AOD and relative humidity (RH) further complicates the relationship [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]. On its own, visibility is therefore insufficient as a direct substitute for AOD, but it holds promise when integrated with complementary datasets.\u003c/p\u003e\u003cp\u003eMachine learning (ML) offers a flexible framework for combining heterogeneous inputs and extracting non-linear relationships that elude traditional parameterizations [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e, \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]. Previous studies have applied tree-based models such as decision trees and random forests to estimate visibility from monthly or daily aerosol information and vice-versa [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e, \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e], but they fail to resolve rapid aerosol changes, leading to biased irradiance forecasts [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]. More advanced ML methods, including gradient boosting frameworks (e.g., XGBoost, LightGBM, CatBoost) and Neural Networks, offer improved performance, scalability, and robustness across diverse datasets [\u003cspan additionalcitationids=\"CR56 CR57 CR58\" citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e]. Quantum variational circuits (QVCs) have also been proposed for ML applications [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e, \u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e], though their application to solar energy meteorology remains largely exploratory. Despite these advances, few studies have systematically explored the integration of real-time visibility (METAR) with reanalysis products (CAMS) to improve clear-sky irradiance estimation.\u003c/p\u003e\u003cp\u003eThis study addresses that gap with a data-driven framework for estimating \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e. Specifically, It:\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003ePresents a data-driven approach using machine learning (ML) models for estimating global horizontal clear sky irradiance (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e) by combining METAR and CAMS aerosol datasets.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003ePresents an approach for obtaining normalized pseudo global horizontal clear sky irradiance (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}^{*}\\)\u003c/span\u003e\u003c/span\u003e) targets using ground measured GHI, satellite estimated cloud index (CI) and the top of atmosphere (TOA) irradiance, in order to compensate for the lack of direct measurements of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e in all-weather situations.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eBenchmarks the accuracy of satellite-estimated all-sky GHI derived using the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e output from the ML models utilizing METAR and CAMS data against the satellite-estimated all-sky GHI derived using the McClear model, at four unseen sites.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eValidates the improvement in estimated all-sky GHI across a range of visibility situations and aerosol-related METAR weather codes.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eValidates the improvement in estimated all-sky GHI across a range of relative humidity (RH) conditions.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e"},{"header":"2. Data and Method","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e2.1. Ground measured GHI\u003c/h2\u003e\u003cp\u003eGround observations of GHI are obtained from five stations located in regions strongly influenced by diverse aerosol conditions (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Data for Cairo, Gurgaon, Da Nang and Chiba are obtained from the CAMS Evaluation and Quality Control database hosted at MinesParis [\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e], while Xianghe measurements are retrieved via the BSRN FTP server [\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e]. The Cairo and Xianghe stations are equipped with Kipp \u0026amp; Zonen CM21 secondary standard class A pyranometers, Gurgaon uses an Eppley PSP pyranometer, Da Nang is equipped with Huskeflux SR20 secondary standard class A pyranometer, and the Chiba SKYNET station employs a POM-01 sky radiometer. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e provides a summary of the GHI measurement stations used in this study. All datasets are quality controlled using the libinsitu software package [\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e]. This includes removal of values flagged as invalid by the physical possible limit (PPL) and extremely rare limit tests. Following quality control, the GHI datasets are averaged from 1-minute to 30-minute resolution before being used in this study.\u003c/p\u003e\u003cp\u003eThese stations are selected because they are located in regions characterized by frequent and diverse aerosol loading:\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eCairo\u003c/strong\u003e\u003cp\u003eStrongly affected by a mix of urban emissions, biomass burning, and desert dust [\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e]. Dust storms, especially in spring, contribute to high AOD and influence cloud properties [\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e]. A unique \u0026ldquo;urban haze\u0026rdquo; composed of submicron ammonium chloride (from biomass burning) and super micron dust has been reported [\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e].\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eGurgaon (Delhi)\u003c/strong\u003e\u003cp\u003eHigh aerosol concentrations result from industrial-vehicular emissions, biomass burning and dust storms, with significant seasonal variations. Biomass burning dominates in the post-monsoon and winter periods [\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e, \u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e], industrial emissions persist year-round with peaks after monsoon [\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e] and dust storms are common during pre-monsoon and monsoon [\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e].\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eXianghe (Beijing)\u003c/strong\u003e\u003cp\u003eSummer exhibits the highest AOD and fine-mode fraction due to urban haze [\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e], winter has moderate AOD with increased coarse-mode aerosols from heating activities [\u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e72\u003c/span\u003e], and spring is influenced by desert dust [\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e].\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eChiba (Tokyo)\u003c/strong\u003e\u003cp\u003eOrganic aerosols dominate composition (40\u0026ndash;60%) across seasons, with daytime peaks [\u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e73\u003c/span\u003e]. Diesel exhaust is a major source of fine particulate matter [\u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e74\u003c/span\u003e].\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eDa Nang\u003c/strong\u003e\u003cp\u003eRice straw burning during late summer-autumn harvests elevates PM\u003csub\u003e2.5\u003c/sub\u003e and NO\u003csub\u003e2\u003c/sub\u003e [\u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e75\u003c/span\u003e]. Such practices are most prevalent during the harvest season from late summer to early autumn. Black carbon from quarrying and vehicular pollution peaks in the dry season (June \u0026ndash; July) [\u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e76\u003c/span\u003e].\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eStations providing GHI ground observations\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"6\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSite\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eNetwork\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eLocation\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eSource of data\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eTime period\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eDistance to next airport/METAR observation (km)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCairo\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eenerMENA\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e30.04 ˚N, 31.01˚E\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://tds.webservice-energy.org/\u003c/span\u003e\u003cspan address=\"http://tds.webservice-energy.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e2015\u0026ndash;2019\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e39\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGurgaon\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eBSRN\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e28.42 ˚N, 77.16 ˚E\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://tds.webservice-energy.org/\u003c/span\u003e\u003cspan address=\"http://tds.webservice-energy.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e2018\u0026ndash;2019\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e12\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDa Nang\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eESMAP\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e16.01 ˚N, 108.19 ˚E\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://tds.webservice-energy.org/\u003c/span\u003e\u003cspan address=\"http://tds.webservice-energy.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e2017\u0026ndash;2019\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e8\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eXianghe\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eBSRN\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e39.75 ˚N, 116.96 ˚E\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003eftp://ftp.bsrn.awi.de/\u003c/span\u003e\u003cspan address=\"http://ftp://ftp.bsrn.awi.de/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e2010\u0026ndash;2015\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e71\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eChiba\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSKYNET\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e35.63 ˚N, 140.10 ˚E\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://tds.webservice-energy.org/\u003c/span\u003e\u003cspan address=\"http://tds.webservice-energy.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e2015\u0026ndash;2017\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e20\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e2.2. Cloud observations from satellites\u003c/h2\u003e\u003cp\u003eSurface Solar Radiation Data Set \u0026ndash; Heliosat (SARAH-3) [\u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e77\u003c/span\u003e], available at 30-minute temporal resolution on a 0.05˚ x 0.05˚ regular grid, is generated by applying the MAGICSOL algorithm on the images from Meteosat, located at 0 ˚E. MAGICSOL derives the effective cloud albedo (CAL) using the original Heliosat method [\u003cspan citationid=\"CR78\" class=\"CitationRef\"\u003e78\u003c/span\u003e]. In the SARAH-3 dataset, CAL is the variable corresponding to CI.\u003c/p\u003e\u003cp\u003eFor this study, CAL values for the Cairo IEA-PVPS station (30.04 ˚N, 31.01 ˚E) are extracted via spatial interpolation for the time period 2015\u0026ndash;2019 (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). CAL is converted to clear sky index (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{k}_{c}\\)\u003c/span\u003e\u003c/span\u003e) following the procedure in [\u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e79\u003c/span\u003e], summarized in \u003cb\u003eEquation\u003c/b\u003e \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:1\\)\u003c/span\u003e\u003c/span\u003e.\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}{k}_{c}=\\left\\{\\begin{array}{c}1.2,\\:\\text{f}\\text{o}\\text{r}\\:\\:\\:\\:\\:\\:\\:\\:\\:CI\\le\\:\\:-0.2\\\\\\:1-CI,\\:\\text{f}\\text{o}\\text{r}-0.2\\le\\:CI\\le\\:0.8\\\\\\:1.661-1.7814CI+0.7250{CI}^{2},\\:\\text{f}\\text{o}\\text{r}\\:\\:\\:\\:\\:0.8\\le\\:CI\\le\\:1.05\\\\\\:0.09,\\:\\text{f}\\text{o}\\text{r}\\:\\:1.05\u0026lt;CI\\end{array}\\right\\}\\:\\#\\left(1\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere,\u003c/p\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{k}_{c}:\\)\u003c/span\u003e\u003c/span\u003e clear sky index\u003c/p\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:CI:\\)\u003c/span\u003e\u003c/span\u003e cloud index\u003c/p\u003e\u003cp\u003eComplementary datasets of cloud opacity at 30-minute resolution for Xianghe, Chiba, Gurgaon and Da Nang are obtained from the Solcast platform [\u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e80\u003c/span\u003e]. Solcast does not release the full details of its proprietary methodology; however, published studies indicate that its approach is based on semi-empirical retrievals of cloud properties from geostationary satellite imagery [\u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e81\u003c/span\u003e]. In line with prior literature [\u003cspan citationid=\"CR82\" class=\"CitationRef\"\u003e82\u003c/span\u003e], cloud opacity is considered equivalent to CI (or CAL), and is therefore converted to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{k}_{c}\\)\u003c/span\u003e\u003c/span\u003e using \u003cb\u003eEquation\u003c/b\u003e \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:1\\)\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eSatellite-estimated products used in this study\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSite name\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSatellite product name\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eSource\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCairo\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCloud albedo\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eOnline repository of the Satellite Application Facility (CM-SAF) on Climate Monitoring, SARAH-3 dataset\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGurgaon\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCloud opacity\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eSolcast web platform and API\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDa Nang\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCloud opacity\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eSolcast web platform and API\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eXianghe\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCloud opacity\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eSolcast web platform and API\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eChiba\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCloud opacity\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eSolcast web platform and API\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003e2.3. Aerosols and other atmospheric parameters\u003c/h2\u003e\u003cdiv id=\"Sec6\" class=\"Section3\"\u003e\u003ch2\u003e2.3.1. McClear Clear Sky Irradiance and CAMS Aerosol\u003c/h2\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e for the sites used in this study are obtained from the McClear service of CAMS [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. The atmospheric composition input into the McClear model comes from the CAMS global reanalysis, which has a horizontal resolution of ~\u0026thinsp;40 km and a temporal resolution of 3 hours [\u003cspan citationid=\"CR83\" class=\"CitationRef\"\u003e83\u003c/span\u003e]. In addition, McClear internally calculates solar geometry parameters and top of atmosphere irradiance (TOA). For this study, McClear \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e and atmospheric composition data are retrieved for each site through the CAMS Atmosphere Data Store (ADS) using cdsapi in expert mode. Outputs are requested at 30-minute temporal resolution, consistent with the temporal resolution of the METAR data. As CAMS reanalysis is available at 3 hourly resolution, the 30-minute values are obtained by assuming constant atmospheric conditions within each 3 hour window. The full list of parameters used in this study is summarized in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eSummary of the CAMS Global Reanalysis parameters used in this study\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"2\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eParameter\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eDescription\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTOA\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eIrradiation on a horizontal plane at the top of atmosphere\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003esza\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSolar zenith angle in degrees\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003etco3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTotal column content of ozone in Dobson unit\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003etcwv\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTotal column content of water vapour in kg/m\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAOD BC\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003ePartial aerosol optical depth at 550 nm for black carbon\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAOD DU\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003ePartial aerosol optical depth at 550 nm for dust\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAOD SS\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003ePartial aerosol optical depth at 550 nm for sea salt\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAOD OR\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003ePartial aerosol optical depth at 550 nm for organic matter\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAOD SU\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003ePartial aerosol optical depth at 550 nm for sulphate\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e2.3.2 METAR\u003c/p\u003e\u003cp\u003eMETAR recorded atmospheric parameters observed once every 30 minutes are obtained for the closest airport to the five sites. The datasets shown in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e are downloaded from the Iowa Environmental Mesonet repository [\u003cspan citationid=\"CR84\" class=\"CitationRef\"\u003e84\u003c/span\u003e] maintained by the Iowa State University of Science and Technology, which has a long-term archive of airport Automated Surface/ Weather Observation Stations (ASOS/AWOS) for weather parameters. The temperature, wind speed and visibility measurements are converted to SI units, i.e., ˚C, m/s and km. Visibility measurements at airports commonly use transmissometers and forwards scatter sensors for METAR reports [\u003cspan citationid=\"CR85\" class=\"CitationRef\"\u003e85\u003c/span\u003e]. Quality checks involve comparing sensor data with human observations and reference instruments.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eAtmospheric parameters from METAR data\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"2\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eParameter\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eDescription\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003erelh\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eRH in %\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003evsby\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eVisibility in miles\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ewxcodes\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSignificant weather observations\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eFurthermore, METAR provides observations of the significant weather. Namely, the classes Haze (HZ), Smoke (FU), Widespread Dust (DU), Sand (SA), Sandstorm (SS), Duststorm (DS) and Dust/ Sand whirls (PO) are related to aerosols and are used for diagnostic classification of the results.\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e"},{"header":"3. Machine Learning setup","content":"\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003e3.1. Training-validation-test data split\u003c/h2\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eAvailability of quality controlled datapoints for the analysis\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eSite\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eQuality checked datapoints\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTraining and Validation\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eTesting\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCairo\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e20,914\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e-\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGurgaon\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e-\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e6,222\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDa Nang\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e-\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e12,800\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eXianghe\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e-\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e16,268\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eChiba\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e-\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e14,682\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eCairo is a site with the largest number of data points and is characterized both by dust and anthropogenic pollution conditions. Therefore, it is chosen for the development of the ML models.\u003c/p\u003e\u003cp\u003eTwo-third of the available datapoints from Cairo, as shown in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e are used for training the models and the remaining one-third for validation and hyperparameter tuning. The training-validation split is not done randomly but in a chronological manner, to ensure that different datapoints from the same days do not appear in the training and validation datasets. Otherwise, due to similarity in the atmospheric situation over a day, the model may produce memorized results instead of learning. The data from the remaining four sites are used to test the performance of the model on previously unseen sites. This is done to check whether the trained models are able to overcome site-dependency.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003e3.2. Predictor preparation\u003c/h2\u003e\u003cp\u003eIn order to reduce the computational load, the CAMS AOD values of the different species at 550 nm are not entered simultaneously as inputs into the models. They are summed up to produce (i) total AOD at 550 nm. Further input parameters into the ML models are selected as follows: (ii) Visibility measurements from aiport, which provide local information on the atmospheric aerosol loading at the surface. (iii) RH, as it is correlated to the presence of fog and mist, which are known to occur with smog. (iv) Solar zenith angle (SZA), as the cosine of SZA is inversely proportional to the air mass that the TOA irradiance travels through and undergoes dissipation before reaching the surface. (v) Solar azimuth angle, as it is correlated to the diurnal movement of the Sun. (vi) Total column water vapour (TCWV), as it is found to be a significant contributor to the reduction of GHI and the dissipative effect increases with the increase in SZA [\u003cspan citationid=\"CR86\" class=\"CitationRef\"\u003e86\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\u003ch2\u003e3.3. Target preparation\u003c/h2\u003e\u003cp\u003eThe cloud-free component of irradiance in all-sky situations cannot be measured directly. GHI measurements taken during cloudless periods are equivalent to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e. Various approaches for filtering clear sky situations are found in the literature, and the majority of them uses a clear sky model or requires all three components of solar irradiance or use some statistical approaches [\u003cspan additionalcitationids=\"CR88\" citationid=\"CR87\" class=\"CitationRef\"\u003e87\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR89\" class=\"CitationRef\"\u003e89\u003c/span\u003e]. On the one hand, the filtering leads to a considerable reduction of the number of available datapoints available for training and validation. On the other hand, this again introduces the problem that sudden changes in irradiance due to aerosol loading could be considered as cloudy situations and eliminated from the dataset. Severe dust storms and smog events are often associated with the presence of clouds and low-level stratus respectively, and therefore it is often difficult to isolate the aerosol impact from cloud impact in such situations. Furthermore, as \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e is finally used for deriving satellite-estimated GHI from CI in both clear and cloudy situations, it is necessary to evaluate its performance also in both situations. Due to these reasons, a normalized pseudo global horizontal clear sky irradiance (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}^{*}\\)\u003c/span\u003e\u003c/span\u003e) is derived, starting from the expression of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e at ground level shown in \u003cb\u003eEquation\u003c/b\u003e \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:2\\)\u003c/span\u003e\u003c/span\u003e.\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}GH{I}_{CS}=\\frac{GHI}{{k}_{c}}\\approx\\:\\frac{GH{I}_{ground}}{\\left(1-n\\right)}\\:\\#\\left(2\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003ewhere,\u003c/p\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:GH{I}_{ground}:\\)\u003c/span\u003e\u003c/span\u003e ground measured GHI\u003c/p\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{k}_{c}:\\)\u003c/span\u003e\u003c/span\u003e clear sky index\u003c/p\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}:\\)\u003c/span\u003e\u003c/span\u003e clear sky GHI\u003c/p\u003e\u003cp\u003e\u003cstrong\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:n\\)\u003c/span\u003e\u003c/span\u003e\u003c/strong\u003e\u003cp\u003esatellite estimated cloud index or cloud opacity\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:GH{I}_{ground}\\)\u003c/span\u003e\u003c/span\u003e is obtained from surface measurements and CI from satellite images. Of course, this equation will not hold true for situations where the cloudiness seen by the pyranometer at the surface level does not match the cloudiness seen from satellite due to the effects of parallax and spatial resolution. However, it is expected that the statistics- based machine learning methods will be able to handle these outlier situations. Furthermore, the above expression is normalized by the TOA irradiance in order to restrict the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}^{*}\\)\u003c/span\u003e\u003c/span\u003e values within the range \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\left[\\text{0,1}\\right]\\)\u003c/span\u003e\u003c/span\u003e, as shown in \u003cb\u003eEquation\u003c/b\u003e \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:3\\)\u003c/span\u003e\u003c/span\u003e, which is more efficient for training ML models. Overshooting of GHI values beyond TOA irradiances due to cloud enhancement are neglected in this approach, which is justified by the 30 min averages of GHI analyzed.\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}Target=\\frac{GH{I}_{clear}}{TOA}=\\frac{GH{I}_{ground}}{\\left(\\left(1-n\\right)\\times\\:TOA\\right)}\\#\\left(3\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003e3.4. Machine Learning models\u003c/h2\u003e\u003cp\u003ePopular models for multi-variate regression are used in this analysis, including gradient boosting methods \u0026ndash; (i) XGBoost, (ii) LightGBM, (iii) CatBoost, tree-based methods \u0026ndash; (iv) Extra Trees, (v) Random Forest, and (vi) Neural Network. Furthermore, a more recent approach of using QVC for machine learning has also been explored. The following subsections provide a brief description of each model.\u003c/p\u003e\u003cdiv id=\"Sec12\" class=\"Section3\"\u003e\u003ch2\u003e3.4.1. EXtreme Gradient Boosting (XGBoost)\u003c/h2\u003e\u003cp\u003eXGBoost leverages the principles of boosting ensemble techniques to enhance prediction accuracy. It operates on the premise of sequentially adding weak learners (typically decision trees) to improve the performance of the overall model. XGBoost employs a unique regularization approach and handles missing values internally while optimizing computation speed and model robustness through parallel processing. An efficient and scalable Python implementation of XGBoost published by the original authors has been used in this analysis [\u003cspan citationid=\"CR90\" class=\"CitationRef\"\u003e90\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section3\"\u003e\u003ch2\u003e3.4.2. Light Gradient-Boosting Machine (LightGBM)\u003c/h2\u003e\u003cp\u003eLightGBM, developed by Microsoft, improves upon traditional gradient boosting frameworks by integrating Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) techniques. These innovations allow LightGBM to handle vast datasets effectively while reducing memory usage and computation time. Similar to XGBoost, LightGBM uses a decision-tree-based learning algorithm but optimizes the training process by exclusively focusing on the gradients of the chosen data subset. The latest version of the official LightGBM python implementation from Microsoft is used in this analysis [\u003cspan citationid=\"CR91\" class=\"CitationRef\"\u003e91\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section3\"\u003e\u003ch2\u003e3.4.3. Categorical Boosting (CatBoost)\u003c/h2\u003e\u003cp\u003eCaBoost is a gradient boosting algorithm that uses ordered boosting to reduce prediction shift and target leakage [\u003cspan citationid=\"CR92\" class=\"CitationRef\"\u003e92\u003c/span\u003e]. In several studies, CatBoost achieved competitive or enhanced accuracy on tasks with imbalanced or categorical data, although its training speed was generally slower than that of LightGBM and XGBoost [\u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e93\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section3\"\u003e\u003ch2\u003e3.4.4. Random Forest\u003c/h2\u003e\u003cp\u003eRandom Forests combine many decision trees to improve predictions [\u003cspan citationid=\"CR94\" class=\"CitationRef\"\u003e94\u003c/span\u003e]. Random Forests build trees by drawing bootstrap samples and choosing splits that optimize measures such as impurity or variance reduction.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section3\"\u003e\u003ch2\u003e3.4.5. Extremely Randomized Trees (Extra-Trees)\u003c/h2\u003e\u003cp\u003eExtra-Trees averages the predictions from multiple decision trees, obtained by portioning the input-space with randomly generated splits [\u003cspan citationid=\"CR95\" class=\"CitationRef\"\u003e95\u003c/span\u003e]. However, Extra-Trees work on the full training set and select both the splitting feature and the split point at random [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e]. Empirical work indicates that in high-dimensional or noisy settings Extra-Trees may match or exceed the performance of Random Forests\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section3\"\u003e\u003ch2\u003e3.4.6. Neural Network (NeuralNetTorch)\u003c/h2\u003e\u003cp\u003eNeural network consists of multiple layers of perceptrons or neurons, which learn to transform input data into desired output through a process of weighted connections. It utilizes backpropagation to adjust the weights based on the error between predicted and actual outputs, which facilitates learning intricate patterns in data. PyTorch implementation of Neural Network is used in this analysis [\u003cspan citationid=\"CR96\" class=\"CitationRef\"\u003e96\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec18\" class=\"Section3\"\u003e\u003ch2\u003e3.4.7. Quantum Variational Circuit (QVC)\u003c/h2\u003e\u003cp\u003eQVCs encode classical data into quantum states and employ a parameterized quantum circuit (ansatz) to produce the predictions [\u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e93\u003c/span\u003e]. The data encoder circuit determines the frequency spectrum of the quantum model, which in turn affects its expressivity and thereby its ability to learn different types of functions [\u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e76\u003c/span\u003e]. In this study, a feature encoder with learnable parameters is used, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003e, for the chosen input predictors \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{x}_{m}\\)\u003c/span\u003e\u003c/span\u003e. The inputs are encoded through parameterized rotations \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}_{X}\\left({\\theta\\:}_{mX}\\cdot\\:{x}_{m}\\right)\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}_{Y}\\left({\\theta\\:}_{mY}\\cdot\\:{x}_{m}\\right)\\:\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}_{Z}\\left({\\theta\\:}_{mZ}\\cdot\\:{x}_{m}\\right)\\)\u003c/span\u003e\u003c/span\u003e, as it has been shown that angle encoding with learnable parameters can help reduce circuit depth [\u003cspan citationid=\"CR97\" class=\"CitationRef\"\u003e97\u003c/span\u003e]. \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\theta\\:}_{mX}\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\theta\\:}_{mY}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\theta\\:}_{mZ}\\)\u003c/span\u003e\u003c/span\u003e are the learnable rotation parameters corresponding to the input feature \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{x}_{m}\\)\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e"},{"header":"4. Results and Discussion","content":"\u003cp\u003eAs already mentioned, it is not straightforward to evaluate the quality of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e estimates in all-sky situations because \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e cannot be directly measured in cloudy situations. Therefore, the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e estimates obtained from the ML models are evaluated by using them in the Heliosat-3 method and validating the accuracy of satellite-estimated all-sky GHI derived from them against the ground measured GHI. \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e from the physics-based McClear model, which utilizes CAMS AOD, is also used in the Heliosat-3 method to produce satellite-estimated all-sky GHI, and is used as a reference benchmark. All GHI datasets are averaged to 30 minutes resolution, prior to validation.\u003c/p\u003e\u003cp\u003eThe general performance for all the test datapoints used in this analysis is evaluated using the coefficient of determination (R\u003csup\u003e2\u003c/sup\u003e) and the root mean square error (RMSE), shown in Eqs.\u0026nbsp;\u003cspan refid=\"Equ1\" class=\"InternalRef\"\u003e4\u003c/span\u003e and \u003cspan refid=\"Equ2\" class=\"InternalRef\"\u003e5\u003c/span\u003e respectively. The R\u003csup\u003e2\u003c/sup\u003e metric gives an idea about the overall fit of the estimated values compared to the measured values. RMSE shows the average deviation of the estimated values with strong emphasis on large errors. The utility of the additional METAR data is analyzed by evaluating the percentage improvement in RMSE due to the ML models in comparison to the McClear model, across the available range of visibility values.\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:{R}^{2}=1-{\\sum\\:}_{i=1}^{n}\\frac{{\\left({y}_{target}^{i}-{y}_{model}^{i}\\right)}^{2}}{{\\left({y}_{target}^{i}-\\frac{1}{n}{\\sum\\:}_{i=1}^{n}{y}_{target}^{i}\\right)}^{2}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:RMSE\\:=\\sqrt{\\frac{1}{n}{\\sum\\:}_{i=1}^{N}{\\left({y}_{model}^{i}-{y}_{target}^{i}\\right)}^{2}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe overall all-sky RMSE in Heliosat-3 estimated GHI using the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e values obtained from ML models, are slightly reduced compared to the RMSE when \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e obtained from McClear is used (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Out of the models tested in this study, XGBoost and NeuralNetTorch show the largest reduction in RMSE. While the QVC shows the least improvement in RMSE, it must also be considered that it uses a very low number of learnable parameters (188) in comparison to the other models such as the Neural Network (which uses 50561 learnable parameters). Also, the number of layers had to be restricted due to the computational requirements. Most of the ML models did not perform well at the Xinaghe site. This could be attributed to the fact that the visibility measurement station was the farthest from the GHI measurement station among all the sites considered here (shown in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The R\u003csup\u003e2\u003c/sup\u003e metric (Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e) shows that the accuracy of the all-sky GHI derived using different ML models and McClear, are comparable. The overall impact is low, but positive.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eR\u003csup\u003e2\u003c/sup\u003e of the satellite-estimated GHI against ground measured GHI using \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e from different models\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"8\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCatBoost\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eExtraTrees\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eLightGBM\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eNeuralNet\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eQVC\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eRandomForest\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c8\"\u003e\u003cp\u003eMcClear\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e0.92\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.92\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.92\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.92\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.91\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.92\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.92\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.91\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eConsistent improvement in RMSE is observed for visibility values between 6 and 10 km (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e3\u003c/span\u003e). 10 km is the operational threshold of visibility reporting at airports, beyond which no significant weather phenomena such as haze, smog, dust storm, smoke etc., are found according to WMO guidelines [\u003cspan citationid=\"CR98\" class=\"CitationRef\"\u003e98\u003c/span\u003e]. However, it is also noticeable that for visibility values lower than 6 km, limited or no improvement is observed. Such drastically low values of visibility are often caused by the presence of hydrometeors. Since the cloud sources of hydrometeors are already being taken into account by the CI parameter, the lower visibility values may overcompensate for the reduction in GHI. Although, the dew point temperature and RH parameters are used as inputs in order to eliminate such situations, the filtering may not have been effective enough. Large errors in visibility derived AOD in situations with higher RH were noted in [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. In general, the observations in this study are in line with previous findings that show that visibility is not a perfect proxy for AOD [\u003cspan citationid=\"CR99\" class=\"CitationRef\"\u003e99\u003c/span\u003e]. The largest improvements in RMSE are observed within the visibility range of 6 to 8 km.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eFigure \u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows the improvement or deterioration of the RMSE in Heliosat-3 estimated GHI, when using \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e values from the ML models instead of the physics-based McClear model, for aerosol-relevant significant weather situations classified in the METAR data. The largest and most consistent improvement in RMSE is observed in the presence of dust and sand aerosol with all ML models. In particular, the LightGBM model shows the highest reduction in RMSE (almost 20%). Only three models \u0026ndash; CatBoost, LightGBM and XGBoost, show a significant reduction of RMSE during smoke events. While none of the ML models was able to show an improvement in RMSE during situations with haze. The lowest visibility values, ranging from 1 to 3 km, are observed for weather situations with smoke (FU). Smoke particles are typically small. This leads to a more effective extinction of light in the shorter wavelengths, leading to a greater reduction of visibility [\u003cspan citationid=\"CR100\" class=\"CitationRef\"\u003e100\u003c/span\u003e]. Dust particles, which are often larger [\u003cspan citationid=\"CR101\" class=\"CitationRef\"\u003e101\u003c/span\u003e], tend to scatter light less efficiently but can still cause significant attenuation in high concentrations. Depending on the traveling distance, larger particles are removed by dry deposition. This explains the larger range of visibility values, between 2.5 and 5.5 km, observed in the presence of dust and sand aerosol events. Haze (HZ) primarily consists of dispersed secondary aerosols, which could also originate from anthropogenic sources as well as from biomass burning [\u003cspan citationid=\"CR102\" class=\"CitationRef\"\u003e102\u003c/span\u003e]. Due to the relatively lower concentrations than smoke at the source of origin, higher average visibility is observed during haze conditions in Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe improvement in RMSE of the Heliosat-3 estimated GHI shows a bi-modal distribution with respect to the RH values (Fig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e5\u003c/span\u003e). For very low and high RH values, an improvement in RMSE is observed. However, almost no improvement or even deterioration is observed for intermediate values of RH. In existing literature, it has been shown that variations in visibility are well-correlated to variations in particulate matter or AOD under low RH conditions and display an inverse proportional or exponential relationship at higher RH values [\u003cspan citationid=\"CR103\" class=\"CitationRef\"\u003e103\u003c/span\u003e]. Due to the hygroscopic nature of some aerosol types, they increase their size and scattering cross-section by absorbing moisture from the air, which leads to a greater visibility reduction at higher levels of RH. As the ML models are trained on visibility, AOD and RH datasets, this could account for the improvements in RMSE observed with the ML models at higher RH values. However, for intermediate values of RH, visibility measurements are very sensitive towards AOD or particulate matter concentrations [\u003cspan citationid=\"CR103\" class=\"CitationRef\"\u003e103\u003c/span\u003e].\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e"},{"header":"5. Summary and Conclusion","content":"\u003cp\u003eThis study introduced a machine learning (ML) framework for estimating global horizontal clear sky irradiance (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e) at 30-minute resolution by combining atmospheric parameters from the METeorological Aerodrome Report (METAR) with aerosol information from Copernicus Atmosphere Monitoring Service (CAMS) reanalysis. To address the absence of direct \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e measurements, a normalized pseudo global horizontal clear sky irradiance (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}^{*}\\)\u003c/span\u003e\u003c/span\u003e) target was employed for model training. Models trained on data from Cairo were tested on four unseen sites in tropical and sub-tropical environments.\u003c/p\u003e\u003cp\u003eWhen coupled with the Heliosat-3 model to derive all-sky GHI, the ML-derived \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e values outperformed the physics-based McClear estimates on an overall basis. Neural Network (NeuralNetTorch) and eXtreme Gradient Boosting (XGBoost) yielded the most robust overall improvements, while quantum variational circuit (QVC) achieved notable gains despite the limited number of parameters. The strongest benefits were observed for visibility values between 6 and 8 km. Large reductions in RMSE of up to 22% were observed during dust and sand aerosol events, with moderate improvements under smoke, while haze events showed no improvement. Performance also exhibited a bimodal dependence on relative humidity (RH), with gains most pronounced in low and high RH regimes, and little to no improvement in the intermediate range. This behavior likely reflects the changing relationship between visibility and RH, which is weak at low RH, becomes strongly inverse at high RH, and transitions nonlinearly in the mid-range.\u003c/p\u003e\u003cp\u003eThese findings demonstrate that ML-based \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{GHI}_{CS}\\)\u003c/span\u003e\u003c/span\u003e estimates using local METAR data offer a useful enhancement for the existing satellite-based GHI estimation models, particularly in aerosol-rich regions where existing physics-based models face limitations due to spatial resolution. Looking ahead, expanding the training domain to multiple sites, incorporating aerosol-type specific AOD, and exploring domain adaptation techniques may further improve the accuracy of satellite retrieved GHI. This approach holds promise for advancing operational PV power prediction and solar resource assessment in regions strongly impacted by aerosols.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eFunding\u003c/h2\u003e\u003cp\u003eThe work was supported with funding from the German Academic Exchange Service (DAAD).\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eA.R and D.H conceived the idea and designed the study. M.SH formulated the evaluation techniques. J.L acquired the datasets and implemented Heliosat-3. A.R did the simulations with ML models. A.R, D.H and M.SH wrote the paper\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eSARAH3, METAR, and ground observations used in this study are openly available for any purpose, SOLCAST data is available for research and education purposes. All download links are mentioned in the data section. The output data of the machine learning models will be made available upon request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAl-Dahidi, S., Ayadi, O., Alrbai, M., Adeeb, J. (2019) Ensemble Approach of Optimized Artificial Neural Networks for Solar Photovoltaic Power Prediction. \u003cem\u003eIEEE Access\u003c/em\u003e, \u003cstrong\u003e7\u003c/strong\u003e, 81741\u0026ndash;81758.\u003c/li\u003e\n\u003cli\u003eKhodayar, M., Mohammadi, S., Khodayar, M.E., Wang, J., Liu, G. (2020) Convolutional Graph Autoencoder: A Generative Deep Neural Network for Probabilistic Spatio-Temporal Solar Irradiance Forecasting. \u003cem\u003eIEEE Trans. Sustain. Energy\u003c/em\u003e, \u003cstrong\u003e11\u003c/strong\u003e (2), 571\u0026ndash;583.\u003c/li\u003e\n\u003cli\u003eNajdawi, F.Z. and Villarreal, R. (2023) Utilizing the Vector Autoregression Model (VAR) for Short-Term Solar Irradiance Forecasting. \u003cem\u003eEPE\u003c/em\u003e, \u003cstrong\u003e15\u003c/strong\u003e (11), 353\u0026ndash;362.\u003c/li\u003e\n\u003cli\u003eEdoli, E., Fiorenzani, S., Vargiolu, T. (2016) Optimal Trading Strategies in Intraday Power Markets, in \u003cem\u003eOptimization Methods for Gas and Power Markets: Theory and Cases\u003c/em\u003e (eds E. Edoli, S. Fiorenzani, T. Vargiolu), Palgrave Macmillan, London, pp. 161\u0026ndash;184.\u003c/li\u003e\n\u003cli\u003eEwan D. Dunlop, Lucien Wald, Marcel Suri (2006) \u003cem\u003eSolar Energy Resource Management for Electricity Generation from Local Level to Global Scale\u003c/em\u003e, Nova Science Publishers Inc.\u003c/li\u003e\n\u003cli\u003eYamasoe, M.A., do Ros\u0026aacute;rio, N.M.E., Barros, K.M. (2017) Downward solar global irradiance at the surface in S\u0026atilde;o Paulo city\u0026mdash;The climatological effects of aerosol and clouds. \u003cem\u003eJGR Atmospheres\u003c/em\u003e, \u003cstrong\u003e122\u003c/strong\u003e (1), 391\u0026ndash;404.\u003c/li\u003e\n\u003cli\u003eKosmopoulos, P.G., Kazadzis, S., Taylor, M., Athanasopoulou, E., Speyer, O., Raptis, P.I., Marinou, E., Proestakis, E., Solomos, S., Gerasopoulos, E., Amiridis, V., Bais, A., Kontoes, C. (2017) Dust impact on surface solar irradiance assessed with model simulations, satellite observations and ground-based measurements. \u003cem\u003eAtmos. Meas. Tech.\u003c/em\u003e, \u003cstrong\u003e10\u003c/strong\u003e (7), 2435\u0026ndash;2453.\u003c/li\u003e\n\u003cli\u003eSchafer, J.S., Eck, T.F., Holben, B.N., Artaxo, P., Yamasoe, M.A., Procopio, A.S. (2002) Observed reductions of total solar irradiance by biomass‐burning aerosols in the Brazilian Amazon and Zambian Savanna. \u003cem\u003eGeophysical Research Letters\u003c/em\u003e, \u003cstrong\u003e29\u003c/strong\u003e (17).\u003c/li\u003e\n\u003cli\u003eCosta, R.S., Martins, F.R., Pereira, E.B. (2016) Atmospheric aerosol influence on the Brazilian solar energy assessment: Experiments with different horizontal visibility bases in radiative transfer model. \u003cem\u003eRenewable Energy\u003c/em\u003e, \u003cstrong\u003e90\u003c/strong\u003e, 120\u0026ndash;135.\u003c/li\u003e\n\u003cli\u003eHusar, R.B., Husar, J.D., Martin, L. (2000) Distribution of continental surface aerosol extinction based on visual range data. \u003cem\u003eAtmospheric Environment\u003c/em\u003e, \u003cstrong\u003e34\u003c/strong\u003e (29-30), 5067\u0026ndash;5078.\u003c/li\u003e\n\u003cli\u003eTantiwechwuttikul, R., Yarime, M., Ito, K. (2019) Solar Photovoltaic Market Adoption: Dilemma of Technological Exploitation vs Technological Exploration, in \u003cem\u003eTechnologies and Eco-innovation towards Sustainability II: Eco Design Assessment and Management\u003c/em\u003e (eds A.H. Hu, M. Matsumoto, T.C. Kuo, S. Smith), Springer Singapore, Singapore, pp. 215\u0026ndash;227.\u003c/li\u003e\n\u003cli\u003eHermann, M., Heintzenberg, J., Wiedensohler, A., Zahn, A., Heinrich, G., Brenninkmeijer, C.A.M. (2003) Meridional distributions of aerosol particle number concentrations in the upper troposphere and lower stratosphere obtained by Civil Aircraft for Regular Investigation of the Atmosphere Based on an Instrument Container (CARIBIC) flights. \u003cem\u003eJ. Geophys. Res.\u003c/em\u003e, \u003cstrong\u003e108\u003c/strong\u003e (D3).\u003c/li\u003e\n\u003cli\u003eSun, X., Bright, J.M., Gueymard, C.A., Bai, X., Acord, B., Wang, P. (2021) Worldwide performance assessment of 95 direct and diffuse clear-sky irradiance models using principal component analysis. \u003cem\u003eRenewable and Sustainable Energy Reviews\u003c/em\u003e, \u003cstrong\u003e135\u003c/strong\u003e, 110087.\u003c/li\u003e\n\u003cli\u003eKamath, H.G. and Srinivasan, J. (2020) Validation of global irradiance derived from INSAT-3D over India. \u003cem\u003eSolar Energy\u003c/em\u003e, \u003cstrong\u003e202\u003c/strong\u003e, 45\u0026ndash;54.\u003c/li\u003e\n\u003cli\u003eGueymard, C.A., Habte, A., Sengupta, M. (2018) Reducing Uncertainties in Large-Scale Solar Resource Data: The Impact of Aerosols. \u003cem\u003eIEEE J. Photovoltaics\u003c/em\u003e, \u003cstrong\u003e8\u003c/strong\u003e (6), 1732\u0026ndash;1737.\u003c/li\u003e\n\u003cli\u003eFoyo-Moreno, I., Alados, I., Ant\u0026oacute;n, M., Fern\u0026aacute;ndez-G\u0026aacute;lvez, J., Cazorla, A., Alados-Arboledas, L. (2014) Estimating aerosol characteristics from solar irradiance measurements at an urban location in southeastern Spain. \u003cem\u003eJGR Atmospheres\u003c/em\u003e, \u003cstrong\u003e119\u003c/strong\u003e (4), 1845\u0026ndash;1859.\u003c/li\u003e\n\u003cli\u003eHouborg, R., Soegaard, H., Emmerich, W., Moran, S. (2007) Inferences of all‐sky solar irradiance using Terra and Aqua MODIS satellite data. \u003cem\u003eInternational Journal of Remote Sensing\u003c/em\u003e, \u003cstrong\u003e28\u003c/strong\u003e (20), 4509\u0026ndash;4535.\u003c/li\u003e\n\u003cli\u003eRemund, J., Wald, L., Lef\u0026egrave;vre, M., Ranchin, T., Page, J.H. (2003) Worldwide Linke turbidity information, in \u003cem\u003eProceedings of ISES Solar World Congress 2003\u003c/em\u003e. International Solar Energy Society (ISES), G\u0026ouml;teborg, Sweden, 13 p.\u003c/li\u003e\n\u003cli\u003eKim, M., Levy, R.C., Remer, L.A., Mattoo, S., Gupta, P. (2024) Parameterizing spectral surface reflectance relationships for the Dark Target aerosol algorithm applied to a geostationary imager. \u003cem\u003eAtmos. Meas. Tech.\u003c/em\u003e, \u003cstrong\u003e17\u003c/strong\u003e (7), 1913\u0026ndash;1939.\u003c/li\u003e\n\u003cli\u003eSchutgens, N.A.J. (2020) Site representativity of AERONET and GAW remotely sensed aerosol optical thickness and absorbing aerosol optical thickness observations. \u003cem\u003eAtmos. Chem. Phys.\u003c/em\u003e, \u003cstrong\u003e20\u003c/strong\u003e (12), 7473\u0026ndash;7488.\u003c/li\u003e\n\u003cli\u003eLee, K.-H., Yoo, J.-M., Wong, M.-S. (2020) Estimation of Directional Surface Reflectance and Atmospheric Aerosols Over East Asia Using a Multi-Channel Geostationary Satellite, in \u003cem\u003e2020 IEEE International Geoscience \u0026amp; Remote Sensing Symposium: Proceedings : September 26-October 2, 2020, virtual\u003c/em\u003e. IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium, 9/26/2020 - 10/2/2020, Waikoloa, HI, USA. IEEE, Piscataway, NJ, pp. 5600\u0026ndash;5603.\u003c/li\u003e\n\u003cli\u003eLef\u0026egrave;vre, M., Oumbe, A., Blanc, P., Espinar, B., Gschwind, B., Qu, Z., Wald, L., Schroedter-Homscheidt, M., Hoyer-Klick, C., Arola, A., Benedetti, A., Kaiser, J.W., Morcrette, J.-J. (2013) McClear: a new model estimating downwelling solar radiation at ground level in clear-sky conditions. \u003cem\u003eAtmos. Meas. Tech.\u003c/em\u003e, \u003cstrong\u003e6\u003c/strong\u003e (9), 2403\u0026ndash;2418.\u003c/li\u003e\n\u003cli\u003eGueymard, C.A. and Yang, D. (2020) Worldwide validation of CAMS and MERRA-2 reanalysis aerosol optical depth products using 15 years of AERONET observations. \u003cem\u003eAtmospheric Environment\u003c/em\u003e, \u003cstrong\u003e225\u003c/strong\u003e, 117216.\u003c/li\u003e\n\u003cli\u003eIsaza, A., Kay, M., Evans, J.P., Bremner, S., Prasad, A. (2021) Validation of Australian atmospheric aerosols from reanalysis data and CMIP6 simulations. \u003cem\u003eAtmospheric Research\u003c/em\u003e, \u003cstrong\u003e264\u003c/strong\u003e, 105856.\u003c/li\u003e\n\u003cli\u003eAnsari, K. and Ramachandran, S. (2024) Optical and physical characteristics of aerosols over Asia: AERONET, MERRA-2 and CAMS. \u003cem\u003eAtmospheric Environment\u003c/em\u003e, \u003cstrong\u003e326\u003c/strong\u003e, 120470.\u003c/li\u003e\n\u003cli\u003eWitthuhn, J., H\u0026uuml;nerbein, A., Deneke, H. (2020) Evaluation of satellite-based aerosol datasets and the CAMS reanalysis over ocean utilizing shipborne reference observations. \u003cem\u003eAtmos. Meas. Tech.\u003c/em\u003e, \u003cstrong\u003e13\u003c/strong\u003e (3), 1387\u0026ndash;1412.\u003c/li\u003e\n\u003cli\u003eJ\u0026uacute;nior, A.L.P., Curado, L.F.A., Da Pal\u0026aacute;cios, R.S., Santos, L.O.F.d., Querino, C.A.S., Da Querino, J.K.A.S., Rodrigues, T.R., Marques, J.B. (2025) Evaluation of Aerosol Optical Depth (Aod) Estimated by Copernicus Atmosphere Monitoring Service (Cams) in Brazil. \u003cem\u003eTheor Appl Climatol\u003c/em\u003e, \u003cstrong\u003e156\u003c/strong\u003e (2).\u003c/li\u003e\n\u003cli\u003eTuna Tuygun, G. and Elbir, T. (2024) Comparative analysis of CAMS aerosol optical depth data and AERONET observations in the Eastern Mediterranean over 19 years. \u003cem\u003eEnviron Sci Pollut Res\u003c/em\u003e, \u003cstrong\u003e31\u003c/strong\u003e (18), 27069\u0026ndash;27084.\u003c/li\u003e\n\u003cli\u003eKoschmieder, H. (1924) Theorie der horizontalen sichtweite, Beitrage zur Physik der Freien Atmosphare. \u003cem\u003eMeteorologische Zeitschrift\u003c/em\u003e, \u003cstrong\u003e12\u003c/strong\u003e, 3353.\u003c/li\u003e\n\u003cli\u003eHorvath, H. (1971) On the applicability of the koschmieder visibility formula. \u003cem\u003eAtmospheric Environment (1967)\u003c/em\u003e, \u003cstrong\u003e5\u003c/strong\u003e (3), 177\u0026ndash;184.\u003c/li\u003e\n\u003cli\u003eOzkaynak, H., Schatz, A.D., Thurston, G.D., Isaacs, R.G., Husar, R.B. (1985) Relationships between Aerosol Extinction Coefficients Derived from Airport Visual Range Observations and Alternative Measures of Airborne Particle Mass. \u003cem\u003eJournal of the Air Pollution Control Association\u003c/em\u003e, \u003cstrong\u003e35\u003c/strong\u003e (11), 1176\u0026ndash;1185.\u003c/li\u003e\n\u003cli\u003eFriedlander, S.K. (2000) Smoke, Dust and Haze, Oxford University Press.\u003c/li\u003e\n\u003cli\u003ePeterson, J.T. and Fee, C.J. (1981) Visibility-atmospheric turbidity dependence at Raleigh, North Carolina. \u003cem\u003eAtmospheric Environment (1967)\u003c/em\u003e, \u003cstrong\u003e15\u003c/strong\u003e (12), 2561\u0026ndash;2563.\u003c/li\u003e\n\u003cli\u003eElterman, L. (1970) Relationships between vertical attenuation and surface meteorological range. \u003cem\u003eAppl. Opt.\u003c/em\u003e, \u003cstrong\u003e9\u003c/strong\u003e (8), 1804\u0026ndash;1810.\u003c/li\u003e\n\u003cli\u003eZhang, S., Wu, J., Fan, W., Yang, Q., Zhao, D. (2020) Review of aerosol optical depth retrieval using visibility data. \u003cem\u003eEarth-Science Reviews\u003c/em\u003e, \u003cstrong\u003e200\u003c/strong\u003e, 102986.\u003c/li\u003e\n\u003cli\u003eQiu, J. and Lin, Y. (2001) A parameterization model of aerosol optical depths in China. \u003cem\u003eActa Meteorol. Sin\u003c/em\u003e, \u003cstrong\u003e59\u003c/strong\u003e (3), 368\u0026ndash;372.\u003c/li\u003e\n\u003cli\u003eWu, J., Luo, J., Zhang, L., Xia, L., Zhao, D., Tang, J. (2014) Improvement of aerosol optical depth retrieval using visibility data in China during the past 50 years. \u003cem\u003eJGR Atmospheres\u003c/em\u003e, \u003cstrong\u003e119\u003c/strong\u003e (23).\u003c/li\u003e\n\u003cli\u003eZhang, Z., Wu, W., Wei, J., Song, Y., Yan, X., Zhu, L., Wang, Q. (2017) Aerosol optical depth retrieval from visibility in China during 1973\u0026ndash;2014. \u003cem\u003eAtmospheric Environment\u003c/em\u003e, \u003cstrong\u003e171\u003c/strong\u003e, 38\u0026ndash;48.\u003c/li\u003e\n\u003cli\u003eLi, F., Zhang, L., Wei, Q., Yang, Y., Han, F., Li, W., Zhao, C., Wang, W. (2022) An improved method for retrieving aerosol optical depth using the ground-level meteorological data over the South-central Plain of Hebei Province, China. \u003cem\u003eAtmospheric Pollution Research\u003c/em\u003e, \u003cstrong\u003e13\u003c/strong\u003e (3), 101334.\u003c/li\u003e\n\u003cli\u003eWu, J., Zhang, S., Yang, Q., Zhao, D., Fan, W., Zhao, J., Shen, C. (2021) Using particle swarm optimization to improve visibility-aerosol optical depth retrieval method. \u003cem\u003enpj Clim Atmos Sci\u003c/em\u003e, \u003cstrong\u003e4\u003c/strong\u003e (1), 1\u0026ndash;12.\u003c/li\u003e\n\u003cli\u003eHao, H., Wang, K., Zhao, C., Wu, G., Li, J. (2024) Visibility-derived aerosol optical depth over global land from 1959 to 2021. \u003cem\u003eEarth Syst. Sci. Data\u003c/em\u003e, \u003cstrong\u003e16\u003c/strong\u003e (7), 3233\u0026ndash;3260.\u003c/li\u003e\n\u003cli\u003eVijayakumar, K., Devara, P.C.S., Sonbawne, S.M., Giles, D.M., Holben, B.N., Rao, S.V.B., Jayasankar, C.K. (2020) Solar radiometer sensing of multi-year aerosol features over a tropical urban station: direct-Sun and inversion products. \u003cem\u003eAtmos. Meas. Tech.\u003c/em\u003e, \u003cstrong\u003e13\u003c/strong\u003e (10), 5569\u0026ndash;5593.\u003c/li\u003e\n\u003cli\u003eIneichen, P. and Perez, R. (2010) Aerosol quantification based on global irradiance. \u003cem\u003eSolar Paces 2010 proceedings\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eSequeira, R. and Lai, K.-H. (1998) The effect of meteorological parameters and aerosol constituents on visibility in urban Hong Kong. \u003cem\u003eAtmospheric Environment\u003c/em\u003e, \u003cstrong\u003e32\u003c/strong\u003e (16), 2865\u0026ndash;2871.\u003c/li\u003e\n\u003cli\u003eWen, C.-C. and Yeh, H.-H. (2010) Comparative influences of airborne pollutants and meteorological parameters on atmospheric visibility and turbidity. \u003cem\u003eAtmospheric Research\u003c/em\u003e, \u003cstrong\u003e96\u003c/strong\u003e (4), 496\u0026ndash;509.\u003c/li\u003e\n\u003cli\u003ePeng, Y., Wang, H., Hou, M., Jiang, T., Zhang, M., Zhao, T., Che, H. (2020) Improved method of visibility parameterization focusing on high humidity and aerosol concentrations during fog\u0026ndash;haze events: Application in the GRAPES_CAUCE model in Jing-Jin-Ji, China. \u003cem\u003eAtmospheric Environment\u003c/em\u003e, \u003cstrong\u003e222\u003c/strong\u003e, 117139.\u003c/li\u003e\n\u003cli\u003eGoudie, A.S. and Middleton, N.J. (1992) The changing frequency of dust storms through time. \u003cem\u003eClimatic Change\u003c/em\u003e, \u003cstrong\u003e20\u003c/strong\u003e (3), 197\u0026ndash;225.\u003c/li\u003e\n\u003cli\u003eMahowald, N.M., Ballantine, J.A., Feddema, J., Ramankutty, N. (2007) Global trends in visibility: implications for dust sources. \u003cem\u003eAtmos. Chem. Phys.\u003c/em\u003e, \u003cstrong\u003e7\u003c/strong\u003e (12), 3309\u0026ndash;3339.\u003c/li\u003e\n\u003cli\u003eTavartkiladze, K.A. and Amiranashvili, A.G. (2007) The Influence of Relative Humidity on the Changeability of the Atmospheric Aerosol Optical Depth, in \u003cem\u003eNucleation and atmospheric aerosols: 17th international conference, Galway, Ireland, 2007\u003c/em\u003e (eds C.D. O\u0026apos;Dowd and P.E. Wagner), Springer, Berlin, pp. 761\u0026ndash;765.\u003c/li\u003e\n\u003cli\u003eWilson, R.T., Milton, E.J., Nield, J.M. (2015) Are visibility-derived AOT estimates suitable for parameterizing satellite data atmospheric correction algorithms? \u003cem\u003eInternational Journal of Remote Sensing\u003c/em\u003e, \u003cstrong\u003e36\u003c/strong\u003e (6), 1675\u0026ndash;1688.\u003c/li\u003e\n\u003cli\u003eVerbois, H., Rusydi, A., Thiery, A. (2018) Probabilistic forecasting of day-ahead solar irradiance using quantile gradient boosting. \u003cem\u003eSolar Energy\u003c/em\u003e, \u003cstrong\u003e173\u003c/strong\u003e, 313\u0026ndash;327.\u003c/li\u003e\n\u003cli\u003eNabavi, S.O., Haimberger, L., Abbasi, R., Samimi, C. (2018) Prediction of aerosol optical depth in West Asia using deterministic models and machine learning algorithms. \u003cem\u003eAeolian Research\u003c/em\u003e, \u003cstrong\u003e35\u003c/strong\u003e, 69\u0026ndash;84.\u003c/li\u003e\n\u003cli\u003eKosmopoulos, P. (2024) Impact of aerosols on solar energy production, in \u003cem\u003ePlanning and Management of Solar Power from Space\u003c/em\u003e (ed P. Kosmopoulos), Academic Press, pp. 89\u0026ndash;104.\u003c/li\u003e\n\u003cli\u003eIna Neher, Tina Buchmann, Susanne Crewell, Bernhard Pospichal, Stefanie Meilinger (2019) Impact of atmospheric aerosols on solar power. \u003cem\u003eMeteorologische Zeitschrift\u003c/em\u003e, \u003cstrong\u003e28\u003c/strong\u003e (4), 305\u0026ndash;321.\u003c/li\u003e\n\u003cli\u003eBent\u0026eacute;jac, C., Cs\u0026ouml;rgő, A., Mart\u0026iacute;nez-Mu\u0026ntilde;oz, G. (2021) A comparative analysis of gradient boosting algorithms. \u003cem\u003eArtif Intell Rev\u003c/em\u003e, \u003cstrong\u003e54\u003c/strong\u003e (3), 1937\u0026ndash;1967.\u003c/li\u003e\n\u003cli\u003eZhang, J., Mucs, D., Norinder, U., Svensson, F. (2019) LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity-Application to the Tox21 and Mutagenicity Data Sets. \u003cem\u003eJournal of Chemical Information and Modeling\u003c/em\u003e, \u003cstrong\u003e59\u003c/strong\u003e (10), 4150\u0026ndash;4158.\u003c/li\u003e\n\u003cli\u003eHancock, J.T. and Khoshgoftaar, T.M. (2020) CatBoost for big data: an interdisciplinary review. \u003cem\u003eJournal of big data\u003c/em\u003e, \u003cstrong\u003e7\u003c/strong\u003e (1), 94.\u003c/li\u003e\n\u003cli\u003eGeurts, P., Ernst, D., Wehenkel, L. (2006) Extremely randomized trees. \u003cem\u003eMachine Learning\u003c/em\u003e, \u003cstrong\u003e63\u003c/strong\u003e (1), 3\u0026ndash;42.\u003c/li\u003e\n\u003cli\u003eCar, Z., Baressi \u0026Scaron;egota, S., Anđelić, N., Lorencin, I., Mrzljak, V. (2020) Modeling the Spread of COVID-19 Infection Using a Multilayer Perceptron. \u003cem\u003eComputational and Mathematical Methods in Medicine\u003c/em\u003e, \u003cstrong\u003e2020\u003c/strong\u003e (1), 5714714.\u003c/li\u003e\n\u003cli\u003eBenedetti, M., Lloyd, E., Sack, S., Fiorentini, M. (2019) Parameterized quantum circuits as machine learning models. \u003cem\u003eQuantum Sci. Technol.\u003c/em\u003e, \u003cstrong\u003e4\u003c/strong\u003e (4), 43001.\u003c/li\u003e\n\u003cli\u003eSchuld, M., Sinayskiy, I., Petruccione, F. (2015) An introduction to quantum machine learning. \u003cem\u003eContemporary Physics\u003c/em\u003e, \u003cstrong\u003e56\u003c/strong\u003e (2), 172\u0026ndash;185.\u003c/li\u003e\n\u003cli\u003eBlanc, P., Jolivet, R., M\u0026eacute;nard, L., Saint-Drenan, Y.-M. (2022) Data sharing of in-situ measurements following GEO and FAIR principles in the solar energy sector. \u003cem\u003eCentre O.I.E. MINES Paris, Working document\u003c/em\u003e (ed 1.0).\u003c/li\u003e\n\u003cli\u003eDriemel, A., Augustine, J., Behrens, K., Colle, S., Cox, C., Cuevas-Agull\u0026oacute;, E., Denn, F.M., Duprat, T., Fukuda, M., Grobe, H., Haeffelin, M., Hodges, G., Hyett, N., Ijima, O., Kallis, A., Knap, W., Kustov, V., Long, C.N., Longenecker, D., Lupi, A., Maturilli, M., Mimouni, M., Ntsangwane, L., Ogihara, H., Olano, X., Olefs, M., Omori, M., Passamani, L., Pereira, E.B., Schmith\u0026uuml;sen, H., Schumacher, S., Sieger, R., Tamlyn, J., Vogt, R., Vuilleumier, L., Xia, X., Ohmura, A., K\u0026ouml;nig-Langlo, G. (2018) Baseline Surface Radiation Network (BSRN): structure and data description (1992\u0026ndash;2017). \u003cem\u003eEarth Syst. Sci. Data\u003c/em\u003e, \u003cstrong\u003e10\u003c/strong\u003e (3), 1491\u0026ndash;1501.\u003c/li\u003e\n\u003cli\u003eEl‐Metwally, M., Alfaro, S.C., Abdel Wahab, M., Chatenet, B. (2008) Aerosol characteristics over urban Cairo: Seasonal variations as retrieved from Sun photometer measurements. \u003cem\u003eJGR Atmospheres\u003c/em\u003e, \u003cstrong\u003e113\u003c/strong\u003e (D14).\u003c/li\u003e\n\u003cli\u003eEl‐Askary, H. and Kafatos, M. (2008) Dust storm and black cloud influence on aerosol optical properties over Cairo and the Greater Delta region, Egypt. \u003cem\u003eInternational Journal of Remote Sensing\u003c/em\u003e, \u003cstrong\u003e29\u003c/strong\u003e (24), 7199\u0026ndash;7211.\u003c/li\u003e\n\u003cli\u003eChristodoulou, A., Bezantakos, S., Bourtsoukidis, E., Stavroulas, I., Afif, C., Borbon, A., Vrekoussis, M., Mihalopoulos, N., Sauvage, S., Sciare, J. (2024) \u003cem\u003eSubmicron aerosol pollution in Greater Cairo (Egypt): A new type of urban haze?\u003c/em\u003e, Copernicus GmbH.\u003c/li\u003e\n\u003cli\u003eLalchandani, V., Srivastava, D., Dave, J., Mishra, S., Tripathi, N., Shukla, A.K., Sahu, R., Thamban, N.M., Gaddamidi, S., Dixit, K., Ganguly, D., Tiwari, S., Srivastava, A.K., Sahu, L., Rastogi, N., Gargava, P., Tripathi, S.N. (2022) Effect of Biomass Burning on PM 2.5 Composition and Secondary Aerosol Formation During Post‐Monsoon and Winter Haze Episodes in Delhi. \u003cem\u003eJGR Atmospheres\u003c/em\u003e, \u003cstrong\u003e127\u003c/strong\u003e (1).\u003c/li\u003e\n\u003cli\u003eBhowmik, H.S., Tripathi, S.N., Shukla, A.K., Lalchandani, V., Murari, V., Devaprasad, M., Shivam, A., Bhushan, R., Pr\u0026eacute;v\u0026ocirc;t, A.S.H., Rastogi, N. (2024) Contribution of fossil and biomass-derived secondary organic carbon to winter water-soluble organic aerosols in Delhi, India. \u003cem\u003eThe Science of the total environment\u003c/em\u003e, \u003cstrong\u003e912\u003c/strong\u003e, 168655.\u003c/li\u003e\n\u003cli\u003eJain, S., Sharma, S.K., Vijayan, N., Mandal, T.K. (2020) Seasonal characteristics of aerosols (PM2.5 and PM10) and their source apportionment using PMF: A four year study over Delhi, India. \u003cem\u003eEnvironmental pollution (Barking, Essex : 1987)\u003c/em\u003e, \u003cstrong\u003e262\u003c/strong\u003e, 114337.\u003c/li\u003e\n\u003cli\u003eSharma, M., Kaskaoutis, D.G., Singh, R.P., Singh, S. (2014) Seasonal Variability of Atmospheric Aerosol Parameters over Greater Noida Using Ground Sunphotometer Observations. \u003cem\u003eAerosol Air Qual. Res.\u003c/em\u003e, \u003cstrong\u003e14\u003c/strong\u003e (3), 608\u0026ndash;622.\u003c/li\u003e\n\u003cli\u003eYan, L. and Liu, X. (2009) Seasonal variation of atmospheric aerosol and its relation to cloud faction over Beijing-Tianjin-Hebei region. \u003cem\u003eChin. Res. Environ. Sci\u003c/em\u003e, \u003cstrong\u003e22\u003c/strong\u003e, 924\u0026ndash;931.\u003c/li\u003e\n\u003cli\u003eLi, B.G., Ran, Y., Tao, S. (2008) Seasonal variation and spatial distribution of atmospheric aerosols in Beijing. \u003cem\u003eActa Scientiae Circumstantiae\u003c/em\u003e, \u003cstrong\u003e28\u003c/strong\u003e (7), 1425\u0026ndash;1429.\u003c/li\u003e\n\u003cli\u003eTakegawa, N., Miyakawa, T., Kondo, Y., Jimenez, J.L., Zhang, Q., Worsnop, D.R., Fukuda, M. (2006) Seasonal and diurnal variations of submicron organic aerosol in Tokyo observed using the Aerodyne aerosol mass spectrometer. \u003cem\u003eJGR Atmospheres\u003c/em\u003e, \u003cstrong\u003e111\u003c/strong\u003e (D11).\u003c/li\u003e\n\u003cli\u003eIijima, A., Tago, H., Kumagai, K., Kato, M., Kozawa, K., Sato, K., Furuta, N. (2008) Regional and seasonal characteristics of emission sources of fine airborne particulate matter collected in the center and suburbs of Tokyo, Japan as determined by multielement analysis and source receptor models. \u003cem\u003eJournal of environmental monitoring : JEM\u003c/em\u003e, \u003cstrong\u003e10\u003c/strong\u003e (9), 1025\u0026ndash;1032.\u003c/li\u003e\n\u003cli\u003ePhan, N.-T. and Dinh-Tri, C. (2024) Assessment of air pollutant emissions from rice straw open burning in Hoa Vang district, Da Nang city, Vietnam. \u003cem\u003eUD-JST\u003c/em\u003e, 25\u0026ndash;32.\u003c/li\u003e\n\u003cli\u003ePham, T.T.K., Le, S.H., Nguyen, T., Balasubramanian, R., Tran, P.T.M. (2024) Characteristics of airborne particles in stone quarrying areas: Human exposure assessment and mitigation. \u003cem\u003eEnvironmental research\u003c/em\u003e, \u003cstrong\u003e245\u003c/strong\u003e, 118087.\u003c/li\u003e\n\u003cli\u003ePfeifroth, U., Kothe, S., Dr\u0026uuml;cke, J., Trentmann, J., Schr\u0026ouml;der, M., Selbach, N., Hollmann, R. (2023) Surface Radiation Data Set - Heliosat (SARAH) - Edition 3, Satellite Application Facility on Climate Monitoring (CM SAF). https://wui.cmsaf.eu/safira/action/viewDoiDetails?acronym=SARAH_V003.\u003c/li\u003e\n\u003cli\u003eHammer, A., Heinemann, D., Hoyer, C., Kuhlemann, R., Lorenz, E., M\u0026uuml;ller, R., Beyer, H.G. (2003) Solar energy assessment using remote sensing technologies. \u003cem\u003eRemote Sensing of Environment\u003c/em\u003e, \u003cstrong\u003e86\u003c/strong\u003e (3), 423\u0026ndash;432.\u003c/li\u003e\n\u003cli\u003eHammer, A., K\u0026uuml;hnert, J., Weinreich, K., Lorenz, E. (2015) Short-Term Forecasting of Surface Solar Irradiance Based on Meteosat-SEVIRI Data Using a Nighttime Cloud Index. \u003cem\u003eRemote Sensing\u003c/em\u003e, \u003cstrong\u003e7\u003c/strong\u003e (7), 9070\u0026ndash;9090.\u003c/li\u003e\n\u003cli\u003eSolar API and Weather Forecasting Tool | Solcast\u0026trade;. https://solcast.com/ (9 April 2025).\u003c/li\u003e\n\u003cli\u003eBright, J.M. (2019) Solcast: Validation of a satellite-derived solar irradiance dataset. \u003cem\u003eSolar Energy\u003c/em\u003e, \u003cstrong\u003e189\u003c/strong\u003e, 435\u0026ndash;449.\u003c/li\u003e\n\u003cli\u003eBright, J.M., Killinger, S., Lingfors, D., Engerer, N.A. (2017) Integration of distributed solar forecasting with distribution network operations in Australia. \u003cem\u003eISES Sol. World Congr. 2015, Abu Dhabi, United Arab Emirates, Oct. 29-Novemb. 2\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eInness, A., Ades, M., Agust\u0026iacute;-Panareda, A., Barr\u0026eacute;, J., Benedictow, A., Blechschmidt, A.-M., Dominguez, J.J., Engelen, R., Eskes, H., Flemming, J., others (2019) The CAMS reanalysis of atmospheric composition. \u003cem\u003eAtmos. Chem. Phys.\u003c/em\u003e, \u003cstrong\u003e19\u003c/strong\u003e (6), 3515\u0026ndash;3556.\u003c/li\u003e\n\u003cli\u003eHerzmann, D., Arritt, R., Todey, D. (2004) Iowa environmental mesonet. \u003cem\u003eAvailable at mesonet. agron. iastate. edu/request/coop/fe. phtml (verified 27 Sept. 2005). Iowa State Univ., Dep. of Agron., Ames, IA\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eChan, P.W. (2016) A test of visibility sensors at Hong Kong International Airport. \u003cem\u003eWeather\u003c/em\u003e, \u003cstrong\u003e71\u003c/strong\u003e (10), 241\u0026ndash;246.\u003c/li\u003e\n\u003cli\u003eBilbao, J., Rom\u0026aacute;n, R., Yousif, C., Mateos, D., Miguel, A. de (2014) Total ozone column, water vapour and aerosol effects on erythemal and global solar irradiance in Marsaxlokk, Malta. \u003cem\u003eAtmospheric Environment\u003c/em\u003e, \u003cstrong\u003e99\u003c/strong\u003e, 508\u0026ndash;518.\u003c/li\u003e\n\u003cli\u003eBright, J.M., Sun, X., Gueymard, C.A., Acord, B., Wang, P., Engerer, N.A. (2020) Bright-Sun: A globally applicable 1-min irradiance clear-sky detection model. \u003cem\u003eRenewable and Sustainable Energy Reviews\u003c/em\u003e, \u003cstrong\u003e121\u003c/strong\u003e, 109706.\u003c/li\u003e\n\u003cli\u003eAlia-Martinez, M., Antonanzas, J., Urraca, R., Martinez-de-Pison, F.J., Antonanzas-Torres, F. (2016) Benchmark of algorithms for solar clear-sky detection. \u003cem\u003eJournal of Renewable and Sustainable Energy\u003c/em\u003e, \u003cstrong\u003e8\u003c/strong\u003e (3).\u003c/li\u003e\n\u003cli\u003eEllis, B.H., Deceglie, M., Jain, A. (2019) Automatic Detection of Clear-Sky Periods From Irradiance Data. \u003cem\u003eIEEE J. Photovoltaics\u003c/em\u003e, \u003cstrong\u003e9\u003c/strong\u003e (4), 998\u0026ndash;1005.\u003c/li\u003e\n\u003cli\u003eChen, T. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System, in \u003cem\u003eProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\u003c/em\u003e. Association for Computing Machinery, New York, NY, USA, pp. 785\u0026ndash;794.\u003c/li\u003e\n\u003cli\u003eShi, Y., Ke, G., Chen, Z., Zheng, S., Liu, T.-Y. (2022) Quantized Training of Gradient Boosting Decision Trees, in \u003cem\u003eAdvances in Neural Information Processing Systems\u003c/em\u003e (eds S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh). Curran Associates, Inc, pp. 18822\u0026ndash;18833.\u003c/li\u003e\n\u003cli\u003eProkhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A. (2018) CatBoost: unbiased boosting with categorical features. \u003cem\u003eAdvances in Neural Information Processing Systems\u003c/em\u003e, \u003cstrong\u003e31\u003c/strong\u003e.\u003c/li\u003e\n\u003cli\u003eSahin, E.K. (2022) Comparative analysis of gradient boosting algorithms for landslide susceptibility mapping. \u003cem\u003eGeocarto International\u003c/em\u003e, \u003cstrong\u003e37\u003c/strong\u003e (9), 2441\u0026ndash;2465.\u003c/li\u003e\n\u003cli\u003eScornet, E. (2016) Random forests and kernel methods. \u003cem\u003eIEEE Transactions on Information Theory\u003c/em\u003e, \u003cstrong\u003e62\u003c/strong\u003e (3), 1485\u0026ndash;1500.\u003c/li\u003e\n\u003cli\u003eWehenkel, L., Ernst, D., Geurts, P. (2006) Ensembles of extremely randomized trees and some generic applications, in \u003cem\u003eRobust methods for power system state estimation and load forecasting\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003ePaszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S. (2019) PyTorch: An Imperative Style, High-Performance Deep Learning Library. \u003cem\u003eAdvances in Neural Information Processing Systems\u003c/em\u003e, \u003cstrong\u003e32\u003c/strong\u003e.\u003c/li\u003e\n\u003cli\u003eOvalle-Magallanes, E., Alvarado-Carrillo, D.E., Avina-Cervantes, J.G., Cruz-Aceves, I., Ruiz-Pinales, J. (2023) Quantum angle encoding with learnable rotation applied to quantum\u0026ndash;classical convolutional neural networks. \u003cem\u003eApplied Soft Computing\u003c/em\u003e, \u003cstrong\u003e141\u003c/strong\u003e, 110307.\u003c/li\u003e\n\u003cli\u003e\u003cem\u003eManual on Codes, International Codes, vol. I. 1 (Annex II to WMO Technical Regulations), part A, Alphanumeric Code\u003c/em\u003e\u003cem\u003es\u003c/em\u003e (1995).\u003c/li\u003e\n\u003cli\u003eZhang, Z.Y., Wong, M.S., Lee, K.H. (2016) Evaluation of the representativeness of ground-based visibility for analysing the spatial and temporal variability of aerosol optical thickness in China. \u003cem\u003eAtmospheric Environment\u003c/em\u003e, \u003cstrong\u003e147\u003c/strong\u003e, 31\u0026ndash;45.\u003c/li\u003e\n\u003cli\u003eYuFeng Yang and Ting Li (2018) Study on the relationship between PM2.5 concentration and visibility in Beijing based on light scattering theory. Fourth Seminar on Novel Optoelectronic Detection Technology and Application. SPIE, pp. 165\u0026ndash;171.\u003c/li\u003e\n\u003cli\u003eJanicka, L., Stachlewska, I.S., Veselovskii, I., Baars, H. (2017) Temporal variations in optical and microphysical properties of mineral dust and biomass burning aerosol derived from daytime Raman lidar observations over Warsaw, Poland. \u003cem\u003eAtmospheric Environment\u003c/em\u003e, \u003cstrong\u003e169\u003c/strong\u003e, 162\u0026ndash;174.\u003c/li\u003e\n\u003cli\u003eGuo, B., Wang, Y., Zhang, X., Che, H., Zhong, J., Chu, Y., Cheng, L. (2020) Temporal and spatial variations of haze and fog and the characteristics of PM2.5 during heavy pollution episodes in China from 2013 to 2018. \u003cem\u003eAtmospheric Pollution Research\u003c/em\u003e, \u003cstrong\u003e11\u003c/strong\u003e (10), 1847\u0026ndash;1856.\u003c/li\u003e\n\u003cli\u003eWang, X., Zhang, R., Yu, W. (2019) The Effects of PM 2.5 Concentrations and Relative Humidity on Atmospheric Visibility in Beijing. \u003cem\u003eJGR Atmospheres\u003c/em\u003e, \u003cstrong\u003e124\u003c/strong\u003e (4), 2235\u0026ndash;2259.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":true,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Satellite-estimated solar irradiance, aerosol, classical and quantum learning, CAMS, McClear, METAR","lastPublishedDoi":"10.21203/rs.3.rs-7820256/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7820256/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAccurate solar irradiance forecasts are vital for photovoltaic (PV) power prediction, especially in tropical and subtropical regions affected by dust, wildfire smoke, and pollution. Yet, aerosol detection from satellites is often obstructed by clouds, AErosol RObotic NETwork (AERONET) stations are sparsely distributed, and climatological datasets cannot capture intra-day variability. Global products such as the Copernicus Atmosphere Monitoring Service (CAMS) provide broad coverage but miss local events due to coarse resolution and uncertainties in the underlying emission database. In this study, atmospheric parameters from METeorological Aerodrome Report (METAR) observations and CAMS reanalysis are used as inputs to data-driven models trained on normalized pseudo global horizontal clear sky irradiance (\u003cem\u003eGHI*\u003c/em\u003e\u003csub\u003e\u003cem\u003eCS\u003c/em\u003e\u003c/sub\u003e) targets. Models tested include gradient boosting methods, Random Forests, neural networks, and a quantum variational circuit. The predicted global horizontal clear sky irradiance (\u003cem\u003eGHI\u003c/em\u003e\u003csub\u003e\u003cem\u003eCS\u003c/em\u003e\u003c/sub\u003e) is then used in the Heliosat-3 method, which uses satellite-derived cloud index (CI) to estimate the all-sky global horizontal irradiance (GHI), for benchmarking against the all-sky GHI output of Heliosat-3 coupled with \u003cem\u003eGHI\u003c/em\u003e\u003csub\u003e\u003cem\u003eCS\u003c/em\u003e\u003c/sub\u003e from the physics-based McClear model. Results show the largest root mean squared error (RMSE) reductions of 3–7% under visibility of 6–8 km, with Neural Network and eXtreme Gradient Boosting (XGBoost) achieving the highest overall gain (2.6%). During dust and sand events, performance improves substantially, with Light Gradient-Boosting Machine (LightGBM) achieving a 22% reduction. These findings demonstrate the value of \u003cem\u003eGHI*\u003c/em\u003e\u003csub\u003e\u003cem\u003eCS\u003c/em\u003e\u003c/sub\u003e based machine learning approach for improving solar irradiance estimates in aerosol-rich environments.\u003c/p\u003e","manuscriptTitle":"Data-driven Combination of METAR Observations and CAMS Reanalysis Aerosols to Enhance Satellite Retrieval of Surface Solar Irradiance","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-28 16:25:26","doi":"10.21203/rs.3.rs-7820256/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-11-07T08:36:11+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-06T09:35:26+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-05T22:32:52+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-03T18:48:21+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-29T10:48:01+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"235246532426375589853920673557935352713","date":"2025-10-15T13:21:42+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"154816178473056839258022461873943682758","date":"2025-10-15T13:01:37+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"146611018129675915142986669013823347803","date":"2025-10-15T08:05:51+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"277591926484368610067645019911099316877","date":"2025-10-14T17:14:29+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"223850546632729681139087207635675924226","date":"2025-10-14T12:32:45+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"86444092605796872017368310028981353519","date":"2025-10-14T12:00:24+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-10-14T11:48:03+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-10-14T09:53:41+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-10-14T01:41:58+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-10-14T01:41:02+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-10-09T18:02:03+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"2dc2a54d-2552-42da-9123-e0ea9636feff","owner":[],"postedDate":"October 28th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":56832040,"name":"Earth and environmental sciences/Climate sciences"},{"id":56832041,"name":"Earth and environmental sciences/Environmental sciences"}],"tags":[],"updatedAt":"2026-02-23T16:03:04+00:00","versionOfRecord":{"articleIdentity":"rs-7820256","link":"https://doi.org/10.1038/s41598-026-39971-w","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2026-02-16 15:59:25","publishedOnDateReadable":"February 16th, 2026"},"versionCreatedAt":"2025-10-28 16:25:26","video":"","vorDoi":"10.1038/s41598-026-39971-w","vorDoiUrl":"https://doi.org/10.1038/s41598-026-39971-w","workflowStages":[]},"version":"v1","identity":"rs-7820256","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7820256","identity":"rs-7820256","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00