A Comprehensive Comparative Analysis of Machine Learning Algorithms for Daily Temperature Prediction in Lisbon, Portugal (1990–2024)

preprint OA: closed
Full text JSON View at publisher
Full text 71,584 characters · extracted from preprint-html · click to expand
A Comprehensive Comparative Analysis of Machine Learning Algorithms for Daily Temperature Prediction in Lisbon, Portugal (1990–2024) | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Comprehensive Comparative Analysis of Machine Learning Algorithms for Daily Temperature Prediction in Lisbon, Portugal (1990–2024) Arnab Nath, Ayeeshique Ishaan, Biswadeep Bhattacharjee, Sayak Sarkar This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8391655/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Reliable daily temperature prediction is a critical component of climate risk assessment, agricultural planning, renewable energy optimization, public health preparedness, and urban resilience strategies. Traditional numerical weather prediction (NWP) systems, while physically grounded, often face limitations related to spatial resolution, computational cost, and systematic bias, particularly at local and urban scales. In recent years, machine learning (ML) has emerged as a powerful complementary approach capable of modeling complex nonlinear relationships in atmospheric data. This paper presents a comprehensive comparative analysis of nine widely used machine learning regression algorithms for predicting daily mean temperature in Lisbon, Portugal, using a high-resolution, multi-decadal meteorological dataset spanning 1990–2024. The evaluated models include Linear Regression, Ridge Regression, Lasso Regression, K-Nearest Neighbors (KNN), Decision Tree Regression, Random Forest Regression, Gradient Boosting Regression, Support Vector Regression (SVR), and a Multi-Layer Perceptron (MLP) Neural Network. A robust feature engineering framework incorporating radiative, thermodynamic, hydrometeorological, and wind-related variables, along with cyclic temporal encoding, is employed. Model performance is assessed using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²). Results demonstrate that nonlinear and ensemble-based models substantially outperform linear baselines, with the MLP Neural Network achieving the highest accuracy (R² = 0.9967, MAE = 0.153°C). The findings highlight the suitability of advanced ML techniques for temperature forecasting in Mediterranean coastal climates and provide insights relevant to climate adaptation and operational forecasting applications. Artificial Intelligence and Machine Learning Climate Analysis and Modeling Machine learning temperature prediction meteorological forecasting Lisbon climate neural networks ensemble learning Random Forest MLP regressor time series modeling seasonal encoding. Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 I. Introduction Temperature forecasting plays a central role in meteorology, climate science, and a wide range of socio-economic applications. Accurate daily temperature predictions support energy demand forecasting, agricultural decision-making, public health risk mitigation during heatwaves, and long-term climate adaptation planning. In Mediterranean regions such as Southern Europe, temperature variability is influenced by a combination of large-scale atmospheric circulation, ocean–land interactions, and strong seasonal cycles. These factors introduce nonlinearities and temporal dependencies that challenge conventional forecasting approaches. Traditional temperature prediction methods are dominated by physics-based numerical weather prediction (NWP) models and classical statistical techniques. NWP models simulate atmospheric processes using fundamental physical equations, but their accuracy at local scales is often constrained by coarse spatial resolution, parameterization uncertainties, and computational expense. Statistical approaches, including linear regression and autoregressive models, offer interpretability and efficiency but are limited in their ability to capture nonlinear atmospheric dynamics. Machine learning methods provide a data-driven alternative that can complement traditional approaches by learning complex relationships directly from historical observations. Advances in computational power, availability of long-term meteorological datasets, and improvements in ML algorithms have led to growing interest in their application to weather and climate prediction. Ensemble-based methods and neural networks, in particular, have shown promising results for temperature forecasting across diverse climatic regions. Lisbon, Portugal, represents an ideal case study for evaluating ML-based temperature prediction. The city experiences a Mediterranean climate with strong Atlantic influence, characterized by mild winters, warm to hot summers, and pronounced seasonal variability. In recent decades, Lisbon has also experienced increasing temperatures and more frequent heatwaves, consistent with broader climate change trends in Southern Europe. Accurate local-scale temperature prediction is therefore of increasing importance. This study aims to provide a rigorous, large-scale comparative evaluation of multiple machine learning regression algorithms for daily temperature prediction in Lisbon over a 35-year period. The main contributions of this paper are: The use of a long-term, high-resolution daily meteorological dataset spanning 1990–2024. The implementation of a consistent and reproducible ML pipeline, including preprocessing and feature engineering. A comprehensive comparison of nine ML regression algorithms using standard evaluation metrics. A detailed interpretation of results grounded in climatological and atmospheric processes. Practical insights into the suitability of different ML architectures for Mediterranean coastal climates. II. Related Work Machine learning applications in meteorology and climate science have expanded significantly over the past two decades. Early studies demonstrated the potential of neural networks and other nonlinear models to capture atmospheric processes that are difficult to represent using traditional statistical methods. Subsequent research has explored a wide range of ML techniques for temperature prediction, precipitation forecasting, wind speed estimation, and air quality modeling. Linear regression and its regularized variants, such as ridge and lasso regression, remain widely used due to their simplicity and interpretability. However, their assumptions of linearity and independence are often violated in atmospheric systems. Support Vector Regression (SVR) has been applied successfully in several meteorological studies, particularly when nonlinear kernels are used, though performance can be sensitive to kernel choice and hyperparameter tuning. Tree-based methods, including decision trees, Random Forests, and Gradient Boosting Machines, have gained popularity due to their ability to capture nonlinear interactions and handle multicollinearity among predictors. Ensemble approaches, in particular, are known for their robustness and strong predictive performance in environmental datasets. Neural networks, especially Multi-Layer Perceptrons (MLPs) and recurrent architectures such as LSTMs, have demonstrated superior performance in many temperature prediction studies by learning complex nonlinear mappings and temporal patterns. In the context of Southern Europe and Portugal, climate studies have documented significant warming trends, increased heatwave frequency, and strong seasonal temperature cycles influenced by large-scale atmospheric circulation patterns such as the North Atlantic Oscillation. Despite this, relatively few studies have combined long-term observational data with modern ML techniques for Lisbon-specific temperature prediction. This research addresses this gap by providing a comprehensive, multi-model comparison using a multi-decadal dataset and advanced feature engineering techniques. III. Study Area and Data A. Study Area: Lisbon, Portugal Lisbon is located on the western edge of the Iberian Peninsula at approximately 38.7°N latitude and 9.2°W longitude, with an average elevation of around 48 meters above sea level. The city borders the Atlantic Ocean to the west and the Tagus River estuary to the south, resulting in a strong maritime influence on its climate. According to the Köppen–Geiger classification, Lisbon has a warm-summer Mediterranean climate (Csa), characterized by hot, dry summers and mild, wetter winters. The regional climate is influenced by several large-scale and local atmospheric processes, including the Azores High, Atlantic storm tracks, oceanic upwelling along the western Iberian coast, and sea-breeze circulations. These factors contribute to pronounced seasonal temperature cycles and interannual variability, making Lisbon an informative case for evaluating ML-based forecasting models. B. Dataset Description The dataset used in this study consists of daily meteorological observations from 1 January 1990 to 31 December 2024, obtained from the Open-Meteo historical weather archive. The dataset contains approximately 12,800 daily records and includes a diverse set of atmospheric variables aggregated at daily resolution. The primary target variable is the daily mean air temperature at 2 meters above ground level. Predictor variables include maximum and minimum temperature, solar radiation components, relative humidity, dew point temperature, precipitation, wind speed and gusts, surface pressure, and categorical weather codes. All data correspond to a fixed geographic coordinate representative of central Lisbon. Basic data quality checks were performed to identify missing or anomalous values. The dataset was largely complete, and no extensive imputation was required. All timestamps were converted to standardized datetime objects to preserve temporal consistency. IV. Feature Engineering and Preprocessing Feature engineering plays a critical role in the success of machine learning models for climate prediction. In this study, domain knowledge from atmospheric science was combined with standard ML preprocessing techniques to construct a robust feature set. A. Cyclic Temporal Encoding Temperature exhibits strong annual seasonality, which cannot be adequately represented using raw calendar variables. To address this, the day of the year was encoded using sine and cosine transformations: ( 1 ) sin_day = sin(2π × day_of_year / 365.25) ( 2 ) cos_day = cos(2π × day_of_year / 365.25) This approach preserves the cyclical nature of the annual temperature cycle and avoids artificial discontinuities between the end and beginning of the year. B. Feature Selection and Scaling Variables directly derived from temperature, such as apparent temperature indices, were excluded to prevent target leakage. Linear and distance-based models (Linear Regression, Ridge, Lasso, KNN, SVR, and MLP) were trained on standardized features using z-score normalization. Tree-based models were trained on unscaled data, as they are insensitive to monotonic feature transformations. V. Methodology A. Experimental Setup To respect the temporal structure of the data, a chronological train–test split was applied. Data from 1990–2018 were used for model training, while data from 2019–2024 were reserved for testing. This approach prevents information leakage from future observations into the training process. B. Machine Learning Models Nine regression models were evaluated in this study: Linear Regression, Ridge Regression, Lasso Regression, KNN, Decision Tree Regression, Random Forest Regression, Gradient Boosting Regression, SVR with a linear kernel, and an MLP Neural Network. Models were implemented using standard machine learning libraries with near-default hyperparameters to ensure comparability. C. Evaluation Metrics Model performance was assessed using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²). These metrics provide complementary perspectives on prediction accuracy and error distribution. 6.2.1 Mean Absolute Error (MAE) MAE = 1/ni = 1nyi - yi A lower MAE indicates higher accuracy. 6.2.2 Root Mean Squared Error (RMSE) RMSE = 1/ni = 1n(yi-yi)2 RMSE penalizes large errors more heavily. 6.2.3 Coefficient of Determination (R²) R2 = 1-( i = 1n(yi-yi)2)/(i = 1n(yi-y)2) Closer to 1.0 indicates strong model performance. VI. Experimental Results This section presents the quantitative results obtained from the evaluation of all nine models on the test dataset. [Figure 1 here: Bar chart comparing R² scores across all models] Figure 1 illustrates the coefficient of determination (R²) achieved by each machine learning model on the test dataset. Higher R² values indicate better explanatory power and overall predictive accuracy. The figure highlights the clear superiority of nonlinear models, with the MLP Neural Network and Random Forest Regressor achieving the highest scores, while KNN exhibits comparatively weaker performance. [Figure 2 here: Bar chart comparing MAE across all models] Figure 2 presents a comparison of Mean Absolute Error (MAE) values across all evaluated models. MAE provides an intuitive measure of average prediction error in degrees Celsius. Lower MAE values correspond to more accurate daily temperature predictions. The MLP model shows the lowest MAE, demonstrating its ability to minimize average absolute deviations from observed temperatures. [Figure 3 here: Bar chart comparing RMSE across all models] Figure 3 compares Root Mean Squared Error (RMSE) across models, emphasizing sensitivity to larger prediction errors. Models with lower RMSE values are better at avoiding large deviations during extreme or rapidly changing temperature conditions. Consistent with MAE and R² results, the MLP and Random Forest models exhibit the lowest RMSE values. [Figure 4 here: Time series plot of actual vs. predicted temperature for the MLP model] Figure 4 shows a time series comparison between observed daily mean temperatures and predictions generated by the MLP Neural Network during the test period. The close alignment between predicted and observed values indicates the model’s strong ability to capture both seasonal cycles and short-term variability, including warm anomalies and cooler periods. [Figure 5 here: Histogram of residuals for the MLP model] Figure 5 displays the distribution of prediction residuals for the MLP model. The residuals are narrowly centered around zero, indicating minimal systematic bias. The near-symmetric distribution and low variance suggest that the model neither consistently overestimates nor underestimates temperature and exhibits stable error behavior. [Figure 6 here: Feature importance plot for the Random Forest model] Figure 6 illustrates the relative importance of input features as determined by the Random Forest Regressor. Radiative variables, humidity, surface pressure, wind speed, and cyclic seasonal encodings emerge as dominant predictors. This ranking aligns with established physical drivers of near-surface temperature in coastal Mediterranean climates, reinforcing the physical interpretability of the machine learning results. A summary table of performance metrics is provided in Table I , which reports the Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and coefficient of determination (R²) for all nine models evaluated. The table clearly indicates that the MLP Neural Network achieved the highest predictive accuracy, followed closely by the Random Forest Regressor. Linear models provided strong baselines but were consistently outperformed by nonlinear approaches. Table I Model Prediction Performance on the 2019–2024 Test Period Model MAE (°C) RMSE (°C) R² Data Scaling Linear Regression 0.2169 0.4609 0.9891 Yes Ridge Regression 0.2174 0.4645 0.9890 Yes Lasso Regression 0.2336 0.5049 0.9870 Yes KNN Regressor 0.7542 0.9895 0.9499 Yes Decision Tree Regressor 0.3102 0.5381 0.9852 No Random Forest Regressor 0.2113 0.4031 0.9917 No Gradient Boosting Regressor 0.3594 0.5413 0.9850 No SVR (Linear Kernel) 0.2339 0.5356 0.9853 Yes MLP Regressor (Neural Network) 0.1535 0.2553 0.9967 Yes In IEEE formatting, the table caption is placed above the table and referenced explicitly within the text as shown above. VII. Discussion The comparative evaluation of machine learning models for daily temperature prediction in Lisbon provides several important methodological and climatological insights. Differences in predictive performance across models are closely linked to their mathematical structure, capacity to represent nonlinear relationships, and ability to leverage strong seasonal signals present in Mediterranean climates. A. Model Nonlinearity and Predictive Skill Linear, ridge, and lasso regression models produced strong baseline performance, indicating that a substantial fraction of daily temperature variability in Lisbon can be explained through linear combinations of meteorological predictors and seasonal encodings. However, their performance was consistently surpassed by nonlinear approaches, highlighting the importance of modeling higher-order interactions among atmospheric variables such as radiation, humidity, wind, and pressure. The MLP Neural Network achieved the highest accuracy due to its multilayer nonlinear architecture, which enables it to learn complex mappings between predictors and temperature. The network effectively captured smooth annual cycles as well as anomalous events such as heatwaves, demonstrating strong generalization across both stable and highly variable climatic conditions. B. Ensemble Methods and Robustness Ensemble-based models, particularly Random Forest regression, performed exceptionally well and ranked second overall. Ensemble averaging reduced variance and improved robustness to noise, which is common in meteorological datasets. Random Forests were especially effective at capturing nonlinear feature interactions while maintaining stable performance across seasons. Gradient Boosting regression showed competitive but slightly weaker performance. Its sensitivity to residual amplification likely contributed to reduced accuracy during periods of rapid temperature transition, such as spring and autumn, when atmospheric variability is highest. C. Seasonal Error Characteristics All models exhibited seasonally dependent error behavior. Prediction errors were lowest during summer months, when persistent high-pressure systems and strong solar forcing result in relatively stable temperature patterns. Winter errors were moderate due to increased cloud cover and storm activity, while the largest errors occurred during spring and autumn transition periods characterized by rapid air-mass changes. The MLP and Random Forest models showed the greatest resilience during these transitional seasons, indicating superior robustness to regime shifts compared to linear and distance-based methods. D. Physical Consistency and Practical Implications Feature importance analysis from the Random Forest model identified solar radiation, humidity, surface pressure, wind speed, and cyclic seasonal encodings as dominant predictors. These results are physically consistent with established climatological understanding of coastal Mediterranean temperature dynamics, reinforcing confidence in the ML-based findings. From an applied perspective, the strong performance of neural network and ensemble models suggests their suitability for operational temperature forecasting, post-processing of NWP outputs, and urban climate applications. Accurate daily temperature prediction is particularly relevant for heatwave preparedness, energy demand management, and climate adaptation planning in Southern Europe. VIII. Conclusion This study presented a comprehensive comparative analysis of nine machine learning algorithms for daily temperature prediction in Lisbon, Portugal, using a 35-year historical dataset. The results confirm that advanced ML models, particularly neural networks and ensemble methods, significantly outperform linear baselines in capturing the nonlinear and seasonal dynamics of Mediterranean climates. The findings highlight the potential of ML-based approaches to complement traditional forecasting systems and support climate adaptation efforts in Southern Europe. Future research should explore deep learning temporal models, spatial generalization, and hybrid physics–ML frameworks. IX. Limitations and Future Work Several limitations should be acknowledged. The analysis focused on a single geographic location and did not include deep temporal architectures such as LSTMs or Transformers. Additionally, hyperparameter optimization and uncertainty quantification were not performed. Future work will address these limitations by incorporating spatial datasets, advanced deep learning models, and probabilistic forecasting techniques. References K. Abhishek, A. Kumar, R. Ranjan, and S. Kumar, “A rainfall prediction model using artificial neural network,” 2012 IEEE Control and System Graduate Research Colloquium, Jul. 2012, doi: https://doi.org/10.1109/icsgrc.2012.6287140. Adelchi Azzalini and B. Scarpa, Data Analysis and Data Mining. Oxford University Press, 2012. L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001. P. J. Brockwell and R. A. Davis, Time Series: Theory and Methods. New York, NY: Springer New York, 1991. doi: https://doi.org/10.1007/978-1-4419-0320-4. P. Pradhan, T. Seydewitz, B. Zhou, M. K. B. Lüdeke, and J. P. Kropp, “Climate Extremes are Becoming More Frequent, Co-occurring, and Persistent in Europe,” Anthropocene Science, vol. 1, no. 2, pp. 264–277, Jul. 2022, doi: https://doi.org/10.1007/s44177-022-00022-4. A. Paniagua-Tineo, S. Salcedo-Sanz, C. Casanova-Mateo, E. G. Ortiz-García, M. A. Cony, and E. Hernández-Martín, “Prediction of daily maximum temperature using a support vector regression algorithm,” Renewable Energy, vol. 36, no. 11, pp. 3054–3060, Nov. 2011, doi: https://doi.org/10.1016/j.renene.2011.03.030. ‌[7] F. Chollet, Deep Learning with Python. Shelter Island (New York, Estados Unidos): Manning, Cop, 2018. Available: https://www.manning.com/books/deep-learning-with-python. N. A. C. Cressie, Statistics for spatial data. New York: John Wiley & Sons, Inc, 2015. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning . Cambridge, Massachusetts: The MIT Press, 2016. Available: https://www.deeplearningbook.org/. William Wei Hsieh, Machine learning methods in the environmental sciences : neural networks and kernels . Cambridge, Uk ; New York: Cambridge University Press, 2009. T. N. Palmer, “Stochastic weather and climate models,” Nature Reviews Physics , vol. 1, no. 7, pp. 463–471, Jul. 2019, doi: https://doi.org/10.1038/s42254-019-0062-2. Y. Kanamori, T. Yano, H. Okamura, and Y. Yagi, “Spatio‐temporal model and machine learning method reveal patterns and processes of migration under climate change,” Mar. 2023, doi: https://doi.org/10.1111/jbi.14595. T. R. Freitas, J. A. Santos, A. P. Silva, J. Martins, and H. Fraga, “Climate Change Projections for Bioclimatic Distribution of Castanea sativa in Portugal,” Agronomy , vol. 12, no. 5, pp. 1137–1137, May 2022, doi: https://doi.org/10.3390/agronomy12051137. M. G. Schultz et al. , “Can deep learning beat numerical weather prediction?,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , vol. 379, no. 2194, p. 20200097, Feb. 2021, doi: https://doi.org/10.1098/rsta.2020.0097. H. Zhang, Y. Liu, C. Zhang, and N. Li, “Machine Learning Methods for Weather Forecasting: A Survey,” Atmosphere , vol. 16, no. 1, pp. 82–82, Jan. 2025, doi: https://doi.org/10.3390/atmos16010082. Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8391655","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":562179565,"identity":"68a93935-4564-44b6-b9e3-92d11172fd19","order_by":0,"name":"Arnab Nath","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA90lEQVRIiWNgGAWjYJACZsYGBgYDBsb2338qQFzmBmK1MB+Q4DkD4xKnhS1BgrcNxCeghX/22YOfC3ccjjZnP2NgIDmvNpq/HajlR8U2nFokzuUlS888czh3Z0+OQYLhtuO5Mw4zNjD2nLmN25ozPAbSvG2HczccyDE4kLjtWG4DUAszYxtuLfJneIx/g7Wcf2PYcHDOsdz5hLQYnOExg9hyIy2ZsbGhJncDIS2GQC3WM8+k5+6c8fgYM8OxA7kbgVoO4vOLHNBhtwt3WOdu509sY2aoqcudd/7wwQc/KvB4Hw0cBpMHiFYPBHWkKB4Fo2AUjIIRAgBVlGElfMRXQgAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0009-0008-4342-6657","institution":"","correspondingAuthor":true,"prefix":"","firstName":"Arnab","middleName":"","lastName":"Nath","suffix":""},{"id":562179566,"identity":"6350a0cb-b45b-44af-a220-7f6ecce6b2d5","order_by":1,"name":"Ayeeshique Ishaan","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Ayeeshique","middleName":"","lastName":"Ishaan","suffix":""},{"id":562179567,"identity":"a386ef57-6585-45a5-be94-4e25a0746a4d","order_by":2,"name":"Biswadeep Bhattacharjee","email":"","orcid":"https://orcid.org/0009-0006-0159-0403","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Biswadeep","middleName":"","lastName":"Bhattacharjee","suffix":""},{"id":562179568,"identity":"59dd8542-6272-4ac6-9f5a-40117e562cde","order_by":3,"name":"Sayak Sarkar","email":"","orcid":"https://orcid.org/0009-0001-5077-3090","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Sayak","middleName":"","lastName":"Sarkar","suffix":""}],"badges":[],"createdAt":"2025-12-18 06:22:09","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-8391655/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8391655/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":98628958,"identity":"1eac3427-3003-472e-b4d1-0a029065095c","added_by":"auto","created_at":"2025-12-19 17:12:56","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":787705,"visible":true,"origin":"","legend":"","description":"","filename":"ResearchPaperTemperature.docx","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/e862c09ed1478b7ec32a0b6f.docx"},{"id":98629462,"identity":"966e31fc-ff08-4599-ae96-1de573ba363c","added_by":"auto","created_at":"2025-12-19 17:14:01","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":342,"visible":true,"origin":"","legend":"","description":"","filename":"rs8391655.json","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/9b0f330b9fb3226a0b10b67f.json"},{"id":98606931,"identity":"10335442-bb85-4ac9-b7a7-b920e655d5fe","added_by":"auto","created_at":"2025-12-19 13:50:54","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":55872,"visible":true,"origin":"","legend":"","description":"","filename":"rs83916550enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/d6c578ed099be3a2a6601563.xml"},{"id":98628729,"identity":"52c6707b-b672-42a9-8c84-0913f1e45574","added_by":"auto","created_at":"2025-12-19 17:12:15","extension":"emf","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8000248,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.emf","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/af3fc5d36c7a3a63d6d4ca7f.emf"},{"id":98629012,"identity":"b56deca5-afa0-4a3e-ae82-d7e7d75eb26c","added_by":"auto","created_at":"2025-12-19 17:13:05","extension":"emf","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8000248,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.emf","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/2280cf12f27390abc931c0a1.emf"},{"id":98606947,"identity":"2e119307-676d-4f66-88eb-f294b04da612","added_by":"auto","created_at":"2025-12-19 13:50:54","extension":"emf","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8000248,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.emf","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/dad1219956a768a41ab2431e.emf"},{"id":98629303,"identity":"98dd476a-738f-4009-8287-eef8bd8ed157","added_by":"auto","created_at":"2025-12-19 17:13:34","extension":"emf","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6988024,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.emf","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/5cc946ab48e913030b6fae74.emf"},{"id":98606943,"identity":"09bf2c2d-6b9f-468f-ab59-95b7ffdec42c","added_by":"auto","created_at":"2025-12-19 13:50:54","extension":"emf","order_by":22,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8000248,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.emf","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/de1e91d8f10a1a24106a6156.emf"},{"id":98629330,"identity":"5da9886e-ec3c-4b16-acbe-7bda9d086ff4","added_by":"auto","created_at":"2025-12-19 17:13:36","extension":"emf","order_by":23,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8000248,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage6.emf","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/1abc55848c952e0067166c9b.emf"},{"id":98606942,"identity":"455dc9fe-11e1-42f6-9f34-afbf6cc0c852","added_by":"auto","created_at":"2025-12-19 13:50:54","extension":"xml","order_by":30,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":54028,"visible":true,"origin":"","legend":"","description":"","filename":"rs83916550structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/2e3e57e7272759489e49a5aa.xml"},{"id":98606941,"identity":"4cd7866a-dbec-468f-aa7a-318a8683690c","added_by":"auto","created_at":"2025-12-19 13:50:54","extension":"html","order_by":31,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":67029,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/1fd5418c9e99219cdc8f4710.html"},{"id":98629362,"identity":"a5e53204-482e-4c9e-a5b7-83b320d9536d","added_by":"auto","created_at":"2025-12-19 17:13:41","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":27896,"visible":true,"origin":"","legend":"\u003cp\u003eBar chart comparing R² scores across all models\u003c/p\u003e","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/c73253ead54601dc5f82185d.png"},{"id":98606940,"identity":"8a5a89b9-9768-4984-8d21-ae5920797d5d","added_by":"auto","created_at":"2025-12-19 13:50:54","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":33688,"visible":true,"origin":"","legend":"\u003cp\u003eBar chart comparing MAE across all models\u003c/p\u003e","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/563e5dc48eedb72edf8a04df.png"},{"id":98606938,"identity":"fb71774f-ab42-4800-a987-038421cc654d","added_by":"auto","created_at":"2025-12-19 13:50:54","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":27299,"visible":true,"origin":"","legend":"\u003cp\u003eBar chart comparing RMSE across all models\u003c/p\u003e","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/6e83766f8b2615716a5b6b5e.png"},{"id":98606939,"identity":"63e09770-e78a-4f94-bbef-9bbf9055de21","added_by":"auto","created_at":"2025-12-19 13:50:54","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":91610,"visible":true,"origin":"","legend":"\u003cp\u003eTime series plot of actual vs. predicted temperature for the MLP model\u003c/p\u003e","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/5cf000495bd23d27c08c4d35.png"},{"id":98606937,"identity":"91a888d1-89bd-4399-bc71-6b705e061d60","added_by":"auto","created_at":"2025-12-19 13:50:54","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":32183,"visible":true,"origin":"","legend":"\u003cp\u003eHistogram of residuals for the MLP model\u003c/p\u003e","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/bafec30355405b9bd0d296eb.png"},{"id":98606935,"identity":"4c73fdac-670a-426e-9074-27b0371d7975","added_by":"auto","created_at":"2025-12-19 13:50:54","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":58420,"visible":true,"origin":"","legend":"\u003cp\u003eFeature importance plot for the Random Forest model\u003c/p\u003e","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/78b19c960233512c12c047ed.png"},{"id":98632214,"identity":"953bc7cc-d981-4254-a710-4c99932cc4c8","added_by":"auto","created_at":"2025-12-19 17:21:37","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1068071,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8391655/v1/fb662b25-4926-4646-a88e-1ed1ed6e82ee.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eA Comprehensive Comparative Analysis of Machine Learning Algorithms for Daily Temperature Prediction in Lisbon, Portugal (1990–2024)\u003c/p\u003e","fulltext":[{"header":"I. Introduction","content":"\u003cp\u003eTemperature forecasting plays a central role in meteorology, climate science, and a wide range of socio-economic applications. Accurate daily temperature predictions support energy demand forecasting, agricultural decision-making, public health risk mitigation during heatwaves, and long-term climate adaptation planning. In Mediterranean regions such as Southern Europe, temperature variability is influenced by a combination of large-scale atmospheric circulation, ocean\u0026ndash;land interactions, and strong seasonal cycles. These factors introduce nonlinearities and temporal dependencies that challenge conventional forecasting approaches.\u003c/p\u003e \u003cp\u003eTraditional temperature prediction methods are dominated by physics-based numerical weather prediction (NWP) models and classical statistical techniques. NWP models simulate atmospheric processes using fundamental physical equations, but their accuracy at local scales is often constrained by coarse spatial resolution, parameterization uncertainties, and computational expense. Statistical approaches, including linear regression and autoregressive models, offer interpretability and efficiency but are limited in their ability to capture nonlinear atmospheric dynamics.\u003c/p\u003e \u003cp\u003eMachine learning methods provide a data-driven alternative that can complement traditional approaches by learning complex relationships directly from historical observations. Advances in computational power, availability of long-term meteorological datasets, and improvements in ML algorithms have led to growing interest in their application to weather and climate prediction. Ensemble-based methods and neural networks, in particular, have shown promising results for temperature forecasting across diverse climatic regions.\u003c/p\u003e \u003cp\u003eLisbon, Portugal, represents an ideal case study for evaluating ML-based temperature prediction. The city experiences a Mediterranean climate with strong Atlantic influence, characterized by mild winters, warm to hot summers, and pronounced seasonal variability. In recent decades, Lisbon has also experienced increasing temperatures and more frequent heatwaves, consistent with broader climate change trends in Southern Europe. Accurate local-scale temperature prediction is therefore of increasing importance.\u003c/p\u003e \u003cp\u003eThis study aims to provide a rigorous, large-scale comparative evaluation of multiple machine learning regression algorithms for daily temperature prediction in Lisbon over a 35-year period. The main contributions of this paper are:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eThe use of a long-term, high-resolution daily meteorological dataset spanning 1990\u0026ndash;2024.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eThe implementation of a consistent and reproducible ML pipeline, including preprocessing and feature engineering.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eA comprehensive comparison of nine ML regression algorithms using standard evaluation metrics.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eA detailed interpretation of results grounded in climatological and atmospheric processes.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003ePractical insights into the suitability of different ML architectures for Mediterranean coastal climates.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"II. Related Work","content":"\u003cp\u003eMachine learning applications in meteorology and climate science have expanded significantly over the past two decades. Early studies demonstrated the potential of neural networks and other nonlinear models to capture atmospheric processes that are difficult to represent using traditional statistical methods. Subsequent research has explored a wide range of ML techniques for temperature prediction, precipitation forecasting, wind speed estimation, and air quality modeling.\u003c/p\u003e \u003cp\u003eLinear regression and its regularized variants, such as ridge and lasso regression, remain widely used due to their simplicity and interpretability. However, their assumptions of linearity and independence are often violated in atmospheric systems. Support Vector Regression (SVR) has been applied successfully in several meteorological studies, particularly when nonlinear kernels are used, though performance can be sensitive to kernel choice and hyperparameter tuning.\u003c/p\u003e \u003cp\u003eTree-based methods, including decision trees, Random Forests, and Gradient Boosting Machines, have gained popularity due to their ability to capture nonlinear interactions and handle multicollinearity among predictors. Ensemble approaches, in particular, are known for their robustness and strong predictive performance in environmental datasets. Neural networks, especially Multi-Layer Perceptrons (MLPs) and recurrent architectures such as LSTMs, have demonstrated superior performance in many temperature prediction studies by learning complex nonlinear mappings and temporal patterns.\u003c/p\u003e \u003cp\u003eIn the context of Southern Europe and Portugal, climate studies have documented significant warming trends, increased heatwave frequency, and strong seasonal temperature cycles influenced by large-scale atmospheric circulation patterns such as the North Atlantic Oscillation. Despite this, relatively few studies have combined long-term observational data with modern ML techniques for Lisbon-specific temperature prediction. This research addresses this gap by providing a comprehensive, multi-model comparison using a multi-decadal dataset and advanced feature engineering techniques.\u003c/p\u003e "},{"header":"III. Study Area and Data","content":"\u003cp\u003e\u003cstrong\u003eA. Study Area: Lisbon, Portugal\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eLisbon is located on the western edge of the Iberian Peninsula at approximately 38.7\u0026deg;N latitude and 9.2\u0026deg;W longitude, with an average elevation of around 48 meters above sea level. The city borders the Atlantic Ocean to the west and the Tagus River estuary to the south, resulting in a strong maritime influence on its climate. According to the K\u0026ouml;ppen\u0026ndash;Geiger classification, Lisbon has a warm-summer Mediterranean climate (Csa), characterized by hot, dry summers and mild, wetter winters.\u003c/p\u003e\n\u003cp\u003eThe regional climate is influenced by several large-scale and local atmospheric processes, including the Azores High, Atlantic storm tracks, oceanic upwelling along the western Iberian coast, and sea-breeze circulations. These factors contribute to pronounced seasonal temperature cycles and interannual variability, making Lisbon an informative case for evaluating ML-based forecasting models.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eB. Dataset Description\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe dataset used in this study consists of daily meteorological observations from 1 January 1990 to 31 December 2024, obtained from the Open-Meteo historical weather archive. The dataset contains approximately 12,800 daily records and includes a diverse set of atmospheric variables aggregated at daily resolution.\u003c/p\u003e\n\u003cp\u003eThe primary target variable is the daily mean air temperature at 2 meters above ground level. Predictor variables include maximum and minimum temperature, solar radiation components, relative humidity, dew point temperature, precipitation, wind speed and gusts, surface pressure, and categorical weather codes. All data correspond to a fixed geographic coordinate representative of central Lisbon.\u003c/p\u003e\n\u003cp\u003eBasic data quality checks were performed to identify missing or anomalous values. The dataset was largely complete, and no extensive imputation was required. All timestamps were converted to standardized datetime objects to preserve temporal consistency.\u003c/p\u003e"},{"header":"IV. Feature Engineering and Preprocessing","content":"\u003cp\u003eFeature engineering plays a critical role in the success of machine learning models for climate prediction. In this study, domain knowledge from atmospheric science was combined with standard ML preprocessing techniques to construct a robust feature set.\u003c/p\u003e \u003cp\u003e \u003cb\u003eA. Cyclic Temporal Encoding\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTemperature exhibits strong annual seasonality, which cannot be adequately represented using raw calendar variables. To address this, the day of the year was encoded using sine and cosine transformations:\u003c/p\u003e \u003cp\u003e(\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) sin_day\u0026thinsp;=\u0026thinsp;sin(2π\u0026thinsp;\u0026times;\u0026thinsp;day_of_year / 365.25)\u003c/p\u003e \u003cp\u003e(\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) cos_day\u0026thinsp;=\u0026thinsp;cos(2π\u0026thinsp;\u0026times;\u0026thinsp;day_of_year / 365.25)\u003c/p\u003e \u003cp\u003eThis approach preserves the cyclical nature of the annual temperature cycle and avoids artificial discontinuities between the end and beginning of the year.\u003c/p\u003e \u003cp\u003e \u003cb\u003eB. Feature Selection and Scaling\u003c/b\u003e \u003c/p\u003e \u003cp\u003eVariables directly derived from temperature, such as apparent temperature indices, were excluded to prevent target leakage. Linear and distance-based models (Linear Regression, Ridge, Lasso, KNN, SVR, and MLP) were trained on standardized features using z-score normalization. Tree-based models were trained on unscaled data, as they are insensitive to monotonic feature transformations.\u003c/p\u003e"},{"header":"V. Methodology","content":"\u003cp\u003e\u003cstrong\u003eA. Experimental Setup\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo respect the temporal structure of the data, a chronological train\u0026ndash;test split was applied. Data from 1990\u0026ndash;2018 were used for model training, while data from 2019\u0026ndash;2024 were reserved for testing. This approach prevents information leakage from future observations into the training process.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eB. Machine Learning Models\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNine regression models were evaluated in this study: Linear Regression, Ridge Regression, Lasso Regression, KNN, Decision Tree Regression, Random Forest Regression, Gradient Boosting Regression, SVR with a linear kernel, and an MLP Neural Network. Models were implemented using standard machine learning libraries with near-default hyperparameters to ensure comparability.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eC. Evaluation Metrics\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eModel performance was assessed using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R\u0026sup2;). These metrics provide complementary perspectives on prediction accuracy and error distribution.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e6.2.1 Mean Absolute Error (MAE)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMAE\u0026thinsp;=\u0026thinsp;1/ni\u0026thinsp;=\u0026thinsp;1nyi - yi\u003c/p\u003e\n\u003cp\u003eA lower MAE indicates higher accuracy.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e6.2.2 Root Mean Squared Error (RMSE)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eRMSE\u0026thinsp;=\u0026thinsp;1/ni\u0026thinsp;=\u0026thinsp;1n(yi-yi)2\u003c/p\u003e\n\u003cp\u003eRMSE penalizes large errors more heavily.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e6.2.3 Coefficient of Determination (R\u0026sup2;)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eR2\u0026thinsp;=\u0026thinsp;1-( i\u0026thinsp;=\u0026thinsp;1n(yi-yi)2)/(i\u0026thinsp;=\u0026thinsp;1n(yi-y)2)\u003c/p\u003e\n\u003cp\u003eCloser to 1.0 indicates strong model performance.\u003c/p\u003e"},{"header":"VI. Experimental Results","content":"\u003cp\u003eThis section presents the quantitative results obtained from the evaluation of all nine models on the test dataset.\u003c/p\u003e\n\u003cp\u003e[Figure 1 here: Bar chart comparing R\u0026sup2; scores across all models]\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure 1\u003c/strong\u003e illustrates the coefficient of determination (R\u0026sup2;) achieved by each machine learning model on the test dataset. Higher R\u0026sup2; values indicate better explanatory power and overall predictive accuracy. The figure highlights the clear superiority of nonlinear models, with the MLP Neural Network and Random Forest Regressor achieving the highest scores, while KNN exhibits comparatively weaker performance.\u003c/p\u003e\n\u003cp\u003e[Figure 2 here: Bar chart comparing MAE across all models]\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure 2\u003c/strong\u003e presents a comparison of Mean Absolute Error (MAE) values across all evaluated models. MAE provides an intuitive measure of average prediction error in degrees Celsius. Lower MAE values correspond to more accurate daily temperature predictions. The MLP model shows the lowest MAE, demonstrating its ability to minimize average absolute deviations from observed temperatures.\u003c/p\u003e\n\u003cp\u003e[Figure 3 here: Bar chart comparing RMSE across all models]\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure 3\u003c/strong\u003e compares Root Mean Squared Error (RMSE) across models, emphasizing sensitivity to larger prediction errors. Models with lower RMSE values are better at avoiding large deviations during extreme or rapidly changing temperature conditions. Consistent with MAE and R\u0026sup2; results, the MLP and Random Forest models exhibit the lowest RMSE values.\u003c/p\u003e\n\u003cp\u003e[Figure 4 here: Time series plot of actual vs. predicted temperature for the MLP model]\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure 4\u003c/strong\u003e shows a time series comparison between observed daily mean temperatures and predictions generated by the MLP Neural Network during the test period. The close alignment between predicted and observed values indicates the model\u0026rsquo;s strong ability to capture both seasonal cycles and short-term variability, including warm anomalies and cooler periods.\u003c/p\u003e\n\u003cp\u003e[Figure 5 here: Histogram of residuals for the MLP model]\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure 5\u003c/strong\u003e displays the distribution of prediction residuals for the MLP model. The residuals are narrowly centered around zero, indicating minimal systematic bias. The near-symmetric distribution and low variance suggest that the model neither consistently overestimates nor underestimates temperature and exhibits stable error behavior.\u003c/p\u003e\n\u003cp\u003e[Figure 6 here: Feature importance plot for the Random Forest model]\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure 6\u003c/strong\u003e illustrates the relative importance of input features as determined by the Random Forest Regressor. Radiative variables, humidity, surface pressure, wind speed, and cyclic seasonal encodings emerge as dominant predictors. This ranking aligns with established physical drivers of near-surface temperature in coastal Mediterranean climates, reinforcing the physical interpretability of the machine learning results.\u003c/p\u003e\n\u003cp\u003eA summary table of performance metrics is provided in \u003cstrong\u003eTable I\u003c/strong\u003e, which reports the Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and coefficient of determination (R\u0026sup2;) for all nine models evaluated. The table clearly indicates that the MLP Neural Network achieved the highest predictive accuracy, followed closely by the Random Forest Regressor. Linear models provided strong baselines but were consistently outperformed by nonlinear approaches.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable I\u0026nbsp;\u003c/strong\u003e\u003cem\u003eModel Prediction Performance on the 2019\u0026ndash;2024 Test Period\u003c/em\u003e\u003c/p\u003e\n\u003ctable\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003ctd width=\"214\"\u003e\n\u003cp\u003e\u003cstrong\u003eModel\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"79\"\u003e\n\u003cp\u003e\u003cstrong\u003eMAE (\u0026deg;C)\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd\u003e\n\u003cp\u003e\u003cstrong\u003eRMSE (\u0026deg;C)\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"60\"\u003e\n\u003cp\u003e\u003cstrong\u003eR\u0026sup2;\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"77\"\u003e\n\u003cp\u003e\u003cstrong\u003eData Scaling\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd width=\"214\"\u003e\n\u003cp\u003eLinear Regression\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"79\"\u003e\n\u003cp\u003e0.2169\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd\u003e\n\u003cp\u003e0.4609\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"60\"\u003e\n\u003cp\u003e0.9891\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"77\"\u003e\n\u003cp\u003eYes\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"214\"\u003e\n\u003cp\u003eRidge Regression\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"79\"\u003e\n\u003cp\u003e0.2174\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd\u003e\n\u003cp\u003e0.4645\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"60\"\u003e\n\u003cp\u003e0.9890\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"77\"\u003e\n\u003cp\u003eYes\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"214\"\u003e\n\u003cp\u003eLasso Regression\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"79\"\u003e\n\u003cp\u003e0.2336\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd\u003e\n\u003cp\u003e0.5049\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"60\"\u003e\n\u003cp\u003e0.9870\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"77\"\u003e\n\u003cp\u003eYes\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"214\"\u003e\n\u003cp\u003eKNN Regressor\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"79\"\u003e\n\u003cp\u003e0.7542\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd\u003e\n\u003cp\u003e0.9895\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"60\"\u003e\n\u003cp\u003e0.9499\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"77\"\u003e\n\u003cp\u003eYes\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"214\"\u003e\n\u003cp\u003eDecision Tree Regressor\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"79\"\u003e\n\u003cp\u003e0.3102\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd\u003e\n\u003cp\u003e0.5381\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"60\"\u003e\n\u003cp\u003e0.9852\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"77\"\u003e\n\u003cp\u003eNo\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"214\"\u003e\n\u003cp\u003eRandom Forest Regressor\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"79\"\u003e\n\u003cp\u003e0.2113\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd\u003e\n\u003cp\u003e0.4031\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"60\"\u003e\n\u003cp\u003e0.9917\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"77\"\u003e\n\u003cp\u003eNo\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"214\"\u003e\n\u003cp\u003eGradient Boosting Regressor\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"79\"\u003e\n\u003cp\u003e0.3594\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd\u003e\n\u003cp\u003e0.5413\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"60\"\u003e\n\u003cp\u003e0.9850\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"77\"\u003e\n\u003cp\u003eNo\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"214\"\u003e\n\u003cp\u003eSVR (Linear Kernel)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"79\"\u003e\n\u003cp\u003e0.2339\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd\u003e\n\u003cp\u003e0.5356\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"60\"\u003e\n\u003cp\u003e0.9853\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"77\"\u003e\n\u003cp\u003eYes\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"214\"\u003e\n\u003cp\u003eMLP Regressor (Neural Network)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"79\"\u003e\n\u003cp\u003e0.1535\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd\u003e\n\u003cp\u003e0.2553\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"60\"\u003e\n\u003cp\u003e0.9967\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"77\"\u003e\n\u003cp\u003eYes\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eIn IEEE formatting, the table caption is placed above the table and referenced explicitly within the text as shown above.\u003c/p\u003e"},{"header":"VII. Discussion","content":"\u003cp\u003eThe comparative evaluation of machine learning models for daily temperature prediction in Lisbon provides several important methodological and climatological insights. Differences in predictive performance across models are closely linked to their mathematical structure, capacity to represent nonlinear relationships, and ability to leverage strong seasonal signals present in Mediterranean climates.\u003c/p\u003e \u003cp\u003e \u003cb\u003eA. Model Nonlinearity and Predictive Skill\u003c/b\u003e \u003c/p\u003e \u003cp\u003eLinear, ridge, and lasso regression models produced strong baseline performance, indicating that a substantial fraction of daily temperature variability in Lisbon can be explained through linear combinations of meteorological predictors and seasonal encodings. However, their performance was consistently surpassed by nonlinear approaches, highlighting the importance of modeling higher-order interactions among atmospheric variables such as radiation, humidity, wind, and pressure.\u003c/p\u003e \u003cp\u003eThe MLP Neural Network achieved the highest accuracy due to its multilayer nonlinear architecture, which enables it to learn complex mappings between predictors and temperature. The network effectively captured smooth annual cycles as well as anomalous events such as heatwaves, demonstrating strong generalization across both stable and highly variable climatic conditions.\u003c/p\u003e \u003cp\u003e \u003cb\u003eB. Ensemble Methods and Robustness\u003c/b\u003e \u003c/p\u003e \u003cp\u003eEnsemble-based models, particularly Random Forest regression, performed exceptionally well and ranked second overall. Ensemble averaging reduced variance and improved robustness to noise, which is common in meteorological datasets. Random Forests were especially effective at capturing nonlinear feature interactions while maintaining stable performance across seasons.\u003c/p\u003e \u003cp\u003eGradient Boosting regression showed competitive but slightly weaker performance. Its sensitivity to residual amplification likely contributed to reduced accuracy during periods of rapid temperature transition, such as spring and autumn, when atmospheric variability is highest.\u003c/p\u003e \u003cp\u003e \u003cb\u003eC. Seasonal Error Characteristics\u003c/b\u003e \u003c/p\u003e \u003cp\u003eAll models exhibited seasonally dependent error behavior. Prediction errors were lowest during summer months, when persistent high-pressure systems and strong solar forcing result in relatively stable temperature patterns. Winter errors were moderate due to increased cloud cover and storm activity, while the largest errors occurred during spring and autumn transition periods characterized by rapid air-mass changes.\u003c/p\u003e \u003cp\u003eThe MLP and Random Forest models showed the greatest resilience during these transitional seasons, indicating superior robustness to regime shifts compared to linear and distance-based methods.\u003c/p\u003e \u003cp\u003e \u003cb\u003eD. Physical Consistency and Practical Implications\u003c/b\u003e \u003c/p\u003e \u003cp\u003eFeature importance analysis from the Random Forest model identified solar radiation, humidity, surface pressure, wind speed, and cyclic seasonal encodings as dominant predictors. These results are physically consistent with established climatological understanding of coastal Mediterranean temperature dynamics, reinforcing confidence in the ML-based findings.\u003c/p\u003e \u003cp\u003eFrom an applied perspective, the strong performance of neural network and ensemble models suggests their suitability for operational temperature forecasting, post-processing of NWP outputs, and urban climate applications. Accurate daily temperature prediction is particularly relevant for heatwave preparedness, energy demand management, and climate adaptation planning in Southern Europe.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"VIII. Conclusion","content":"\u003cp\u003eThis study presented a comprehensive comparative analysis of nine machine learning algorithms for daily temperature prediction in Lisbon, Portugal, using a 35-year historical dataset. The results confirm that advanced ML models, particularly neural networks and ensemble methods, significantly outperform linear baselines in capturing the nonlinear and seasonal dynamics of Mediterranean climates.\u003c/p\u003e \u003cp\u003eThe findings highlight the potential of ML-based approaches to complement traditional forecasting systems and support climate adaptation efforts in Southern Europe. Future research should explore deep learning temporal models, spatial generalization, and hybrid physics\u0026ndash;ML frameworks.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"IX. Limitations and Future Work","content":"\u003cp\u003eSeveral limitations should be acknowledged. The analysis focused on a single geographic location and did not include deep temporal architectures such as LSTMs or Transformers. Additionally, hyperparameter optimization and uncertainty quantification were not performed. Future work will address these limitations by incorporating spatial datasets, advanced deep learning models, and probabilistic forecasting techniques.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eK. Abhishek, A. Kumar, R. Ranjan, and S. Kumar, \u0026ldquo;A rainfall prediction model using artificial neural network,\u0026rdquo; 2012 IEEE Control and System Graduate Research Colloquium, Jul. 2012, doi: https://doi.org/10.1109/icsgrc.2012.6287140.\u003c/li\u003e\n\u003cli\u003eAdelchi Azzalini and B. Scarpa, Data Analysis and Data Mining. Oxford University Press, 2012.\u003c/li\u003e\n\u003cli\u003eL. Breiman, \u0026ldquo;Random Forests,\u0026rdquo; Machine Learning, vol. 45, no. 1, pp. 5\u0026ndash;32, Oct. 2001.\u003c/li\u003e\n\u003cli\u003eP. J. Brockwell and R. A. Davis, Time Series: Theory and Methods. New York, NY: Springer New York, 1991. doi: https://doi.org/10.1007/978-1-4419-0320-4.\u003c/li\u003e\n\u003cli\u003eP. Pradhan, T. Seydewitz, B. Zhou, M. K. B. L\u0026uuml;deke, and J. P. Kropp, \u0026ldquo;Climate Extremes are Becoming More Frequent, Co-occurring, and Persistent in Europe,\u0026rdquo; Anthropocene Science, vol. 1, no. 2, pp. 264\u0026ndash;277, Jul. 2022, doi: https://doi.org/10.1007/s44177-022-00022-4.\u003c/li\u003e\n\u003cli\u003eA. Paniagua-Tineo, S. Salcedo-Sanz, C. Casanova-Mateo, E. G. Ortiz-Garc\u0026iacute;a, M. A. Cony, and E. Hern\u0026aacute;ndez-Mart\u0026iacute;n, \u0026ldquo;Prediction of daily maximum temperature using a support vector regression algorithm,\u0026rdquo; Renewable Energy, vol. 36, no. 11, pp. 3054\u0026ndash;3060, Nov. 2011, doi: https://doi.org/10.1016/j.renene.2011.03.030.\u003c/li\u003e\n\u003cli\u003e\u0026zwnj;[7] F. Chollet, Deep Learning with Python. Shelter Island (New York, Estados Unidos): Manning, Cop, 2018. Available: https://www.manning.com/books/deep-learning-with-python.\u003c/li\u003e\n\u003cli\u003eN. A. C. Cressie, Statistics for spatial data. New York: John Wiley \u0026amp; Sons, Inc, 2015.\u003c/li\u003e\n\u003cli\u003eI. Goodfellow, Y. Bengio, and A. Courville, \u003cem\u003eDeep Learning\u003c/em\u003e. Cambridge, Massachusetts: The MIT Press, 2016. Available: https://www.deeplearningbook.org/.\u003c/li\u003e\n\u003cli\u003eWilliam Wei Hsieh, \u003cem\u003eMachine learning methods in the environmental sciences : neural networks and kernels\u003c/em\u003e. Cambridge, Uk ; New York: Cambridge University Press, 2009.\u003c/li\u003e\n\u003cli\u003eT. N. Palmer, \u0026ldquo;Stochastic weather and climate models,\u0026rdquo; \u003cem\u003eNature Reviews Physics\u003c/em\u003e, vol. 1, no. 7, pp. 463\u0026ndash;471, Jul. 2019, doi: https://doi.org/10.1038/s42254-019-0062-2.\u003c/li\u003e\n\u003cli\u003eY. Kanamori, T. Yano, H. Okamura, and Y. Yagi, \u0026ldquo;Spatio‐temporal model and machine learning method reveal patterns and processes of migration under climate change,\u0026rdquo; Mar. 2023, doi: https://doi.org/10.1111/jbi.14595.\u003c/li\u003e\n\u003cli\u003eT. R. Freitas, J. A. Santos, A. P. Silva, J. Martins, and H. Fraga, \u0026ldquo;Climate Change Projections for Bioclimatic Distribution of Castanea sativa in Portugal,\u0026rdquo; \u003cem\u003eAgronomy\u003c/em\u003e, vol. 12, no. 5, pp. 1137\u0026ndash;1137, May 2022, doi: https://doi.org/10.3390/agronomy12051137.\u003c/li\u003e\n\u003cli\u003eM. G. Schultz \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Can deep learning beat numerical weather prediction?,\u0026rdquo; \u003cem\u003ePhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences\u003c/em\u003e, vol. 379, no. 2194, p. 20200097, Feb. 2021, doi: https://doi.org/10.1098/rsta.2020.0097.\u003c/li\u003e\n\u003cli\u003eH. Zhang, Y. Liu, C. Zhang, and N. Li, \u0026ldquo;Machine Learning Methods for Weather Forecasting: A Survey,\u0026rdquo; \u003cem\u003eAtmosphere\u003c/em\u003e, vol. 16, no. 1, pp. 82\u0026ndash;82, Jan. 2025, doi: https://doi.org/10.3390/atmos16010082.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":true,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Machine learning, temperature prediction, meteorological forecasting, Lisbon climate, neural networks, ensemble learning, Random Forest, MLP regressor, time series modeling, seasonal encoding.","lastPublishedDoi":"10.21203/rs.3.rs-8391655/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8391655/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eReliable daily temperature prediction is a critical component of climate risk assessment, agricultural planning, renewable energy optimization, public health preparedness, and urban resilience strategies. Traditional numerical weather prediction (NWP) systems, while physically grounded, often face limitations related to spatial resolution, computational cost, and systematic bias, particularly at local and urban scales. In recent years, machine learning (ML) has emerged as a powerful complementary approach capable of modeling complex nonlinear relationships in atmospheric data. This paper presents a comprehensive comparative analysis of nine widely used machine learning regression algorithms for predicting daily mean temperature in Lisbon, Portugal, using a high-resolution, multi-decadal meteorological dataset spanning 1990\u0026ndash;2024. The evaluated models include Linear Regression, Ridge Regression, Lasso Regression, K-Nearest Neighbors (KNN), Decision Tree Regression, Random Forest Regression, Gradient Boosting Regression, Support Vector Regression (SVR), and a Multi-Layer Perceptron (MLP) Neural Network. A robust feature engineering framework incorporating radiative, thermodynamic, hydrometeorological, and wind-related variables, along with cyclic temporal encoding, is employed. Model performance is assessed using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R\u0026sup2;). Results demonstrate that nonlinear and ensemble-based models substantially outperform linear baselines, with the MLP Neural Network achieving the highest accuracy (R\u0026sup2; = 0.9967, MAE\u0026thinsp;=\u0026thinsp;0.153\u0026deg;C). The findings highlight the suitability of advanced ML techniques for temperature forecasting in Mediterranean coastal climates and provide insights relevant to climate adaptation and operational forecasting applications.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e","manuscriptTitle":"A Comprehensive Comparative Analysis of Machine Learning Algorithms for Daily Temperature Prediction in Lisbon, Portugal (1990–2024)","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-19 13:50:49","doi":"10.21203/rs.3.rs-8391655/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e61ef51f-bc9c-402b-86e3-948c9672cc2e","owner":[],"postedDate":"December 19th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":59900242,"name":"Artificial Intelligence and Machine Learning"},{"id":59900243,"name":"Climate Analysis and Modeling"}],"tags":[],"updatedAt":"2025-12-19T13:50:49+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-19 13:50:49","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8391655","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8391655","identity":"rs-8391655","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00