Fine-tuning a global weather model for superior subseasonal forecasting | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Fine-tuning a global weather model for superior subseasonal forecasting Vateanui SANSINE, Takeshi Izumo, Marania Hopuare, Damien Specq, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5619528/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Accurate subseasonal forecasting is socio-economically critical yet remains a great scientific challenge. Recent advances in machine-learning based global weather forecasting demonstrate superior skill on medium-range (1 to 15 days ahead) and subseasonal-range (15 to 42 days ahead) than the best traditional weather forecasting system. These data-driven models require immense computational resources for training, which are not widely available. Here we show, by using medium-range Graphcast model as pre-trained model and focusing on reducing iterative error accumulation, that fine-tuning is an efficient strategy to achieve impressive results for subseasonal forecasting. Our fine-tuned model GraphFT rapidly converges (trained on just three years of data), and significantly outperforms Graphcast and the leading deterministic traditional subseasonal forecasting system, even outperforming this system’s ensemble mean for key variables. Demonstrating the potential of fine-tuning for improving possibly both atmosphere and ocean forecasts with low computational costs and remarkable results. Earth and environmental sciences/Climate sciences/Atmospheric science/Atmospheric dynamics Physical sciences/Mathematics and computing/Computer science Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Subseasonal forecasting is the process of predicting weather patterns that occur between 2 to 6 weeks or more in the future 1 . It fills the gap between medium-range forecasting, that extends up to 15 days, and seasonal forecasting that gives an outlook of the coming trends months ahead. A skillful subseasonal forecast is paramount across multiple society relevant sectors 2 and for pro-active disaster impact mitigation efforts, since these efforts may take several weeks to implement 3 . The difficulty of implementing accurate subseasonal forecasts originates from the dampening and the loss of atmospheric initial conditions beyond a sufficiently long lead time 4 , iterative error accumulation, as well as the slow changes in boundary conditions such as Sea Surface Temperatures (SSTs), soil moisture and sea-ice components. It is those different time and space scales of atmosphere, land and ocean, and the ability to predict them, that makes subseasonal forecasting a major challenge 5,6,7 . This timescale has long been seen as a “predictability desert” 8 by meteorologists. However, there are important potential sources of predictability for subseasonal timescales to be found in certain large-scale phenomena such as the Madden-Julian Oscillation (MJO) and the El Niño Southern Oscillation (ENSO). Recent advancements in machine learning models represent an alternative to the “traditional” dynamical models, based on fluid mechanics and thermodynamics equations, for weather forecasting at medium-range 9,10,11,12 , subseasonal-range 13 , and for long-term weather and climate forecasting 14 . The data-driven medium-range weather forecasting models only utilize machine-learning techniques and were trained using 40 years of historical data obtained from the European Center for Medium-Range Weather Forecasts (ECMWF) reanalysis v5 (ERA5) 15 . They achieved notable success in forecasting skillfully weather variables globally for up to 10-days ahead, while requiring only a fraction of the computational resources used by traditional dynamical models 9,11 . At longer lead-times, i.e. for subseasonal forecasts, machine learning models have also made significant improvements. However, there are some challenges and limitations 16,17 , notably arising from the incorporation of a limited amount of variables into the models and from the iterative error accumulation. One of the interesting novelty of the recent medium-range models 9,11,12 is that they have a comprehensive amount of meteorological variables compared to earlier studies with a limited number of variables. Indeed, FourCastNet 12 has 20 variables at 5 pressure levels, Pangu-Weather 11 has 5 upper-air atmospheric variables at 13 pressure levels with 4 surface variables and Graphcast 9 possesses 6 upper-air atmospheric variables at 13 pressure levels with 5 surface variables. However, the evaluation of such models has been limited to a 10-day lead-time. Yet, we expect their forecasting skills to rapidly decrease with lead-time, due to iterative error accumulation. Our first objective was to assess their accuracy for subseasonal forecasts and identify methods to enhance their precision. We have thus developed an efficient fine-tuning process, that leads to impressive improvements in forecasting skills, even at 4-week leads, while remaining computationally cost-effective. Fine-tuning in artificial intelligence (AI) is the procedure of adjusting a pre-trained model and adapt it to a new and specific task. When a pre-trained model is fine-tuned, it uses its knowledge acquired from the original task to learn the nuances of the task more efficiently. The primary hypothesis of this work is that by fine-tuning medium-range models, the long-term prediction errors can be decreased. Consequently, the obtained results can surpass those of their non-fine-tuned counterparts, with a much higher computing efficiency than if the model was developed and trained from the very beginning. The process of fine-tuning machine learning models for weather forecasting has already been applied to medium-range timescales, for example 3 pre-trained FuXi 18 models were fine-tuned for optimal forecast performance for one of the forecast time windows: 0–5 days, 5–10 days, and 10–15 days. Additionally, the Graphcast operational model has undergone fine-tuning using HRES data from 2016 to 2021. Nevertheless, it has not yet been accomplished for subseasonal timescales. In this study, we utilize Graphcast as a pre-trained global weather model and adjust it through fine-tuning to get significantly-improved weekly global subseasonal forecasts at a 100 km resolution (1°). The benefit of this strategy is the ability to generate precise deterministic subseasonal forecasts with minimal computational power required for training and without the need for creating a new model structure, which is a time-consuming procedure. This is achieved by exploiting the existing knowledge of medium-range pre-trained models and adjusting it to longer timescales. The proposed fine-tuning strategy utilizes Graphcast as a pre-trained model and adjusts it by substituting its standard input and target data. Graphcast typically utilizes the two of the most recent Earth’s atmospheric states (ERA5), i.e the current time and 6 hours earlier, and forecasts the next atmospheric state 6 hours ahead. We substituted one of the inputs with the mean value from the preceding week in comparison to real-time data. The target data was substituted with the mean value of the week to be predicted. This allows the model to retrain on the relevant timescale of interest for subseasonal forecasts with the advantage of reducing iterative errors accumulation originating from classical data-driven roll-out forecasts. Roll-out forecasts being produced following the principle of feeding the forecast back to the model to have an arbitrarily long lead-time forecast 9 . We implement data-driven subseasonal forecasts that predict weather conditions for the second (7–14 days ahead), third (14–21 days ahead) and fourth (21–28 days ahead) week in advance, here referred to as week 2, week 3 and week 4. The goal is to forecast the outcomes for weeks 2, 3 and 4 in advance using a fine-tuned version of Graphcast, named GraphFT. Our deterministic fine-tuned model GraphFT was compared to the control (deterministic) and perturbed (ensemble) mean reforecasts of the ECMWF Subseasonal to Seasonal (S2S) system, which is recognized as the best ocean-atmosphere modeling system for producing deterministic and probabilistic subseasonal forecasts 19,20 . Despite certain limits and areas for improvement, we argue that fine-tuning medium-range weather models is a viable method for significantly increasing accuracy in subseasonal forecasts with low computational and human-resources costs and with substantial results. Results The present study presents a strategy for fine-tuning the medium-range Graphcast global weather forecasting model to enhance its predictions at longer lead-times, on a subseasonal time-scale. The fine-tuned GraphFT model provides weekly global forecasts with a resolution of 1° for lead-times of 7–14, 14–21 and 21–28 days ahead. The forecasts generated by GraphFT incorporate 6 upper-air atmospheric variables at 13 pressure levels and 5 surface variables. The performance of GraphFT was evaluated against ERA5 reanalysis, considered here as ground truth, and compared to GraphOP, Pangu-Weather, control and ensemble reforecasts from S2S ECMWF. Here we first compare the performances of subseasonal forecasts from our fine-tuned GraphFT model, Graphcast Operational (GraphOP), Pangu-Weather (another ML-based medium-range forecasting model) and control reforecasts (i.e. hindcasts) from ECMWF S2S. The forecasting models are evaluated throughout the 2020–2021 period (which was not used for training/fine-tuning), with three forecast lead-times: week 2, 3 and 4. For each lead-time, the daily forecasts are reduced to their weekly mean before being evaluated against observations, except for GraphFT which was fine-tuned to produce weekly mean forecasts directly. Figure 1 displays the global Anomaly Correlation Coefficient (ACC) associated with Pangu-Weather, GraphOP, GraphFT and ECMWF S2S model for week 3 (ACC is a classical metrics in meteorology to estimate forecast skills and represents the temporal mean of the global spatial correlation between observed and forecasted anomalies, see Methods). The evaluated variables are the surface temperature (T2M), zonal wind at 10 meters (U10), meridional winds at 10 meters (V10) and the total precipitation (TP). Note that only the control S2S is utilized here, as it provides a fair comparison to a deterministic GraphFT. The prospect of implementing an ensemble GraphFT model to further improve forecast accuracy will be discussed in the next section. GraphFT significantly outperforms the control S2S, GraphOP and Pangu-Weather for all variables from this week 3 lead-time onwards. GraphFT is significantly better than its non-fine-tuned version (at least doubling ACC to the square, ACC 2 , i.e. the explained variance), demonstrating that our fine-tuning strategy can considerably improve the forecasts of medium-range models such as GraphOP. These results also confirm the hypothesis that Graphcast’s prediction error, strongly increasing with forecast lead-time because of the roll-out strategy, can be significantly reduced via fine-tuning for longer lead-time forecasting. GraphFT yields significantly better results than the ECMWF S2S control. This finding illustrates that fine-tuning enables an impressive increase in forecast accuracy for 3 weeks ahead. Comparatively, the control S2S model and GraphOP exhibit similar performance across all variables. Pangu-Weather also shows good performance for U10 and V10 but performs poorly for T2M and does not include precipitation forecasts. Figure 2 provides the spatial distribution of the temporal Correlation Coefficient R t between observed and forecasted anomalies (cf. Methods), for air temperature at surface, T2M (see Supplementary Figs. 6, 8 and 10 for the other variables), for the control S2S, GraphOP and GraphFT models, and the differences between GraphFT and both S2S and GraphOP models, still for week 3 lead. The spatial distributions of R t and their differences reveal in general higher values for GraphFT than for GraphOP and S2S over land and ocean, except in a few regions mostly in the tropics. Comparing GraphFT to GraphOP (Fig. 2 e), we observe clear improvements in the subtropical to polar regions, notably in the northeastern and southeastern Pacific and its islands (e.g. Hawaii and French Polynesia (FP), North America, northern pole, the Indian subcontinent to the Tibetan plateau, northeastern China and Japan, North and southeastern Atlantic, Europe. The few regions without improvement are the north-equatorial Pacific, South America, Australia and the central part of Asia. This overall improvement illustrates the benefits of our fine-tuning strategy in enhancing the forecasting skill of medium-range data-driven models for subseasonal time-scales. Comparing GraphFT to control S2S (Fig. 2 d), GraphFT remains overall better, notably in the subtropical to polar regions. In the Tropics, where the ocean-atmosphere coupling is the strongest since SST is the warmest and is strongly coupled to atmospheric deep convection, ECMWF S2S is more difficult to outperform, as it includes full ocean physics, notably equatorial Kelvin and Rossby waves, and oceanic mixed layer thermodynamics, well capturing this coupling. Conversely, Graphcast only captures partially these ocean processes through T2M (strongly related to SST in the tropics). Yet, while ECMWF S2S outperforms GraphOP and GraphFT for the R t point-wise metrics, GraphFT still performs better than S2S in the Tropics for a complementary metrics, R x,t , which assess the skill of a model to predict together temporal and longitudinal variability, and which better synthetizes the evolution of the skill as a function of latitude (Suppl. Fig. S5 ). Concerning land processes, GraphFT performs clearly better than S2S in the Tibetan plateau and northern polar regions, likely because snow cover complex dynamics is difficult to simulate in a model based on physical equations such as ECMWF S2S model. Especially for the Tibetan plateau for week 3 and beyond 21 . This suggests that snow cover intraseasonal variability is more easily forecasted by a ML-based model such as GraphFT. Moreover, GraphFT also outperforms GraphOP and S2S in most regions worldwide for week 3 for the other variables: U10, V10 and TP (Supplementary Figs. 6, 8 and 10). Only around the equator for U10 does S2S slightly outperform GraphFT, when considering both R t and R x,t metrics. The next question that naturally emerges is: how many years of training do we need to fine-tune GraphFT with, in order to obtain accurate subseasonal forecasting skill? Figure 3 presents the ACC for a forecast lead-time of week 3 for T2M, U10, V10 and TP for 4 different models: GraphOP followed by 3 different versions of GraphFT, each of which were respectively trained on 2019, 2018–2019 and 2017-2018-2019 weekly values. Already with one year of training only, the ACC increase from original GraphOP to GraphFT is substantial for all variables. As expected, further increases are apparent for most variables the more years we train GraphFT on (especially for T2M from the 2019 to the 2018–2019 version). However, these increases are not highly significant between the three versions of GraphFT, as confidence intervals from these versions mostly overlap, especially from the 2018–2019 to the 2017-2018-2019 version. The ACCs tend to converge already after 2–3 years, except for U10, with GraphFT-2017-2018-2019 slightly and significantly outperforming GraphFT-2019. For TP, the difference in ACC for the 3 GraphFT models is not significant. Those results show that fine-tuning significantly increases the performance of GraphFT over GraphOP, but the difference of performance between the 3 training sets is overall less significant. In order to improve GraphFT even more, and for its subseasonal forecasting skill to completely converge, one would need a larger number of years for training, which would notably explore a larger range of ENSO conditions. Nevertheless, the outcomes are already highly encouraging with just 3 years for training. We now explore how the forecasting skill comparison between the different forecasting models evolves across lead-times of weeks 2, 3 and 4. Figure 4 shows the ACC values for T2M for GraphOP, GraphFT, S2S control and ensemble reforecasts across these lead-times. Extending GraphOP forecasts for week 4 was considered unnecessary, as GraphFT already outperforms GraphOP at week 3. Therefore, GraphOP forecasts are only included for weeks 2 and 3. For a lead-time of week 2, the S2S ensemble mean model achieved the highest performance, with control S2S and GraphOP closely following. Supplementary Fig. 1, representing the ACC values for each model and for all the evaluated variables during week 2, further indicates that GraphFT already outperforms all other models in terms of TP, but not yet for U10, V10 and T2M. This underperformance from GraphFT could be explained by two facts. First, the standard training of GraphOP is already sufficient and well suited for week 2 (which still stands in the medium-range lead-time). Second, our fine-tuning strategy for GraphFT, has been aimed towards longer lead-times. This weekly mean might be a too strong temporal averaging, to be relevant for a lead-time of week 2. Therefore, it can be inferred that our proposed fine-tuning method is more appropriate for extended lead-times, i.e. weeks 3 and 4. For week 3, GraphFT achieves an ACC score that is comparable to the S2S ensemble mean and significantly surpasses both S2S control and GraphOP for T2M, as aforementioned in Fig. 1 . Regarding week 4, GraphFT emerges as the best forecasting model, surpassing even the S2S ensemble mean for T2M. Supplementary Figs. 3 and 12–20 further elaborate on GraphFT forecasting capabilities for week 4 and for other variables, and their spatial distribution. GraphFT surpasses the control S2S forecast for all variables, which aligns with the findings from week 3. But it now generates comparable results to S2S ensemble mean for U10 and V10 and even outperforms significantly S2S ensemble mean for T2M and TP. An ensemble GraphFT implementation would then certainly outperform S2S ensemble mean for all variables. In line with our previous findings, this result illustrates that GraphOP's plasticity, flexibility, and adaptability can be improved over subseasonal lead times through a pertinent and cost-effective fine-tuning strategy, attaining an accuracy level comparable (or better for T2M and TP) to that of the ECMWF S2S ensemble system, recognized as the most efficient dynamical model for subseasonal forecasts. In order to gain a more synthetic view of the forecasting skill, Fig. 5 displays the mean values of R t coefficient for T2M for GraphFT, S2S control and S2S ensemble for week 4 in several geographical boxes across the globe. When comparing S2S control and GraphFT, we observe that GraphFT significantly outperforms S2S in almost all of the defined geographical boxes, except for the tropical band (30°S to 30°N) where their forecasting skills are equivalent. This finding aligns with Fig. 2 results and highlights once again the effectiveness of fine-tuning in enhancing Graphcast to an accuracy equivalent or better than that of S2S control, with S2S still having an advantage in the tropics thanks to the ocean-atmosphere coupling. Discussion Our proposed fine-tuning strategy on the medium-range Graphcast global weather forecasting model has significantly increased its performance on subseasonal timescales for all considered variables. Indeed, GraphFT performs better than the unperturbed S2S model, Pangu-Weather, and its “not-fine-tuned father” GraphOP during weeks 3 and 4. GraphFT also demonstrates similar skills to that of the ensemble S2S ensemble mean for week 3, and even better skills for week 4 for surface temperature and precipitation. An inherent benefit of the described fine-tuning procedure is its computational cost-effectiveness. Indeed, the fine-tuning process was carried out over a much shorter training period of 3 years (2017-2018-2019), requiring much less data than the usual multidecadal training periods, and using only a 40 Gb A100 GPU. The training process took only around 40 hours and yielded remarkable results that were similar to the S2S ensemble mean. A more comprehensive comparison can be drawn with GraphOP, a model that underwent training on a 32 Gb Cloud Tensor Processing Units (TPU) v4 device, which typically performs about 15 to 30 times better than a GPU, for a duration of around 4 weeks 9 . GraphOP employed a parallelization of data distribution approach that involved 32 TPU devices. Furthermore, they also employed Gradient check-pointing 22 to further decrease the memory usage. Thanks to the efficiency of our fine-tuning approach, several obvious enhancements could be implemented. The first is to enable GraphFT to provide ensemble forecasts at least similar to the ECMWF S2S ensemble system which contains 11 members. GraphFT’s cost-effectiveness may allow us to have much more members than the 11 members. For instance, introducing three distinct types of Perlin noise to the initial states fed into the model, Pangu-Weather 11 generates a 100-member ensemble forecast. FourCastNet 12 uses a Gaussian random noise and added it to the initial conditions using Ensemble Kalman Filtering 23 to obtain a 100-member ensemble forecast. It would also be interesting to extend the lead-time of GraphFT to week 5 and week 6 as changing the target of the model and adapting the fine-tuning process is fairly easy and does not involve any architecture modifications. Indeed, FuXi-S2S 13 , a state-of-the-art machine learning subseasonal range weather forecasting model, provides skillfull data-driven subseasonal forecasts up to 42 days ahead. While an accurate comparison between FuXi-S2S and GraphFT is out of the scope of the present study (FuXi-S2S study having an additional detrending strategy to remove linear long-term trends possibly related to global warming, before evaluating the forecasting skill), a broad comparison can be done. Indeed FuXi-S2S performs as well as S2S ensemble for T2M for example at week 3 and 4. We perform even better than S2S ensemble and by extrapolation we should also perform better than FuXi-S2S for T2M. However, for a proper comparison, a new data processing would have to be implemented. Another possible improvement is the use of other base models for fine-tuning, of higher spatial resolution, such as the 0.25° version of Graphcast or its equivalent at ECMWF, AIFS 24 so as to test our strategy on other systems and to obtain subseasonal forecasts at a higher spatial resolution. These should better capture the small-spatial scale processes such as orographic features, e.g. in mountain ranges such as the Himalayas, and in high islands such as Tahiti in French Polynesia or Big Island in Hawaii archipelagos. The implementation is straight-forward, but obviously needs a larger set of higher-resolution reanalysis data and more memory to retrain the model. The natural next step would be to implement a ‘GraphOAFT’ trained by both ocean and atmosphere variables. Indeed, integrating Sea Surface Temperature (SST) into GraphFT as a predicted variable appears to be an effective approach for enhancing forecast accuracy. Indeed, slow changing SST conditions provide valuable source of predictability at subseasonal time-scales 25,26 . Especially, for regions like the north-equatorial Pacific where the North Equatorial CounterCurrent (NECC) lies below the InterTropical Convergence Zone (ITCZ). Also, integrating land moisture and ice-snow variables can be seen as a relevant way to improve subseasonal forecasts. As winds and precipitation variables are overall better forecasted by GraphFT than GraphOP, the MJO and its different impacts depending of its phases 27 are certainly better forecasted by GraphFT. While a proper evaluation of the MJO skill is out of the scope here, and might require the computation of weekly Real-time Multivariate MJO (RMM) 28 indexes, adding the ocean-atmosphere coupling likely further improves its skill in the Tropics, and thus the MJO forecasting skill. In conclusion, the potential of fine-tuning for improved subseasonal forecasts from cutting-edge medium-range weather forecasting models is undisputed, and yields impressive results, even exceeding the ECMWF S2S ensemble model at week 4 lead-time. Those results are even more impressive considering the computation-cost effectiveness of the process. The best prospect for even better forecasting accuracy is to implement ensemble forecasts integrating ocean variables and other land moisture and ice-snow variables, which would represent a much more complete description of the earth-ocean weather system. This ocean-atmosphere coupling in addition to ensemble forecasts represent the next step for even more precise data-driven subseasonal forecasts. Methods Data ERA5 ERA5 is the fifth iteration of the ECMWF reanalysis dataset and offers a rich array of surface and upper-air variables. It operates at an approximately 25 km horizontal resolution (0.25°) and a temporal resolution of 1 hour, spanning from 1940 to the present day 15 . ERA5 stands as the most comprehensive and precise reanalysis archive globally and is considered the best-known estimation for most atmospheric variables 29,30 . In this study, we utilize 6-hourly ERA5 reanalysis, regridded to a 1° resolution, to fine-tune Graphcast. This data was obtained through the WeatherBench2 10 python package, a publicly available, cloud-optimized ground truth and baseline datasets. It serves as the sole data for fine-tuning Graphcast. With the aim of removing trends from our forecasts and for evaluation, we have calculated a daily climatology using 1-hourly ERA5 reanalysis dataset provided from Corpernicus ECMWF Center. The data spans from 1979 to 2019. The identical dataset, covering the time period from 2020 to 2021, serves as the ground truth (observations) and is employed for evaluating the accuracy of all forecasting models. S2S The S2S prediction project 31 was initiated in 2013 from an international effort to improve and develop various aspects of dynamical subseasonal predictions, including tropical cyclones 32 (TCs). The S2S project has created an extensive dataset containing subseasonal forecasts and reforecasts (also known as hindcasts) from 11 operational and research centers. A key goal of these efforts is to improve the forecasting skill and understand the sources of subseasonal predictability, most especially the MJO, with the best forecasts of the MJO originating from ECMWF 4 . Often, operational subseasonal forecasting models are updated by incorporating recent research discoveries optimized for operational use. For instance, the ECMWF S2S hindcasts are generated on-the-fly by employing the most recent model version available at the time of forecast generation. This study utilizes the hindcasts from ECMWF S2S CY47R3 model cycle with a resolution of 1.5°, as the reference dynamical model (same as FuXi-S2S 13 ) for the forecast verification period from 2020 to 2021. Specifically, the verification process utilizes both the deterministic and probabilistic forecasts, which are obtained by reducing 11 ensemble members to their ensemble mean. For our study, we aimed to examine the variables that have the most influence on daily life. Specifically, we focused on the surface winds U10 and V10 to aid in forecasting wind gusts, the surface temperature T2M for heatwave prevention and the total precipitation for flood control. For their evaluation process, FuXi-S2S 13 employs a different method for data post-processing. In our study, the data processing, as detailed in the previous subsection, consists firstly in removing a daily climatology computed from 1979 to 2019 from our observations and forecasts simultaneously. The main motivation of using a daily climatology was to remove any seasonal trend from the data. This difference in data processing makes the comparison between GraphFT and FuXi-S2S difficult. Models Graphcast Operational The Graphcast medium-range model produces accurate weather forecasts at a resolution of 0.25° and has been evaluated up to 10 days ahead against the top deterministic operational system in the world, the ECMWF’s high-resolution forecast (HRES). HRES has a 9 km horizontal resolution, and is a product of the Integrated Forecasting System (IFS) that produces a 10-day forecast. After regridding to a resolution of 0.25°, Graphcast significantly outperforms HRES on 90% of 1380 verification targets 9 . As stated previously, this particular model takes as input the two ERA5 reanalysis, i.e the current time and 6 hours earlier, and forecasts the next atmospheric state 6 hours ahead. Graphcast is also an autoregressive model meaning that it can be “rolled-out” by feeding its own predictions back in as input, to generate weather forecasts at long lead-times (multiples of 6-hr). The operational Graphcast was trained with 39 years of ERA5 reanalysis data from 1979 to 2017. As stated previously, it took roughly 4 weeks to train the model, on a 32 Go Cloud TPU and now can make accurate predictions in under a minute on a single TPU. Graphcast is based on the Graph Neural Network (GNN) in an “encoder-decoder” configuration and a has a total number of 36.7 million of parameters (demo available at https://github.com/deepmind/graphcast ). Training workflow for fine-tuning Graphcast and develop GraphFT Given the constraints of our local computational resources, we opted to leverage a Google Cloud environment, which grants us access to A100 Graphics Processing Units (GPUs). This allowed us to execute the training, inference, and fine-tuning procedures of the “small” Graphcast model. This “small” model is a version of Graphcast, with a resolution of 1° and reduced computational demands. However, it still includes a wide range of variables, incorporating 6 upper-air atmospheric variables at 13 pressure levels and 5 surface variables. This “small” Graphcast was trained using ERA5 data from 1979 to 2015 and fine-tuned from 2017 to 2019. The training and fine-tuning period were chosen so as to not overlap with each other. The Graphcast model requires two instantaneous ERA5 reanalysis as input, which are 6 hours apart. It then generates a forecast for the ERA5 reanalysis dataset 6 hours ahead. This process has been altered throughout the fine-tuning phase with the precise goal of training Graphcast to excel in subseasonal weekly forecasts. To do this, we followed the procedure outlined in Fig. 6 . In other words, the input of the original Graphcast model was composed of two atmospheric states, the most recent time step before the forecast starts and the previous state separated by a 6-hr timestep. Graphcast aimed to forecast the next 6-hr timestep. While GraphFT input uses as previous state the previous week’s state and aims to forecast week 2, 3 or 4. Then 3 Graphcast models are fine-tuned separately with the custom data for week 2, week 3 and week 4 on 20 epochs, with a learning rate of 1e-4 and with an adam optimizer. The fine-tuning is first carried out on 2019 then 2018 and finally 2017, consecutively. One advantage of predicting directly week 2, 3 or 4 is that there is less error accumulation than on a standard roll-out forecast, which is the principle of feeding the forecast back to the model to obtain longer lead-time forecast. Indeed, in order to obtain subseasonal forecasts from the normal version of Graphcast we would have needed to perform for example 56 roll-out forecast (4 per day) to have an idea of the weather 14 days ahead. With GraphFT one forecast is needed to obtain a 14-day ahead forecast for example. Following the process of fine-tuning, the accuracy of GraphFT's predictions is assessed by comparing them to the forecasts generated by Graphcast Operational and the S2S dynamical models. The metrics utilized for the evaluation process are described in the next Methods section. Metrics and evaluation This section provides an overview of the metrics used in the evaluation process. Before assessment, we remove the daily climatology to all variables to remove any seasonal component that may be present, so as to focus on anomalies. More precisely, we calculated a weekly climatology in order to compare it with the weekly forecasts generated by GraphFT. The weekly climatology was derived from the daily climatology, which was calculated for all relevant variables from 1979 to 2019. Anomaly Correlation Coefficient We computed the Anomaly Correlation Coefficient (ACC) with the following equation: $$\:ACC\left(\tau\:,k\right)=\:\frac{1}{\left|{D}_{eval}\right|}\sum\:_{{t}_{0}ϵ{D}_{eval}}\frac{\sum\:_{iϵ{G}_{1.5^\circ\:}}{a}_{i}{\widehat{A}}_{i,j,k}^{{t}_{0}+\tau\:\:}{A}_{i,j,k}^{{t}_{0}+\tau\:\:}}{\sqrt{\sum\:_{i,jϵ{G}_{1.5^\circ\:}}{a}_{i}({\widehat{A}}_{i,j,k}^{{t}_{0}+\tau\:\:})²\sum\:_{i,jϵ{G}_{1.5^\circ\:}}{a}_{i}({A}_{i,j,k}^{{t}_{0}+\tau\:\:})²}}$$ Where: \(\:{t}_{0}\:ϵ\:{D}_{eval}\) represents forecast initialization date-times in the evaluation dataset \(\:k\:ϵ\:K\) index variables, e.g., k = {T2M, U10, V10, TP} \(\:i,j\:ϵ\:{G}_{1.5^\circ\:}\) are the location (latitude and longitude) coordinates in the grid \(\:{a}_{i}\) is the area of the latitude-longitude grid cell (normalized to unit mean over the grid) which varies with latitude \(\:\tau\:\) refers to the forecast lead time steps added to \(\:{t}_{0}\) \(\:{\widehat{A}}_{i,j,k}^{{t}_{0}+\tau\:\:}\) and \(\:{A}_{i,j,k}^{{t}_{0}+\tau\:\:}\) are predicted and observed anomalies for a given variable, location, and lead time R Correlation Coefficient We also computed the temporal R t Correlation Coefficient (R t ) using the same notation as above: $$\:{\text{R}}_{t}\left(\tau\:,i,j,k\right)=\frac{\sum\:_{{t}_{0}ϵ{D}_{eval}}{\widehat{A}}_{i,j,k}^{{t}_{0}+\tau\:\:}{A}_{i,j,k}^{{t}_{0}+\tau\:\:}}{\sqrt{\sum\:_{{t}_{0}ϵ{D}_{eval}}({\widehat{A}}_{i,j,k}^{{t}_{0}+\tau\:\:})²\sum\:_{{t}_{0}ϵ{D}_{eval}}({A}_{i,j,k}^{{t}_{0}+\tau\:\:})²}}$$ Spatial maps of the R t coefficient are presented in the results section. Latitudinal dependence: the \(\:{R}_{t,x}\left(\tau\:,j,k\right)\) correlation coefficient In order to have a better idea of the latitudinal distribution of R correlation coefficient, the correlation coefficient R t,x was plotted for each latitude. R t,x is directly calculated for each latitude on both time and longitude axis simultaneously with the following equation: $$\:{R}_{t,x}\left(\tau\:,j,k\right)=\frac{\sum\:_{{t}_{0}ϵ{D}_{eval},\:jϵ{G}_{1.5^\circ\:}}{\widehat{A}}_{i,j,k}^{{t}_{0}+\tau\:\:}{A}_{i,j,k}^{{t}_{0}+\tau\:\:}}{\sqrt{\sum\:_{{t}_{0}ϵ{D}_{eval},jϵ{G}_{1.5^\circ\:}}({\widehat{A}}_{i,j,k}^{{t}_{0}+\tau\:\:})²\sum\:_{{t}_{0}ϵ{D}_{eval},jϵ{G}_{1.5^\circ\:}}({A}_{i,j,k}^{{t}_{0}+\tau\:\:})²}}\:$$ Fisher transform was utilized to compute R t,x confidence intervals at a 95% confidence level for each latitude. To estimate the effective number of degrees of freedom, we assumed that each time-step and each 10° of longitudes represent one effective degree of freedom. Bootstrapping for significance testing We adopted a bootstrapping method as a significance test. Bootstrapping creates synthetic datasets by resampling with replacement, e.g. 1000 in this work, from the original data and provides measures of accuracy to sample estimates 33 such as variance or confidence intervals. Bootstrapping is used for computing the 95% confidence intervals for mean global ACC values and for the mean values R t in the geographical boxes throughout this paper. The sample value n is equal to 96, corresponding to one prediction per week for 2 years (2020–2021). Declarations Competing interests No competing interests Author contributions Conceptualization, V.S., T.I., M.H.; methodology, V.S., T.I., M.H., D.S.; software, V.S., T.I., M.H.; validation, V.S., T.I., M.H., D.S.,S.M-L. ; formal analysis, V.S., T.I.; investigation, V.S., T.I.; resources, V.S., M.H. ; data curation, V.S. ; writing—original draft preparation, V.S.; writing—review and editing, V.S., T.I., M.H., D.S, S.M-L.; supervision, T.I., M.H.; project administration, T.I., M.H.; funding acquisition, V.S.,T.I., M.H.,S.M-L.; All authors have read and agreed to the published version of the manuscript. Acknowledgements. This project was carried out in conjunction with the national weather service Météo-France, with the aim of enhancing subseasonal forecasting, notably for French Polynesia, in order to minimize the impact of extreme weather events such as floods, strong winds, droughts, and tropical cyclones likelihood. Météo-France, has the responsibility to supply meteorological forecasts on the continent and its overseas territories. Especially in French Polynesia where the MJO has an important impact 27 , Météo-France provides weekly forecasts that predict weather conditions for the second (7–14 days ahead) and third (14–21 days ahead) week in advance, referred to as week 2 and week 3. We implement data-driven subseasonal weekly forecasts for this purpose. Data availability statement Data and code are provided with this article. The ECMWF S2S data were obtained from https://apps.ecmwf.int/datasets/data/s2s/ . For training and testing GraphFT, we downloaded a subset of the ERA5 dataset from https://cds.climate.copernicus.eu/ , the official website of Copernicus Climate Data (CDS). References Next Generation Earth System Prediction: Strategies for Subseasonal to Seasonal Forecasts . (National Academies Press, Washington, D.C., 2016). doi:10.17226/21873. Richter, J. H. et al. Quantifying sources of subseasonal prediction skill in CESM2. Npj Clim. Atmospheric Sci. 7 , 1–9 (2024). Coughlan de Perez, E. et al. Action-based flood forecasting for triggering humanitarian action. Hydrol. Earth Syst. Sci. 20 , 3549–3560 (2016). Vitart, F. & Robertson, A. W. The sub-seasonal to seasonal prediction project (S2S) and the prediction of extreme events. Npj Clim. Atmospheric Sci. 1 , 1–7 (2018). White, C. J. et al. Potential applications of subseasonal-to-seasonal (S2S) predictions. Meteorol. Appl. 24 , 315–325 (2017). Chen, M., Wang, W. & Kumar, A. Prediction of Monthly-Mean Temperature: The Roles of Atmospheric and Land Initial Conditions and Sea Surface Temperature. (2010) doi:10.1175/2009JCLI3090.1. Doblas, F., García‐Serrano, J., Lienert, F., Biescas, A. & Rodrigues, L. Seasonal climate predictability and forecasting: Status and prospects. Wiley Interdiscip. Rev. Clim. Change 4 , (2013). Vitart, F., Robertson, A. & Anderson, D. Subseasonal to Seasonal Prediction Project: Bridging the gap between weather and climate. WMO Bull. 61 , (2012). Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 382 , 1416–1421 (2023). Rasp, S. et al. WeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global Weather Models. J. Adv. Model. Earth Syst. 16 , e2023MS004019 (2024). Bi, K. et al. Accurate medium-range global weather forecasting with 3D neural networks. Nature 619 , 533–538 (2023). Pathak, J. et al. FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators. Preprint at http://arxiv.org/abs/2202.11214 (2022). Chen, L. et al. A machine learning model that outperforms conventional global subseasonal forecast models. Nat. Commun. 15 , 6425 (2024). Kochkov, D. et al. Neural general circulation models for weather and climate. Nature 1–7 (2024) doi:10.1038/s41586-024-07744-y. Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146 , 1999–2049 (2020). He, S., Li, X., DelSole, T., Ravikumar, P. & Banerjee, A. Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances. Proc. AAAI Conf. Artif. Intell. 35 , 169–177 (2021). Kiefer, S. M., Lerch, S., Ludwig, P. & Pinto, J. G. Can Machine Learning Models Be a Suitable Tool for Predicting Central European Cold Winter Weather on Subseasonal to Seasonal Time Scales? (2023) doi:10.1175/AIES-D-23-0020.1. Chen, L. et al. FuXi: a cascade machine learning forecasting system for 15-day global weather forecast. Npj Clim. Atmospheric Sci. 6 , 1–11 (2023). Domeisen, D. I. V. et al. Advances in the Subseasonal Prediction of Extreme Events: Relevant Case Studies across the Globe. (2022) doi:10.1175/BAMS-D-20-0221.1. de Andrade, F. M., Coelho, C. A. S. & Cavalcanti, I. F. A. Global precipitation hindcast quality assessment of the Subseasonal to Seasonal (S2S) prediction project models. Clim. Dyn. 52 , 5451–5475 (2019). Li, W., Hu, S., Hsu, P.-C., Guo, W. & Wei, J. Systematic bias of Tibetan Plateau snow cover in subseasonal-to-seasonal models. The Cryosphere 14 , 3565–3579 (2020). Chen, T., Xu, B., Zhang, C. & Guestrin, C. Training Deep Nets with Sublinear Memory Cost. Preprint at https://doi.org/10.48550/arXiv.1604.06174 (2016). Evensen, G. The Ensemble Kalman Filter: Theoretical Formulation and Practical Implementation. Ocean Dyn. 53 , 343–367 (2003). Lang, S. et al. AIFS -- ECMWF’s data-driven forecasting system. Preprint at https://doi.org/10.48550/arXiv.2406.01465 (2024). Albers, J. R. & Newman, M. Subseasonal predictability of the North Atlantic Oscillation. Environ. Res. Lett. 16 , 044024 (2021). Yan, Y., Liu, B. & Zhu, C. Subseasonal Predictability of South China Sea Summer Monsoon Onset With the ECMWF S2S Forecasting System. Geophys. Res. Lett. 48 , e2021GL095943 (2021). Hopuare, M., Guglielmino, M. & Ortega, P. Interactions between intraseasonal and diurnal variability of precipitation in the South Central Pacific: The case of a small high island, Tahiti, French Polynesia. Int. J. Climatol. 39 , 670–686 (2019). Wheeler, M. C. & Hendon, H. H. An All-Season Real-Time Multivariate MJO Index: Development of an Index for Monitoring and Prediction. Mon. Weather Rev. 132 , 1917–1932 (2004). Linus Magnusson, S. M. Tropical cyclone activities at ECMWF. ECMWF https://www.ecmwf.int/en/elibrary/81277-tropical-cyclone-activities-ecmwf (2021). Jiao, D., Xu, N., Yang, F. & Xu, K. Evaluation of spatial-temporal variation performance of ERA5 precipitation data in China. Sci. Rep. 11 , 17956 (2021). Vitart, F. et al. The Subseasonal to Seasonal (S2S) Prediction Project Database. Bull. Am. Meteorol. Soc. 98 , 163–173 (2017). Lee, C.-Y., Camargo, S. J., Vitart, F., Sobel, A. H. & Tippett, M. K. Subseasonal Tropical Cyclone Genesis Prediction and MJO in the S2S Dataset. Weather Forecast. 33 , 967–988 (2018). Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap . (Chapman and Hall/CRC, New York, 1994). doi:10.1201/9780429246593. Additional Declarations There is NO Competing Interest. Supplementary Files barplottrainingyears.txt Python notebook for Figure 3 plottingfields.txt Plotting Script for Figure 2 graphcastfinetuning.txt Python Notebook on the fine-tuning Strategy tccvstime.txt Python code for Figure 4 readinggeographicalboxes.txt Python code for Figure 5 meteofranceboxesgraphcastoptimized.txt Main Data processing python code datautils.txt Data utilitary processingoutputs.txt Secondary functions barplotacc20202021.txt Bar plot ACC for Supplementary Figure 4 supplementarymaterials.docx Supplementary Materials Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5619528","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":425231531,"identity":"2b92f10e-952f-4af0-a3e7-0bdea1e13c1b","order_by":0,"name":"Vateanui SANSINE","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA8UlEQVRIie3NsYrCQBCA4ZGFtVnZdkNO7xVWFqKg4KusBLSxOBBsPRDSHba+hyCWyoBphGstRAyCnaCNRFDQeIUoJLnSYv9mGJiPATCZ3jj2NyQAJwACqknH5IVYvYg00skjOUkh0v+drk9j/ACO2yD8WhaUnwsWoFfxZO6S4s8cGYhGSTG5VQ5mVRl0O5ZY3y4VOS8i4NggsT5CSkUm1PGkv8meLhHh/tEKJXaHvRsBHU+4cCm5f4GWI5hELUkq2Sg77zUZFa2OfSPFAVJS1gmE8npw2HmVGuf+yArP+Mn7s8xin0Ae9mn7BzCZTCZTQler30a0UeEWDAAAAABJRU5ErkJggg==","orcid":"","institution":"French National Research Institute for sustainable development","correspondingAuthor":true,"prefix":"","firstName":"Vateanui","middleName":"","lastName":"SANSINE","suffix":""},{"id":425231532,"identity":"9547ef03-93b8-49ab-9ee3-cf66254e3f9e","order_by":1,"name":"Takeshi Izumo","email":"","orcid":"https://orcid.org/0000-0001-9617-2234","institution":"Institut de Recherche pour le Développement (IRD)","correspondingAuthor":false,"prefix":"","firstName":"Takeshi","middleName":"","lastName":"Izumo","suffix":""},{"id":425231533,"identity":"06fbef3c-a864-47af-a36d-a8331a9c3b14","order_by":2,"name":"Marania Hopuare","email":"","orcid":"","institution":"Université de la Polynésie Française (UPF)","correspondingAuthor":false,"prefix":"","firstName":"Marania","middleName":"","lastName":"Hopuare","suffix":""},{"id":425231534,"identity":"cfe1e62f-2b9c-4eb0-8321-303d6029c70a","order_by":3,"name":"Damien Specq","email":"","orcid":"https://orcid.org/0000-0002-4572-0226","institution":"CNRM","correspondingAuthor":false,"prefix":"","firstName":"Damien","middleName":"","lastName":"Specq","suffix":""},{"id":425231535,"identity":"0069b506-0bef-4251-b7fe-30168591b0e4","order_by":4,"name":"Sophie Martinoni-La Pierre","email":"","orcid":"","institution":"Météo-France","correspondingAuthor":false,"prefix":"","firstName":"Sophie","middleName":"Martinoni-La","lastName":"Pierre","suffix":""}],"badges":[],"createdAt":"2024-12-10 21:35:28","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5619528/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5619528/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":78107181,"identity":"a2d66113-7b14-4635-93b3-618e901b87ec","added_by":"auto","created_at":"2025-03-10 04:00:22","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":50953,"visible":true,"origin":"","legend":"\u003cp\u003eBar plot of ACC for a lead-time of week 3 for Pangu-Weather, Graphcast Operational, GraphFT and S2S accompanied by their error bars representing 95% confidence intervals.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/d36f5292b8bb6d5543262ad9.png"},{"id":78107603,"identity":"e5233015-c2c7-4976-9488-665e739058bd","added_by":"auto","created_at":"2025-03-10 04:08:22","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":388659,"visible":true,"origin":"","legend":"\u003cp\u003eSpatial map of R correlation coefficient of S2S (upper left), GraphOP (upper middle) and GraphFT (upper right), along with R differences between GraphFT-S2S (lower left) and GraphFT-GraphOP (lower right) for T2M at forecast lead-time of week 3.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/c2899094e5eb8f829bdcc6dd.png"},{"id":78107188,"identity":"4d2cd453-171e-4469-834c-39ad6c4cb0a6","added_by":"auto","created_at":"2025-03-10 04:00:23","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":76914,"visible":true,"origin":"","legend":"\u003cp\u003eACC for GraphFT and GraphOP compared with respect of training year (2019-2018-2017), error bars represent the 95% confidence interval for the mean values of ACC.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/c18b1865f51407e68ca4fe64.png"},{"id":78107610,"identity":"fe45a13c-5406-4ef0-8a64-a56e78cc721e","added_by":"auto","created_at":"2025-03-10 04:08:23","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":105288,"visible":true,"origin":"","legend":"\u003cp\u003eACC for variable T2M for GraphFT, GraphOP and S2S compared with lead-time accompanied by their 95% confidence intervals\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/66145dcf3c22e4bcfc023109.png"},{"id":78107201,"identity":"545e8588-a4d4-480b-8934-1b5f0df118f2","added_by":"auto","created_at":"2025-03-10 04:00:23","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":90523,"visible":true,"origin":"","legend":"\u003cp\u003eMean values of R\u003csub\u003et\u003c/sub\u003e for GraphFT, S2S control for week 4 in several geographical boxes across the globe.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/8132292fd59e24344d74edda.png"},{"id":78107997,"identity":"c9662ad7-c345-4690-a334-1365974dbd74","added_by":"auto","created_at":"2025-03-10 04:16:23","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":94415,"visible":true,"origin":"","legend":"\u003cp\u003eFine-tuning workflow for Graphcast.\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/9a951f03fb1f145fbfa27e7a.png"},{"id":78108961,"identity":"f0c9a611-79fd-4e29-85f3-65a3339d5c3c","added_by":"auto","created_at":"2025-03-10 04:32:24","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1357905,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/e5995382-15a0-48fe-8f2b-c33062ca3cd5.pdf"},{"id":78107180,"identity":"f231b6b5-7740-496c-892d-98ff8c1aae70","added_by":"auto","created_at":"2025-03-10 04:00:22","extension":"txt","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":2567,"visible":true,"origin":"","legend":"Python notebook for Figure 3","description":"","filename":"barplottrainingyears.txt","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/39eae2ef13dc82b101c8cd18.txt"},{"id":78108767,"identity":"e0a0e516-0ae3-448f-a5fc-a666c19164b9","added_by":"auto","created_at":"2025-03-10 04:24:23","extension":"txt","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":40818,"visible":true,"origin":"","legend":"Plotting Script for Figure 2","description":"","filename":"plottingfields.txt","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/a0dc9c5bcc5baceecf9fc10c.txt"},{"id":78107185,"identity":"ab9b5c51-09df-43dd-b7d0-7a60f2ddae03","added_by":"auto","created_at":"2025-03-10 04:00:22","extension":"txt","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":60020,"visible":true,"origin":"","legend":"Python Notebook on the fine-tuning Strategy","description":"","filename":"graphcastfinetuning.txt","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/748e7b800feb70a1dd60e0aa.txt"},{"id":78107197,"identity":"d0c5b695-4a9a-4b83-bc97-f3d3636548d5","added_by":"auto","created_at":"2025-03-10 04:00:23","extension":"txt","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":2674,"visible":true,"origin":"","legend":"Python code for Figure 4","description":"","filename":"tccvstime.txt","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/006bde97f08d43fe00ace9b3.txt"},{"id":78107612,"identity":"af491715-b338-44df-aa92-16f044133d39","added_by":"auto","created_at":"2025-03-10 04:08:23","extension":"txt","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":8433,"visible":true,"origin":"","legend":"Python code for Figure 5","description":"","filename":"readinggeographicalboxes.txt","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/f559263dff0fa8815f81876c.txt"},{"id":78107200,"identity":"8f6af534-73a6-4c1f-aff6-35feccd3f24a","added_by":"auto","created_at":"2025-03-10 04:00:23","extension":"txt","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":58014,"visible":true,"origin":"","legend":"Main Data processing python code","description":"","filename":"meteofranceboxesgraphcastoptimized.txt","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/c3266797fcf0ad2f860278dd.txt"},{"id":78107192,"identity":"6126e3c4-99f0-4c1b-9a99-24cd8b114541","added_by":"auto","created_at":"2025-03-10 04:00:23","extension":"txt","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":8990,"visible":true,"origin":"","legend":"Data utilitary","description":"","filename":"datautils.txt","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/acb522a8d971507e2b8bcccb.txt"},{"id":78107199,"identity":"ae022cdc-7b70-4ec0-b1e3-74bfab752b86","added_by":"auto","created_at":"2025-03-10 04:00:23","extension":"txt","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":17558,"visible":true,"origin":"","legend":"Secondary functions","description":"","filename":"processingoutputs.txt","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/67fa8b5c37c144fdadcf99a9.txt"},{"id":78107614,"identity":"af95ed9a-bba5-4da1-9730-e869332ee723","added_by":"auto","created_at":"2025-03-10 04:08:23","extension":"txt","order_by":10,"title":"","display":"","copyAsset":false,"role":"supplement","size":2165,"visible":true,"origin":"","legend":"Bar plot ACC for Supplementary Figure 4","description":"","filename":"barplotacc20202021.txt","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/891777dc553a9ef12427b70a.txt"},{"id":78107220,"identity":"509cb7ac-ad1d-4019-a1e3-33cc4019fe40","added_by":"auto","created_at":"2025-03-10 04:00:24","extension":"docx","order_by":11,"title":"","display":"","copyAsset":false,"role":"supplement","size":3712698,"visible":true,"origin":"","legend":"Supplementary Materials","description":"","filename":"supplementarymaterials.docx","url":"https://assets-eu.researchsquare.com/files/rs-5619528/v1/f4ea4700df8a888f582e26ef.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Fine-tuning a global weather model for superior subseasonal forecasting","fulltext":[{"header":"Introduction","content":"\u003cp\u003eSubseasonal forecasting is the process of predicting weather patterns that occur between 2 to 6 weeks or more in the future\u003csup\u003e1\u003c/sup\u003e. It fills the gap between medium-range forecasting, that extends up to 15 days, and seasonal forecasting that gives an outlook of the coming trends months ahead. A skillful subseasonal forecast is paramount across multiple society relevant sectors\u003csup\u003e2\u003c/sup\u003e and for pro-active disaster impact mitigation efforts, since these efforts may take several weeks to implement\u003csup\u003e3\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eThe difficulty of implementing accurate subseasonal forecasts originates from the dampening and the loss of atmospheric initial conditions beyond a sufficiently long lead time\u003csup\u003e4\u003c/sup\u003e, iterative error accumulation, as well as the slow changes in boundary conditions such as Sea Surface Temperatures (SSTs), soil moisture and sea-ice components. It is those different time and space scales of atmosphere, land and ocean, and the ability to predict them, that makes subseasonal forecasting a major challenge\u003csup\u003e5,6,7\u003c/sup\u003e. This timescale has long been seen as a \u0026ldquo;predictability desert\u0026rdquo;\u003csup\u003e8\u003c/sup\u003e by meteorologists. However, there are important potential sources of predictability for subseasonal timescales to be found in certain large-scale phenomena such as the Madden-Julian Oscillation (MJO) and the El Ni\u0026ntilde;o Southern Oscillation (ENSO).\u003c/p\u003e \u003cp\u003eRecent advancements in machine learning models represent an alternative to the \u0026ldquo;traditional\u0026rdquo; dynamical models, based on fluid mechanics and thermodynamics equations, for weather forecasting at medium-range\u003csup\u003e9,10,11,12\u003c/sup\u003e, subseasonal-range\u003csup\u003e13\u003c/sup\u003e, and for long-term weather and climate forecasting\u003csup\u003e14\u003c/sup\u003e. The data-driven medium-range weather forecasting models only utilize machine-learning techniques and were trained using 40 years of historical data obtained from the European Center for Medium-Range Weather Forecasts (ECMWF) reanalysis v5 (ERA5)\u003csup\u003e15\u003c/sup\u003e. They achieved notable success in forecasting skillfully weather variables globally for up to 10-days ahead, while requiring only a fraction of the computational resources used by traditional dynamical models\u003csup\u003e9,11\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eAt longer lead-times, i.e. for subseasonal forecasts, machine learning models have also made significant improvements. However, there are some challenges and limitations\u003csup\u003e16,17\u003c/sup\u003e, notably arising from the incorporation of a limited amount of variables into the models and from the iterative error accumulation.\u003c/p\u003e \u003cp\u003eOne of the interesting novelty of the recent medium-range models\u003csup\u003e9,11,12\u003c/sup\u003e is that they have a comprehensive amount of meteorological variables compared to earlier studies with a limited number of variables. Indeed, FourCastNet\u003csup\u003e12\u003c/sup\u003e has 20 variables at 5 pressure levels, Pangu-Weather\u003csup\u003e11\u003c/sup\u003e has 5 upper-air atmospheric variables at 13 pressure levels with 4 surface variables and Graphcast\u003csup\u003e9\u003c/sup\u003e possesses 6 upper-air atmospheric variables at 13 pressure levels with 5 surface variables. However, the evaluation of such models has been limited to a 10-day lead-time. Yet, we expect their forecasting skills to rapidly decrease with lead-time, due to iterative error accumulation. Our first objective was to assess their accuracy for subseasonal forecasts and identify methods to enhance their precision. We have thus developed an efficient fine-tuning process, that leads to impressive improvements in forecasting skills, even at 4-week leads, while remaining computationally cost-effective.\u003c/p\u003e \u003cp\u003eFine-tuning in artificial intelligence (AI) is the procedure of adjusting a pre-trained model and adapt it to a new and specific task. When a pre-trained model is fine-tuned, it uses its knowledge acquired from the original task to learn the nuances of the task more efficiently. The primary hypothesis of this work is that by fine-tuning medium-range models, the long-term prediction errors can be decreased. Consequently, the obtained results can surpass those of their non-fine-tuned counterparts, with a much higher computing efficiency than if the model was developed and trained from the very beginning.\u003c/p\u003e \u003cp\u003eThe process of fine-tuning machine learning models for weather forecasting has already been applied to medium-range timescales, for example 3 pre-trained FuXi\u003csup\u003e18\u003c/sup\u003e models were fine-tuned for optimal forecast performance for one of the forecast time windows: 0\u0026ndash;5 days, 5\u0026ndash;10 days, and 10\u0026ndash;15 days. Additionally, the Graphcast operational model has undergone fine-tuning using HRES data from 2016 to 2021. Nevertheless, it has not yet been accomplished for subseasonal timescales. In this study, we utilize Graphcast as a pre-trained global weather model and adjust it through fine-tuning to get significantly-improved weekly global subseasonal forecasts at a 100 km resolution (1\u0026deg;).\u003c/p\u003e \u003cp\u003eThe benefit of this strategy is the ability to generate precise deterministic subseasonal forecasts with minimal computational power required for training and without the need for creating a new model structure, which is a time-consuming procedure. This is achieved by exploiting the existing knowledge of medium-range pre-trained models and adjusting it to longer timescales.\u003c/p\u003e \u003cp\u003eThe proposed fine-tuning strategy utilizes Graphcast as a pre-trained model and adjusts it by substituting its standard input and target data. Graphcast typically utilizes the two of the most recent Earth\u0026rsquo;s atmospheric states (ERA5), i.e the current time and 6 hours earlier, and forecasts the next atmospheric state 6 hours ahead. We substituted one of the inputs with the mean value from the preceding week in comparison to real-time data. The target data was substituted with the mean value of the week to be predicted. This allows the model to retrain on the relevant timescale of interest for subseasonal forecasts with the advantage of reducing iterative errors accumulation originating from classical data-driven roll-out forecasts. Roll-out forecasts being produced following the principle of feeding the forecast back to the model to have an arbitrarily long lead-time forecast\u003csup\u003e9\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eWe implement data-driven subseasonal forecasts that predict weather conditions for the second (7\u0026ndash;14 days ahead), third (14\u0026ndash;21 days ahead) and fourth (21\u0026ndash;28 days ahead) week in advance, here referred to as week 2, week 3 and week 4. The goal is to forecast the outcomes for weeks 2, 3 and 4 in advance using a fine-tuned version of Graphcast, named GraphFT.\u003c/p\u003e \u003cp\u003eOur deterministic fine-tuned model GraphFT was compared to the control (deterministic) and perturbed (ensemble) mean reforecasts of the ECMWF Subseasonal to Seasonal (S2S) system, which is recognized as the best ocean-atmosphere modeling system for producing deterministic and probabilistic subseasonal forecasts\u003csup\u003e19,20\u003c/sup\u003e. Despite certain limits and areas for improvement, we argue that fine-tuning medium-range weather models is a viable method for significantly increasing accuracy in subseasonal forecasts with low computational and human-resources costs and with substantial results.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eThe present study presents a strategy for fine-tuning the medium-range Graphcast global weather forecasting model to enhance its predictions at longer lead-times, on a subseasonal time-scale. The fine-tuned GraphFT model provides weekly global forecasts with a resolution of 1\u0026deg; for lead-times of 7\u0026ndash;14, 14\u0026ndash;21 and 21\u0026ndash;28 days ahead. The forecasts generated by GraphFT incorporate 6 upper-air atmospheric variables at 13 pressure levels and 5 surface variables. The performance of GraphFT was evaluated against ERA5 reanalysis, considered here as ground truth, and compared to GraphOP, Pangu-Weather, control and ensemble reforecasts from S2S ECMWF.\u003c/p\u003e \u003cp\u003eHere we first compare the performances of subseasonal forecasts from our fine-tuned GraphFT model, Graphcast Operational (GraphOP), Pangu-Weather (another ML-based medium-range forecasting model) and control reforecasts (i.e. hindcasts) from ECMWF S2S. The forecasting models are evaluated throughout the 2020\u0026ndash;2021 period (which was not used for training/fine-tuning), with three forecast lead-times: week 2, 3 and 4. For each lead-time, the daily forecasts are reduced to their weekly mean before being evaluated against observations, except for GraphFT which was fine-tuned to produce weekly mean forecasts directly.\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e displays the global Anomaly Correlation Coefficient (ACC) associated with Pangu-Weather, GraphOP, GraphFT and ECMWF S2S model for week 3 (ACC is a classical metrics in meteorology to estimate forecast skills and represents the temporal mean of the global spatial correlation between observed and forecasted anomalies, see Methods). The evaluated variables are the surface temperature (T2M), zonal wind at 10 meters (U10), meridional winds at 10 meters (V10) and the total precipitation (TP). Note that only the control S2S is utilized here, as it provides a fair comparison to a deterministic GraphFT. The prospect of implementing an ensemble GraphFT model to further improve forecast accuracy will be discussed in the next section.\u003c/p\u003e \u003cp\u003eGraphFT significantly outperforms the control S2S, GraphOP and Pangu-Weather for all variables from this week 3 lead-time onwards. GraphFT is significantly better than its non-fine-tuned version (at least doubling ACC to the square, ACC\u003csup\u003e2\u003c/sup\u003e, i.e. the explained variance), demonstrating that our fine-tuning strategy can considerably improve the forecasts of medium-range models such as GraphOP.\u003c/p\u003e \u003cp\u003eThese results also confirm the hypothesis that Graphcast\u0026rsquo;s prediction error, strongly increasing with forecast lead-time because of the roll-out strategy, can be significantly reduced via fine-tuning for longer lead-time forecasting. GraphFT yields significantly better results than the ECMWF S2S control. This finding illustrates that fine-tuning enables an impressive increase in forecast accuracy for 3 weeks ahead. Comparatively, the control S2S model and GraphOP exhibit similar performance across all variables. Pangu-Weather also shows good performance for U10 and V10 but performs poorly for T2M and does not include precipitation forecasts.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e provides the spatial distribution of the temporal Correlation Coefficient R\u003csub\u003et\u003c/sub\u003e between observed and forecasted anomalies (cf. Methods), for air temperature at surface, T2M (see Supplementary Figs.\u0026nbsp;6, 8 and 10 for the other variables), for the control S2S, GraphOP and GraphFT models, and the differences between GraphFT and both S2S and GraphOP models, still for week 3 lead.\u003c/p\u003e \u003cp\u003eThe spatial distributions of R\u003csub\u003et\u003c/sub\u003e and their differences reveal in general higher values for GraphFT than for GraphOP and S2S over land and ocean, except in a few regions mostly in the tropics. Comparing GraphFT to GraphOP (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ee), we observe clear improvements in the subtropical to polar regions, notably in the northeastern and southeastern Pacific and its islands (e.g. Hawaii and French Polynesia (FP), North America, northern pole, the Indian subcontinent to the Tibetan plateau, northeastern China and Japan, North and southeastern Atlantic, Europe. The few regions without improvement are the north-equatorial Pacific, South America, Australia and the central part of Asia. This overall improvement illustrates the benefits of our fine-tuning strategy in enhancing the forecasting skill of medium-range data-driven models for subseasonal time-scales.\u003c/p\u003e \u003cp\u003eComparing GraphFT to control S2S (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ed), GraphFT remains overall better, notably in the subtropical to polar regions. In the Tropics, where the ocean-atmosphere coupling is the strongest since SST is the warmest and is strongly coupled to atmospheric deep convection, ECMWF S2S is more difficult to outperform, as it includes full ocean physics, notably equatorial Kelvin and Rossby waves, and oceanic mixed layer thermodynamics, well capturing this coupling. Conversely, Graphcast only captures partially these ocean processes through T2M (strongly related to SST in the tropics). Yet, while ECMWF S2S outperforms GraphOP and GraphFT for the R\u003csub\u003et\u003c/sub\u003e point-wise metrics, GraphFT still performs better than S2S in the Tropics for a complementary metrics, R\u003csub\u003ex,t\u003c/sub\u003e, which assess the skill of a model to predict together temporal and longitudinal variability, and which better synthetizes the evolution of the skill as a function of latitude (Suppl. Fig. \u003cspan refid=\"MOESM5\" class=\"InternalRef\"\u003eS5\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eConcerning land processes, GraphFT performs clearly better than S2S in the Tibetan plateau and northern polar regions, likely because snow cover complex dynamics is difficult to simulate in a model based on physical equations such as ECMWF S2S model. Especially for the Tibetan plateau for week 3 and beyond\u003csup\u003e21\u003c/sup\u003e. This suggests that snow cover intraseasonal variability is more easily forecasted by a ML-based model such as GraphFT.\u003c/p\u003e \u003cp\u003eMoreover, GraphFT also outperforms GraphOP and S2S in most regions worldwide for week 3 for the other variables: U10, V10 and TP (Supplementary Figs.\u0026nbsp;6, 8 and 10). Only around the equator for U10 does S2S slightly outperform GraphFT, when considering both R\u003csub\u003et\u003c/sub\u003e and R\u003csub\u003ex,t\u003c/sub\u003e metrics.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe next question that naturally emerges is: how many years of training do we need to fine-tune GraphFT with, in order to obtain accurate subseasonal forecasting skill?\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e presents the ACC for a forecast lead-time of week 3 for T2M, U10, V10 and TP for 4 different models: GraphOP followed by 3 different versions of GraphFT, each of which were respectively trained on 2019, 2018\u0026ndash;2019 and 2017-2018-2019 weekly values. Already with one year of training only, the ACC increase from original GraphOP to GraphFT is substantial for all variables. As expected, further increases are apparent for most variables the more years we train GraphFT on (especially for T2M from the 2019 to the 2018\u0026ndash;2019 version). However, these increases are not highly significant between the three versions of GraphFT, as confidence intervals from these versions mostly overlap, especially from the 2018\u0026ndash;2019 to the 2017-2018-2019 version. The ACCs tend to converge already after 2\u0026ndash;3 years, except for U10, with GraphFT-2017-2018-2019 slightly and significantly outperforming GraphFT-2019. For TP, the difference in ACC for the 3 GraphFT models is not significant. Those results show that fine-tuning significantly increases the performance of GraphFT over GraphOP, but the difference of performance between the 3 training sets is overall less significant. In order to improve GraphFT even more, and for its subseasonal forecasting skill to completely converge, one would need a larger number of years for training, which would notably explore a larger range of ENSO conditions. Nevertheless, the outcomes are already highly encouraging with just 3 years for training.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe now explore how the forecasting skill comparison between the different forecasting models evolves across lead-times of weeks 2, 3 and 4. Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows the ACC values for T2M for GraphOP, GraphFT, S2S control and ensemble reforecasts across these lead-times. Extending GraphOP forecasts for week 4 was considered unnecessary, as GraphFT already outperforms GraphOP at week 3. Therefore, GraphOP forecasts are only included for weeks 2 and 3. For a lead-time of week 2, the S2S ensemble mean model achieved the highest performance, with control S2S and GraphOP closely following. Supplementary Fig.\u0026nbsp;1, representing the ACC values for each model and for all the evaluated variables during week 2, further indicates that GraphFT already outperforms all other models in terms of TP, but not yet for U10, V10 and T2M. This underperformance from GraphFT could be explained by two facts. First, the standard training of GraphOP is already sufficient and well suited for week 2 (which still stands in the medium-range lead-time). Second, our fine-tuning strategy for GraphFT, has been aimed towards longer lead-times. This weekly mean might be a too strong temporal averaging, to be relevant for a lead-time of week 2. Therefore, it can be inferred that our proposed fine-tuning method is more appropriate for extended lead-times, i.e. weeks 3 and 4. For week 3, GraphFT achieves an ACC score that is comparable to the S2S ensemble mean and significantly surpasses both S2S control and GraphOP for T2M, as aforementioned in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eRegarding week 4, GraphFT emerges as the best forecasting model, surpassing even the S2S ensemble mean for T2M. Supplementary Figs.\u0026nbsp;3 and 12\u0026ndash;20 further elaborate on GraphFT forecasting capabilities for week 4 and for other variables, and their spatial distribution. GraphFT surpasses the control S2S forecast for all variables, which aligns with the findings from week 3. But it now generates comparable results to S2S ensemble mean for U10 and V10 and even outperforms significantly S2S ensemble mean for T2M and TP. An ensemble GraphFT implementation would then certainly outperform S2S ensemble mean for all variables.\u003c/p\u003e \u003cp\u003eIn line with our previous findings, this result illustrates that GraphOP's plasticity, flexibility, and adaptability can be improved over subseasonal lead times through a pertinent and cost-effective fine-tuning strategy, attaining an accuracy level comparable (or better for T2M and TP) to that of the ECMWF S2S ensemble system, recognized as the most efficient dynamical model for subseasonal forecasts.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn order to gain a more synthetic view of the forecasting skill, Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e displays the mean values of R\u003csub\u003et\u003c/sub\u003e coefficient for T2M for GraphFT, S2S control and S2S ensemble for week 4 in several geographical boxes across the globe.\u003c/p\u003e \u003cp\u003eWhen comparing S2S control and GraphFT, we observe that GraphFT significantly outperforms S2S in almost all of the defined geographical boxes, except for the tropical band (30\u0026deg;S to 30\u0026deg;N) where their forecasting skills are equivalent.\u003c/p\u003e \u003cp\u003eThis finding aligns with Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e results and highlights once again the effectiveness of fine-tuning in enhancing Graphcast to an accuracy equivalent or better than that of S2S control, with S2S still having an advantage in the tropics thanks to the ocean-atmosphere coupling.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eOur proposed fine-tuning strategy on the medium-range Graphcast global weather forecasting model has significantly increased its performance on subseasonal timescales for all considered variables. Indeed, GraphFT performs better than the unperturbed S2S model, Pangu-Weather, and its \u0026ldquo;not-fine-tuned father\u0026rdquo; GraphOP during weeks 3 and 4. GraphFT also demonstrates similar skills to that of the ensemble S2S ensemble mean for week 3, and even better skills for week 4 for surface temperature and precipitation.\u003c/p\u003e \u003cp\u003eAn inherent benefit of the described fine-tuning procedure is its computational cost-effectiveness. Indeed, the fine-tuning process was carried out over a much shorter training period of 3 years (2017-2018-2019), requiring much less data than the usual multidecadal training periods, and using only a 40 Gb A100 GPU. The training process took only around 40 hours and yielded remarkable results that were similar to the S2S ensemble mean. A more comprehensive comparison can be drawn with GraphOP, a model that underwent training on a 32 Gb Cloud Tensor Processing Units (TPU) v4 device, which typically performs about 15 to 30 times better than a GPU, for a duration of around 4 weeks\u003csup\u003e9\u003c/sup\u003e. GraphOP employed a parallelization of data distribution approach that involved 32 TPU devices. Furthermore, they also employed Gradient check-pointing\u003csup\u003e22\u003c/sup\u003e to further decrease the memory usage.\u003c/p\u003e \u003cp\u003eThanks to the efficiency of our fine-tuning approach, several obvious enhancements could be implemented. The first is to enable GraphFT to provide ensemble forecasts at least similar to the ECMWF S2S ensemble system which contains 11 members. GraphFT\u0026rsquo;s cost-effectiveness may allow us to have much more members than the 11 members. For instance, introducing three distinct types of Perlin noise to the initial states fed into the model, Pangu-Weather\u003csup\u003e11\u003c/sup\u003e generates a 100-member ensemble forecast. FourCastNet\u003csup\u003e12\u003c/sup\u003e uses a Gaussian random noise and added it to the initial conditions using Ensemble Kalman Filtering\u003csup\u003e23\u003c/sup\u003e to obtain a 100-member ensemble forecast.\u003c/p\u003e \u003cp\u003eIt would also be interesting to extend the lead-time of GraphFT to week 5 and week 6 as changing the target of the model and adapting the fine-tuning process is fairly easy and does not involve any architecture modifications. Indeed, FuXi-S2S\u003csup\u003e13\u003c/sup\u003e, a state-of-the-art machine learning subseasonal range weather forecasting model, provides skillfull data-driven subseasonal forecasts up to 42 days ahead. While an accurate comparison between FuXi-S2S and GraphFT is out of the scope of the present study (FuXi-S2S study having an additional detrending strategy to remove linear long-term trends possibly related to global warming, before evaluating the forecasting skill), a broad comparison can be done. Indeed FuXi-S2S performs as well as S2S ensemble for T2M for example at week 3 and 4. We perform even better than S2S ensemble and by extrapolation we should also perform better than FuXi-S2S for T2M. However, for a proper comparison, a new data processing would have to be implemented.\u003c/p\u003e \u003cp\u003eAnother possible improvement is the use of other base models for fine-tuning, of higher spatial resolution, such as the 0.25\u0026deg; version of Graphcast or its equivalent at ECMWF, AIFS\u003csup\u003e24\u003c/sup\u003e so as to test our strategy on other systems and to obtain subseasonal forecasts at a higher spatial resolution. These should better capture the small-spatial scale processes such as orographic features, e.g. in mountain ranges such as the Himalayas, and in high islands such as Tahiti in French Polynesia or Big Island in Hawaii archipelagos. The implementation is straight-forward, but obviously needs a larger set of higher-resolution reanalysis data and more memory to retrain the model.\u003c/p\u003e \u003cp\u003eThe natural next step would be to implement a \u0026lsquo;GraphOAFT\u0026rsquo; trained by both ocean and atmosphere variables. Indeed, integrating Sea Surface Temperature (SST) into GraphFT as a predicted variable appears to be an effective approach for enhancing forecast accuracy. Indeed, slow changing SST conditions provide valuable source of predictability at subseasonal time-scales\u003csup\u003e25,26\u003c/sup\u003e. Especially, for regions like the north-equatorial Pacific where the North Equatorial CounterCurrent (NECC) lies below the InterTropical Convergence Zone (ITCZ). Also, integrating land moisture and ice-snow variables can be seen as a relevant way to improve subseasonal forecasts.\u003c/p\u003e \u003cp\u003eAs winds and precipitation variables are overall better forecasted by GraphFT than GraphOP, the MJO and its different impacts depending of its phases\u003csup\u003e27\u003c/sup\u003e are certainly better forecasted by GraphFT. While a proper evaluation of the MJO skill is out of the scope here, and might require the computation of weekly Real-time Multivariate MJO (RMM)\u003csup\u003e28\u003c/sup\u003e indexes, adding the ocean-atmosphere coupling likely further improves its skill in the Tropics, and thus the MJO forecasting skill.\u003c/p\u003e \u003cp\u003eIn conclusion, the potential of fine-tuning for improved subseasonal forecasts from cutting-edge medium-range weather forecasting models is undisputed, and yields impressive results, even exceeding the ECMWF S2S ensemble model at week 4 lead-time. Those results are even more impressive considering the computation-cost effectiveness of the process. The best prospect for even better forecasting accuracy is to implement ensemble forecasts integrating ocean variables and other land moisture and ice-snow variables, which would represent a much more complete description of the earth-ocean weather system. This ocean-atmosphere coupling in addition to ensemble forecasts represent the next step for even more precise data-driven subseasonal forecasts.\u003c/p\u003e "},{"header":"Methods","content":"\u003ch2\u003eData\u003c/h2\u003e\n\u003ch3\u003eERA5\u003c/h3\u003e\n\u003cp\u003eERA5 is the fifth iteration of the ECMWF reanalysis dataset and offers a rich array of surface and upper-air variables. It operates at an approximately 25 km horizontal resolution (0.25\u0026deg;) and a temporal resolution of 1 hour, spanning from 1940 to the present day\u003csup\u003e15\u003c/sup\u003e. ERA5 stands as the most comprehensive and precise reanalysis archive globally and is considered the best-known estimation for most atmospheric variables\u003csup\u003e29,30\u003c/sup\u003e. In this study, we utilize 6-hourly ERA5 reanalysis, regridded to a 1\u0026deg; resolution, to fine-tune Graphcast. This data was obtained through the WeatherBench2\u003csup\u003e10\u003c/sup\u003e python package, a publicly available, cloud-optimized ground truth and baseline datasets. It serves as the sole data for fine-tuning Graphcast.\u003c/p\u003e \u003cp\u003eWith the aim of removing trends from our forecasts and for evaluation, we have calculated a daily climatology using 1-hourly ERA5 reanalysis dataset provided from Corpernicus ECMWF Center. The data spans from 1979 to 2019. The identical dataset, covering the time period from 2020 to 2021, serves as the ground truth (observations) and is employed for evaluating the accuracy of all forecasting models.\u003c/p\u003e\n\u003ch3\u003eS2S\u003c/h3\u003e\n\u003cp\u003eThe S2S prediction project\u003csup\u003e31\u003c/sup\u003e was initiated in 2013 from an international effort to improve and develop various aspects of dynamical subseasonal predictions, including tropical cyclones\u003csup\u003e32\u003c/sup\u003e (TCs). The S2S project has created an extensive dataset containing subseasonal forecasts and reforecasts (also known as hindcasts) from 11 operational and research centers. A key goal of these efforts is to improve the forecasting skill and understand the sources of subseasonal predictability, most especially the MJO, with the best forecasts of the MJO originating from ECMWF\u003csup\u003e4\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eOften, operational subseasonal forecasting models are updated by incorporating recent research discoveries optimized for operational use. For instance, the ECMWF S2S hindcasts are generated on-the-fly by employing the most recent model version available at the time of forecast generation.\u003c/p\u003e \u003cp\u003eThis study utilizes the hindcasts from ECMWF S2S CY47R3 model cycle with a resolution of 1.5\u0026deg;, as the reference dynamical model (same as FuXi-S2S\u003csup\u003e13\u003c/sup\u003e) for the forecast verification period from 2020 to 2021. Specifically, the verification process utilizes both the deterministic and probabilistic forecasts, which are obtained by reducing 11 ensemble members to their ensemble mean. For our study, we aimed to examine the variables that have the most influence on daily life. Specifically, we focused on the surface winds U10 and V10 to aid in forecasting wind gusts, the surface temperature T2M for heatwave prevention and the total precipitation for flood control.\u003c/p\u003e \u003cp\u003eFor their evaluation process, FuXi-S2S\u003csup\u003e13\u003c/sup\u003e employs a different method for data post-processing. In our study, the data processing, as detailed in the previous subsection, consists firstly in removing a daily climatology computed from 1979 to 2019 from our observations and forecasts simultaneously. The main motivation of using a daily climatology was to remove any seasonal trend from the data. This difference in data processing makes the comparison between GraphFT and FuXi-S2S difficult.\u003c/p\u003e \u003cp\u003eModels\u003c/p\u003e \u003cp\u003eGraphcast Operational\u003c/p\u003e \u003cp\u003eThe Graphcast medium-range model produces accurate weather forecasts at a resolution of 0.25\u0026deg; and has been evaluated up to 10 days ahead against the top deterministic operational system in the world, the ECMWF\u0026rsquo;s high-resolution forecast (HRES). HRES has a 9 km horizontal resolution, and is a product of the Integrated Forecasting System (IFS) that produces a 10-day forecast. After regridding to a resolution of 0.25\u0026deg;, Graphcast significantly outperforms HRES on 90% of 1380 verification targets\u003csup\u003e9\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eAs stated previously, this particular model takes as input the two ERA5 reanalysis, i.e the current time and 6 hours earlier, and forecasts the next atmospheric state 6 hours ahead. Graphcast is also an autoregressive model meaning that it can be \u0026ldquo;rolled-out\u0026rdquo; by feeding its own predictions back in as input, to generate weather forecasts at long lead-times (multiples of 6-hr).\u003c/p\u003e \u003cp\u003eThe operational Graphcast was trained with 39 years of ERA5 reanalysis data from 1979 to 2017. As stated previously, it took roughly 4 weeks to train the model, on a 32 Go Cloud TPU and now can make accurate predictions in under a minute on a single TPU. Graphcast is based on the Graph Neural Network (GNN) in an \u0026ldquo;encoder-decoder\u0026rdquo; configuration and a has a total number of 36.7\u0026nbsp;million of parameters (demo available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/deepmind/graphcast\u003c/span\u003e\u003cspan address=\"https://github.com/deepmind/graphcast\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eTraining workflow for fine-tuning Graphcast and develop GraphFT\u003c/p\u003e \u003cp\u003eGiven the constraints of our local computational resources, we opted to leverage a Google Cloud environment, which grants us access to A100 Graphics Processing Units (GPUs). This allowed us to execute the training, inference, and fine-tuning procedures of the \u0026ldquo;small\u0026rdquo; Graphcast model. This \u0026ldquo;small\u0026rdquo; model is a version of Graphcast, with a resolution of 1\u0026deg; and reduced computational demands. However, it still includes a wide range of variables, incorporating 6 upper-air atmospheric variables at 13 pressure levels and 5 surface variables. This \u0026ldquo;small\u0026rdquo; Graphcast was trained using ERA5 data from 1979 to 2015 and fine-tuned from 2017 to 2019. The training and fine-tuning period were chosen so as to not overlap with each other.\u003c/p\u003e \u003cp\u003eThe Graphcast model requires two instantaneous ERA5 reanalysis as input, which are 6 hours apart. It then generates a forecast for the ERA5 reanalysis dataset 6 hours ahead. This process has been altered throughout the fine-tuning phase with the precise goal of training Graphcast to excel in subseasonal weekly forecasts. To do this, we followed the procedure outlined in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eIn other words, the input of the original Graphcast model was composed of two atmospheric states, the most recent time step before the forecast starts and the previous state separated by a 6-hr timestep. Graphcast aimed to forecast the next 6-hr timestep. While GraphFT input uses as previous state the previous week\u0026rsquo;s state and aims to forecast week 2, 3 or 4.\u003c/p\u003e \u003cp\u003eThen 3 Graphcast models are fine-tuned separately with the custom data for week 2, week 3 and week 4 on 20 epochs, with a learning rate of 1e-4 and with an adam optimizer. The fine-tuning is first carried out on 2019 then 2018 and finally 2017, consecutively.\u003c/p\u003e \u003cp\u003eOne advantage of predicting directly week 2, 3 or 4 is that there is less error accumulation than on a standard roll-out forecast, which is the principle of feeding the forecast back to the model to obtain longer lead-time forecast. Indeed, in order to obtain subseasonal forecasts from the normal version of Graphcast we would have needed to perform for example 56 roll-out forecast (4 per day) to have an idea of the weather 14 days ahead. With GraphFT one forecast is needed to obtain a 14-day ahead forecast for example.\u003c/p\u003e \u003cp\u003eFollowing the process of fine-tuning, the accuracy of GraphFT's predictions is assessed by comparing them to the forecasts generated by Graphcast Operational and the S2S dynamical models. The metrics utilized for the evaluation process are described in the next Methods section.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eMetrics and evaluation\u003c/p\u003e \u003cp\u003eThis section provides an overview of the metrics used in the evaluation process. Before assessment, we remove the daily climatology to all variables to remove any seasonal component that may be present, so as to focus on anomalies. More precisely, we calculated a weekly climatology in order to compare it with the weekly forecasts generated by GraphFT. The weekly climatology was derived from the daily climatology, which was calculated for all relevant variables from 1979 to 2019.\u003c/p\u003e\n\u003ch3\u003eAnomaly Correlation Coefficient\u003c/h3\u003e\n\u003cp\u003eWe computed the Anomaly Correlation Coefficient (ACC) with the following equation:\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:ACC\\left(\\tau\\:,k\\right)=\\:\\frac{1}{\\left|{D}_{eval}\\right|}\\sum\\:_{{t}_{0}ϵ{D}_{eval}}\\frac{\\sum\\:_{iϵ{G}_{1.5^\\circ\\:}}{a}_{i}{\\widehat{A}}_{i,j,k}^{{t}_{0}+\\tau\\:\\:}{A}_{i,j,k}^{{t}_{0}+\\tau\\:\\:}}{\\sqrt{\\sum\\:_{i,jϵ{G}_{1.5^\\circ\\:}}{a}_{i}({\\widehat{A}}_{i,j,k}^{{t}_{0}+\\tau\\:\\:})\u0026sup2;\\sum\\:_{i,jϵ{G}_{1.5^\\circ\\:}}{a}_{i}({A}_{i,j,k}^{{t}_{0}+\\tau\\:\\:})\u0026sup2;}}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eWhere:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{t}_{0}\\:ϵ\\:{D}_{eval}\\)\u003c/span\u003e \u003c/span\u003e represents forecast initialization date-times in the evaluation dataset\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:k\\:ϵ\\:K\\)\u003c/span\u003e \u003c/span\u003e index variables, e.g., k = {T2M, U10, V10, TP}\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:i,j\\:ϵ\\:{G}_{1.5^\\circ\\:}\\)\u003c/span\u003e \u003c/span\u003e are the location (latitude and longitude) coordinates in the grid\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{a}_{i}\\)\u003c/span\u003e \u003c/span\u003e is the area of the latitude-longitude grid cell (normalized to unit mean over the grid) which varies with latitude\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:\\tau\\:\\)\u003c/span\u003e \u003c/span\u003e refers to the forecast lead time steps added to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{t}_{0}\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{\\widehat{A}}_{i,j,k}^{{t}_{0}+\\tau\\:\\:}\\)\u003c/span\u003e \u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{A}_{i,j,k}^{{t}_{0}+\\tau\\:\\:}\\)\u003c/span\u003e\u003c/span\u003e are predicted and observed anomalies for a given variable, location, and lead time\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e\n\u003ch3\u003eR Correlation Coefficient\u003c/h3\u003e\n\u003cp\u003eWe also computed the temporal R\u003csub\u003et\u003c/sub\u003e Correlation Coefficient (R\u003csub\u003et\u003c/sub\u003e) using the same notation as above:\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:{\\text{R}}_{t}\\left(\\tau\\:,i,j,k\\right)=\\frac{\\sum\\:_{{t}_{0}ϵ{D}_{eval}}{\\widehat{A}}_{i,j,k}^{{t}_{0}+\\tau\\:\\:}{A}_{i,j,k}^{{t}_{0}+\\tau\\:\\:}}{\\sqrt{\\sum\\:_{{t}_{0}ϵ{D}_{eval}}({\\widehat{A}}_{i,j,k}^{{t}_{0}+\\tau\\:\\:})\u0026sup2;\\sum\\:_{{t}_{0}ϵ{D}_{eval}}({A}_{i,j,k}^{{t}_{0}+\\tau\\:\\:})\u0026sup2;}}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eSpatial maps of the R\u003csub\u003et\u003c/sub\u003e coefficient are presented in the \u003cspan refid=\"Sec2\" class=\"InternalRef\"\u003eresults\u003c/span\u003e section.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eLatitudinal dependence: the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}_{t,x}\\left(\\tau\\:,j,k\\right)\\)\u003c/span\u003e\u003c/span\u003e correlation coefficient\u003c/h2\u003e \u003cp\u003eIn order to have a better idea of the latitudinal distribution of R correlation coefficient, the correlation coefficient R\u003csub\u003et,x\u003c/sub\u003e was plotted for each latitude. R\u003csub\u003et,x\u003c/sub\u003e is directly calculated for each latitude on both time and longitude axis simultaneously with the following equation:\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$$\\:{R}_{t,x}\\left(\\tau\\:,j,k\\right)=\\frac{\\sum\\:_{{t}_{0}ϵ{D}_{eval},\\:jϵ{G}_{1.5^\\circ\\:}}{\\widehat{A}}_{i,j,k}^{{t}_{0}+\\tau\\:\\:}{A}_{i,j,k}^{{t}_{0}+\\tau\\:\\:}}{\\sqrt{\\sum\\:_{{t}_{0}ϵ{D}_{eval},jϵ{G}_{1.5^\\circ\\:}}({\\widehat{A}}_{i,j,k}^{{t}_{0}+\\tau\\:\\:})\u0026sup2;\\sum\\:_{{t}_{0}ϵ{D}_{eval},jϵ{G}_{1.5^\\circ\\:}}({A}_{i,j,k}^{{t}_{0}+\\tau\\:\\:})\u0026sup2;}}\\:$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eFisher transform was utilized to compute R\u003csub\u003et,x\u003c/sub\u003e confidence intervals at a 95% confidence level for each latitude. To estimate the effective number of degrees of freedom, we assumed that each time-step and each 10\u0026deg; of longitudes represent one effective degree of freedom.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eBootstrapping for significance testing\u003c/h3\u003e\n\u003cp\u003eWe adopted a bootstrapping method as a significance test. Bootstrapping creates synthetic datasets by resampling with replacement, e.g. 1000 in this work, from the original data and provides measures of accuracy to sample estimates\u003csup\u003e33\u003c/sup\u003e such as variance or confidence intervals. Bootstrapping is used for computing the 95% confidence intervals for mean global ACC values and for the mean values R\u003csub\u003et\u003c/sub\u003e in the geographical boxes throughout this paper. The sample value n is equal to 96, corresponding to one prediction per week for 2 years (2020\u0026ndash;2021).\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eCompeting interests\u003c/h2\u003e \u003cp\u003eNo competing interests\u003c/p\u003e\u003ch2\u003eAuthor contributions\u003c/h2\u003e \u003cp\u003eConceptualization, V.S., T.I., M.H.; methodology, V.S., T.I., M.H., D.S.; software, V.S., T.I., M.H.; validation, V.S., T.I., M.H., D.S.,S.M-L. ; formal analysis, V.S., T.I.; investigation, V.S., T.I.; resources, V.S., M.H. ; data curation, V.S. ; writing\u0026mdash;original draft preparation, V.S.; writing\u0026mdash;review and editing, V.S., T.I., M.H., D.S, S.M-L.; supervision, T.I., M.H.; project administration, T.I., M.H.; funding acquisition, V.S.,T.I., M.H.,S.M-L.; All authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgements.\u003c/h2\u003e \u003cp\u003eThis project was carried out in conjunction with the national weather service M\u0026eacute;t\u0026eacute;o-France, with the aim of enhancing subseasonal forecasting, notably for French Polynesia, in order to minimize the impact of extreme weather events such as floods, strong winds, droughts, and tropical cyclones likelihood. M\u0026eacute;t\u0026eacute;o-France, has the responsibility to supply meteorological forecasts on the continent and its overseas territories. Especially in French Polynesia where the MJO has an important impact\u003csup\u003e27\u003c/sup\u003e, M\u0026eacute;t\u0026eacute;o-France provides weekly forecasts that predict weather conditions for the second (7\u0026ndash;14 days ahead) and third (14\u0026ndash;21 days ahead) week in advance, referred to as week 2 and week 3. We implement data-driven subseasonal weekly forecasts for this purpose.\u003c/p\u003e\u003ch2\u003eData availability statement\u003c/h2\u003e \u003cp\u003eData and code are provided with this article. The ECMWF S2S data were obtained from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://apps.ecmwf.int/datasets/data/s2s/\u003c/span\u003e\u003cspan address=\"https://apps.ecmwf.int/datasets/data/s2s/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. For training and testing GraphFT, we downloaded a subset of the ERA5 dataset from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://cds.climate.copernicus.eu/\u003c/span\u003e\u003cspan address=\"https://cds.climate.copernicus.eu/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e, the official website of Copernicus Climate Data (CDS).\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003e\u003cem\u003eNext Generation Earth System Prediction: Strategies for Subseasonal to Seasonal Forecasts\u003c/em\u003e. (National Academies Press, Washington, D.C., 2016). doi:10.17226/21873.\u003c/li\u003e\n\u003cli\u003eRichter, J. H. \u003cem\u003eet al.\u003c/em\u003e Quantifying sources of subseasonal prediction skill in CESM2. \u003cem\u003eNpj Clim. Atmospheric Sci.\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 1\u0026ndash;9 (2024).\u003c/li\u003e\n\u003cli\u003eCoughlan de Perez, E. \u003cem\u003eet al.\u003c/em\u003e Action-based flood forecasting for triggering humanitarian action. \u003cem\u003eHydrol. Earth Syst. Sci.\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 3549\u0026ndash;3560 (2016).\u003c/li\u003e\n\u003cli\u003eVitart, F. \u0026amp; Robertson, A. W. The sub-seasonal to seasonal prediction project (S2S) and the prediction of extreme events. \u003cem\u003eNpj Clim. Atmospheric Sci.\u003c/em\u003e \u003cstrong\u003e1\u003c/strong\u003e, 1\u0026ndash;7 (2018).\u003c/li\u003e\n\u003cli\u003eWhite, C. J. \u003cem\u003eet al.\u003c/em\u003e Potential applications of subseasonal-to-seasonal (S2S) predictions. \u003cem\u003eMeteorol. Appl.\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 315\u0026ndash;325 (2017).\u003c/li\u003e\n\u003cli\u003eChen, M., Wang, W. \u0026amp; Kumar, A. Prediction of Monthly-Mean Temperature: The Roles of Atmospheric and Land Initial Conditions and Sea Surface Temperature. (2010) doi:10.1175/2009JCLI3090.1.\u003c/li\u003e\n\u003cli\u003eDoblas, F., Garc\u0026iacute;a‐Serrano, J., Lienert, F., Biescas, A. \u0026amp; Rodrigues, L. Seasonal climate predictability and forecasting: Status and prospects. \u003cem\u003eWiley Interdiscip. Rev. Clim. Change\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, (2013).\u003c/li\u003e\n\u003cli\u003eVitart, F., Robertson, A. \u0026amp; Anderson, D. Subseasonal to Seasonal Prediction Project: Bridging the gap between weather and climate. \u003cem\u003eWMO Bull.\u003c/em\u003e \u003cstrong\u003e61\u003c/strong\u003e, (2012).\u003c/li\u003e\n\u003cli\u003eLam, R. \u003cem\u003eet al.\u003c/em\u003e Learning skillful medium-range global weather forecasting. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e382\u003c/strong\u003e, 1416\u0026ndash;1421 (2023).\u003c/li\u003e\n\u003cli\u003eRasp, S. \u003cem\u003eet al.\u003c/em\u003e WeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global Weather Models. \u003cem\u003eJ. Adv. Model. Earth Syst.\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, e2023MS004019 (2024).\u003c/li\u003e\n\u003cli\u003eBi, K. \u003cem\u003eet al.\u003c/em\u003e Accurate medium-range global weather forecasting with 3D neural networks. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e619\u003c/strong\u003e, 533\u0026ndash;538 (2023).\u003c/li\u003e\n\u003cli\u003ePathak, J. \u003cem\u003eet al.\u003c/em\u003e FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators. Preprint at http://arxiv.org/abs/2202.11214 (2022).\u003c/li\u003e\n\u003cli\u003eChen, L. \u003cem\u003eet al.\u003c/em\u003e A machine learning model that outperforms conventional global subseasonal forecast models. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 6425 (2024).\u003c/li\u003e\n\u003cli\u003eKochkov, D. \u003cem\u003eet al.\u003c/em\u003e Neural general circulation models for weather and climate. \u003cem\u003eNature\u003c/em\u003e 1\u0026ndash;7 (2024) doi:10.1038/s41586-024-07744-y.\u003c/li\u003e\n\u003cli\u003eHersbach, H. \u003cem\u003eet al.\u003c/em\u003e The ERA5 global reanalysis. \u003cem\u003eQ. J. R. Meteorol. Soc.\u003c/em\u003e \u003cstrong\u003e146\u003c/strong\u003e, 1999\u0026ndash;2049 (2020).\u003c/li\u003e\n\u003cli\u003eHe, S., Li, X., DelSole, T., Ravikumar, P. \u0026amp; Banerjee, A. Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances. \u003cem\u003eProc. AAAI Conf. Artif. Intell.\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, 169\u0026ndash;177 (2021).\u003c/li\u003e\n\u003cli\u003eKiefer, S. M., Lerch, S., Ludwig, P. \u0026amp; Pinto, J. G. Can Machine Learning Models Be a Suitable Tool for Predicting Central European Cold Winter Weather on Subseasonal to Seasonal Time Scales? (2023) doi:10.1175/AIES-D-23-0020.1.\u003c/li\u003e\n\u003cli\u003eChen, L. \u003cem\u003eet al.\u003c/em\u003e FuXi: a cascade machine learning forecasting system for 15-day global weather forecast. \u003cem\u003eNpj Clim. Atmospheric Sci.\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 1\u0026ndash;11 (2023).\u003c/li\u003e\n\u003cli\u003eDomeisen, D. I. V. \u003cem\u003eet al.\u003c/em\u003e Advances in the Subseasonal Prediction of Extreme Events: Relevant Case Studies across the Globe. (2022) doi:10.1175/BAMS-D-20-0221.1.\u003c/li\u003e\n\u003cli\u003ede Andrade, F. M., Coelho, C. A. S. \u0026amp; Cavalcanti, I. F. A. Global precipitation hindcast quality assessment of the Subseasonal to Seasonal (S2S) prediction project models. \u003cem\u003eClim. Dyn.\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, 5451\u0026ndash;5475 (2019).\u003c/li\u003e\n\u003cli\u003eLi, W., Hu, S., Hsu, P.-C., Guo, W. \u0026amp; Wei, J. Systematic bias of Tibetan Plateau snow cover in subseasonal-to-seasonal models. \u003cem\u003eThe Cryosphere\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 3565\u0026ndash;3579 (2020).\u003c/li\u003e\n\u003cli\u003eChen, T., Xu, B., Zhang, C. \u0026amp; Guestrin, C. Training Deep Nets with Sublinear Memory Cost. Preprint at https://doi.org/10.48550/arXiv.1604.06174 (2016).\u003c/li\u003e\n\u003cli\u003eEvensen, G. The Ensemble Kalman Filter: Theoretical Formulation and Practical Implementation. \u003cem\u003eOcean Dyn.\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, 343\u0026ndash;367 (2003).\u003c/li\u003e\n\u003cli\u003eLang, S. \u003cem\u003eet al.\u003c/em\u003e AIFS -- ECMWF\u0026rsquo;s data-driven forecasting system. Preprint at https://doi.org/10.48550/arXiv.2406.01465 (2024).\u003c/li\u003e\n\u003cli\u003eAlbers, J. R. \u0026amp; Newman, M. Subseasonal predictability of the North Atlantic Oscillation. \u003cem\u003eEnviron. Res. Lett.\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 044024 (2021).\u003c/li\u003e\n\u003cli\u003eYan, Y., Liu, B. \u0026amp; Zhu, C. Subseasonal Predictability of South China Sea Summer Monsoon Onset With the ECMWF S2S Forecasting System. \u003cem\u003eGeophys. Res. Lett.\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, e2021GL095943 (2021).\u003c/li\u003e\n\u003cli\u003eHopuare, M., Guglielmino, M. \u0026amp; Ortega, P. Interactions between intraseasonal and diurnal variability of precipitation in the South Central Pacific: The case of a small high island, Tahiti, French Polynesia. \u003cem\u003eInt. J. Climatol.\u003c/em\u003e \u003cstrong\u003e39\u003c/strong\u003e, 670\u0026ndash;686 (2019).\u003c/li\u003e\n\u003cli\u003eWheeler, M. C. \u0026amp; Hendon, H. H. An All-Season Real-Time Multivariate MJO Index: Development of an Index for Monitoring and Prediction. \u003cem\u003eMon. Weather Rev.\u003c/em\u003e \u003cstrong\u003e132\u003c/strong\u003e, 1917\u0026ndash;1932 (2004).\u003c/li\u003e\n\u003cli\u003eLinus Magnusson, S. M. Tropical cyclone activities at ECMWF. \u003cem\u003eECMWF\u003c/em\u003e https://www.ecmwf.int/en/elibrary/81277-tropical-cyclone-activities-ecmwf (2021).\u003c/li\u003e\n\u003cli\u003eJiao, D., Xu, N., Yang, F. \u0026amp; Xu, K. Evaluation of spatial-temporal variation performance of ERA5 precipitation data in China. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 17956 (2021).\u003c/li\u003e\n\u003cli\u003eVitart, F. \u003cem\u003eet al.\u003c/em\u003e The Subseasonal to Seasonal (S2S) Prediction Project Database. \u003cem\u003eBull. Am. Meteorol. Soc.\u003c/em\u003e \u003cstrong\u003e98\u003c/strong\u003e, 163\u0026ndash;173 (2017).\u003c/li\u003e\n\u003cli\u003eLee, C.-Y., Camargo, S. J., Vitart, F., Sobel, A. H. \u0026amp; Tippett, M. K. Subseasonal Tropical Cyclone Genesis Prediction and MJO in the S2S Dataset. \u003cem\u003eWeather Forecast.\u003c/em\u003e \u003cstrong\u003e33\u003c/strong\u003e, 967\u0026ndash;988 (2018).\u003c/li\u003e\n\u003cli\u003eEfron, B. \u0026amp; Tibshirani, R. J. \u003cem\u003eAn Introduction to the Bootstrap\u003c/em\u003e. (Chapman and Hall/CRC, New York, 1994). doi:10.1201/9780429246593.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-5619528/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5619528/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAccurate subseasonal forecasting is socio-economically critical yet remains a great scientific challenge. Recent advances in machine-learning based global weather forecasting demonstrate superior skill on medium-range (1 to 15 days ahead) and subseasonal-range (15 to 42 days ahead) than the best traditional weather forecasting system. These data-driven models require immense computational resources for training, which are not widely available. Here we show, by using medium-range Graphcast model as pre-trained model and focusing on reducing iterative error accumulation, that fine-tuning is an efficient strategy to achieve impressive results for subseasonal forecasting. Our fine-tuned model GraphFT rapidly converges (trained on just three years of data), and significantly outperforms Graphcast and the leading deterministic traditional subseasonal forecasting system, even outperforming this system\u0026rsquo;s ensemble mean for key variables. Demonstrating the potential of fine-tuning for improving possibly both atmosphere and ocean forecasts with low computational costs and remarkable results.\u003c/p\u003e","manuscriptTitle":"Fine-tuning a global weather model for superior subseasonal forecasting","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-03-10 04:00:18","doi":"10.21203/rs.3.rs-5619528/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"8d104e2d-8368-4bbb-a4ff-ae29b7148be0","owner":[],"postedDate":"March 10th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":45322718,"name":"Earth and environmental sciences/Climate sciences/Atmospheric science/Atmospheric dynamics"},{"id":45322719,"name":"Physical sciences/Mathematics and computing/Computer science"}],"tags":[],"updatedAt":"2026-04-10T08:06:01+00:00","versionOfRecord":[],"versionCreatedAt":"2025-03-10 04:00:18","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5619528","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5619528","identity":"rs-5619528","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.