Performance Evaluation of LSTM Networks for Earthquake Magnitude Prediction in Iran | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Performance Evaluation of LSTM Networks for Earthquake Magnitude Prediction in Iran Alireza Ghotbi, Mohammad Rahimi, Ahmad Zamani This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6395698/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Iran, located along an active seismic belt, frequently experiences destructive earthquakes, making accurate forecasting crucial for disaster preparedness and risk mitigation. This study employs Long Short-Term Memory (LSTM) networks to predict earthquake magnitudes using a dataset of 6,916 seismic events recorded in Iran from 1900 to the present, with magnitudes ranging from 4.0 to 7.7. Various loss functions and resampling methods were applied to optimize predictive accuracy, and the performance of four models was compared. Results indicate that LSTM networks achieved a high correlation across the full magnitude range, with yearly resampling yielding the most accurate predictions. For large earthquakes (6.0 ≤ M < 7.7), the Pseudo-Huber loss function improved model stability, though predictions were constrained by data scarcity. While daily and monthly predictions exhibited higher variance, yearly forecasting provided more reliable long-term trends. This study underscores the importance of selecting appropriate time intervals and loss functions in earthquake prediction models. The findings contribute to seismic hazard assessment efforts and can aid in developing early warning systems for earthquake-prone regions. Iran LSTM earthquake neural network time-series Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 1- Introduction Earthquakes pose a significant threat to human life and infrastructure, triggering fires, accidents, and tsunamis. Since 1998, they have caused over 750,000 deaths, particularly in seismically active regions such as Turkey, Japan, China, and Iran. The extent of earthquake damage depends on both its magnitude and the resilience of buildings and infrastructure, with events exceeding magnitude 6.0 often leading to widespread destruction. A deeper understanding of earthquakes is essential for mitigating their impact and reducing casualties [ 1 ], [ 8 ]. This task can be achieved through the development of robust hazard models and advanced forecasting techniques, including the LSTM model. 1.1 Geographical Setting The study area encompasses Iran, approximately bounded by latitudes 25.5°N to 40.0°N and longitudes 45.0°E to 63.0°E (Fig. 1 ). Iran is located within the central segment of the Alpide Belt, a major orogenic system extending along the southern margin of Eurasia. This extensive belt comprises a series of mountain ranges stretching from Java and Sumatra, through the Himalayas and the Mediterranean, into the Atlantic Ocean. Based on collected data, the average earthquake magnitude in Iran from 1900 to the present was approximately 4.56 (Fig. 1 ). Within this system, the Zagros Fold-Thrust Belt, situated in southwestern Iran, represents a significant structural feature. Composed of a thick, conformable sedimentary sequence from the Cambrian to the late Tertiary, it was deformed during the late Alpine orogeny, in the Plio-Pleistocene. The belt exhibits long, parallel, asymmetric folds, forming a linear intercontinental structure trending NW-SE between the Arabian Shield and Central Iran. It extends approximately 1,500 km in length and 200–300 km in width, consisting exclusively of sedimentary rocks with no evidence of metamorphic or igneous activity [ 17 ]. While Central and Eastern Iran experience less seismic activity, the entire region remains vulnerable to destructive earthquakes. 1.2 Seismic Activity in Iran Iran is divided into several seismotectonic provinces, each with its own seismic characteristics. The Alborz Mountain range, for example, encompasses major cities such as Tehran, Rasht, Karaj, Zanjan, Qazvin, Sari, and Gorgan, which have experienced devastating historical earthquake [ 12 ]. Iran’s GPS horizontal velocities indicate that the Arabian plate moves northward relative to Eurasia at a rate of 2.1–2.5 centimeters per year. However, deformation varies across several active zones, including the Makran subduction complex, Kopeh-Dagh Mountains, Zagros, and Alborz Mountains. Right-lateral displacement primarily occurs along the Main Recent Fault and North Tabriz Fault. While Central Iran behaves as a rigid block relative to Eurasia, Eastern Iran exhibits slower movement. The velocity contrast between regions results in right-lateral strike-slip motion along north-south trending faults bounding the Lut block [ 6 ]. 1.2.3 Statistical Overview The statistical analysis of the dataset reveals earthquake magnitudes ranging from 4.0 to 7.7. The mean magnitude is 4.56, with a median of 4.4. The most frequently occurring magnitude is also 4.4, appearing 787 times. The minimum recorded magnitude is 4.0, while the maximum reaches 7.7, indicating a wide range of seismic activity in the region. A standard deviation of 0.47 suggests that most earthquakes are clustered around the mean, with occasional larger events. The distribution of magnitudes, shown in Fig. 1 , confirms that lower-magnitude earthquakes are more prevalent. 1.3 LSTM Network Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) introduced by Sepp Hochreiter and Jürgen Schmidhuber in 1997. They were designed to address the vanishing gradient problem in training traditional RNNs, enabling the learning of long-term dependencies in sequential data. LSTMs incorporate memory cells and gating mechanisms to control the flow of information, allowing them to maintain and update information over extended time intervals. While traditional RNNs struggle with learning long-term dependencies due to vanishing gradients, LSTMs overcome this limitation using gated mechanisms. The input, forget, and output gates regulate the flow of information, enabling LSTMs to retain important patterns over long sequences. These gates are mathematically defined as follows: $$\:\begin{array}{c}{f}_{t}=\sigma\:\left({W}_{f}\cdot\:\left[{h}_{t-1},{x}_{t}\right]+{b}_{f}\right)\\\:{i}_{t}=\sigma\:\left({W}_{i}\cdot\:\left[{h}_{t-1},{x}_{t}\right]+{b}_{i}\right)\\\:{\stackrel{\prime }{C}}_{t}=tanh\left({W}_{C}\cdot\:\left[{h}_{t-1},{x}_{t}\right]+{b}_{C}\right)\\\:{C}_{t}={f}_{t}\odot\:{C}_{t-1}+{i}_{t}\odot\:{\stackrel{\prime }{C}}_{t}\\\:{o}_{t}=\sigma\:\left({W}_{o}\cdot\:\left[{h}_{t-1},{x}_{t}\right]+{b}_{o}\right)\\\:{h}_{t}={o}_{t}\odot\:tanh\left({C}_{t}\right)\end{array}$$ where \(\:{f}_{t},{i}_{t}\) , and \(\:{o}_{t}\) represent the forget, input, and output gate activations, respectively, \(\:{C}_{t}\) is the cell state, \(\:{h}_{t}\) is the hidden state, and \(\:\sigma\:\) and tanh are the sigmoid and hyperbolic tangent activation functions. These mechanisms allow LSTMs to selectively retain or discard information across long sequences, making them well-suited for capturing temporal patterns in earthquake data. By leveraging LSTM networks, this study aims to optimize earthquake magnitude forecasting in Iran, evaluating different hyperparameters to enhance predictive performance. (Hochreiter & Schmidhuber, 1997). 1.3.1 Advantages of LSTM Networks in Seismic Forecasting LSTM models are effective for earthquake prediction because they capture long-term dependencies and temporal patterns in seismic data, addressing the nonlinear and complex nature of earthquakes. Unlike traditional statistical models, LSTMs retain historical information and mitigate the vanishing gradient problem, improving predictive accuracy. Research has shown that LSTMs outperform autoregressive models in forecasting earthquake magnitudes and occurrences. For example, a study on deep learning for earthquake prediction demonstrates that LSTM-based models reduce forecasting errors and enhance seismic hazard assessments. [ 13 ] Additionally, the integration of LSTMs with techniques like Variational Mode Decomposition (VMD) has been shown to enhance prediction accuracy for earthquake occurrence parameters, including time, location, and magnitude. [ 21 ] 2- Methodology For developing LSTM models to predict earthquake magnitudes in Iran, historical seismic data from 1900 to the present has been collected. This dataset, sourced from the United States Geological Survey (USGS), contains approximately 6,916 earthquake events with magnitudes ranging from 4.0 to 7.7. It includes key attributes such as latitude, longitude, depth, magnitude, and other various geophysical parameters. Prior to model training, the raw data underwent initial cleaning, including conversion of magnitude values to numeric format and removal of entries with missing magnitude data. The cleaned data was then resampled to daily, monthly, and yearly frequencies using the mean as the aggregation function to analyze trends at different temporal scales. Time intervals play a crucial role in earthquake prediction models, especially when utilizing LSTM networks. The choice of interval—be it daily, monthly, or yearly—significantly influences the model's performance and the granularity of predictions. Any NaN values generated during resampling were subsequently removed. LSTM models were trained using three distinct loss functions: Custom Time-Weighted Mean Squared Error (MSE), Huber Loss, and Pseudo-Huber Loss. Optimal model hyperparameters were determined through a grid search approach, utilizing 3-fold cross-validation and negative mean squared error as the scoring metric. The grid search evaluated combinations of LSTM units, learning rates, and batch sizes. For earthquakes of magnitude 6.0 and above, the model was also trained using K-fold cross-validation to classify earthquake magnitudes into high-risk categories. The overall model performance was visualized through loss curves, confusion matrices, and earthquake magnitude distribution plots, providing critical insights into the effectiveness of different approaches across varying time intervals. Model performance was evaluated using the following metrics: Mean Absolute Error (MAE) : The average absolute difference between predicted and actual magnitudes. Lower values indicate better performance. Root Mean Squared Error (RMSE) : The square root of the average squared difference between predicted and actual magnitudes. Lower values indicate better performance. RMSE is more sensitive to large errors. Correlation : The correlation coefficient between predicted and actual magnitudes, where values closer to 1 indicate stronger positive correlations. The LSTM networks were trained using four different regression loss functions, each tailored to specific magnitude ranges: Time-Weighted Mean Squared Error (MSE) Applied to magnitudes in the range 4.0 ≤ M < 5.0, emphasizing recent data while penalizing larger errors. Huber Loss Used for magnitudes in the range 5.0 ≤ M < 6.0, providing robustness against outliers while maintaining MSE-like behavior for small errors. Pseudo-Huber Loss Implemented for magnitudes in the range 6.0 ≤ M ≤ 7.7, combining the benefits of Huber Loss with a smooth transition between quadratic and linear loss regions. Pseudo-Huber Loss (General Model) Also applied to the entire dataset, covering magnitudes 4.0 ≤ M ≤ 7.7, to evaluate its effectiveness across all earthquake magnitudes. Alongside selecting appropriate loss functions for developing the LSTM model, choosing the right optimizer is equally important. The Adam optimizer (Adaptive Moment Estimation) was chosen for training due to its proven effectiveness in earthquake signal classification, as supported by comparative studies. Its adaptive learning rate, which adjusts based on past gradients, enables faster convergence and stable training—critical for handling noisy earthquake time-series data. Additionally, Adam helps mitigate vanishing or exploding gradients, making it a robust choice for LSTM networks in earthquake magnitude prediction. Its default hyperparameters also reduce the need for extensive tuning, improving efficiency [ 9 ], [ 11 ]. 2.1 Computational Setup The LSTM models in this study were implemented in Python 3.9.21 [ 16 ] using TensorFlow 2.12.0 on a machine running Windows 10 with an Intel Core CPU (AMD64 Family 23 Model 24 Stepping 1, Authentic AMD) and 15.91GB RAM. The models were trained using the CPU, as GPU acceleration was not available during the experiments. However, for LSTM networks with large datasets, it’s recommended to use GPU acceleration to reduce training and prediction times. This is crucial for iterative tasks like hyperparameter tuning or when dealing with real-time applications [ 3 ]. Data preprocessing and visualization were performed using NumPy, Pandas, and Matplotlib. The models were trained on different magnitude ranges with varying batch sizes to optimize performance for each specific range: Model 1 (4.0 ≤ M < 5.0) with batch size 64, Model 2 (5.0 ≤ M < 6.0) with batch size 32, Model 3 (6.0 ≤ M < 7.7) with batch size 16, Model 4 (4.0 ≤ M ≤ 7.7) with batch size 64. The varied batch sizes for the LSTM models were determined through grid search and cross-validation to achieve optimal performance. 2.2 Model 1: Magnitude Range of 4.0 ≤ M < 5.0 For earthquakes within the 4.0 ≤ M < 5.0 range, an LSTM model was developed and trained using historical seismic data from Iran. The data was preprocessed, converted to moment magnitude (mb) and resampled into daily, monthly, and yearly magnitudes before being scaled using MinMaxScaler to normalize values. Model Architecture & Training : Input Layer : Sequences of past earthquake magnitudes, with a length of {3, 7, 14, 30}, determined through hyperparameter tuning. LSTM Layers : First LSTM layer with {32, 64, 128} units (selected via tuning), returning sequences to maintain temporal dependencies. Dropout layer (0.1–0.3) for regularization. Second LSTM layer (same unit range), returning a final output sequence. Dropout layer (0.1–0.3) to prevent overfitting. Dense Output Layer : A fully connected layer (Dense(1)) that predicts the next earthquake magnitude. Loss Function : Custom Time-Weighted Mean Squared Error (MSE), prioritizing recent seismic activity while penalizing larger errors. Optimization & Hyperparameter Tuning : Adam (learning_rate = 0.001). Conducted via GridSearchCV with 3-fold cross-validation, specifically optimizing the number of LSTM units. Evaluation Metrics : MAE, RMSE and Correlation Coefficient The model was trained using sequences of length 3, and its predictive performance was analyzed using statistical metrics. The results, visualized in Fig. 2 , demonstrated the model’s effectiveness, with key performance metrics reported in Table 1 . Time-Weighted MSE Mathematical Formula for the Loss Function : $$\:{\mathcal{L}}_{\text{TW-MSE\:}}=\frac{1}{N}\sum\:_{i=1}^{N}\:{w}_{i}{\left({y}_{\text{true:},i}-{y}_{\text{pred\:},i}\right)}^{2}$$ Where: \(\:N\) is the number of samples. \(\:{y}_{\text{true:},i}\) is the actual (true) value at time step \(\:i\) . \(\:{y}_{\text{pred:},i}\) is the predicted value at time step \(\:i\) . \(\:{w}_{i}=\frac{i}{N}\) is the weight assigned to each data point, increasing over time, with \(\:i\) representing the index of the time step [ 18 ]. The given loss function differs from the standard Mean Squared Error (MSE) by incorporating time-dependent weighting for data points. Unlike MSE, which treats all data points equally, this modified loss function assigns greater emphasis to more recent data, reflecting the intuition that recent errors are more critical in time-series forecasting. By prioritizing the minimization of recent discrepancies, this approach enhances predictive accuracy and improves decision-making based on up-to-date information. This method aligns with recent advancements in time-series loss functions, such as the work by Jadon et al. [ 10 ], which highlights the advantages of regression-based loss functions that account for time-dependent characteristics. By integrating these insights, the proposed loss function strengthens model performance and ensures robust forecasting across various applications. [ 14 ]. 2.3 Model 2: Magnitude Range of 5.0 ≤ M < 6.0 For predicting earthquake magnitudes within the 5.0 to 6.0 range, same as before the LSTM model was developed using historical seismic data from Iran. The data was preprocessed, converted to moment magnitude (Mw), and resampled into daily, monthly, and yearly intervals. Model Architecture & Training : Input Sequences : Earthquake magnitudes were segmented into time sequences of length 3. LSTM Configuration : A single LSTM layer with {50, 100, 150} units (optimized using GridSearchCV). ReLU activation and He-normal initialization for stable training. Loss Function : A custom Huber loss function was implemented to mitigate the impact of outliers. Optimization & Hyperparameter Tuning : Adam optimizer with a fine-tuned learning rate (0.001, 0.0005, 0.0001). GridSearchCV (3-fold cross-validation) was used to optimize LSTM units, batch size (16, 32, 64), and learning rate. Evaluation Metrics : MAE, RMSE and Correlation Coefficient Figure 2 illustrates the actual vs. predicted earthquake magnitudes, highlighting the model’s effectiveness, while the corresponding evaluation metrics are detailed in Table 1 . Huber Loss Mathematical Formula for the Loss Function : $$\:{L}_{\delta\:}\left(a\right)=\left\{\begin{array}{c}\frac{1}{2}{a}^{2}\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\text{}\text{f}\text{o}\text{r}\text{}\left|a\right|<\delta\:\\\:\delta\:\left(\left|a\right|-\frac{1}{2}\delta\:\right)\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\hspace{0.25em}\text{}\text{f}\text{o}\text{r}\text{}\left|a\right|\ge\:\delta\:\end{array}\right.$$ \(\:a={y}_{\text{true:}}-{y}_{\text{pred:}}\) is the error (difference between the true and predicted values). \(\:\delta\:\) (Set to 1.0 in the code) is the threshold that differentiates between the quadratic and linear loss regions. [ 4 ]. This formulation makes Huber loss less sensitive to outliers compared to Mean Squared Error (MSE), as it behaves like MSE for small errors \(\:\left(\right|a|<\delta\:)\) and like Mean Absolute Error (MAE) for larger errors ( \(\:\left|a\right|\ge\:\delta\:)\) . Huber Loss for magnitude predictions, providing robustness against outliers. [ 4 ]. 2.4 Model 3: Magnitude Range of 6.0 ≤ M < 7.7 Given that LSTM models are known to require substantial amounts of data to achieve optimal performance, (Hestness, et al, 2017), predicting earthquakes within the higher magnitude range of 6.0 to 7.7 presents a significant challenge due to the inherent scarcity of such events. Our dataset contained only 103 events in this magnitude range, which is a key limitation when assessing the model's performance. To address this issue, we implemented Gaussian Noise Augmentation as a strategy to enhance the dataset and improve model training. [ 15 ] The Gaussian noise formula follows a normal distribution and is defined as: N ( \(\:\mu\:\) , \(\:{\sigma\:}^{2}\) ) = \(\:\frac{1}{\sqrt{2\pi\:{\sigma\:}^{2}}}{e}^{-\frac{{\left(x-\mu\:\right)}^{2}}{2{\sigma\:}^{2}}}\) where: \(\:\mu\:\) is the mean magnitude of earthquakes in the range 6 to 7.7 from the dataset \(\:{\sigma\:}^{2\:}\) is the variance of the magnitudes in this range. \(\:\sigma\:\:\) is the standard deviation of the earthquake magnitudes. \(\:x\:\) is a random variable following the normal distribution. To introduce controlled noise in dataset of magnitude 6–7.7 earthquake data, it is modified as: $$\:{x}^{{\prime\:}}=x+\mathcal{N}\left(\mu\:,{\sigma\:}^{2}\right)$$ where \(\:{x}^{{\prime\:}}\) is the new magnitude value after adding Gaussian noise. \(\:\mathcal{N}\left(\mu\:,{\sigma\:}^{2}\right)\) is the noise is sampled from a normal distribution. Since the magnitude range is relatively narrow (6–7.7), we set, \(\:\mu\:=0\) (zero mean noise) and choose a \(\:\sigma\:\) as a small fraction of the standard deviation to avoid unrealistic values. To predict earthquakes in the 6.0–7.7 magnitude range, the LSTM model incorporating a custom pseudo-Huber loss function to mitigate the impact of outliers Model Architecture & Training : Input Sequences : Earthquake magnitudes were structured into time sequences of length 3. LSTM Configuration : A single LSTM layer with {50, 100, 150} units, optimized via GridSearchCV. ReLU activation with He-normal initialization for stable weight updates. Optimization and Hyperparameter Tuning : Adam optimizer (learning rate = 0.001, gradient clipping = clipvalue = 1.0). GridSearchCV (3-fold cross-validation) for hyperparameter tuning. Data Scaling : MinMaxScaler was used for feature normalization. Evaluation Metrics : MAE, RMSE and Correlation Coefficient The Pseudo-Huber loss function is a smooth approximation of Huber Loss, designed to be less sensitive to outliers while maintaining differentiability. It is defined as: $$\:{L}_{\delta\:}\left(a\right)={\delta\:}^{2\:}\left(\sqrt{1+\left(\frac{a}{\delta\:}\right)²}-\:1\right)\:$$ Where: a = \(\:y\:-\:\:ŷ\) is the difference between the actual and predicted values. \(\:\delta\:\) is a parameter that controls the transition between quadratic and linear behavior. For small errors (| \(\:a\) | ≪ \(\:\delta\:\) ) Pseudo-Huber loss behaves like the Mean Squared Error (MSE), while for large errors (| \(\:a\) | ≫ \(\:\delta\:\) ), it behaves more like the Mean Absolute Error (MAE), reducing the impact of outliers [ 2 ]. 2.5 Model 4: Magnitude Range of 4.0 ≤ M ≤ 7.7 Unlike previous models that focused solely on time and magnitude, this model utilizes a broader set of earthquake-related features to enhance predictive accuracy. The input consists of a 30-day time-series sequence, where each time step includes the following six features including the location, depth, Root Mean Square (rms) Error, Azimuthal Gap, and Minimum Distance to the Station (dmin). Model Architecture & Training : LSTM Layers : LSTM Layer 1: A recurrent layer with 64 units, configured to return sequences to preserve temporal dependencies across time steps. LSTM Layer 2: A second LSTM layer with 32 units extracts deeper temporal patterns. Dropout Layer: A dropout rate of 0.3 is applied to mitigate overfitting. Dense Layer: A fully connected layer with 32 neurons and ReLU activation, refining feature interactions. Output Layer : A single neuron that predicts the earthquake magnitude. Loss Function : The model is trained using the custom Pseudo-Huber loss, balancing robustness against outliers and sensitivity to small errors. Optimizer : The Adam optimizer is used with gradient clipping (clipvalue = 1.0) to enhance training stability and prevent exploding gradients. The model undergoes hyperparameter tuning using a Randomized Search approach, optimizing the following parameters: Batch Size : [32, 64] Epochs : [50, 100] LSTM Units : [64, 128] Learning Rate : [1e-3, 5e-4, 1e-4] Delta (Pseudo-Huber Loss Parameter) : [0.5, 1.0, 1.5, 2.0] A 3-fold cross-validation to ensure a robust model evaluation and prevent overfitting to specific data subsets. Evaluation Metrics MAE, RMSE and Correlation Coefficient 3- Models’ Performances and Analysis 3.1 Performance Evaluation of Model 1 (4.0 ≤ mag < 5.0) The findings indicate a clear trend in predictive performance across different time intervals for earthquakes with magnitudes 4.0 ≤ mag < 5.0. Yearly data yielded the most accurate predictions, exhibiting the lowest MAE and RMSE values while achieving the highest correlation coefficient. Specifically, yearly predictions had an MAE of 0.235, an RMSE of 0.325, and a high correlation of 0.8949. These results suggest that yearly aggregated data allows the model to capture long-term seismic trends effectively while reducing short-term noise for this magnitude range. (Table 1 ). 3.2 Performance Evaluation of Model 2 (5.0 ≤ mag < 6.0) Daily predictions were the least reliable within this magnitude range. The daily model exhibited an MAE of 0.1535, RMSE of 0.1927, and a correlation coefficient of 0.4316, indicating a moderate but still limited ability to capture short-term seismic variations. The relatively low correlation suggests that short-term fluctuations introduce noise, making daily models less effective for precise predictions. In contrast, monthly predictions provided a more balanced performance, significantly improving upon the daily model. The monthly model achieved an MAE of 0.1401, RMSE of 0.1781, and a correlation of 0.6073, demonstrating better predictive accuracy while retaining sufficient temporal resolution. Yearly predictions yielded the highest accuracy, with an MAE of 0.0721, RMSE of 0.0871, and a strong correlation of 0.8860, making it the most reliable approach for long-term trend forecasting. These results indicate that aggregating data over longer intervals helps mitigate noise and enhance prediction stability, making yearly models more suitable for strategic seismic risk assessment. Table 1 LSTM Model Results for Predicting Earthquakes of Magnitude 4.0-7.7, showing MAE, RMSE, and Correlation at Daily, Monthly, and Yearly Time Intervals, with corresponding Loss Functions. Magnitude Time Interval MAE RMSE Correlation Loss Function Used 4.0 ≤ mag < 5.0 Daily (D) 0.2088 0.2650 0.1817 Custom Time-Weighted MSE Monthly (M) 0.09712 0.1259 0.5341 Yearly (Y) 0.0235 0.0325 0.8949 5.0 ≤ mag < 6.0 Daily (D) 0.1535 0.1927 0.4316 Huber Loss Monthly (M) 0.1401 0.1781 0.6073 Yearly (Y) 0.0721 0.0871 0.8860 6.0 ≤ mag < 7.7 Daily (D) 0.2345 0.3020 0.3046 Pseudo-Huber loss Monthly (M) 0.2495 0.3727 0.2071 Yearly (Y) 0.1568 0.1790 0.3365 3.3 Performance Evaluation of Model 3 (6.0 ≤ mag < 7.7) The model's performance in predicting large-magnitude earthquakes was assessed across daily, monthly, and yearly time intervals using the Pseudo-Huber loss function, which is effective in handling outliers while maintaining stability. The results indicate that yearly predictions yield the best performance, with the lowest error values (MAE = 0.1568, RMSE = 0.1790) and the highest correlation ( 0.3365 ) , suggesting that long-term seismic trends are more predictable than short-term fluctuations. (Table-1) Conversely, daily, and monthly predictions exhibit higher errors and weaker correlations, with daily predictions achieving a correlation of 0.3046 and monthly predictions dropping to 0.2071, indicating increased uncertainty in short-term forecasting. This trend highlights the challenges of modeling seismic activity on finer time scales, where localized and unpredictable variations—such as aftershocks—may contribute to deviations in the predicted values. These findings reinforce the importance of selecting appropriate time intervals when forecasting large-magnitude earthquakes and suggest that long-term predictions provide more reliable insights into seismic trends. To ensure the model’s robustness and assess its generalization performance across different subsets of data, we performed 5-fold cross-validation on the earthquake magnitude dataset with the same model. The K-fold cross-validation results for predicting earthquakes of magnitude 6.0–7.7 show weaker performance compared to the model without K-fold. Short-term predictions (daily and monthly) remained unreliable, with high MAE and RMSE values and weak or negative correlations, likely due to the sporadic nature of large earthquakes. While yearly aggregation improved performance (MAE: 0.2205, RMSE: 0.2638, correlation: 0.2396), it still underperformed compared to the standard model without K-fold, which achieved a lower MAE (0.1568) and RMSE (0.1790) with a slightly better correlation (0.3365) (Table 2). It is important to note that K-fold cross-validation assumes that data points are independent and identically distributed (i.i.d.), which is often not valid for time-series data due to its temporal dependencies [ 20 ]. In contrast, time-series-specific validation methods could provide a more accurate assessment. Table-2 The results of K-Cross validation for magnitude 6.0-7.7 LSTM model Frequency Average MAE Average RMSE Average Correlation Daily 0.29188997 0.378925037 -0.082527095 Monthly 0.285678448 0.35724987 -0.021537634 Yearly 0.220487958 0.263773797 0.23957514 3.4 Performance Evaluation of Model 4 (4.0 ≤ M ≤ 7.7) When the LSTM model was trained and evaluated on the entire magnitude range of earthquakes in Iran, a significant improvement in performance was observed. Figure 6 presents the scatter plot of predicted vs. true magnitudes, revealing a strong positive correlation of 0.90. (Fig. 5 ) This high correlation suggests that the model effectively captures the underlying relationship between predicted and actual magnitudes. Furthermore, the error distribution, as depicted in the histogram, is centered around zero and closely resembles a normal distribution, indicating unbiased and randomly distributed errors. This finding aligns with the research of Wang, Guo, Yu, and Li [ 22 ], which highlights the role of geographic information in improving LSTM models for earthquake prediction. While the model demonstrates excellent overall performance, some clustering of predictions appears in the lower magnitude range, indicating a potential area for further investigation. These results contrast sharply with the challenges encountered when focusing solely on the 6.0–7.7 magnitude range, where data scarcity and model instability were significant concerns. 4- Discussion The results demonstrate the model’s capability to effectively capture long-term seismic trends and achieve high prediction accuracy, particularly when trained on the entire magnitude range of earthquakes. it’s evident that LSTM networks have great potential for creating earthquake hazard models and warning systems. Compared to our previous research on the seismicity of Alborz and Zagros regions which has applied time-series analysis to study earthquake recurrence rates and temporal patterns [ 17 ]. While such statistical approaches provide valuable insights, they often struggle with capturing non-linearity, complex feature interactions, and long-term dependencies in seismic activity. Simple time-series models like moving averages or autoregressive methods assume linear relationships and rely on limited past values, making them insufficient for modeling seismic events influenced by multiple factors such as depth, location, and stress accumulation over time. In contrast, Long LSTM networks overcome these limitations by maintaining memory of long-term dependencies, adapting to non-linear patterns, and integrating multiple seismic parameters into the forecasting process. LSTM models use gated mechanisms to selectively retain or forget information, making them more suitable for handling irregular seismic events and improving predictive accuracy. The results also show the impact of implementing different loss functions and feature engineering approaches on model performance, highlighting key insights and areas for improvement. Despite promising results, short-term forecasting (daily predictions) exhibited higher error rates due to seismic variability. Additionally, data scarcity for large-magnitude earthquakes impacted predictive stability. Future research should focus on: Enhancing daily and monthly prediction accuracy through advanced feature selection and external data integration. Developing hybrid models combining LSTM with other machine learning techniques to improve robustness. Exploring real-time seismic data integration for early warning applications. By addressing these challenges, the predictive framework could contribute significantly to seismic risk mitigation efforts. 5- Conclusion The evaluation of LSTM networks for earthquake magnitude prediction in Iran, highlighting the effectiveness of different loss functions, time intervals, and model architectures: Yearly earthquake predictions demonstrated the highest accuracy, while daily predictions were less reliable, particularly for magnitudes between 4.0 and 5.0. Multivariate models incorporating additional seismic features outperformed univariate models. Pseudo-Huber Loss achieved the best balance of stability and robustness to outliers, enhancing predictive accuracy across magnitudes, while time-weighted MSE worked best for lower magnitudes. Hyperparameter tuning significantly impacted model performance, with optimized LSTM configurations yielding lower error rates. However, tuning hyperparameters for large datasets is computationally intensive, often requiring extensive grid search or random search techniques. The use of GPU acceleration, distributed computing, and model pruning techniques can substantially reduce computational costs and training time, making large-scale seismic forecasting more feasible. Data augmentation techniques (e.g., Gaussian noise) enhanced model generalization for large-magnitude earthquake predictions. Future studies should focus on refining short-term predictions, integrating external geophysical and seismic data, such as PGA and PGV for developing better warning systems and exploring alternative deep learning architectures to enhance model performance. By improving earthquake forecasting methodologies, this research contributes to seismic hazard assessment and disaster preparedness in Iran. Declarations Declaration of interests We have nothing to declare. Clinical Trial Registration Not applicable Ethics, Consent to Participate, and Consent to Publish declarations Not applicable Funding Sources This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Author Contribution A.G. (Alireza Ghotbi) developed the methodology, implemented all coding and modeling in Python, and led the overall research and manuscript preparation. M.R. (Mohammad Rahimi) contributed to the time-series methodology and provided domain expertise on the seismic data. A.Z. (Ahmad Zamani) supervised the research and provided guidance throughout the study. All authors reviewed and approved the final version of the manuscript. Acknowledgment The authors thank our co-author professor Ahamad Zamani for supervising this study. Data Availability The raw seismic data used in this study were obtained from the USGS Earthquake Catalog (https://earthquake.usgs.gov/earthquakes/search/). The processed datasets and model outputs generated during the current study are not publicly available but can be obtained from the corresponding author upon reasonable request. References Berhich, A., Belouadha, F.-Z., & Kabbaj, M. I. (2023). An attention-based LSTM network for large earthquake prediction . Charbonnier, P., Blanc-Féraud, L., Aubert, G., & Barlaud, M. (1997). Deterministic edge-preserving regularization in computed imaging. IEEE Transactions on Image Processing, 6(2), 298-311. https://doi.org/10.1109/83.551599 Danopoulos, D. et al. (2022). LSTM Acceleration with FPGA and GPU Devices for Edge Computing Applications in B5G MEC . In: Orailoglu, A., Reichenbach, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2022. Lecture Notes in Computer Science, vol 13511. Springer, Cham. https://doi.org/10.1007/978-3-031-15074-6_26 Gokcesu, K., & Gokcesu, H. (2021). Generalized Huber Loss for Robust Learning and its Efficient Minimization for Robust Statistics . Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., Patwary, M. M. A., Yang, Y., & Zhou, Y. (2017). Deep learning scaling is predictable, empirically . arXiv preprint arXiv:1712.00409. Hessami, K., & Jamali, F. (n.d.). Explanatory notes to the map of major active faults of Iran . Seismology Research Center, International Institute of Earthquake Engineering and Seismology (IIEES), 2006 Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9 (8), 1735-1780. MIT Press Ghafory-Ashtiany, M. (1999). Seismic hazard assessment of Iran. Annals of Geophysics I. W. Mustika, H. N. Adi, and F. Najib, Comparison of Keras Optimizers for Earthquake Signal Classification Based on Deep Neural Networks, in Proceedings of the 2021 4th International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 2021, pp. 304-308. DOI: 10.1109/ICOIACT53268.2021.9563990. Jadon, A., Patil, A., & Jadon, S. (2022). A comprehensive survey of regression-based loss functions for time series forecasting. arXiv. https://arxiv.org/abs/2211.02495 Karimpouli, S., Caus, D., Grover, H., Martínez-Garzón, P., Bohnhoff, M., Beroza, G. C., Dresen, G., Goebel, T., Weigel, T., & Kwiatek, G. (2023). Explainable machine learning for labquake prediction using catalog-driven features . Earth and Planetary Science Letters, 622, 118383. https://doi.org/10.1016/j.epsl.2023.118383. Khatib, M. M. (2023). Seismic Risk in Alborz: Insights from Geological Moment Rate Estimation and Fault Activity Analysis . Applied Sciences , 10(13), 2-15. Switzerland: Typographic. Laurenti, L., Tinti, E., Galasso, F., Franco, L., & Marone, C. (2022). Deep learning for laboratory earthquake prediction and autoregressive forecasting of fault zone stress . arXiv preprint arXiv:2203.13313 . Retrieved from https://arxiv.org/abs/2203.13313. Luxenberg, E., & Boyd, S. (2024). Exponentially Weighted Moving Models . arXiv preprint arXiv:2404.08136. M. Abbasi, M. Kargar, F. Ahmadian, D. NoormohammadZadehMaleki, A. Arandan, and N. S. Hosseini, "GN-CNN-LSTM: Financial Market Prediction With Gaussian Noise Embedded CNN LSTM ," 2024 11th International Symposium on Telecommunications (IST), Tehran, Iran, Islamic Republic of, 2024, pp. 287-294, https://doi.org/10.1109/IST64061.2024.10843452 Python Software Foundation. (2024). Python (Version 3.9.21) [Computer software]. https://www.python.org Rahimi, M, Zamani, Ahmad Ghotbi, Ali Reza, 2022, The study of seismicity of Alborz (Northern Iran) and Zagros (Southern Iran) regions by using time series analysis . Shalizi, C. (2015). Lecture 24/25 : Weighted and Generalized Least Squares . stat.cmu.edu United States Geological Survey. (n.d.). Earthquake Hazards Program. https://earthquake.usgs.gov Vamsikrishna, A., Gijo, E.V. New Techniques to Perform Cross-Validation for Time Series Models . Oper. Res. Forum 5, 51 (2024). https://doi.org/10.1007/s43069-024-00334-8 Wang, Q., Zhang, Y., Zhang, J. et al. On the use of VMD-LSTM neural network for approximate earthquake prediction. Nat Hazards 120, 13351–13367 (2024). https://doi.org/10.1007/s11069-024-06724-9 Wang, Q., Guo, Y., Yu, L., & Li, P. (2017). Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6395698","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":451657335,"identity":"f636d542-f713-4e34-a2d5-fa126c426201","order_by":0,"name":"Alireza Ghotbi","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABD0lEQVRIiWNgGAWjYDACdsYGMG3AwMDGkFBhIwfiHHiATwszXAszUMuZNGOwlgS8WqA0WAtjy+FEsAn4tPA3M7d9+PDHzm47e/+xBw8b0tLnhx1+CLTFTk63AbsWicOMzTNn8CQn7+w5zG6QuMMmd+PtNAOglmRjswM4rAFqYeaRYE42uJHMJpF4Ji134+wEkJYDidtwaJEHafljUA/V0nY43XB2+ge8WgxAWhgSDtvBtCTIS+fgt8UQqIWx58DxBIMzh80NgIFsuEE6p+BAggFuv8gdb3/M8ONPtb3B8cZnD39U2MjLz07f/OFDhZ0cTu9DASQ6wE4FqzTArxwE7OEs+QbcqkbBKBgFo2BkAgCyE2YR+3MM4wAAAABJRU5ErkJggg==","orcid":"","institution":"Islamic Azad University, Science and Research Branch","correspondingAuthor":true,"prefix":"","firstName":"Alireza","middleName":"","lastName":"Ghotbi","suffix":""},{"id":451657336,"identity":"e13be983-16c2-4ea2-b0c1-2400e241f740","order_by":1,"name":"Mohammad Rahimi","email":"","orcid":"","institution":"Islamic Azad University, Science and Research Branch","correspondingAuthor":false,"prefix":"","firstName":"Mohammad","middleName":"","lastName":"Rahimi","suffix":""},{"id":451657337,"identity":"7b938af8-9236-4006-afd6-55ae978dbb61","order_by":2,"name":"Ahmad Zamani","email":"","orcid":"","institution":"Shiraz University","correspondingAuthor":false,"prefix":"","firstName":"Ahmad","middleName":"","lastName":"Zamani","suffix":""}],"badges":[],"createdAt":"2025-04-07 15:23:22","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6395698/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6395698/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":82164199,"identity":"8fd96201-05f6-473d-86e6-38c185187ef5","added_by":"auto","created_at":"2025-05-07 09:03:37","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":332704,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eThe study area and geographic distribution of earthquake magnitudes in Iran (1900–2025). The scatter plot represents earthquake locations, with colors indicating magnitude—blue for lower magnitudes and red for higher magnitudes. The data illustrates the seismic activity concentrated along Iran’s major fault lines. Based on USGS earthquake catalogue.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-6395698/v1/b8ce598ad4333b65fbbe5f07.png"},{"id":82163714,"identity":"fcbaa2fc-bffc-40e3-96f0-a5a3569ee964","added_by":"auto","created_at":"2025-05-07 08:55:37","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":256443,"visible":true,"origin":"","legend":"\u003cp\u003eLSTM Model Predictions of Earthquake Magnitude 4.0 ≤ mag \u0026lt; 5.0 (mb) in Iran. The figure illustrates the model's performance at daily, monthly, and yearly resolutions, with corresponding accuracy metrics.\u003c/p\u003e","description":"","filename":"Fig2.png","url":"https://assets-eu.researchsquare.com/files/rs-6395698/v1/8a8aa633287a81d5f5d3555e.png"},{"id":82161707,"identity":"46736182-de41-46eb-86aa-8c2ad4ba66b7","added_by":"auto","created_at":"2025-05-07 08:39:37","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":300510,"visible":true,"origin":"","legend":"\u003cp\u003eLSTM Model Predictions of Earthquake Magnitude 5.0 ≤ mag \u0026lt; 6.0 (Mw) in Iran. The figure illustrates the model's performance at daily, monthly, and yearly resolutions, with corresponding accuracy metrics.\u003c/p\u003e","description":"","filename":"Fig3.png","url":"https://assets-eu.researchsquare.com/files/rs-6395698/v1/bbac6ba1b33eb71afe5bd59b.png"},{"id":82161697,"identity":"04c7d04b-a6e6-48b0-9c60-1c7d9be08d02","added_by":"auto","created_at":"2025-05-07 08:39:37","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":157105,"visible":true,"origin":"","legend":"\u003cp\u003eLSTM Model Predictions of Earthquake Magnitude 6.0 ≤ mag \u0026lt; 7.7 (Mw) in Iran. The figure illustrates the model's performance at daily, monthly, and yearly resolutions, with corresponding accuracy metrics.\u003c/p\u003e","description":"","filename":"Fig4.png","url":"https://assets-eu.researchsquare.com/files/rs-6395698/v1/840843e60a172725d411d8f2.png"},{"id":82161698,"identity":"c044d1e9-4bec-4233-891e-2e094d4e40eb","added_by":"auto","created_at":"2025-05-07 08:39:37","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":62723,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eResults of the LSTM model's prediction of earthquake magnitudes (4.0 - 7.7), including a scatter plot of predicted vs. true values with a correlation of 0.90 (left) and a histogram of the residuals\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Fig5.png","url":"https://assets-eu.researchsquare.com/files/rs-6395698/v1/ace09ddd34b9607c58492bd5.png"},{"id":91329680,"identity":"77eeb551-c390-4115-b379-2d91e1a438a8","added_by":"auto","created_at":"2025-09-15 10:38:41","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2218091,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6395698/v1/7533ae86-ee55-4b4b-9c75-1b67b5498fa5.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Performance Evaluation of LSTM Networks for Earthquake Magnitude Prediction in Iran","fulltext":[{"header":"1- Introduction","content":"\u003cp\u003eEarthquakes pose a significant threat to human life and infrastructure, triggering fires, accidents, and tsunamis. Since 1998, they have caused over 750,000 deaths, particularly in seismically active regions such as Turkey, Japan, China, and Iran. The extent of earthquake damage depends on both its magnitude and the resilience of buildings and infrastructure, with events exceeding magnitude 6.0 often leading to widespread destruction. A deeper understanding of earthquakes is essential for mitigating their impact and reducing casualties [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e], [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. This task can be achieved through the development of robust hazard models and advanced forecasting techniques, including the LSTM model.\u003c/p\u003e \u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003e1.1 Geographical Setting\u003c/h2\u003e \u003cp\u003eThe study area encompasses Iran, approximately bounded by latitudes 25.5\u0026deg;N to 40.0\u0026deg;N and longitudes 45.0\u0026deg;E to 63.0\u0026deg;E (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Iran is located within the central segment of the Alpide Belt, a major orogenic system extending along the southern margin of Eurasia. This extensive belt comprises a series of mountain ranges stretching from Java and Sumatra, through the Himalayas and the Mediterranean, into the Atlantic Ocean. Based on collected data, the average earthquake magnitude in Iran from 1900 to the present was approximately 4.56 (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eWithin this system, the Zagros Fold-Thrust Belt, situated in southwestern Iran, represents a significant structural feature. Composed of a thick, conformable sedimentary sequence from the Cambrian to the late Tertiary, it was deformed during the late Alpine orogeny, in the Plio-Pleistocene. The belt exhibits long, parallel, asymmetric folds, forming a linear intercontinental structure trending NW-SE between the Arabian Shield and Central Iran. It extends approximately 1,500 km in length and 200\u0026ndash;300 km in width, consisting exclusively of sedimentary rocks with no evidence of metamorphic or igneous activity [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. While Central and Eastern Iran experience less seismic activity, the entire region remains vulnerable to destructive earthquakes.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e1.2 Seismic Activity in Iran\u003c/h2\u003e \u003cp\u003eIran is divided into several seismotectonic provinces, each with its own seismic characteristics. The Alborz Mountain range, for example, encompasses major cities such as Tehran, Rasht, Karaj, Zanjan, Qazvin, Sari, and Gorgan, which have experienced devastating historical earthquake [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Iran\u0026rsquo;s GPS horizontal velocities indicate that the Arabian plate moves northward relative to Eurasia at a rate of 2.1\u0026ndash;2.5 centimeters per year. However, deformation varies across several active zones, including the Makran subduction complex, Kopeh-Dagh Mountains, Zagros, and Alborz Mountains. Right-lateral displacement primarily occurs along the Main Recent Fault and North Tabriz Fault. While Central Iran behaves as a rigid block relative to Eurasia, Eastern Iran exhibits slower movement. The velocity contrast between regions results in right-lateral strike-slip motion along north-south trending faults bounding the Lut block [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec4\" class=\"Section3\"\u003e \u003ch2\u003e1.2.3 Statistical Overview\u003c/h2\u003e \u003cp\u003eThe statistical analysis of the dataset reveals earthquake magnitudes ranging from 4.0 to 7.7. The mean magnitude is 4.56, with a median of 4.4. The most frequently occurring magnitude is also 4.4, appearing 787 times. The minimum recorded magnitude is 4.0, while the maximum reaches 7.7, indicating a wide range of seismic activity in the region. A standard deviation of 0.47 suggests that most earthquakes are clustered around the mean, with occasional larger events. The distribution of magnitudes, shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, confirms that lower-magnitude earthquakes are more prevalent.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e1.3 LSTM Network\u003c/h2\u003e \u003cp\u003eLong Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) introduced by Sepp Hochreiter and J\u0026uuml;rgen Schmidhuber in 1997. They were designed to address the vanishing gradient problem in training traditional RNNs, enabling the learning of long-term dependencies in sequential data. LSTMs incorporate memory cells and gating mechanisms to control the flow of information, allowing them to maintain and update information over extended time intervals. While traditional RNNs struggle with learning long-term dependencies due to vanishing gradients, LSTMs overcome this limitation using gated mechanisms. The input, forget, and output gates regulate the flow of information, enabling LSTMs to retain important patterns over long sequences. These gates are mathematically defined as follows:\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:\\begin{array}{c}{f}_{t}=\\sigma\\:\\left({W}_{f}\\cdot\\:\\left[{h}_{t-1},{x}_{t}\\right]+{b}_{f}\\right)\\\\\\:{i}_{t}=\\sigma\\:\\left({W}_{i}\\cdot\\:\\left[{h}_{t-1},{x}_{t}\\right]+{b}_{i}\\right)\\\\\\:{\\stackrel{\\prime }{C}}_{t}=tanh\\left({W}_{C}\\cdot\\:\\left[{h}_{t-1},{x}_{t}\\right]+{b}_{C}\\right)\\\\\\:{C}_{t}={f}_{t}\\odot\\:{C}_{t-1}+{i}_{t}\\odot\\:{\\stackrel{\\prime }{C}}_{t}\\\\\\:{o}_{t}=\\sigma\\:\\left({W}_{o}\\cdot\\:\\left[{h}_{t-1},{x}_{t}\\right]+{b}_{o}\\right)\\\\\\:{h}_{t}={o}_{t}\\odot\\:tanh\\left({C}_{t}\\right)\\end{array}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{f}_{t},{i}_{t}\\)\u003c/span\u003e\u003c/span\u003e, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{o}_{t}\\)\u003c/span\u003e\u003c/span\u003e represent the forget, input, and output gate activations, respectively, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{C}_{t}\\)\u003c/span\u003e\u003c/span\u003e is the cell state, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{h}_{t}\\)\u003c/span\u003e\u003c/span\u003e is the hidden state, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\sigma\\:\\)\u003c/span\u003e\u003c/span\u003e and tanh are the sigmoid and hyperbolic tangent activation functions. These mechanisms allow LSTMs to selectively retain or discard information across long sequences, making them well-suited for capturing temporal patterns in earthquake data. By leveraging LSTM networks, this study aims to optimize earthquake magnitude forecasting in Iran, evaluating different hyperparameters to enhance predictive performance. (Hochreiter \u0026amp; Schmidhuber, 1997).\u003c/p\u003e \u003cdiv id=\"Sec6\" class=\"Section3\"\u003e \u003ch2\u003e1.3.1 Advantages of LSTM Networks in Seismic Forecasting\u003c/h2\u003e \u003cp\u003eLSTM models are effective for earthquake prediction because they capture long-term dependencies and temporal patterns in seismic data, addressing the nonlinear and complex nature of earthquakes. Unlike traditional statistical models, LSTMs retain historical information and mitigate the vanishing gradient problem, improving predictive accuracy. Research has shown that LSTMs outperform autoregressive models in forecasting earthquake magnitudes and occurrences. For example, a study on deep learning for earthquake prediction demonstrates that LSTM-based models reduce forecasting errors and enhance seismic hazard assessments. [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/p\u003e \u003cp\u003eAdditionally, the integration of LSTMs with techniques like Variational Mode Decomposition (VMD) has been shown to enhance prediction accuracy for earthquake occurrence parameters, including time, location, and magnitude. [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"2- Methodology","content":"\u003cp\u003eFor developing LSTM models to predict earthquake magnitudes in Iran, historical seismic data from 1900 to the present has been collected. This dataset, sourced from the United States Geological Survey (USGS), contains approximately 6,916 earthquake events with magnitudes ranging from 4.0 to 7.7. It includes key attributes such as latitude, longitude, depth, magnitude, and other various geophysical parameters.\u003c/p\u003e \u003cp\u003ePrior to model training, the raw data underwent initial cleaning, including conversion of magnitude values to numeric format and removal of entries with missing magnitude data. The cleaned data was then resampled to daily, monthly, and yearly frequencies using the mean as the aggregation function to analyze trends at different temporal scales. Time intervals play a crucial role in earthquake prediction models, especially when utilizing LSTM networks. The choice of interval\u0026mdash;be it daily, monthly, or yearly\u0026mdash;significantly influences the model's performance and the granularity of predictions. Any NaN values generated during resampling were subsequently removed.\u003c/p\u003e \u003cp\u003eLSTM models were trained using three distinct loss functions: Custom Time-Weighted Mean Squared Error (MSE), Huber Loss, and Pseudo-Huber Loss. Optimal model hyperparameters were determined through a grid search approach, utilizing 3-fold cross-validation and negative mean squared error as the scoring metric. The grid search evaluated combinations of LSTM units, learning rates, and batch sizes. For earthquakes of magnitude 6.0 and above, the model was also trained using K-fold cross-validation to classify earthquake magnitudes into high-risk categories.\u003c/p\u003e \u003cp\u003eThe overall model performance was visualized through loss curves, confusion matrices, and earthquake magnitude distribution plots, providing critical insights into the effectiveness of different approaches across varying time intervals.\u003c/p\u003e \u003cp\u003eModel performance was evaluated using the following metrics:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eMean Absolute Error (MAE)\u003c/b\u003e: The average absolute difference between predicted and actual magnitudes. Lower values indicate better performance.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eRoot Mean Squared Error (RMSE)\u003c/b\u003e: The square root of the average squared difference between predicted and actual magnitudes. Lower values indicate better performance. RMSE is more sensitive to large errors.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eCorrelation\u003c/b\u003e: The correlation coefficient between predicted and actual magnitudes, where values closer to 1 indicate stronger positive correlations.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe LSTM networks were trained using four different regression loss functions, each tailored to specific magnitude ranges:\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eTime-Weighted Mean Squared Error (MSE)\u003c/strong\u003e \u003cp\u003eApplied to magnitudes in the range 4.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026lt;\u0026thinsp;5.0, emphasizing recent data while penalizing larger errors.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eHuber Loss\u003c/strong\u003e \u003cp\u003eUsed for magnitudes in the range 5.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026lt;\u0026thinsp;6.0, providing robustness against outliers while maintaining MSE-like behavior for small errors.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003ePseudo-Huber Loss\u003c/strong\u003e \u003cp\u003eImplemented for magnitudes in the range 6.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026le;\u0026thinsp;7.7, combining the benefits of Huber Loss with a smooth transition between quadratic and linear loss regions.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003ePseudo-Huber Loss (General Model)\u003c/strong\u003e \u003cp\u003eAlso applied to the entire dataset, covering magnitudes 4.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026le;\u0026thinsp;7.7, to evaluate its effectiveness across all earthquake magnitudes.\u003c/p\u003e \u003c/p\u003e \u003cp\u003eAlongside selecting appropriate loss functions for developing the LSTM model, choosing the right optimizer is equally important. The Adam optimizer (Adaptive Moment Estimation) was chosen for training due to its proven effectiveness in earthquake signal classification, as supported by comparative studies. Its adaptive learning rate, which adjusts based on past gradients, enables faster convergence and stable training\u0026mdash;critical for handling noisy earthquake time-series data. Additionally, Adam helps mitigate vanishing or exploding gradients, making it a robust choice for LSTM networks in earthquake magnitude prediction. Its default hyperparameters also reduce the need for extensive tuning, improving efficiency [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e].\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Computational Setup\u003c/h2\u003e \u003cp\u003eThe LSTM models in this study were implemented in Python 3.9.21 [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] using TensorFlow 2.12.0 on a machine running Windows 10 with an Intel Core CPU (AMD64 Family 23 Model 24 Stepping 1, Authentic AMD) and 15.91GB RAM. The models were trained using the CPU, as GPU acceleration was not available during the experiments. However, for LSTM networks with large datasets, it\u0026rsquo;s recommended to use GPU acceleration to reduce training and prediction times. This is crucial for iterative tasks like hyperparameter tuning or when dealing with real-time applications [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Data preprocessing and visualization were performed using NumPy, Pandas, and Matplotlib.\u003c/p\u003e \u003cp\u003eThe models were trained on different magnitude ranges with varying batch sizes to optimize performance for each specific range:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eModel 1 (4.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026lt;\u0026thinsp;5.0) with batch size 64,\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eModel 2 (5.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026lt;\u0026thinsp;6.0) with batch size 32,\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eModel 3 (6.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026lt;\u0026thinsp;7.7) with batch size 16,\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eModel 4 (4.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026le;\u0026thinsp;7.7) with batch size 64.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe varied batch sizes for the LSTM models were determined through grid search and cross-validation to achieve optimal performance.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Model 1: Magnitude Range of 4.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026lt;\u0026thinsp;5.0\u003c/h2\u003e \u003cp\u003eFor earthquakes within the 4.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026lt;\u0026thinsp;5.0 range, an LSTM model was developed and trained using historical seismic data from Iran. The data was preprocessed, converted to moment magnitude (mb) and resampled into daily, monthly, and yearly magnitudes before being scaled using MinMaxScaler to normalize values.\u003c/p\u003e \u003cp\u003e \u003cb\u003eModel Architecture \u0026amp; Training\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eInput Layer\u003c/b\u003e: Sequences of past earthquake magnitudes, with a length of {3, 7, 14, 30}, determined through hyperparameter tuning.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLSTM Layers\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eFirst LSTM layer with {32, 64, 128} units (selected via tuning), returning sequences to maintain temporal dependencies.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eDropout layer (0.1\u0026ndash;0.3) for regularization.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSecond LSTM layer (same unit range), returning a final output sequence.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eDropout layer (0.1\u0026ndash;0.3) to prevent overfitting.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eDense Output Layer\u003c/b\u003e: A fully connected layer (Dense(1)) that predicts the next earthquake magnitude.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLoss Function\u003c/b\u003e: Custom Time-Weighted Mean Squared Error (MSE), prioritizing recent seismic activity while penalizing larger errors.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eOptimization \u0026amp; Hyperparameter Tuning\u003c/b\u003e:\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eAdam (learning_rate\u0026thinsp;=\u0026thinsp;0.001).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eConducted via GridSearchCV with 3-fold cross-validation, specifically optimizing the number of LSTM units.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eEvaluation Metrics\u003c/b\u003e: MAE, RMSE and Correlation Coefficient\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe model was trained using sequences of length 3, and its predictive performance was analyzed using statistical metrics. The results, visualized in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e2\u003c/span\u003e, demonstrated the model\u0026rsquo;s effectiveness, with key performance metrics reported in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003eTime-Weighted MSE Mathematical Formula for the Loss Function\u003c/b\u003e:\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:{\\mathcal{L}}_{\\text{TW-MSE\\:}}=\\frac{1}{N}\\sum\\:_{i=1}^{N}\\:{w}_{i}{\\left({y}_{\\text{true:},i}-{y}_{\\text{pred\\:},i}\\right)}^{2}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eWhere:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:N\\)\u003c/span\u003e \u003c/span\u003e is the number of samples.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{y}_{\\text{true:},i}\\)\u003c/span\u003e \u003c/span\u003e is the actual (true) value at time step \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:i\\)\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{y}_{\\text{pred:},i}\\)\u003c/span\u003e \u003c/span\u003e is the predicted value at time step \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:i\\)\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{w}_{i}=\\frac{i}{N}\\)\u003c/span\u003e \u003c/span\u003e is the weight assigned to each data point, increasing over time, with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:i\\)\u003c/span\u003e\u003c/span\u003e representing the index of the time step [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e].\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe given loss function differs from the standard Mean Squared Error (MSE) by incorporating time-dependent weighting for data points. Unlike MSE, which treats all data points equally, this modified loss function assigns greater emphasis to more recent data, reflecting the intuition that recent errors are more critical in time-series forecasting. By prioritizing the minimization of recent discrepancies, this approach enhances predictive accuracy and improves decision-making based on up-to-date information. This method aligns with recent advancements in time-series loss functions, such as the work by Jadon et al. [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], which highlights the advantages of regression-based loss functions that account for time-dependent characteristics. By integrating these insights, the proposed loss function strengthens model performance and ensures robust forecasting across various applications. [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Model 2: Magnitude Range of 5.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026lt;\u0026thinsp;6.0\u003c/h2\u003e \u003cp\u003eFor predicting earthquake magnitudes within the 5.0 to 6.0 range, same as before the LSTM model was developed using historical seismic data from Iran. The data was preprocessed, converted to moment magnitude (Mw), and resampled into daily, monthly, and yearly intervals.\u003c/p\u003e \u003cp\u003e \u003cb\u003eModel Architecture \u0026amp; Training\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eInput Sequences\u003c/b\u003e: Earthquake magnitudes were segmented into time sequences of length 3.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLSTM Configuration\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eA single LSTM layer with {50, 100, 150} units (optimized using GridSearchCV).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eReLU activation and He-normal initialization for stable training.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLoss Function\u003c/b\u003e: A custom Huber loss function was implemented to mitigate the impact of outliers.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eOptimization \u0026amp; Hyperparameter Tuning\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eAdam optimizer with a fine-tuned learning rate (0.001, 0.0005, 0.0001).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eGridSearchCV (3-fold cross-validation) was used to optimize LSTM units, batch size (16, 32, 64), and learning rate.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eEvaluation Metrics\u003c/b\u003e: MAE, RMSE and Correlation Coefficient\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e2\u003c/span\u003e illustrates the actual vs. predicted earthquake magnitudes, highlighting the model\u0026rsquo;s effectiveness, while the corresponding evaluation metrics are detailed in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003eHuber Loss Mathematical Formula for the Loss Function\u003c/b\u003e:\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$$\\:{L}_{\\delta\\:}\\left(a\\right)=\\left\\{\\begin{array}{c}\\frac{1}{2}{a}^{2}\\hspace{0.25em}\\hspace{0.25em}\\hspace{0.25em}\\hspace{0.25em}\\text{}\\text{f}\\text{o}\\text{r}\\text{}\\left|a\\right|\u0026lt;\\delta\\:\\\\\\:\\delta\\:\\left(\\left|a\\right|-\\frac{1}{2}\\delta\\:\\right)\\hspace{0.25em}\\hspace{0.25em}\\hspace{0.25em}\\hspace{0.25em}\\text{}\\text{f}\\text{o}\\text{r}\\text{}\\left|a\\right|\\ge\\:\\delta\\:\\end{array}\\right.$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:a={y}_{\\text{true:}}-{y}_{\\text{pred:}}\\)\u003c/span\u003e \u003c/span\u003e is the error (difference between the true and predicted values).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:\\delta\\:\\)\u003c/span\u003e \u003c/span\u003e (Set to 1.0 in the code) is the threshold that differentiates between the quadratic and linear loss regions. [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e].\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThis formulation makes Huber loss less sensitive to outliers compared to Mean Squared Error (MSE), as it behaves like MSE for small errors \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\left(\\right|a|\u0026lt;\\delta\\:)\\)\u003c/span\u003e\u003c/span\u003e and like Mean Absolute Error (MAE) for larger errors ( \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\left|a\\right|\\ge\\:\\delta\\:)\\)\u003c/span\u003e\u003c/span\u003e. Huber Loss for magnitude predictions, providing robustness against outliers. [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Model 3: Magnitude Range of 6.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026lt;\u0026thinsp;7.7\u003c/h2\u003e \u003cp\u003eGiven that LSTM models are known to require substantial amounts of data to achieve optimal performance, (Hestness, et al, 2017), predicting earthquakes within the higher magnitude range of 6.0 to 7.7 presents a significant challenge due to the inherent scarcity of such events. Our dataset contained only 103 events in this magnitude range, which is a key limitation when assessing the model's performance. To address this issue, we implemented Gaussian Noise Augmentation as a strategy to enhance the dataset and improve model training. [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]\u003c/p\u003e \u003cp\u003eThe Gaussian noise formula follows a normal distribution and is defined as:\u003c/p\u003e \u003cp\u003e \u003cem\u003eN\u003c/em\u003e (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\mu\\:\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{\\sigma\\:}^{2}\\)\u003c/span\u003e\u003c/span\u003e) = \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\frac{1}{\\sqrt{2\\pi\\:{\\sigma\\:}^{2}}}{e}^{-\\frac{{\\left(x-\\mu\\:\\right)}^{2}}{2{\\sigma\\:}^{2}}}\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003cp\u003ewhere:\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:\\mu\\:\\)\u003c/span\u003e \u003c/span\u003e is the mean magnitude of earthquakes in the range 6 to 7.7 from the dataset\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:{\\sigma\\:}^{2\\:}\\)\u003c/span\u003e \u003c/span\u003e is the variance of the magnitudes in this range.\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:\\sigma\\:\\:\\)\u003c/span\u003e \u003c/span\u003eis the standard deviation of the earthquake magnitudes.\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:x\\:\\)\u003c/span\u003e \u003c/span\u003eis a random variable following the normal distribution.\u003c/p\u003e \u003cp\u003eTo introduce controlled noise in dataset of magnitude 6\u0026ndash;7.7 earthquake data, it is modified as:\u003cdiv id=\"Equd\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equd\" name=\"EquationSource\"\u003e\n$$\\:{x}^{{\\prime\\:}}=x+\\mathcal{N}\\left(\\mu\\:,{\\sigma\\:}^{2}\\right)$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{x}^{{\\prime\\:}}\\)\u003c/span\u003e\u003c/span\u003e is the new magnitude value after adding Gaussian noise. \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\mathcal{N}\\left(\\mu\\:,{\\sigma\\:}^{2}\\right)\\)\u003c/span\u003e\u003c/span\u003e is the noise is sampled from a normal distribution. Since the magnitude range is relatively narrow (6\u0026ndash;7.7), we set, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\mu\\:=0\\)\u003c/span\u003e\u003c/span\u003e (zero mean noise) and choose a \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\sigma\\:\\)\u003c/span\u003e\u003c/span\u003e as a small fraction of the standard deviation to avoid unrealistic values.\u003c/p\u003e \u003cp\u003eTo predict earthquakes in the 6.0\u0026ndash;7.7 magnitude range, the LSTM model incorporating a custom pseudo-Huber loss function to mitigate the impact of outliers\u003c/p\u003e \u003cp\u003e \u003cb\u003eModel Architecture \u0026amp; Training\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eInput Sequences\u003c/b\u003e: Earthquake magnitudes were structured into time sequences of length 3.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLSTM Configuration\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eA single LSTM layer with {50, 100, 150} units, optimized via GridSearchCV.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eReLU activation with He-normal initialization for stable weight updates.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eOptimization and Hyperparameter Tuning\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eAdam optimizer (learning rate\u0026thinsp;=\u0026thinsp;0.001, gradient clipping\u0026thinsp;=\u0026thinsp;clipvalue\u0026thinsp;=\u0026thinsp;1.0).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eGridSearchCV (3-fold cross-validation) for hyperparameter tuning.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eData Scaling\u003c/b\u003e: MinMaxScaler was used for feature normalization.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eEvaluation Metrics\u003c/b\u003e: MAE, RMSE and Correlation Coefficient\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe Pseudo-Huber loss function is a smooth approximation of Huber Loss, designed to be less sensitive to outliers while maintaining differentiability. It is defined as:\u003cdiv id=\"Eque\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Eque\" name=\"EquationSource\"\u003e\n$$\\:{L}_{\\delta\\:}\\left(a\\right)={\\delta\\:}^{2\\:}\\left(\\sqrt{1+\\left(\\frac{a}{\\delta\\:}\\right)\u0026sup2;}-\\:1\\right)\\:$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eWhere:\u003c/p\u003e \u003cp\u003e \u003cem\u003ea =\u003c/em\u003e \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:y\\:-\\:\\:ŷ\\)\u003c/span\u003e\u003c/span\u003e is the difference between the actual and predicted values.\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:\\delta\\:\\)\u003c/span\u003e \u003c/span\u003e is a parameter that controls the transition between quadratic and linear behavior.\u003c/p\u003e \u003cp\u003eFor small errors (|\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:a\\)\u003c/span\u003e\u003c/span\u003e| ≪ \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\delta\\:\\)\u003c/span\u003e\u003c/span\u003e) Pseudo-Huber loss behaves like the Mean Squared Error (MSE), while for large errors (|\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:a\\)\u003c/span\u003e\u003c/span\u003e| ≫ \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\delta\\:\\)\u003c/span\u003e\u003c/span\u003e), it behaves more like the Mean Absolute Error (MAE), reducing the impact of outliers [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e2.5 Model 4: Magnitude Range of 4.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026le;\u0026thinsp;7.7\u003c/h2\u003e \u003cp\u003eUnlike previous models that focused solely on time and magnitude, this model utilizes a broader set of earthquake-related features to enhance predictive accuracy. The input consists of a 30-day time-series sequence, where each time step includes the following six features including the location, depth, Root Mean Square (rms) Error, Azimuthal Gap, and Minimum Distance to the Station (dmin).\u003c/p\u003e \u003cp\u003e \u003cb\u003eModel Architecture \u0026amp; Training\u003c/b\u003e:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLSTM Layers\u003c/b\u003e:\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eLSTM Layer 1: A recurrent layer with 64 units, configured to return sequences to preserve temporal dependencies across time steps.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eLSTM Layer 2: A second LSTM layer with 32 units extracts deeper temporal patterns.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eDropout Layer: A dropout rate of 0.3 is applied to mitigate overfitting.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eDense Layer: A fully connected layer with 32 neurons and ReLU activation, refining feature interactions.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eOutput Layer\u003c/b\u003e: A single neuron that predicts the earthquake magnitude.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLoss Function\u003c/b\u003e: The model is trained using the custom Pseudo-Huber loss, balancing robustness against outliers and sensitivity to small errors.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eOptimizer\u003c/b\u003e: The Adam optimizer is used with gradient clipping (clipvalue\u0026thinsp;=\u0026thinsp;1.0) to enhance training stability and prevent exploding gradients.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe model undergoes hyperparameter tuning using a Randomized Search approach, optimizing the following parameters:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eBatch Size\u003c/b\u003e: [32, 64]\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eEpochs\u003c/b\u003e: [50, 100]\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLSTM Units\u003c/b\u003e: [64, 128]\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLearning Rate\u003c/b\u003e: [1e-3, 5e-4, 1e-4]\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eDelta (Pseudo-Huber Loss Parameter)\u003c/b\u003e: [0.5, 1.0, 1.5, 2.0]\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eA 3-fold cross-validation to ensure a robust model evaluation and prevent overfitting to specific data subsets.\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eEvaluation Metrics\u003c/strong\u003e \u003cp\u003eMAE, RMSE and Correlation Coefficient\u003c/p\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"3- Models’ Performances and Analysis","content":"\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Performance Evaluation of Model 1 (4.0\u0026thinsp;\u0026le;\u0026thinsp;mag\u0026thinsp;\u0026lt;\u0026thinsp;5.0)\u003c/h2\u003e \u003cp\u003eThe findings indicate a clear trend in predictive performance across different time intervals for earthquakes with magnitudes 4.0\u0026thinsp;\u0026le;\u0026thinsp;mag\u0026thinsp;\u0026lt;\u0026thinsp;5.0. Yearly data yielded the most accurate predictions, exhibiting the lowest MAE and RMSE values while achieving the highest correlation coefficient. Specifically, yearly predictions had an MAE of 0.235, an RMSE of 0.325, and a high correlation of 0.8949. These results suggest that yearly aggregated data allows the model to capture long-term seismic trends effectively while reducing short-term noise for this magnitude range. (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Performance Evaluation of Model 2 (5.0\u0026thinsp;\u0026le;\u0026thinsp;mag\u0026thinsp;\u0026lt;\u0026thinsp;6.0)\u003c/h2\u003e \u003cp\u003eDaily predictions were the least reliable within this magnitude range. The daily model exhibited an MAE of 0.1535, RMSE of 0.1927, and a correlation coefficient of 0.4316, indicating a moderate but still limited ability to capture short-term seismic variations. The relatively low correlation suggests that short-term fluctuations introduce noise, making daily models less effective for precise predictions.\u003c/p\u003e \u003cp\u003eIn contrast, monthly predictions provided a more balanced performance, significantly improving upon the daily model. The monthly model achieved an MAE of 0.1401, RMSE of 0.1781, and a correlation of 0.6073, demonstrating better predictive accuracy while retaining sufficient temporal resolution.\u003c/p\u003e \u003cp\u003eYearly predictions yielded the highest accuracy, with an MAE of 0.0721, RMSE of 0.0871, and a strong correlation of 0.8860, making it the most reliable approach for long-term trend forecasting. These results indicate that aggregating data over longer intervals helps mitigate noise and enhance prediction stability, making yearly models more suitable for strategic seismic risk assessment.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eLSTM Model Results for Predicting Earthquakes of Magnitude 4.0-7.7, showing MAE, RMSE, and Correlation at Daily, Monthly, and Yearly Time Intervals, with corresponding Loss Functions.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMagnitude\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTime Interval\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMAE\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRMSE\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eCorrelation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eLoss Function Used\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e4.0\u0026thinsp;\u0026le;\u0026thinsp;mag\u0026thinsp;\u0026lt;\u0026thinsp;5.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDaily (D)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.2088\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.2650\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.1817\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eCustom Time-Weighted MSE\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMonthly (M)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.09712\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.1259\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.5341\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYearly (Y)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.0235\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0325\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.8949\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e5.0\u0026thinsp;\u0026le;\u0026thinsp;mag\u0026thinsp;\u0026lt;\u0026thinsp;6.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDaily (D)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.1535\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.1927\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.4316\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eHuber Loss\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMonthly (M)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.1401\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.1781\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.6073\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYearly (Y)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.0721\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.0871\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.8860\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e6.0\u0026thinsp;\u0026le;\u0026thinsp;mag\u0026thinsp;\u0026lt;\u0026thinsp;7.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDaily (D)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.2345\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.3020\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.3046\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003ePseudo-Huber loss\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMonthly (M)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.2495\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.3727\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.2071\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYearly (Y)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.1568\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.1790\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.3365\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Performance Evaluation of Model 3 (6.0\u0026thinsp;\u0026le;\u0026thinsp;mag\u0026thinsp;\u0026lt;\u0026thinsp;7.7)\u003c/h2\u003e \u003cp\u003eThe model's performance in predicting large-magnitude earthquakes was assessed across daily, monthly, and yearly time intervals using the Pseudo-Huber loss function, which is effective in handling outliers while maintaining stability. The results indicate that yearly predictions yield the best performance, with the lowest error values (MAE\u0026thinsp;=\u0026thinsp;0.1568, RMSE\u0026thinsp;=\u0026thinsp;0.1790) and the highest correlation \u003cb\u003e(\u003c/b\u003e0.3365\u003cb\u003e)\u003c/b\u003e, suggesting that long-term seismic trends are more predictable than short-term fluctuations. (Table-1) Conversely, daily, and monthly predictions exhibit higher errors and weaker correlations, with daily predictions achieving a correlation of 0.3046 and monthly predictions dropping to 0.2071, indicating increased uncertainty in short-term forecasting. This trend highlights the challenges of modeling seismic activity on finer time scales, where localized and unpredictable variations\u0026mdash;such as aftershocks\u0026mdash;may contribute to deviations in the predicted values. These findings reinforce the importance of selecting appropriate time intervals when forecasting large-magnitude earthquakes and suggest that long-term predictions provide more reliable insights into seismic trends.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo ensure the model\u0026rsquo;s robustness and assess its generalization performance across different subsets of data, we performed 5-fold cross-validation on the earthquake magnitude dataset with the same model. The K-fold cross-validation results for predicting earthquakes of magnitude 6.0\u0026ndash;7.7 show weaker performance compared to the model without K-fold. Short-term predictions (daily and monthly) remained unreliable, with high MAE and RMSE values and weak or negative correlations, likely due to the sporadic nature of large earthquakes. While yearly aggregation improved performance (MAE: 0.2205, RMSE: 0.2638, correlation: 0.2396), it still underperformed compared to the standard model without K-fold, which achieved a lower MAE (0.1568) and RMSE (0.1790) with a slightly better correlation (0.3365) (Table\u0026nbsp;2).\u003c/p\u003e \u003cp\u003eIt is important to note that K-fold cross-validation assumes that data points are independent and identically distributed (i.i.d.), which is often not valid for time-series data due to its temporal dependencies [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. In contrast, time-series-specific validation methods could provide a more accurate assessment.\u003c/p\u003e\u003cp\u003e \u003cem\u003eTable-2 The results of K-Cross validation for magnitude 6.0-7.7 LSTM model\u003c/em\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFrequency\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAverage MAE\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAverage RMSE\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAverage Correlation\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDaily\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.29188997\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.378925037\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e-0.082527095\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMonthly\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.285678448\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.35724987\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e-0.021537634\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYearly\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.220487958\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.263773797\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.23957514\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Performance Evaluation of Model 4 (4.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026le;\u0026thinsp;7.7)\u003c/h2\u003e \u003cp\u003eWhen the LSTM model was trained and evaluated on the entire magnitude range of earthquakes in Iran, a significant improvement in performance was observed. Figure\u0026nbsp;6 presents the scatter plot of predicted vs. true magnitudes, revealing a strong positive correlation of 0.90. (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e) This high correlation suggests that the model effectively captures the underlying relationship between predicted and actual magnitudes. Furthermore, the error distribution, as depicted in the histogram, is centered around zero and closely resembles a normal distribution, indicating unbiased and randomly distributed errors. This finding aligns with the research of Wang, Guo, Yu, and Li [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e], which highlights the role of geographic information in improving LSTM models for earthquake prediction. While the model demonstrates excellent overall performance, some clustering of predictions appears in the lower magnitude range, indicating a potential area for further investigation. These results contrast sharply with the challenges encountered when focusing solely on the 6.0\u0026ndash;7.7 magnitude range, where data scarcity and model instability were significant concerns.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4- Discussion","content":"\u003cp\u003eThe results demonstrate the model\u0026rsquo;s capability to effectively capture long-term seismic trends and achieve high prediction accuracy, particularly when trained on the entire magnitude range of earthquakes. it\u0026rsquo;s evident that LSTM networks have great potential for creating earthquake hazard models and warning systems. Compared to our previous research on the seismicity of Alborz and Zagros regions which has applied time-series analysis to study earthquake recurrence rates and temporal patterns [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. While such statistical approaches provide valuable insights, they often struggle with capturing non-linearity, complex feature interactions, and long-term dependencies in seismic activity. Simple time-series models like moving averages or autoregressive methods assume linear relationships and rely on limited past values, making them insufficient for modeling seismic events influenced by multiple factors such as depth, location, and stress accumulation over time.\u003c/p\u003e \u003cp\u003eIn contrast, Long LSTM networks overcome these limitations by maintaining memory of long-term dependencies, adapting to non-linear patterns, and integrating multiple seismic parameters into the forecasting process. LSTM models use gated mechanisms to selectively retain or forget information, making them more suitable for handling irregular seismic events and improving predictive accuracy.\u003c/p\u003e \u003cp\u003eThe results also show the impact of implementing different loss functions and feature engineering approaches on model performance, highlighting key insights and areas for improvement. Despite promising results, short-term forecasting (daily predictions) exhibited higher error rates due to seismic variability. Additionally, data scarcity for large-magnitude earthquakes impacted predictive stability. Future research should focus on:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eEnhancing daily and monthly prediction accuracy through advanced feature selection and external data integration.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eDeveloping hybrid models combining LSTM with other machine learning techniques to improve robustness.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eExploring real-time seismic data integration for early warning applications.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eBy addressing these challenges, the predictive framework could contribute significantly to seismic risk mitigation efforts.\u003c/p\u003e"},{"header":"5- Conclusion","content":"\u003cp\u003eThe evaluation of LSTM networks for earthquake magnitude prediction in Iran, highlighting the effectiveness of different loss functions, time intervals, and model architectures:\u003c/p\u003e \u003cp\u003e \u003col style=\"list-style-type:lower-roman;\"\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eYearly earthquake predictions demonstrated the highest accuracy, while daily predictions were less reliable, particularly for magnitudes between 4.0 and 5.0.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eMultivariate models incorporating additional seismic features outperformed univariate models.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003ePseudo-Huber Loss achieved the best balance of stability and robustness to outliers, enhancing predictive accuracy across magnitudes, while time-weighted MSE worked best for lower magnitudes.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eHyperparameter tuning significantly impacted model performance, with optimized LSTM configurations yielding lower error rates. However, tuning hyperparameters for large datasets is computationally intensive, often requiring extensive grid search or random search techniques. The use of GPU acceleration, distributed computing, and model pruning techniques can substantially reduce computational costs and training time, making large-scale seismic forecasting more feasible.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eData augmentation techniques (e.g., Gaussian noise) enhanced model generalization for large-magnitude earthquake predictions.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eFuture studies should focus on refining short-term predictions, integrating external geophysical and seismic data, such as \u003cem\u003ePGA\u003c/em\u003e and \u003cem\u003ePGV\u003c/em\u003e for developing better warning systems and exploring alternative deep learning architectures to enhance model performance. By improving earthquake forecasting methodologies, this research contributes to seismic hazard assessment and disaster preparedness in Iran.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eDeclaration of interests\u003c/h2\u003e \u003cp\u003eWe have nothing to declare.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eClinical Trial Registration\u003c/strong\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eEthics, Consent to Participate, and Consent to Publish declarations\u003c/strong\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding Sources\u003c/h2\u003e \u003cp\u003eThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eA.G. (Alireza Ghotbi) developed the methodology, implemented all coding and modeling in Python, and led the overall research and manuscript preparation. M.R. (Mohammad Rahimi) contributed to the time-series methodology and provided domain expertise on the seismic data. A.Z. (Ahmad Zamani) supervised the research and provided guidance throughout the study. All authors reviewed and approved the final version of the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgment\u003c/h2\u003e \u003cp\u003eThe authors thank our co-author professor Ahamad Zamani for supervising this study.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe raw seismic data used in this study were obtained from the USGS Earthquake Catalog (https://earthquake.usgs.gov/earthquakes/search/). The processed datasets and model outputs generated during the current study are not publicly available but can be obtained from the corresponding author upon reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eBerhich, A., Belouadha, F.-Z., \u0026amp; Kabbaj, M. I. (2023). \u003cem\u003eAn attention-based LSTM network for large earthquake prediction\u003c/em\u003e. \u003c/li\u003e\n\u003cli\u003eCharbonnier, P., Blanc-F\u0026eacute;raud, L., Aubert, G., \u0026amp; Barlaud, M. (1997). Deterministic edge-preserving regularization in computed imaging. IEEE Transactions on Image Processing, 6(2), 298-311. https://doi.org/10.1109/83.551599\u003c/li\u003e\n\u003cli\u003eDanopoulos, D. et al. (2022). \u003cem\u003eLSTM Acceleration with FPGA and GPU Devices for Edge Computing Applications in B5G MEC\u003c/em\u003e. In: Orailoglu, A., Reichenbach, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2022. Lecture Notes in Computer Science, vol 13511. Springer, Cham. https://doi.org/10.1007/978-3-031-15074-6_26\u003c/li\u003e\n\u003cli\u003eGokcesu, K., \u0026amp; Gokcesu, H. (2021). \u003cem\u003eGeneralized Huber Loss for Robust Learning and its Efficient Minimization for Robust Statistics\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eHestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., Patwary, M. M. A., Yang, Y., \u0026amp; Zhou, Y. (2017). \u003cem\u003eDeep learning scaling is predictable, empirically\u003c/em\u003e. arXiv preprint arXiv:1712.00409.\u003c/li\u003e\n\u003cli\u003eHessami, K., \u0026amp; Jamali, F. (n.d.). \u003cem\u003eExplanatory notes to the map of major active faults of Iran\u003c/em\u003e. Seismology Research Center, International Institute of Earthquake Engineering and Seismology (IIEES), 2006\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eHochreiter, S., \u0026amp; Schmidhuber, J.\u003c/strong\u003e (1997). \u003cem\u003eLong short-term memory.\u003c/em\u003e Neural Computation, \u003cstrong\u003e9\u003c/strong\u003e(8), 1735-1780. MIT Press\u003c/li\u003e\n\u003cli\u003eGhafory-Ashtiany, M. (1999). \u003cem\u003eSeismic hazard assessment of Iran. \u003cem\u003eAnnals of Geophysics\u003c/em\u003e\u003c/em\u003e\u003c/li\u003e\n\u003cli\u003eI. W. Mustika, H. N. Adi, and F. Najib, \u003cem\u003eComparison of Keras Optimizers for Earthquake Signal Classification Based on Deep Neural Networks,\u003c/em\u003e in Proceedings of the 2021 4th International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 2021, pp. 304-308. DOI: 10.1109/ICOIACT53268.2021.9563990.\u003c/li\u003e\n\u003cli\u003eJadon, A., Patil, A., \u0026amp; Jadon, S. (2022). \u003cem\u003eA comprehensive survey of regression-based loss functions for time series forecasting.\u003c/em\u003e arXiv. https://arxiv.org/abs/2211.02495\u003c/li\u003e\n\u003cli\u003eKarimpouli, S., Caus, D., Grover, H., Mart\u0026iacute;nez-Garz\u0026oacute;n, P., Bohnhoff, M., Beroza, G. C., Dresen, G., Goebel, T., Weigel, T., \u0026amp; Kwiatek, G. (2023). \u003cem\u003eExplainable machine learning for labquake prediction using catalog-driven features\u003c/em\u003e. Earth and Planetary Science Letters, 622, 118383. https://doi.org/10.1016/j.epsl.2023.118383.\u003c/li\u003e\n\u003cli\u003eKhatib, M. M. (2023). \u003cem\u003eSeismic Risk in Alborz: Insights from Geological Moment Rate Estimation and Fault Activity Analysis\u003c/em\u003e. \u003cem\u003eApplied Sciences\u003c/em\u003e, 10(13), 2-15. Switzerland: Typographic.\u003c/li\u003e\n\u003cli\u003eLaurenti, L., Tinti, E., Galasso, F., Franco, L., \u0026amp; Marone, C. (2022). \u003cem\u003eDeep learning for laboratory earthquake prediction and autoregressive forecasting of fault zone stress\u003c/em\u003e. \u003cem\u003earXiv preprint arXiv:2203.13313\u003c/em\u003e. Retrieved from https://arxiv.org/abs/2203.13313.\u003c/li\u003e\n\u003cli\u003eLuxenberg, E., \u0026amp; Boyd, S. (2024). \u003cem\u003eExponentially Weighted Moving Models\u003c/em\u003e. arXiv preprint arXiv:2404.08136.\u003c/li\u003e\n\u003cli\u003eM. Abbasi, M. Kargar, F. Ahmadian, D. NoormohammadZadehMaleki, A. Arandan, and N. S. Hosseini, \u0026quot;GN-CNN-LSTM: \u003cem\u003eFinancial Market Prediction With Gaussian Noise Embedded CNN LSTM\u003c/em\u003e,\u0026quot; 2024 11th International Symposium on Telecommunications (IST), Tehran, Iran, Islamic Republic of, 2024, pp. 287-294, https://doi.org/10.1109/IST64061.2024.10843452\u003c/li\u003e\n\u003cli\u003ePython Software Foundation. (2024). Python (Version 3.9.21) [Computer software]. https://www.python.org\u003c/li\u003e\n\u003cli\u003eRahimi, M, Zamani, Ahmad Ghotbi, Ali Reza, 2022, \u003cem\u003eThe study of seismicity of Alborz (Northern Iran) and Zagros (Southern Iran) regions by using time series analysis\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eShalizi, C. (2015). Lecture 24/25\u003cem\u003e: Weighted and Generalized Least Squares\u003c/em\u003e. stat.cmu.edu\u003c/li\u003e\n\u003cli\u003eUnited States Geological Survey. (n.d.). Earthquake Hazards Program. https://earthquake.usgs.gov\u003c/li\u003e\n\u003cli\u003eVamsikrishna, A., Gijo, E.V. \u003cem\u003eNew Techniques to Perform Cross-Validation for Time Series Models\u003c/em\u003e. \u003cem\u003eOper. Res. Forum\u003c/em\u003e 5, 51 (2024). https://doi.org/10.1007/s43069-024-00334-8\u003c/li\u003e\n\u003cli\u003eWang, Q., Zhang, Y., Zhang, J. et al. On the use of VMD-LSTM neural network for approximate earthquake prediction. Nat Hazards 120, 13351\u0026ndash;13367 (2024). https://doi.org/10.1007/s11069-024-06724-9\u003c/li\u003e\n\u003cli\u003eWang, Q., Guo, Y., Yu, L., \u0026amp; Li, P. (2017). Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Iran, LSTM, earthquake, neural network, time-series","lastPublishedDoi":"10.21203/rs.3.rs-6395698/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6395698/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eIran, located along an active seismic belt, frequently experiences destructive earthquakes, making accurate forecasting crucial for disaster preparedness and risk mitigation. This study employs Long Short-Term Memory (LSTM) networks to predict earthquake magnitudes using a dataset of 6,916 seismic events recorded in Iran from 1900 to the present, with magnitudes ranging from 4.0 to 7.7. Various loss functions and resampling methods were applied to optimize predictive accuracy, and the performance of four models was compared. Results indicate that LSTM networks achieved a high correlation across the full magnitude range, with yearly resampling yielding the most accurate predictions. For large earthquakes (6.0\u0026thinsp;\u0026le;\u0026thinsp;M\u0026thinsp;\u0026lt;\u0026thinsp;7.7), the Pseudo-Huber loss function improved model stability, though predictions were constrained by data scarcity. While daily and monthly predictions exhibited higher variance, yearly forecasting provided more reliable long-term trends. This study underscores the importance of selecting appropriate time intervals and loss functions in earthquake prediction models. The findings contribute to seismic hazard assessment efforts and can aid in developing early warning systems for earthquake-prone regions.\u003c/p\u003e","manuscriptTitle":"Performance Evaluation of LSTM Networks for Earthquake Magnitude Prediction in Iran","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-07 08:39:32","doi":"10.21203/rs.3.rs-6395698/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d842ab78-a285-4938-b99e-708b7bc46043","owner":[],"postedDate":"May 7th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-09-15T10:38:24+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-07 08:39:32","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6395698","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6395698","identity":"rs-6395698","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.