Application of automated seismic event detection in a low seismicity region of the Baltic States | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Application of automated seismic event detection in a low seismicity region of the Baltic States Viesturs Zandersons, Jānis Karušs This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8988784/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 6 You are reading this latest preprint version Abstract Reliable earthquake catalogues in stable continental regions are difficult to obtain due to sparse station coverage, low signal-to-noise ratios, and the predominance of low-magnitude and anthropogenic events. We evaluated the performance of three deep learning phase picking algorithms – Earthquake Transformer, PhaseNet, and Generalized Phase Detection (GPD) – combined with two phase association methods, Gaussian Mixture Model Association (GaMMA) and PyOcto, using seismic data from the Baltic States between January and October 2021. Automatic detections are benchmarked against manually compiled observations from the Latvian Environment, Geology, and Meteorology Centre. The results show that PhaseNet and Earthquake Transformer substantially outperform GPD in terms of event recall. PyOcto associator generally produces higher recall but lower precision than the GaMMA. The PyOcto event relocation using HypoInverse significantly reduces recall, highlighting the sensitivity of sparse networks to misassociated or slightly mis-timed phase picks. Detection performance strongly depends on the number of available phase observations; events recorded by fewer than five picks are rarely recovered reliably. Our analysis shows that automatic workflows are highly sensitive to the number and spatial distribution of phase observations. Ensemble combinations of multiple pickers and associators improve recovery but also amplify false detections if not carefully constrained. The results demonstrate that parameter tuning, association strategy, and network configuration together govern catalogue quality in low-seismicity intraplate environments. machine learning Baltica automatic event detection seismic event catalogue Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Article Highlights Machine learning and statistical seismic event detection tools are tested in a sparsely monitored intraplate area of the Baltic states. Sparse station coverage strongly limits fully automatic catalogues, but combining multiple methods can improve recall. Events recorded with seven or more seismic phase readings are detected much more reliably. 1. Introduction Seismological observation and seismology are heavily based on the detection and location of earthquakes. Observed earthquakes and other seismic events are crucial for determining seismic risk in any given area. The accuracy of earthquake catalogues is largely determined by the number and precision of arrival time measurements for each observed event. Typically, phase picking, which involves the selection of arrival times for specific seismic waves, is carried out by experienced seismologists using their professional judgement. However, the rapid expansion of seismometer deployment has made it increasingly difficult to handle the large volume of data. Automatic phase pickers and event detection algorithms are increasingly employed to identify previously unnoticed seismic events, thus alleviating the workload of operators at seismological observatories. Various signal processing algorithms alongside machine learning (ML) techniques are used for phase detection with different degrees of success (Allen 1978; Withers et al. 1998; Zhu and Beroza 2018; Ross et al. 2018; Kong et al. 2019; Mousavi et al. 2020). In regions with low seismic activity and sparse station coverage, phase picking becomes even more laborious. A low signal-to-noise ratio can make some wave arrivals difficult to discern, particularly in noisy environments. The proficiency of the seismologist is paramount in such settings, as it is necessary to discern events from background noise and to precisely time them. Conventional automatic tools often perform suboptimally in these scenarios, leading to numerous misclassifications and a high rate of false positives (FP) (Vera Rodriguez et al. 2012). This research leverages seismic data from the Baltic region to evaluate various earthquake detection and phase picking algorithms and determine the most effective for regions with predominantly microseismic activity. We compare the efficacy of three automated phase picking algorithms based on deep learning and two event association tools with manual event detection and phase picking conducted by seismologists at the Latvian Environment, Geology, and Meteorology Centre (LEGMC). 2. Geological and seismological setting In this study, we focus on the territory of the Baltic states (Fig.1). Baltic states are a part of Eurasian plate and are considered as part of Baltica paleocontinent that formed during assembly of the Rodinia supercontinent approximately before 1,8 - 1,75 Ga (Torsvik and Cocks 2005; Bogdanova et al. 2008). The consensus is that Baltica was constructed from the amalgamation of various microcontinents (Bogdanova et al. 2015). Throughout the Phanerozoic era, Baltica underwent several tectonic collisions (Cocks and Torsvik 2005; Nance et al. 2014). However, the Baltic States are positioned centrally within the Baltica landmass, insulating them from direct involvement in the dynamic tectonic activity that typically affects the margins of continents (Cocks and Torsvik 2005; Nance et al. 2014). Geologically, all the Baltic states lie on the East European Craton, where the Proterozoic crystalline basement is overlaid by predominantly siliciclastic and carbonate Paleozoic and Mesozoic sedimentary rocks, with their thickness reaching up to two kilometres. The Paleogene and Neogene periods were marked by terrestrial conditions, with negligible rock formation, and the geological strata are capped with Quaternary deposits from the most recent glacial period (Brangulis et al. 1998). Recent tectonic activity in the area has been sparsely studied; few natural earthquakes have been observed in the upper crust, with magnitudes < 2. The intraplate earthquakes in Estonia have been reported of strike-slip and reverse origin, along steep to subvertical fault planes. The main stress fields are likely attributed to a remote stress field from the opening of the Atlantic or the glacioisostatic rebound (Soosalu et al. 2022; Hellqvist et al. 2015). Due to only a few low-magnitude events and no active tectonic faults, the Baltic States are thus characterised by their relative seismological inactivity. The seismicity is monitored separately by the three Baltic States, but here we focus on the seismic observatory in Latvia, operated by the LEGMC, as it uses the information from all surrounding seismic stations in the Baltics. The LEGMC seismic observatory was operated from 2009 until 2021. Seismic events were monitored using a virtual seismic network (BAVSEN – Baltic Virtual Seismic Network) from stations surrounding the Baltic states. The stations in question are outlined in Figure 1. Initial onset detection is done manually by eye, picking the first arrivals from the waveforms and associating them to seismic events using Seisan (Havskov and Ottemoller 1999) software. As the events are compiled manually, there are no conditions for phase count or minimal signal to noise ratio, only that judged by the working seismologist. However, most of the events are located by sparse instrumentation, thus a large majority of the event parameters are derived from 5 or less seismic phases. The observatory monitored both anthropogenic and natural seismicity, classifying the events manually. It also focused on teleseismic events detected from the seismic stations in the virtual network. Between 2008 and 2021, the LEGMC detected 6952 seismic events in the vicinity of the Baltic States (Figure 1), the overwhelming majority believed to be anthropogenic, such as explosions from regional mining operations. The M W of these events ranged from less than 1 to 3.5 (Ņikuļins 2020). During the first ten months of 2021, LEGMC observatory recorded 844 events, of which 250 were observed around the territory of the Baltic States (Latitude 52N°-60N°, Longitude 18E°-30E°). These events are subsequentially used in the study. 3. Materials and methods In this research, we used a suite of automatic phase detection algorithms, coupled with phase association techniques and event localisation methods, to analyse seismic waveforms collected from various SO across the Baltic States. These automated results were then benchmarked against the manual observations recorded by the Latvian Environment, Geology, and Meteorology Centre (LEGMC) from January to October 2021. The complete methodology, including the processing chain, is illustrated in Fig.2 and elaborated on in the following sections. We used data obtained from 10 seismic stations located around Baltic states. The seismic stations are spaced approximately 100-150 km apart, with an observable uneven distribution; notably, there is a significant gap in the network within eastern Latvia. Estonia and Lithuania have a denser network, with Estonia operating 3 broadband seismic stations and Lithuania 2. To enhance the phase picking process, we also incorporate data from observatories in Poland, Denmark, and the southern region of Finland (as shown in Fig.1) (Quinteros et al. 2021; Geological Survey Of Estonia (EGT) 1996; Institute of Geophysics, Polish Academy of Sciences 1990; Institute of Seismology 1980; GEUS Geological Survey of Denmark and Greenland 1976). The selection of stations was informed by the need for consistency with previous seismological work conducted by LEGMC, which primarily utilized data from these observatories (Ņikuļins 2020). 3.1. Phase detection algorithms We focus on three phase detection algorithms. The first algorithm is the Generalized Phase Detection, introduced by (Ross et al. 2018). The second, PhaseNet, was created by (Zhu and Beroza 2018). Finally, we examine the Earthquake Transformer, an algorithm brought forth by (Mousavi et al. 2020). Generalized Phase Detection (GPD) is a convolutional neural network (CNN) based seismic phase extraction algorithm (Ross et al. 2018). CNNs, commonly utilised in image processing, employ a series of digital filters to convolve data and extract features. These features can then be used for classification within the network. GPD processes 4-second moving snippets of the seismic record. These snippets are first subjected to a feature extraction system and then fed into a fully connected neural network, which uses an activation function to produce probabilities for P-picks, S-picks, and noise classification. PhaseNet (PN) is a U-Net-based deep learning model for P and S wave arrival time picking (Ronneberger et al. 2015; Zhu and Beroza 2018). The algorithm processes 30-second-long three-component seismograms, which are subjected to a sequence of four downsampling stages followed by four upsampling stages. Skip connections are utilised at each level to facilitate direct communication between layers, bypassing deeper layers, which enhances convergence during the training phase. The output layer uses a SoftMax function to normalise the exponential function and assign probabilities to noise, P-picks, and S-picks. These probabilities are then presented as curves, that can subsequently be used to detect seismic events. Earthquake transformer (EQT) is seismic phase and event detection algorithm, which uses a combination of CNNs, long short-term memories, and self-attentive layers, techniques usually applied to language modelling and longer time-series prediction tasks (Mousavi et al. 2020). EQT analyses 60-second segments of three-component seismograms to calculate the probabilities for detecting seismic events, as well as the precise timing for P-wave and S-wave arrivals. The ML models employed in this study were originally trained on various datasets, both regional and global, and have demonstrated a significant capacity for generalisation (in this context models' ability to detect seismic events beyond the scope of their training data, encompassing different regions or types of events). We took inspiration from the work of (Münchmeyer et al. 2022), who refined and retrained these models within the PyTorch framework (Paszke et al. 2017). Considering the generalisation tests in (Münchmeyer et al. 2022) and given the seismological context of the Baltic States, characterised by low-magnitude local events and seismic waves traversing a sedimentary layer atop a platform basement, we opted for models that had been retrained on the INSTANCE benchmark dataset. This data set comprises over 1.3 million local and regional three-component waveforms from approximately 50,000 earthquakes in and around Italy (Michelini et al. 2021). We utilised the pre-trained models available in the SeisBench Python package, which also offers a convenient API for phase detection and model implementation (Woollam et al. 2022). Automatic phase picking algorithms were set to analyse seismic data in one-hour segments throughout the study from 1 st January until 31 st October 2021. We tuned the phase detection parameters for each of the algorithms, experimenting with 11 different P and S detection thresholds for each of them. The thresholds were chosen considering the literature review of other studies that utilise these algorithms (Lim et al. 2024; García et al. 2022; Lapins et al. 2021). The tested thresholds are described in Table 1. Table 1 . Phase detection probability thresholds for ML pickers Phase GPD EQT PN P 0.40, 0.50, 0.70, 0.80, 0.85, 0.90, 0.92, 0.94, 0.96, 0.98, 0.99 0.02, 0.03, 0.05, 0.08, 0.12, 0.18, 0.24, 0.30, 0.35, 0.40, 0.45 0.03, 0.05, 0.08, 0.12, 0.18, 0.24, 0.30, 0.35, 0.40, 0.45, 0.50 S 0.65, 0.80, 0.85, 0.88, 0.90, 0.92, 0.94, 0.96, 0.975, 0.985, 0.995 0.04, 0.05, 0.08, 0.12, 0.18, 0.24, 0.30, 0.36, 0.42, 0.50, 0.60 0.08, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.60 3.2. Phase association Phase association is an essential component of seismic monitoring, serving to correlate automatically detected phase arrivals with specific seismic event. Numerous automated methods for phase association have been developed, including iterative search and backprojection within multi-scale association webs (Yeck et al. 2019), cluster search (Ester et al., 1996) and the utilisation of likelihood distributions for arrival data (Yeck et al. 2019). In this study, we implemented the Gaussian Mixture Model Association (GaMMA) method, developed by (Zhu et al. 2022), and the PyOcto associator, created by (Münchmeyer 2024). The GaMMA approach conceptualises phase association as an unsupervised clustering challenge. It is based on the assumption that each earthquake can be represented by a cluster of P and S phases, with their arrival times following a hyperbolic delay relative to the distance from the earthquake's origin. This hyperbolic pattern is also assumed for the phase amplitudes. To model the arrival times and amplitudes of the detected P and S phases, a multivariate Gaussian distribution is constructed for each seismic event. Subsequently, an expectation-maximisation algorithm applies the maximum likelihood criterion to align phase picks with their respective seismic events. One intrinsic limitation of the Gaussian Mixture Model is the prerequisite assumption about the number of earthquake events. GaMMA circumvents this issue by integrating Bayesian statistics, which introduces prior probabilistic knowledge into the parameters of the Gaussian mixture model. This incorporation allows the algorithm to initiate with a considerable number of potential events and methodically filter out the FP events that markedly deviate from the prior distributions (Zhu et al. 2022). We employ the GaMMA model in Bayesian mode, with oversample factor 4, which controls the initial amount of earthquake clusters; a larger number increases the cluster count. The maximum phase residual time is 10 seconds. The initial earthquake clusters are uniformly initialised over the entire region in both time and space dimensions. Then, iteratively minimising the earthquake parameters of the clusters from picks using the Hubert loss function, GaMMA calculates both associations of picks and potential locations of the events. To reduce the computation cost of GaMMA, the DBSCAN clustering algorithm is used to segment the long sequence of picks before association. DBSCAN is configured to distinguish events with at least 4 different phase picks, with a maximum time between picks to be associated for the same cluster of 50 seconds. In our example we configure GaMMA to use a homogeneous one-layer earth velocity model with P wave velocity of 6.5 km/s and S wave velocity of 3.75 km/s. The GaMMA association is geographically limited to 100 km around the Baltic states – this means that global and regional outside of this Baltic will be seen as local events. The PyOcto algorithm treats phase association as an iterative search problem. The 4D space-time (three distance axis and time) is partitioned in nodes, each associated with incoming seismic phases. All the picks are initially associated to a single node, and then the nodes are either split, used for event location, or discarded if no event can be located. The event location is performed, comparing arrival times of the picks to a velocity model, and minimising the residuals using equal differential-time loss. Both constant and 1D velocity models can be used (Münchmeyer 2024). For PyOcto to work optimally, we need to provide parameters that are necessary to discard the unsuccessful nodes, relating to minimum number of picks required to deem the node as perspective. In this study, we used the minimum amount of 4 picks, with at least 2 P phase picks and 0 S phase picks required. We were forced to use a small number of picks, more than 54% of the events from the LEGMC database contain 5 or lower number of picks for each of the events. For the velocity model, we use the same values as in GaMMA locator - P wave velocity of 6.5 km/s and S wave velocity of 3.75 km/s. The location is again limited to the same area of 100 km around the Baltic States. 3.3. Event relocation For enhanced precision in determining potential event hypocenters, we employed the HypoInverse software (Klein 2014) using the most recent version 1.4. HypoInverse is a location software created by USGS and is used by various seismic surveys to locate earthquake hypocenters worldwide. HypoInverse, developed by the US Geological Survey (USGS), is widely used by seismic networks worldwide for earthquake hypocenter location. Configured for this study, the software initialises earthquake depths at 0 km, suitable for the predominantly shallow or anthropogenic events typical of the region. A minimum of four phases is required for the software to locate an event, and we utilised the default settings for distance and residual weighting. The relocation process was informed by one dimensional P and S wave velocity models derived from the EUROBRIDGE '94 seismic reflection profile, as detailed by (Bogdanova et al. 2006). The relocation of events within the scope of the paper also served as an indicator of how many events are successfully fully automatically located. As the station network is very sparse, associators often might attribute erroneous noisy picks to events that are not related to the seismic event. In this way, Hypoinverse technically could serve as another filter for picks if the observed error between calculated arrival times and observed arrival times is too large. 3.4. Verification We evaluated the performance of the GaMMA and PyOcto associations and subsequent relocations with HypoInverse by comparing their results with observations from seismologists at LEGMC. To establish whether the automatically detected events corresponded to those identified by LEGMC. we conducted automated searches within a 60-second time frame surrounding each event documented by LEGMC. The 60 second time window was chosen to account for possible time misassociations due to possible inaccurate picks being associated with the events. We anecdotally observed at least multiple such cases and wanted to account that no event would be missed in this regard. Using TauP software (Crotwell et al. 1999), we modelled the maximum time it took an S phase from a synthetic event at one corner of the study domain to travel to the other, which was 56 seconds, approximating our chosen search window. 4. Results The testing results for multiple phase-pickers and associators are shwon in Figure 3. We observe a clear negative relationship between recall and precision. Recall increases when the threshold for P detection (P thresh ) is lowered, but precision largely decreases. This is, of course, expected – lower P thresh allows more phases to be picked and, in turn, more events to be associated, while also increasing the false detections. Comparing different pickers, we observe similar trends in event detections from PN and EQT, while GPD significantly underperforms, with a maximum recall of 0.54 and precision of 0.17 when using the GaMMA associator. In addition, the GPD recall values decrease sharply when the phase detection thresholds are increased, down to <40 events when the Pthresh is less than 0.9. While a sharp decrease of recall is observed for both EQT and PN as well, we did not observe such decreases in the event counts, which is also the reason why precision remains relatively high. Comparing both associators, we observe that PyOcto generally associates more events than GaMMA. Especially for lower P thresh and S thresh values, PyOcto can associate up to 10 times more events, while on average PyOcto associates twice as many events as GaMMA. However, this, in turn, leads to a larger recall, which also significantly decreases the precision of the detected events, showcasing that a large part of the increase is misattributed due to false positives. Using multiplication of recall and precision, we calculate the best performing P thresh and S thresh values. Our decision is based on the highest recall, while also not significantly decreasing the event precision, thus not overwhelming the results with large amounts of false positive associations. The results of the best-performing configurations are examined in detail in the remainder of the article. Table 2 . Results of the best performing algorithms and comparisons to LEGMC. Associated and Relocated refer to the data processing stage, GaMMA and PyOcto refer to the event associator used, while the method showcases the phase detector algorithm used. Pthresh and Sthresh refer to thresholds used in phase detection, n is total event count, precision and recall refer to comparisons with LEGMC event data base Method P thresh S thresh n Precision Recall Associated GaMMA EQT 0.03 0.05 147 0.51 0.30 GPD 0.80 0.88 107 0.67 0.28 PN 0.03 0.08 301 0.38 0.45 PyOcto EQT 0.05 0.08 496 0.28 0.55 GPD 0.85 0.90 52 0.83 0.17 PN 0.08 0.15 492 0.26 0.51 Relocated GaMMA EQT 0.02 0.04 104 0.48 0.20 GPD 0.80 0.88 28 0.50 0.06 PN 0.03 0.08 156 0.40 0.24 PyOcto EQT 0.45 0.60 5 0.20 0.00 GPD 0.92 0.94 2 1.00 0.01 PN 0.18 0.25 4 0.25 0.00 The results of the best performing algorithms are analysed in Table 2. Using the GaMMA associator, we observe that PN shows the highest recall (0.45) while using PyOcto, the highest recall can be seen using the EQT phase detector (0.55). That comes with a cost in precision – when using GaMMA PN detects twice as many events as second-best detector, EQT (301 to 147), however, the precision is lower (0.38 to 0.51). Very close precision and recall values between the two algorithms were observed when using PyOcto for association, recall values of 0.55 (EQT) and 0.51 (PN) and precision 0.28 (EQT) and 0.26 (PN). We again observe that PyOcto tends to identify more events, but at the cost of precision, most of the time the events cannot be attributed to the LEGMC data base. Using both associators, we see significantly poorer performance of GPD phase picker – Both associated event count is much smaller and thus recall of the LEGMC data base is also lower. However, we see that the results have high precision (0.67 for GaMMA and 0.83 for PyOcto), showing that the events that are detected using the GPD and associator combination are, in most cases, correct. The quality of the associated events can be partially determined by trying to correctly relocate them. We observe that recall of relocated events significantly drops, mainly because many of the events are unlocatable. The GaMMA associator works better in this regard: while recall drops, we still observe a significant number of relocated events. EQT performs the best in this regard; most of the events (104 of 147) can be relocated successfully. PN picked events can be relocated half of the time (156 from 301) while GPD performs the worst, with only 28 out of 107 events being successfully relocated. The relocation works specifically very poor with PyOcto associator – only few of the events for all the algorithms are successfully relocated. The reasons for this we try and understand in the discussion section. As we can previously observe, generally performance both before and after event relocation is poor. As other studies generally showcase good performance of deep learning pickers (Chen et al. 2024; Si et al. 2024; Münchmeyer et al. 2022; Zhu and Beroza 2018) and associators (Puente Huerta et al. 2025), we wanted to investigate the specifics of our case why they did not perform well. To better understand this, we compared the recall of each associator-picker combination with the number of LEGMC picks for each of the events (Figure 4). We observe that for the pick count of 4, the minimum needed for automatic association, the algorithms do not perform well. The best performance here can be observed from PN, where both PyOcto and GaMMA together get the recall of 0.25, with both associators together representing 0.11. Increasing the pick count, we see an increase in recall as well, however, for both PN and EQT the only significant results of recall ≥ 0.8 can be observed with pick count 7 or above. GPD algorithm, again, performs significantly worse than PN or EQT, seeing recall of 0.78 with 9 picks. We also observe that there is a strong basis to use multiple algorithms together, as they can detect different events. Combining multiple pickers and associators ( Figure 4 subplot Combined) we can see that we get 0.69 recall even with 6 different phase picks, while we get recall of 1 with 7 different picks. In general, these tests show that the quality of the association increases significantly with the number of stations where the traces of the event can be convincingly observed. To better investigate if the association or machine-learning picking quality is to blame, we counted raw phase picks without association in 60 second intervals around each of the 250 manually detected events from LEGMC. The fraction of LEGMC events in relation to the pick count is depicted in figure Figure 5. The hypothesis is that even unmatched events for most of the time have enough picks gathered by machine learning pickers, while sporadic erroneous or noisy picks disturb the association process. We observe that on average, unassociated events tend to have a larger or a similar number of ML picks (as indicated by the mean picks in Figure 5), but there are fewer ML picks for most (~70%) of the events. We also observe that a significant number of unassociated events – around 50-60% for all ML algorithms – have 5 or less picks, while EQT and PN pickers for most of the time tend to have at least 4 different picks (EQT N<4 is 23 or (9.2% of all events) and 21 (8.4%) for GaMMA and PyOcto respectively; PN N<4 is 4 (1.6%) and 24 (9.6%) respectively). This coincides with the theory that the phase association is the likely bottleneck in event detection. Finally, we investigate the waveforms of a seismic events to find the main causes of misdetection. The waveforms are compiled in Figures 6 and 7. Here we observe the duality of the results. In cases when automatic detections coincide well with manual picks (i.e., Figure 6.1) we observe that both ML pickers and subsequent association algorithms can add additional information with previously missed picks for the events. For example, we see additional P picks in the overlooked SUW station and an additional S phase in the PBUR station. We also observe that some of the S phases (in the SUW station, for example) are misclassified as P phases, but the GaMMA association algorithm can distinguish them and remove them from the event. Finally, we observe a time difference between the manual P and S phases (up to 1.56 s in Figure 6.1 SLIT station, but on average less than 0.5 s), which in many cases is not very large, but can be significant, nonetheless. On the other hand, investigating events that do not coincide with automatic observations (Figure 6.2), we can see the difficulty of the association problem. Multiple sporadic P and S phases are automatically picked throughout the event, with some of them coinciding with manual observations (VSU and RAF stations) and some of them missing the manual picks overall (MTSE and ARBE). The presence of additional noisy picks likely interferes with the association process, preventing both algorithms from reliably identifying the seismic event. These high-noise scenarios likely contribute to a lower recall of the LEGMC data base. We also observe that there is a significant basis to believe that ML can also detect potentially missed events by manual observers. Figures 7.1 and 7.2 demonstrate that, in several cases, the automated workflow successfully detects and associates phase arrivals that were not identified by the seismologist. The phases of the events are seen by both the EQT and the PN pickers and are associated by the PyOcto algorithm. Both events are on the lower threshold of associated pick counts, counting 4 P and S phases in all cases. This showcases that for complete catalogues in sparsely monitored noisy areas there is a necessity to tune association algorithms to low association threshold values, as otherwise easily seen events can be missed by seismological operators. Interestingly, in all four investigated events, we observe that GPD algorithm has fewer picks or picks of poorer quality than EQT and PN. It is most obvious in Figure 6.1, where we all other algorithms detect more or as many picks as the seismologist, but GPD only detects three picks. In Figure 7 we see that GPD has not detected seismic phases. This coincides with the previous statistics that show that GPD shows significantly lower recall than other ML pickers. In general, comparing different pickers, we see that PN and EQT perform significantly better than GPD. The difference can be seen in both recall of the LEGMC database and in cases that we investigated in more detail. Overall, the difference between PN and EQT is small, as both pickers obtain comparable results. Comparing different event associators, we observe that generally GaMMA provides a lower detection count while having improved precision, while PyOcto, being tuned to have fewer picks and fewer stations for the association to be successful, shows higher recall values, while also having lower precision. We can clearly see these results in Table 2 and Figure 4. 5. Discussion Our results produce a unique insight in how multiple state-of-the art phase detection algorithms might work in a sparselyinstrumented observed low seismicity area. As we see, the results are mixed – while, on one hand, there are cases where ML pickers and subsequent association of them offer a significant edge and we are able to detect new events (Figure 7.1 and Figure 7.2), on other hand we see relatively poor recall of LEGMC database (0.55 with combination of EQT an PyOcto) and observe multiple manually detected events that are not automatically detected (Figure 6.2). First, we would like to address potential drawbacks of our methodology. The selection of the optimal picker–association combination is based on maximising the balance between false positives and true positives relative to the LEGMC reference database, as quantified by the precision-recall trade-off shown in Figure 4. As expected, lowering the pick-detection threshold systematically increases recall. Although this represents a straightforward strategy to improve event recovery, it inevitably reduces precision and inflates the number of false-positive detections. A natural extension of this framework would be the incorporation of an additional post-processing classification stage to identify falsely associated events. This step could exploit event characteristics such as event location, magnitudes, or waveform spectral information. Comparable approaches have been successfully applied in event classification context, including neural network-based discrimination of anthropogenic blasts (Eggertsson et al. 2024) and unsupervised learning strategies for (Wang et al. 2023). These methodologies could be adapted to specifically target poorly constrained events arising from automatic association. It is also important to note that the reference bulletin itself is not exhaustive. Our automatic workflow identifies seismic events that are absent from the manual catalogue, underscoring both the incompleteness of the ground truth and the ambiguity of the detection problem. This further complicates performance assessment and highlights the need for iterative refinement of both automated and reference datasets. We also want to address the potential concern that the machine-learning pickers employed in this study were predominantly trained on natural earthquake datasets, while the seismicity in the Baltic States is dominated by anthropogenic sources (e.g., quarry blasts). This domain mismatch may influence the picker performance, and we have not explicitly quantified the resulting error within the present analysis. Indeed, isolated cases of clearly spurious, noise-driven phase picks were observed, such as the BSD station in Figure 6.2, which illustrates the sensitivity of the models to local noise conditions. However, previous studies have demonstrated that ML pickers trained in tectonic earthquakes can generalise effectively to anthropogenic events (Duan et al. 2025). Our own observations are consistent with these findings: despite the prevalence of human-made sources, the overall quality of phase-pick remains high. Although the Baltic region exhibits specific geological characteristics, most notably stations located above relatively thick sedimentary basins that can amplify surface-wave energy and modify signal character, empirical pick counts (Figure 5) suggest that the main bottleneck in event detection lies in the association stage rather than in phase identification itself. Still, a future study of re-training ML pickers on the data specifically from the Baltic states would be encouraged. One surprising result of our study was poor recall after relocation with the Hypoinverse locator. There are likely two possible reasons for this. First, in a sparsely instrumented network, even a single erroneous phase pick within the association window can destabilise the location solution. An example is shown in Figure 6.2, where the event was not only poorly relocated but failed to associate altogether, likely because S phases were misclassified as P phases at the MTSE and RAF stations Second, minor but systematic phase-time offsets were observed between the ML-derived and manually reviewed picks. This discrepancy likely stems from the algorithmic design of sliding-window inference, which can slightly reduce the pick precision. Although the observed offsets are small (generally <0.5 s), their cumulative effect—especially when combined with other mispicks – can degrade relocation stability in sparse networks. We therefore recommend that in regions with limited station coverage, phase picks and associations be manually reviewed prior to relocation to mitigate error propagation and improve solution robustness. 6. Conclusions This study evaluates the performance of three state-of-the-art deep learning phase pickers (EQT, PhaseNet, and GPD) combined with two association algorithms (GaMMA and PyOcto) in a sparsely instrumented, low-seismicity region of the Baltic States. The results provide insight into the practical limitations and potential of automated seismic workflows in stable cratonic environments dominated by low-magnitude and anthropogenic events. First, we demonstrate that modern deep learning pickers can identify a substantial fraction of manually detected events. PhaseNet and EQT consistently outperform GPD in both recall and overall robustness, whereas GPD exhibits higher precision but substantially lower detection rates. The difference between PhaseNet and EQT is comparatively small, suggesting that both architectures are suitable for similar tectonic and observational settings. Second, the results highlight that the primary limitation of the automated workflow lies not in phase detection itself, but in the association and relocation stages. Raw pick statistics indicate that many unmatched events contain enough phase picks to accommodate minimal association thresholds. However, the presence of noisy or misclassified arrivals, combined with the sparse network geometry, significantly changes the association process. The marked drop in recall after relocation with HypoInverse underscores the sensitivity of sparse networks to single erroneous picks and small timing offsets. Third, we observe a strong dependence of the recall on the number of available phases. Events within the Baltic States can be recovered with high reliability with seven or more independent phase picks, especially when combining multiple pickers and associators, whereas events recorded by fewer than five picks remain difficult to associate robustly. This emphasises the importance of station density and network geometry to achieve good catalogue completeness in low-seismicity regions. Fourth, the precision-recall trade-off is particularly pronounced in low signal-to-noise environments. Lowering detection thresholds increases recall but rapidly inflates the number of false positives. For operational monitoring, this implies that threshold selection must balance catalogue completeness with analyst workload. Ensemble approaches combining multiple pickers and associators improve recovery of true events but require additional filtering or validation steps. Importantly, the study also shows that automated workflows can identify events that are not present in the manual bulletin. This indicates that existing catalogues in sparsely monitored intraplate regions may be incomplete and that automated methods serve as a valuable complement to human analysis rather than a replacement. Overall, our finding suggests that in stable continental regions improvements in phase association and network density yield are better results than improvements in phase detection algorithms. While there is a basis for future studies for region-specific picking model training, we find that the focus should be on phase association algorithm improvement, considering typical event parameters, regional characteristics, while also potentially re-classifying already associated events as true or false positives. Declarations Competing interests The authors report there are no competing interests to declare. Data availability statement Data sets generated during the current study are available from the corresponding author on request. Restrictions apply to the LEGMC seismic observatory data, which were used in this study and are not publicly available. References Allen RV (1978) Automatic earthquake recognition and timing from single traces. Bull Seismol Soc Am 68:1521–1532. https://doi.org/10.1785/BSSA0680051521 Bogdanova S, Gorbatschev R, Grad M, Janik T, Guterch A, Kozlovskaya E, Motuza G, Skridlaite G, Starostenko V, Taran L, Groups PW (2006) EUROBRIDGE : new insight into the geodynamic evolution of the East European Craton. Geol Soc Lond Mem 32:599–625. https://doi.org/10.1144/GSL.MEM.2006.032.01.36 Bogdanova S, Gorbatschev R, Skridlaite G, Soesoo A, Taran L, Kurlovich D (2015) Trans-Baltic Palaeoproterozoic correlations towards the reconstruction of supercontinent Columbia/Nuna. Precambrian Res 259:5–33. https://doi.org/10.1016/j.precamres.2014.11.023 Bogdanova SV, Bingen B, Gorbatschev R, Kheraskova TN, Kozlov VI, Puchkov VN, Volozh YA (2008) The East European Craton (Baltica) before and during the assembly of Rodinia. Precambrian Res 160:23–45. https://doi.org/10.1016/j.precamres.2007.04.024 Brangulis A, Kuršs V, Misāns J, Stinkulis Ģ (1998) Latvijas ģeoloģiskā karte, mērogs 1:500 000, Ģeoloģiskās uzbūves apraksts. Valsts Ģeoloģijas dienests, Rīga Chen Y, Savvaidis A, Siervo D, Huang D, Saad OM (2024) Near Real‐Time Earthquake Monitoring in Texas Using the Highly Precise Deep Learning Phase Picker. Earth Space Sci 11:e2024EA003890. https://doi.org/10.1029/2024EA003890 Cocks LRM, Torsvik TH (2005) Baltica from the late Precambrian to mid-Palaeozoic times: The gain and loss of a terrane’s identity. Earth-Sci Rev 72:39–66. https://doi.org/10.1016/j.earscirev.2005.04.001 Crotwell HP, Owens TJ, Ritsema J (1999) The TauP Toolkit: Flexible Seismic Travel-time and Ray-path Utilities. Seismol Res Lett 70:154–160. https://doi.org/10.1785/gssrl.70.2.154 Duan C, Schmandt B, Maguire R, Wang R, Kong Q (2025) Differential Seismic Phase Detection Probability as a Potential Discriminant of Explosions and Earthquakes. Seism Rec 5:218–227. https://doi.org/10.1785/0320250015 Eggertsson G, Lund B, Roth M, Schmidt P (2024) Earthquake or blast? Classification of local-distance seismic events in Sweden using fully connected neural networks. Geophys J Int 236:1728–1742. https://doi.org/10.1093/gji/ggae018 García JE, Fernández-Prieto LM, Villaseñor A, Sanz V, Ammirati J-B, Díaz Suárez EA, García C (2022) Performance of Deep Learning Pickers in Routine Network Processing Applications. Seismol Res Lett 93:2529–2542. https://doi.org/10.1785/0220210323 Geological Survey Of Estonia (EGT) (1996) Estonian National Seismic Network (EESN) GEUS Geological Survey of Denmark and Greenland (1976) Danish Seismological Network Havskov J, Ottemoller L (1999) SeisAn Earthquake Analysis Software. Seismol Res Lett 70:532–534. https://doi.org/10.1785/gssrl.70.5.532 Hellqvist {Niina Marjut}, Koskinen {Paula Helena}, Mäntyniemi {Päivi Birgitta}, Uski {Marja Riitta}, Valtonen {Outi Sinikka}, Airo M-L, Huotari-Halkosaari T, Nironen M, Sutinen R, Grigull S, Stephens M, Karin H, Lund B (2015) Seismotectonic framework and seismic source area models in fennoscandia, northern europe. Institute of Seismology, University of Helsinki, Finland Institute of Geophysics, Polish Academy of Sciences (1990) Polish Seismological Network Institute of Seismology U of H (1980) The Finnish National Seismic Network. gt;1000GB Klein FredW (2014) User’s Guide to HYPOINVERSE-2000, a Fortran Program to Solve for Earthquake Locations and Magnitudes Kong Q, Trugman DT, Ross ZE, Bianco MJ, Meade BJ, Gerstoft P (2019) Machine Learning in Seismology: Turning Data into Insights. Seismol Res Lett 90:3–14. https://doi.org/10.1785/0220180259 Lapins S, Goitom B, Kendall J, Werner MJ, Cashman KV, Hammond JOS (2021) A Little Data Goes a Long Way: Automating Seismic Phase Arrival Picking at Nabro Volcano With Transfer Learning. J Geophys Res Solid Earth 126. https://doi.org/10.1029/2021JB021910 Lim CSY, Lapins S, Segou M, Werner MJ (2024) Deep learning phase pickers: how well can existing models detect hydraulic-fracturing induced microseismicity from a borehole array? Geophys J Int 240:535–549. https://doi.org/10.1093/gji/ggae386 Michelini A, Cianetti S, Gaviano S, Giunchi C, Jozinović D, Lauciani V (2021) INSTANCE – the Italian seismic dataset for machine learning. Earth Syst Sci Data 13:5509–5544. https://doi.org/10.5194/essd-13-5509-2021 Mousavi SM, Ellsworth WL, Zhu W, Chuang LY, Beroza GC (2020) Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat Commun 11:3952. https://doi.org/10.1038/s41467-020-17591-w Münchmeyer J (2024) PyOcto: A high-throughput seismic phase associator. Seismica 3. https://doi.org/10.26443/seismica.v3i1.1130 Münchmeyer J, Woollam J, Rietbrock A, Tilmann F, Lange D, Bornstein T, Diehl T, Giunchi C, Haslinger F, Jozinović D, Michelini A, Saul J, Soto H (2022) Which Picker Fits My Data? A Quantitative Evaluation of Deep Learning Based Seismic Pickers. J Geophys Res Solid Earth 127:e2021JB023499. https://doi.org/10.1029/2021JB023499 Nance RD, Murphy JB, Santosh M (2014) The supercontinent cycle: A retrospective essay. Gondwana Res 25:4–29. https://doi.org/10.1016/j.gr.2012.12.026 Ņikuļins VG (2020) Seismological Monitoring in Latvia. Summ Bull Int Seismol Cent 54:50–66. https://doi.org/10.31905/BKETRT2R Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch Puente Huerta JA, Münchmeyer J, McBrearty I, Sippl C (2025) Benchmarking seismic phase associators: Insights from synthetic scenarios. https://meetingorganizer.copernicus.org/EGU24/EGU24-8913.html. Accessed 9 Dec 2025 Quinteros J, Strollo A, Evans PL, Hanka W, Heinloo A, Hemmleb S, Hillmann L, Jaeckel K-H, Kind R, Saul J, Zieke T, Tilmann F (2021) The GEOFON Program in 2020. Seismol Res Lett 92:1610–1622. https://doi.org/10.1785/0220200415 Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Springer International Publishing, Cham, pp 234–241 Ross ZE, Meier M-A, Hauksson E, Heaton TH (2018) Generalized Seismic Phase Detection with Deep Learning. Bull Seismol Soc Am 108:2894–2901. https://doi.org/10.1785/0120180080 Sheen D-H, Friberg PA (2021) Seismic Phase Association Based on the Maximum Likelihood Method. Front Earth Sci 9:699281. https://doi.org/10.3389/feart.2021.699281 Si X, Wu X, Li Z, Wang S, Zhu J (2024) An all-in-one seismic phase picking, location, and association network for multi-task multi-station earthquake monitoring. Commun Earth Environ 5:22. https://doi.org/10.1038/s43247-023-01188-4 Soosalu H, Uski M, Komminaho K, Veski A (2022) Recent Intraplate Seismicity in Estonia, East European Platform. Seismol Res Lett 93:1800–1811. https://doi.org/10.1785/0220210277 Torsvik TH, Cocks LRM (2005) Norway in space and time: A Centennial cavalcade. Nor J Geol 85:73–86 Wang T, Bian Y, Zhang Y, Hou X (2023) Classification of earthquakes, explosions and mining-induced earthquakes based on XGBoost algorithm. Comput Geosci 170:105242. https://doi.org/10.1016/j.cageo.2022.105242 Withers M, Aster R, Young C, Beiriger J, Harris M, Moore S, Trujillo J (1998) A comparison of select trigger algorithms for automated global seismic phase and event detection. Bull Seismol Soc Am 88:95–106. https://doi.org/10.1785/BSSA0880010095 Woollam J, Münchmeyer J, Tilmann F, Rietbrock A, Lange D, Bornstein T, Diehl T, Giunchi C, Haslinger F, Jozinović D, Michelini A, Saul J, Soto H (2022) SeisBench—A Toolbox for Machine Learning in Seismology. Seismol Res Lett 93:1695–1709. https://doi.org/10.1785/0220210324 Yeck WL, Patton JM, Johnson CE, Kragness D, Benz HM, Earle PS, Guy MR, Ambruz NB (2019) GLASS3: A Standalone Multiscale Seismic Detection Associator. Bull Seismol Soc Am 109:1469–1478. https://doi.org/10.1785/0120180308 Zhu W, Beroza GC (2018) PhaseNet: A Deep-Neural-Network-Based Seismic Arrival Time Picking Method Zhu W, McBrearty IW, Mousavi SM, Ellsworth WL, Beroza GC (2022) Earthquake Phase Association using a Bayesian Gaussian Mixture Model. J Geophys Res Solid Earth 127. https://doi.org/10.1029/2021JB023249 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 19 Apr, 2026 Reviewers agreed at journal 27 Mar, 2026 Reviewers invited by journal 02 Mar, 2026 Editor assigned by journal 02 Mar, 2026 Submission checks completed at journal 02 Mar, 2026 First submitted to journal 27 Feb, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8988784","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":599862664,"identity":"dee0547e-9c1f-439b-9a78-34f7c36b6954","order_by":0,"name":"Viesturs Zandersons","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/klEQVRIie3PMUvEMBTA8VcCccmRNaXiZ6gIcRHyVV4Wb7lBOCgdHAIH8St40A/h5HxH4W4pN3c6KkKnGyoHjmJUUCkkrg75L3lv+JEEIBb7j5HfC/s6kq4DOA2S1YiQHL9nT2NCRYjkW1IfX+xeAa/77lDuFc/u+lJbYPnKQ2p6LdZ2rg1sLs+rZq6XVSPbEEkXTMLaIkJiZDZxQ97O6CdJjY/w4+CIAnLymk3eUOXttL8JEU4YuIdhYihztxhMHlqU8EG45/ucUCmaHWrLWJFWG9TL+9mFwJ3wEsrr56EsUHG+fRSHWzeI6dMwFFdn1EN+7GgXf4FYLBaLBXoHYHFQPiZwtcQAAAAASUVORK5CYII=","orcid":"","institution":"University of Latvia","correspondingAuthor":true,"prefix":"","firstName":"Viesturs","middleName":"","lastName":"Zandersons","suffix":""},{"id":599862665,"identity":"5c378c3a-5b1a-463d-8399-da5e896b624d","order_by":1,"name":"Jānis Karušs","email":"","orcid":"","institution":"University of Latvia","correspondingAuthor":false,"prefix":"","firstName":"Jānis","middleName":"","lastName":"Karušs","suffix":""}],"badges":[],"createdAt":"2026-02-27 13:54:07","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8988784/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8988784/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":104401681,"identity":"30e49904-83c9-422e-809a-ffdd6dc06dd4","added_by":"auto","created_at":"2026-03-11 12:13:15","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":3445103,"visible":true,"origin":"","legend":"\u003cp\u003eGeographical outline of the study area (dashed polygon) with the locations of the Baltic States (tinted red) and the seismic stations used in this research, including the network and station abbreviation formatted as \u003cem\u003enetwork.station\u003c/em\u003e. The coloured dots represent the LEGMC seismic catalogue from 2009 until 2021, alongside histograms representing the magnitudes of the detected events and the crustal phase picks per event (Ņikuļins 2020).\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-8988784/v1/48164ba3de099d454da38c07.png"},{"id":103969438,"identity":"ec42d588-dbe0-427a-95cf-ec578c3a9182","added_by":"auto","created_at":"2026-03-05 07:12:08","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":103065,"visible":true,"origin":"","legend":"\u003cp\u003eSeismic data processing flowchart. Input data are delineated blue, operations are highlighted in yellow, and results are highlighted in red\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-8988784/v1/768b62f32d7be42e21d1ab26.png"},{"id":104402200,"identity":"cd97a44a-3c02-4bec-aa5f-44abc3915b60","added_by":"auto","created_at":"2026-03-11 12:14:38","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":577774,"visible":true,"origin":"","legend":"\u003cp\u003eMulti-threshold test with PyOcto and GaMMA associations. PyOcto and GaMMA refer to associator algorithms, EQT, GPD, PN to phase detection algorithms, P\u003csub\u003ethresh\u003c/sub\u003e refers to the P thresholds used in P phase detection, n associated event count, precision and recall refer to comparisons with LEGMC data base. The red dot marks the best performing iteration of each phase-detection/association combination, determined by multiplication of recall and precision.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-8988784/v1/7ff586d1dc277506cbd9241f.png"},{"id":103969455,"identity":"b8eeca45-c4bd-4b95-91d4-b005034331f1","added_by":"auto","created_at":"2026-03-05 07:12:10","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":932721,"visible":true,"origin":"","legend":"\u003cp\u003eRecall of events from the LEGMC database for each of the phase-detectors and event associators. The annotations on the bars represent the recall values for each of the associators; The number of picks shows both both P and S picks in the LEGMC database for the events.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-8988784/v1/03682ac60add934b06e29f75.png"},{"id":103969450,"identity":"6b43208f-31f9-4791-9838-7ad8281cef9a","added_by":"auto","created_at":"2026-03-05 07:12:09","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":808318,"visible":true,"origin":"","legend":"\u003cp\u003eML pick count in 60 second intervals around LEGMC seismic events. The blue line represents events that are matched during the association step, and the red line represents events that are unmatched.\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-8988784/v1/c01c7718eab5984ef507b804.png"},{"id":103969456,"identity":"e8f7f296-78f9-4c2d-bafc-78014883947f","added_by":"auto","created_at":"2026-03-05 07:12:10","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":5627160,"visible":true,"origin":"","legend":"\u003cp\u003eSeismograms of two seismic events. Colour depicts the picking method used, shape - detected phase type, transparency – if automatic picks are associated (fully coloured) or not (transparent), title – association method showcased. The Z axis of the waveforms is shown, with a 2-10 Hz bandpass filter applied. Event 1 on 03.03.2021. depicts where associated event coincides with manual event; event 2 on 02.03.2021. shows where no association is made while picks and event are manually observed.\u003c/p\u003e","description":"","filename":"Figure6.png","url":"https://assets-eu.researchsquare.com/files/rs-8988784/v1/a83806a422db89a029ebaa9c.png"},{"id":103969444,"identity":"99856d83-531c-4c8e-8787-467e51c2394e","added_by":"auto","created_at":"2026-03-05 07:12:09","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":5830586,"visible":true,"origin":"","legend":"\u003cp\u003eSeismograms of two seismic events. Colour depicts the picking method used, shape – detected phase type, transparency – if automatic picks are associated (fully coloured) or not (transparent), title – association method showcased. The Z axis of the waveforms is shown, with a 2-1 Hz bandpass filter applied. Both events show cases in which automatic picking and the association of a local event is made, but no manual picks are detected.\u003c/p\u003e","description":"","filename":"Figure7.png","url":"https://assets-eu.researchsquare.com/files/rs-8988784/v1/736e302e648a5f07831c097f.png"},{"id":104410779,"identity":"cdf8cb5a-bf51-4873-9f1e-9039e741f5e1","added_by":"auto","created_at":"2026-03-11 12:53:44","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":17510830,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8988784/v1/5e2dcba2-8629-4fb0-9edb-837e08834b6f.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Application of automated seismic event detection in a low seismicity region of the Baltic States","fulltext":[{"header":"Article Highlights","content":"\u003cp\u003eMachine learning and statistical seismic event detection tools are tested in a sparsely monitored intraplate area of the Baltic states.\u003c/p\u003e\n\u003cp\u003eSparse station coverage strongly limits fully automatic catalogues, but combining multiple methods can improve recall.\u003c/p\u003e\n\u003cp\u003eEvents recorded with seven or more seismic phase readings are detected much more reliably.\u003c/p\u003e"},{"header":"1. Introduction","content":"\u003cp\u003eSeismological observation and seismology are heavily based on the detection and location of earthquakes. Observed earthquakes and other seismic events are crucial for determining seismic risk in any given area. The accuracy of earthquake catalogues is largely determined by the number and precision of arrival time measurements for each observed event. Typically, phase picking, which involves the selection of arrival times for specific seismic waves, is carried out by experienced seismologists using their professional judgement. However, the rapid expansion of seismometer deployment has made it increasingly difficult to handle the large volume of data. Automatic phase pickers and event detection algorithms are increasingly employed to identify previously unnoticed seismic events, thus alleviating the workload of operators at seismological observatories. Various signal processing algorithms alongside machine learning (ML) techniques are used for phase detection with different degrees of success (Allen 1978; Withers et al. 1998; Zhu and Beroza 2018; Ross et al. 2018; Kong et al. 2019; Mousavi et al. 2020).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn regions with low seismic activity and sparse station coverage, phase picking becomes even more laborious. A low signal-to-noise ratio can make some wave arrivals difficult to discern, particularly in noisy environments. The proficiency of the seismologist is paramount in such settings, as it is necessary to discern events from background noise and to precisely time them. Conventional automatic tools often perform suboptimally in these scenarios, leading to numerous misclassifications and a high rate of false positives (FP) (Vera Rodriguez et al. 2012).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThis research leverages seismic data from the Baltic region to evaluate various earthquake detection and phase picking algorithms and determine the most effective for regions with predominantly microseismic activity. We compare the efficacy of three automated phase picking algorithms based on deep learning and two event association tools with manual event detection and phase picking conducted by seismologists at the Latvian Environment, Geology, and Meteorology Centre (LEGMC).\u003c/p\u003e"},{"header":"2. Geological and seismological setting","content":"\u003cp\u003eIn this study, we focus on the territory of the Baltic states (Fig.1). Baltic states are a part of Eurasian plate and are considered as part of Baltica paleocontinent that formed during assembly of the Rodinia supercontinent approximately before 1,8 - 1,75 Ga (Torsvik and Cocks 2005; Bogdanova et al. 2008). The consensus is that Baltica was constructed from the amalgamation of various microcontinents (Bogdanova et al. 2015). Throughout the Phanerozoic era, Baltica underwent several tectonic collisions (Cocks and Torsvik 2005; Nance et al. 2014). However, the Baltic States are positioned centrally within the Baltica landmass, insulating them from direct involvement in the dynamic tectonic activity that typically affects the margins of continents (Cocks and Torsvik 2005; Nance et al. 2014).\u003c/p\u003e\n\u003cp\u003eGeologically, all the Baltic states lie on the East European Craton, where the Proterozoic crystalline basement is overlaid by predominantly siliciclastic and carbonate Paleozoic and Mesozoic sedimentary rocks, with their thickness reaching up to two kilometres. The Paleogene and Neogene periods were marked by terrestrial conditions, with negligible rock formation, and the geological strata are capped with Quaternary deposits from the most recent glacial period (Brangulis et al. 1998).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eRecent tectonic activity in the area has been sparsely studied; few natural earthquakes have been observed in the upper crust, with magnitudes \u0026lt; 2. The intraplate earthquakes in Estonia have been reported of strike-slip and reverse origin, along steep to subvertical fault planes. The main stress fields are likely attributed to a remote stress field from the opening of the Atlantic or the glacioisostatic rebound (Soosalu et al. 2022; Hellqvist et al. 2015).\u003c/p\u003e\n\u003cp\u003eDue to only a few low-magnitude events and no active tectonic faults, the Baltic States are thus characterised by their relative seismological inactivity. The seismicity is monitored separately by the three Baltic States, but here we focus on the seismic observatory in Latvia, operated by the LEGMC, as it uses the information from all surrounding seismic stations in the Baltics.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe LEGMC seismic observatory was operated from 2009 until 2021. Seismic events were monitored using a virtual seismic network (BAVSEN – Baltic Virtual Seismic Network) from stations surrounding the Baltic states. The stations in question are outlined in Figure 1. Initial onset detection is done manually by eye, picking the first arrivals from the waveforms and associating them to seismic events using Seisan (Havskov and Ottemoller 1999) software. As the events are compiled manually, there are no conditions for phase count or minimal signal to noise ratio, only that judged by the working seismologist. However, most of the events are located by sparse instrumentation, thus a large majority of the event parameters are derived from 5 or less seismic phases. The observatory monitored both anthropogenic and natural seismicity, classifying the events manually. It also focused on teleseismic events detected from the seismic stations in the virtual network.\u003c/p\u003e\n\u003cp\u003eBetween 2008 and 2021, the LEGMC detected 6952 seismic events in the vicinity of the Baltic States (Figure 1), the overwhelming majority believed to be anthropogenic, such as explosions from regional mining operations. The M\u003csub\u003eW\u003c/sub\u003e of these events ranged from less than 1 to 3.5 (Ņikuļins 2020). During the first ten months of 2021, LEGMC observatory recorded 844 events, of which 250 were observed around the territory of the Baltic States (Latitude 52N°-60N°, Longitude 18E°-30E°). These events are subsequentially used in the study.\u003c/p\u003e"},{"header":"3. Materials and methods","content":"\u003cp\u003eIn this research, we used a suite of automatic phase detection algorithms, coupled with phase association techniques and event localisation methods, to analyse seismic waveforms collected from various SO across the Baltic States. These automated results were then benchmarked against the manual observations recorded by the Latvian Environment, Geology, and Meteorology Centre (LEGMC) from January to October 2021. The complete methodology, including the processing chain, is illustrated in Fig.2 and elaborated on in the following sections.\u003c/p\u003e\n\u003cp\u003eWe used data obtained from 10 seismic stations located around Baltic states. The seismic stations are spaced approximately 100-150 km apart, with an observable uneven distribution; notably, there is a significant gap in the network within eastern Latvia. Estonia and Lithuania have a denser network, with Estonia operating 3 broadband seismic stations and Lithuania 2. To enhance the phase picking process, we also incorporate data from observatories in Poland, Denmark, and the southern region of Finland (as shown in Fig.1) (Quinteros et al. 2021; Geological Survey Of Estonia (EGT) 1996; Institute of Geophysics, Polish Academy of Sciences 1990; Institute of Seismology 1980; GEUS Geological Survey of Denmark and Greenland 1976). The selection of stations was informed by the need for consistency with previous seismological work conducted by LEGMC, which primarily utilized data from these observatories (Ņikuļins 2020).\u003c/p\u003e\n\u003ch2\u003e3.1. Phase detection algorithms\u003c/h2\u003e\n\u003cp\u003eWe focus on three phase detection algorithms. The first algorithm is the Generalized Phase Detection, introduced by (Ross et al. 2018). The second, PhaseNet, was created by (Zhu and Beroza 2018). Finally, we examine the Earthquake Transformer, an algorithm brought forth by (Mousavi et al. 2020).\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eGeneralized Phase Detection (GPD)\u003c/em\u003e is a convolutional neural network (CNN) based seismic phase extraction algorithm (Ross et al. 2018). CNNs, commonly utilised in image processing, employ a series of digital filters to convolve data and extract features. These features can then be used for classification within the network. GPD processes 4-second moving snippets of the seismic record. These snippets are first subjected to a feature extraction system and then fed into a fully connected neural network, which uses an activation function to produce probabilities for P-picks, S-picks, and noise classification.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003ePhaseNet\u003c/em\u003e \u003cem\u003e(PN)\u0026nbsp;\u003c/em\u003eis a U-Net-based deep learning model for P and S wave arrival time picking (Ronneberger et al. 2015; Zhu and Beroza 2018). The algorithm processes 30-second-long three-component seismograms, which are subjected to a sequence of four downsampling stages followed by four upsampling stages. Skip connections are utilised at each level to facilitate direct communication between layers, bypassing deeper layers, which enhances convergence during the training phase. The output layer uses a SoftMax function to normalise the exponential function and assign probabilities to noise, P-picks, and S-picks. These probabilities are then presented as curves, that can subsequently be used to detect seismic events.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eEarthquake transformer (EQT)\u003c/em\u003e is seismic phase and event detection algorithm, which uses a combination of CNNs, long short-term memories, and self-attentive layers, techniques usually applied to language modelling and longer time-series prediction tasks (Mousavi et al. 2020). EQT analyses 60-second segments of three-component seismograms to calculate the probabilities for detecting seismic events, as well as the precise timing for P-wave and S-wave arrivals.\u003c/p\u003e\n\u003cp\u003eThe ML models employed in this study were originally trained on various datasets, both regional and global, and have demonstrated a significant capacity for generalisation (in this context models\u0026apos; ability to detect seismic events beyond the scope of their training data, encompassing different regions or types of events). We took inspiration from the work of (M\u0026uuml;nchmeyer et al. 2022), who refined and retrained these models within the PyTorch framework (Paszke et al. 2017). Considering the generalisation tests in (M\u0026uuml;nchmeyer et al. 2022) and given the seismological context of the Baltic States, characterised by low-magnitude local events and seismic waves traversing a sedimentary layer atop a platform basement, we opted for models that had been retrained on the INSTANCE benchmark dataset. This data set comprises over 1.3 million local and regional three-component waveforms from approximately 50,000 earthquakes in and around Italy (Michelini et al. 2021). We utilised the pre-trained models available in the SeisBench Python package, which also offers a convenient API for phase detection and model implementation (Woollam et al. 2022).\u003c/p\u003e\n\u003cp\u003eAutomatic phase picking algorithms were set to analyse seismic data in one-hour segments throughout the study from 1\u003csup\u003est\u003c/sup\u003e January until 31\u003csup\u003est\u003c/sup\u003e October 2021. We tuned the phase detection parameters for each of the algorithms, experimenting with 11 different P and S detection thresholds for each of them. The thresholds were chosen considering the literature review of other studies that utilise these algorithms (Lim et al. 2024; Garc\u0026iacute;a et al. 2022; Lapins et al. 2021). The tested thresholds are described in Table 1.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e1\u003c/strong\u003e\u003cstrong\u003e.\u003c/strong\u003e Phase detection probability thresholds for ML pickers\u003c/p\u003e\n \u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"499\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePhase\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eGPD\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eEQT\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePN\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eP\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.40, 0.50, 0.70, 0.80, 0.85, 0.90, 0.92, 0.94, 0.96, 0.98, 0.99\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.02, 0.03, 0.05, 0.08, 0.12, 0.18, 0.24, 0.30, 0.35, 0.40, 0.45\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.03, 0.05, 0.08, 0.12, 0.18, 0.24, 0.30, 0.35, 0.40, 0.45, 0.50\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.65, 0.80, 0.85, 0.88, 0.90, 0.92, 0.94, 0.96, 0.975, 0.985, 0.995\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.04, 0.05, 0.08, 0.12, 0.18, 0.24, 0.30, 0.36, 0.42, 0.50, 0.60\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.08, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.60\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003ch2\u003e3.2. Phase association \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u003c/h2\u003e\n\u003cp\u003ePhase association is an essential component of seismic monitoring, serving to correlate automatically detected phase arrivals with specific seismic event. Numerous automated methods for phase association have been developed, including iterative search and backprojection within multi-scale association webs (Yeck et al. 2019), cluster search (Ester et al., 1996) and the utilisation of likelihood distributions for arrival data (Yeck et al. 2019). In this study, we implemented the Gaussian Mixture Model Association (GaMMA) method, developed by (Zhu et al. 2022), and the PyOcto associator, created by (M\u0026uuml;nchmeyer 2024).\u003c/p\u003e\n\u003cp\u003eThe GaMMA approach conceptualises phase association as an unsupervised clustering challenge. It is based on the assumption that each earthquake can be represented by a cluster of P and S phases, with their arrival times following a hyperbolic delay relative to the distance from the earthquake\u0026apos;s origin. This hyperbolic pattern is also assumed for the phase amplitudes. To model the arrival times and amplitudes of the detected P and S phases, a multivariate Gaussian distribution is constructed for each seismic event. Subsequently, an expectation-maximisation algorithm applies the maximum likelihood criterion to align phase picks with their respective seismic events. One intrinsic limitation of the Gaussian Mixture Model is the prerequisite assumption about the number of earthquake events. GaMMA circumvents this issue by integrating Bayesian statistics, which introduces prior probabilistic knowledge into the parameters of the Gaussian mixture model. This incorporation allows the algorithm to initiate with a considerable number of potential events and methodically filter out the FP events that markedly deviate from the prior distributions (Zhu et al. 2022).\u003c/p\u003e\n\u003cp\u003eWe employ the GaMMA model in Bayesian mode, with oversample factor 4, which controls the initial amount of earthquake clusters; a larger number increases the cluster count. The maximum phase residual time is 10 seconds. The initial earthquake clusters are uniformly initialised over the entire region in both time and space dimensions. Then, iteratively minimising the earthquake parameters of the clusters from picks using the Hubert loss function, GaMMA calculates both associations of picks and potential locations of the events. To reduce the computation cost of GaMMA, the DBSCAN clustering algorithm is used to segment the long sequence of picks before association. DBSCAN is configured to distinguish events with at least 4 different phase picks, with a maximum time between picks to be associated for the same cluster of 50 seconds. In our example we configure GaMMA to use a homogeneous one-layer earth velocity model with P wave velocity of 6.5 km/s and S wave velocity of 3.75 km/s. The GaMMA association is geographically limited to 100 km around the Baltic states \u0026ndash; this means that global and regional outside of this Baltic will be seen as local events.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe PyOcto algorithm treats phase association as an iterative search problem. The 4D space-time (three distance axis and time) is partitioned in nodes, each associated with incoming seismic phases. All the picks are initially associated to a single node, and then the nodes are either split, used for event location, or discarded if no event can be located. The event location is performed, comparing arrival times of the picks to a velocity model, and minimising the residuals using equal differential-time loss. Both constant and 1D velocity models can be used (M\u0026uuml;nchmeyer 2024).\u003c/p\u003e\n\u003cp\u003eFor PyOcto to work optimally, we need to provide parameters that are necessary to discard the unsuccessful nodes, relating to minimum number of picks required to deem the node as perspective. In this study, we used the minimum amount of 4 picks, with at least 2 P phase picks and 0 S phase picks required. We were forced to use a small number of picks, more than 54% of the events from the LEGMC database contain 5 or lower number of picks for each of the events. For the velocity model, we use the same values as in GaMMA locator - P wave velocity of 6.5 km/s and S wave velocity of 3.75 km/s. The location is again limited to the same area of 100 km around the Baltic States.\u003c/p\u003e\n\u003ch2\u003e3.3. Event relocation\u003c/h2\u003e\n\u003cp\u003eFor enhanced precision in determining potential event hypocenters, we employed the HypoInverse software (Klein 2014) using the most recent version 1.4. HypoInverse is a location software created by USGS and is used by various seismic surveys to locate earthquake hypocenters worldwide. HypoInverse, developed by the US Geological Survey (USGS), is widely used by seismic networks worldwide for earthquake hypocenter location. Configured for this study, the software initialises earthquake depths at 0 km, suitable for the predominantly shallow or anthropogenic events typical of the region. A minimum of four phases is required for the software to locate an event, and we utilised the default settings for distance and residual weighting. The relocation process was informed by one dimensional P and S wave velocity models derived from the EUROBRIDGE \u0026apos;94 seismic reflection profile, as detailed by (Bogdanova et al. 2006).\u003c/p\u003e\n\u003cp\u003eThe relocation of events within the scope of the paper also served as an indicator of how many events are successfully fully automatically located. As the station network is very sparse, associators often might attribute erroneous noisy picks to events that are not related to the seismic event. In this way, Hypoinverse technically could serve as another filter for picks if the observed error between calculated arrival times and observed arrival times is too large.\u003c/p\u003e\n\u003ch2\u003e3.4. Verification\u003c/h2\u003e\n\u003cp\u003eWe evaluated the performance of the GaMMA and PyOcto associations and subsequent relocations with HypoInverse by comparing their results with observations from seismologists at LEGMC. To establish whether the automatically detected events corresponded to those identified by LEGMC. we conducted automated searches within a 60-second time frame surrounding each event documented by LEGMC.\u003c/p\u003e\n\u003cp\u003eThe 60 second time window was chosen to account for possible time misassociations due to possible inaccurate picks being associated with the events. We anecdotally observed at least multiple such cases and wanted to account that no event would be missed in this regard. Using TauP software (Crotwell et al. 1999), we modelled the maximum time it took an S phase from a synthetic event at one corner of the study domain to travel to the other, which was 56 seconds, approximating our chosen search window.\u0026nbsp;\u003c/p\u003e"},{"header":"4. Results","content":"\u003cp\u003eThe testing results for multiple phase-pickers and associators are shwon in Figure 3. We observe a clear negative relationship between recall and precision. Recall increases when the threshold for P detection (P\u003csub\u003ethresh\u003c/sub\u003e) is lowered, but precision largely decreases. This is, of course, expected \u0026ndash; lower P\u003csub\u003ethresh\u003c/sub\u003e allows more phases to be picked and, in turn, more events to be associated, while also increasing the false detections.\u003c/p\u003e\n\u003cp\u003eComparing different pickers, we observe similar trends in event detections from PN and EQT, while GPD significantly underperforms, with a maximum recall of 0.54 and precision of 0.17 when using the GaMMA associator. In addition, the GPD recall values decrease sharply when the phase detection thresholds are increased, down to \u0026lt;40 events when the Pthresh is less than 0.9. While a sharp decrease of recall is observed for both EQT and PN as well, we did not observe such decreases in the event counts, which is also the reason why precision remains relatively high.\u003c/p\u003e\n\u003cp\u003eComparing both associators, we observe that PyOcto generally associates more events than GaMMA. Especially for lower P\u003csub\u003ethresh\u0026nbsp;\u003c/sub\u003eand S\u003csub\u003ethresh\u003c/sub\u003e values, PyOcto can associate up to 10 times more events, while on average PyOcto associates twice as many events as GaMMA. However, this, in turn, leads to a larger recall, which also significantly decreases the precision of the detected events, showcasing that a large part of the increase is misattributed due to false positives.\u003c/p\u003e\n\u003cp\u003eUsing multiplication of recall and precision, we calculate the best performing P\u003csub\u003ethresh\u0026nbsp;\u003c/sub\u003eand S\u003csub\u003ethresh\u0026nbsp;\u003c/sub\u003evalues. Our decision is based on the highest recall, while also not significantly decreasing the event precision, thus not overwhelming the results with large amounts of false positive associations. The results of the best-performing configurations are examined in detail in the remainder of the article.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e2\u003c/strong\u003e\u003cstrong\u003e.\u003c/strong\u003e Results of the best performing algorithms and comparisons to LEGMC. Associated and Relocated refer to the data processing stage, GaMMA and PyOcto refer to the event associator used, while the method showcases the phase detector algorithm used. Pthresh and Sthresh refer to thresholds used in phase detection, n is total event count, precision and recall refer to comparisons with LEGMC event data base\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"479\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\" style=\"width: 65px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 61px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMethod\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eP\u003csub\u003ethresh\u003c/sub\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eS\u003csub\u003ethresh\u003c/sub\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e\u003cstrong\u003en\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePrecision\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRecall\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"6\" style=\"width: 65px;\"\u003e\n \u003cp\u003eAssociated\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"3\" style=\"width: 61px;\"\u003e\n \u003cp\u003eGaMMA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003eEQT\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.03\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.05\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e147\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.51\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.30\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003eGPD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.88\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e107\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.67\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.28\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003ePN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.03\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e301\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.38\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.45\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\" style=\"width: 61px;\"\u003e\n \u003cp\u003ePyOcto\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003eEQT\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.05\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e496\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.28\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.55\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003eGPD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.85\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e52\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.83\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.17\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003ePN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e492\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.26\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.51\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"6\" style=\"width: 65px;\"\u003e\n \u003cp\u003eRelocated\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"3\" style=\"width: 61px;\"\u003e\n \u003cp\u003eGaMMA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003eEQT\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.02\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.04\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e104\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.48\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.20\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003eGPD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.88\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e28\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.50\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.06\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003ePN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.03\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.08\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e156\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.40\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\" style=\"width: 61px;\"\u003e\n \u003cp\u003ePyOcto\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003eEQT\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.45\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.60\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003eGPD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.92\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.94\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e1.00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.01\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003ePN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.18\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.25\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.25\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 59px;\"\u003e\n \u003cp\u003e0.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eThe results of the best performing algorithms are analysed in Table 2. Using the GaMMA associator, we observe that PN shows the highest recall (0.45) while using PyOcto, the highest recall can be seen using the EQT phase detector (0.55). That comes with a cost in precision \u0026ndash; when using GaMMA PN detects twice as many events as second-best detector, EQT (301 to 147), however, the precision is lower (0.38 to 0.51). Very close precision and recall values between the two algorithms were observed when using PyOcto for association, recall values of 0.55 (EQT) and 0.51 (PN) and precision 0.28 (EQT) and 0.26 (PN). We again observe that PyOcto tends to identify more events, but at the cost of precision, most of the time the events cannot be attributed to the LEGMC data base. Using both associators, we see significantly poorer performance of GPD phase picker \u0026ndash; Both associated event count is much smaller and thus recall of the LEGMC data base is also lower. However, we see that the results have high precision (0.67 for GaMMA and 0.83 for PyOcto), showing that the events that are detected using the GPD and associator combination are, in most cases, correct.\u003c/p\u003e\n\u003cp\u003eThe quality of the associated events can be partially determined by trying to correctly relocate them. We observe that recall of relocated events significantly drops, mainly because many of the events are unlocatable. The GaMMA associator works better in this regard: while recall drops, we still observe a significant number of relocated events. EQT performs the best in this regard; most of the events (104 of 147) can be relocated successfully. PN picked events can be relocated half of the time (156 from 301) while GPD performs the worst, with only 28 out of 107 events being successfully relocated. The relocation works specifically very poor with PyOcto associator \u0026ndash; only few of the events for all the algorithms are successfully relocated. The reasons for this we try and understand in the discussion section.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAs we can previously observe, generally performance both before and after event relocation is poor. As other studies generally showcase good performance of deep learning pickers (Chen et al. 2024; Si et al. 2024; M\u0026uuml;nchmeyer et al. 2022; Zhu and Beroza 2018) and associators (Puente Huerta et al. 2025), we wanted to investigate the specifics of our case why they did not perform well. To better understand this, we compared the recall of each associator-picker combination with the number of LEGMC picks for each of the events (Figure 4). We observe that for the pick count of 4, the minimum needed for automatic association, the algorithms do not perform well. The best performance here can be observed from PN, where both PyOcto and GaMMA together get the recall of 0.25, with both associators together representing 0.11. Increasing the pick count, we see an increase in recall as well, however, for both PN and EQT the only significant results of recall \u0026ge; 0.8 can be observed with pick count 7 or above.\u003cstrong\u003e\u0026nbsp;GPD algorithm, again, performs significantly worse than PN or EQT, seeing recall of 0.78 with 9 picks. We also observe that there is a strong basis to use multiple algorithms together, as they can detect different events. Combining multiple pickers and associators (\u003c/strong\u003eFigure 4 subplot Combined) we can see that we get 0.69 recall even with 6 different phase picks, while we get recall of 1 with 7 different picks. In general, these tests show that the quality of the association increases significantly with the number of stations where the traces of the event can be convincingly observed.\u003c/p\u003e\n\u003cp\u003eTo better investigate if the association or machine-learning picking quality is to blame, we counted raw phase picks without association in 60 second intervals around each of the 250 manually detected events from LEGMC. The fraction of LEGMC events in relation to the pick count is depicted in figure Figure 5.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe hypothesis is that even unmatched events for most of the time have enough picks gathered by machine learning pickers, while sporadic erroneous or noisy picks disturb the association process. We observe that on average, unassociated events tend to have a larger or a similar number of ML picks (as indicated by the mean picks in Figure 5), but there are fewer ML picks for most (~70%) of the events. We also observe that a significant number of unassociated events \u0026ndash; around 50-60% for all ML algorithms \u0026ndash; have 5 or less picks, while EQT and PN pickers for most of the time tend to have at least 4 different picks (EQT\u003csub\u003eN\u0026lt;4\u0026nbsp;\u003c/sub\u003eis 23 or (9.2% of all events) and 21 (8.4%) for GaMMA and PyOcto respectively; PN\u003csub\u003e\u0026nbsp;N\u0026lt;4\u0026nbsp;\u003c/sub\u003eis 4 (1.6%) and 24 (9.6%) respectively). This coincides with the theory that the phase association is the likely bottleneck in event detection.\u003c/p\u003e\n\u003cp\u003eFinally, we investigate the waveforms of a seismic events to find the main causes of misdetection. The waveforms are compiled in Figures 6 and 7. Here we observe the duality of the results.\u003c/p\u003e\n\u003cp\u003eIn cases when automatic detections coincide well with manual picks (i.e., Figure 6.1) we observe that both ML pickers and subsequent association algorithms can add additional information with previously missed picks for the events. For example, we see additional P picks in the overlooked SUW station and an additional S phase in the PBUR station. We also observe that some of the S phases (in the SUW station, for example) are misclassified as P phases, but the GaMMA association algorithm can distinguish them and remove them from the event. Finally, we observe a time difference between the manual P and S phases (up to 1.56 s in Figure 6.1 SLIT station, but on average less than 0.5 s), which in many cases is not very large, but can be significant, nonetheless.\u003c/p\u003e\n\u003cp\u003eOn the other hand, investigating events that do not coincide with automatic observations (Figure 6.2), we can see the difficulty of the association problem. Multiple sporadic P and S phases are automatically picked throughout the event, with some of them coinciding with manual observations (VSU and RAF stations) and some of them missing the manual picks overall (MTSE and ARBE). The presence of additional noisy picks likely interferes with the association process, preventing both algorithms from reliably identifying the seismic event. These high-noise scenarios likely contribute to a lower recall of the LEGMC data base.\u003c/p\u003e\n\u003cp\u003eWe also observe that there is a significant basis to believe that ML can also detect potentially missed events by manual observers. Figures 7.1 and 7.2 demonstrate that, in several cases, the automated workflow successfully detects and associates phase arrivals that were not identified by the seismologist. The phases of the events are seen by both the EQT and the PN pickers and are associated by the PyOcto algorithm. Both events are on the lower threshold of associated pick counts, counting 4 P and S phases in all cases. This showcases that for complete catalogues in sparsely monitored noisy areas there is a necessity to tune association algorithms to low association threshold values, as otherwise easily seen events can be missed by seismological operators.\u003c/p\u003e\n\u003cp\u003eInterestingly, in all four investigated events, we observe that GPD algorithm has fewer picks or picks of poorer quality than EQT and PN. It is most obvious in Figure 6.1, where we all other algorithms detect more or as many picks as the seismologist, but GPD only detects three picks. In Figure 7 we see that GPD has not detected seismic phases. This coincides with the previous statistics that show that GPD shows significantly lower recall than other ML pickers.\u003c/p\u003e\n\u003cp\u003eIn general, comparing different pickers, we see that PN and EQT perform significantly better than GPD. The difference can be seen in both recall of the LEGMC database and in cases that we investigated in more detail. Overall, the difference between PN and EQT is small, as both pickers obtain comparable results.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eComparing different event associators, we observe that generally GaMMA provides a lower detection count while having improved precision, while PyOcto, being tuned to have fewer picks and fewer stations for the association to be successful, shows higher recall values, while also having lower precision. We can clearly see these results in Table 2 and Figure 4.\u003c/p\u003e"},{"header":"5. Discussion","content":"\u003cp\u003eOur results produce a unique insight in how multiple state-of-the art phase detection algorithms might work in a sparselyinstrumented observed low seismicity area. As we see, the results are mixed – while, on one hand, there are cases where ML pickers and subsequent association of them offer a significant edge and we are able to detect new events (Figure 7.1 and Figure 7.2), on other hand we see relatively poor recall of LEGMC database (0.55 with combination of EQT an PyOcto) and observe multiple manually detected events that are not automatically detected (Figure 6.2).\u003c/p\u003e\n\u003cp\u003eFirst, we would like to address potential drawbacks of our methodology. The selection of the optimal picker–association combination is based on maximising the balance between false positives and true positives relative to the LEGMC reference database, as quantified by the precision-recall trade-off shown in Figure 4. As expected, lowering the pick-detection threshold systematically increases recall. Although this represents a straightforward strategy to improve event recovery, it inevitably reduces precision and inflates the number of false-positive detections.\u003c/p\u003e\n\u003cp\u003eA natural extension of this framework would be the incorporation of an additional post-processing classification stage to identify falsely associated events. This step could exploit event characteristics such as event location, magnitudes, or waveform spectral information. Comparable approaches have been successfully applied in event classification context, including neural network-based discrimination of anthropogenic blasts (Eggertsson et al. 2024) and unsupervised learning strategies for (Wang et al. 2023). These methodologies could be adapted to specifically target poorly constrained events arising from automatic association.\u003c/p\u003e\n\u003cp\u003eIt is also important to note that the reference bulletin itself is not exhaustive. Our automatic workflow identifies seismic events that are absent from the manual catalogue, underscoring both the incompleteness of the ground truth and the ambiguity of the detection problem. This further complicates performance assessment and highlights the need for iterative refinement of both automated and reference datasets.\u003c/p\u003e\n\u003cp\u003eWe also want to address the potential concern that the machine-learning pickers employed in this study were predominantly trained on natural earthquake datasets, while the seismicity in the Baltic States is dominated by anthropogenic sources (e.g., quarry blasts). This domain mismatch may influence the picker performance, and we have not explicitly quantified the resulting error within the present analysis. Indeed, isolated cases of clearly spurious, noise-driven phase picks were observed, such as the BSD station in Figure 6.2, which illustrates the sensitivity of the models to local noise conditions.\u003c/p\u003e\n\u003cp\u003eHowever, previous studies have demonstrated that ML pickers trained in tectonic earthquakes can generalise effectively to anthropogenic events (Duan et al. 2025). Our own observations are consistent with these findings: despite the prevalence of human-made sources, the overall quality of phase-pick remains high. Although the Baltic region exhibits specific geological characteristics, most notably stations located above relatively thick sedimentary basins that can amplify surface-wave energy and modify signal character, empirical pick counts (Figure 5) suggest that the main bottleneck in event detection lies in the association stage rather than in phase identification itself. Still, a future study of re-training ML pickers on the data specifically from the Baltic states would be encouraged.\u003c/p\u003e\n\u003cp\u003eOne surprising result of our study was poor recall after relocation with the Hypoinverse locator. There are likely two possible reasons for this. First, in a sparsely instrumented network, even a single erroneous phase pick within the association window can destabilise the location solution. An example is shown in Figure 6.2, where the event was not only poorly relocated but failed to associate altogether, likely because S phases were misclassified as P phases at the MTSE and RAF stations\u003c/p\u003e\n\u003cp\u003eSecond, minor but systematic phase-time offsets were observed between the ML-derived and manually reviewed picks. This discrepancy likely stems from the algorithmic design of sliding-window inference, which can slightly reduce the pick precision. Although the observed offsets are small (generally \u0026lt;0.5 s), their cumulative effect—especially when combined with other mispicks – can degrade relocation stability in sparse networks. We therefore recommend that in regions with limited station coverage, phase picks and associations be manually reviewed prior to relocation to mitigate error propagation and improve solution robustness.\u003c/p\u003e"},{"header":"6. Conclusions","content":"\u003cp\u003eThis study evaluates the performance of three state-of-the-art deep learning phase pickers (EQT, PhaseNet, and GPD) combined with two association algorithms (GaMMA and PyOcto) in a sparsely instrumented, low-seismicity region of the Baltic States. The results provide insight into the practical limitations and potential of automated seismic workflows in stable cratonic environments dominated by low-magnitude and anthropogenic events.\u003c/p\u003e\n\u003cp\u003eFirst, we demonstrate that modern deep learning pickers can identify a substantial fraction of manually detected events. PhaseNet and EQT consistently outperform GPD in both recall and overall robustness, whereas GPD exhibits higher precision but substantially lower detection rates. The difference between PhaseNet and EQT is comparatively small, suggesting that both architectures are suitable for similar tectonic and observational settings.\u003c/p\u003e\n\u003cp\u003eSecond, the results highlight that the primary limitation of the automated workflow lies not in phase detection itself, but in the association and relocation stages. Raw pick statistics indicate that many unmatched events contain enough phase picks to accommodate minimal association thresholds. However, the presence of noisy or misclassified arrivals, combined with the sparse network geometry, significantly changes the association process. The marked drop in recall after relocation with HypoInverse underscores the sensitivity of sparse networks to single erroneous picks and small timing offsets.\u003c/p\u003e\n\u003cp\u003eThird, we observe a strong dependence of the recall on the number of available phases. Events within the Baltic States can be recovered with high reliability with seven or more independent phase picks, especially when combining multiple pickers and associators, whereas events recorded by fewer than five picks remain difficult to associate robustly. This emphasises the importance of station density and network geometry to achieve good catalogue completeness in low-seismicity regions.\u003c/p\u003e\n\u003cp\u003eFourth, the precision-recall trade-off is particularly pronounced in low signal-to-noise environments. Lowering detection thresholds increases recall but rapidly inflates the number of false positives. For operational monitoring, this implies that threshold selection must balance catalogue completeness with analyst workload. Ensemble approaches combining multiple pickers and associators improve recovery of true events but require additional filtering or validation steps.\u003c/p\u003e\n\u003cp\u003eImportantly, the study also shows that automated workflows can identify events that are not present in the manual bulletin. This indicates that existing catalogues in sparsely monitored intraplate regions may be incomplete and that automated methods serve as a valuable complement to human analysis rather than a replacement.\u003c/p\u003e\n\u003cp\u003eOverall, our finding suggests that in stable continental regions improvements in phase association and network density yield are better results than improvements in phase detection algorithms. While there is a basis for future studies for region-specific picking model training, we find that the focus should be on phase association algorithm improvement, considering typical event parameters, regional characteristics, while also potentially re-classifying already associated events as true or false positives.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors report there are no competing interests to declare.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eData sets generated during the current study are available from the corresponding author on request. Restrictions apply to the LEGMC seismic observatory data, which were used in this study and are not publicly available.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAllen RV (1978) Automatic earthquake recognition and timing from single traces. Bull Seismol Soc Am 68:1521\u0026ndash;1532. https://doi.org/10.1785/BSSA0680051521\u003c/li\u003e\n\u003cli\u003eBogdanova S, Gorbatschev R, Grad M, Janik T, Guterch A, Kozlovskaya E, Motuza G, Skridlaite G, Starostenko V, Taran L, Groups PW (2006) EUROBRIDGE : new insight into the geodynamic evolution of the East European Craton. Geol Soc Lond Mem 32:599\u0026ndash;625. https://doi.org/10.1144/GSL.MEM.2006.032.01.36\u003c/li\u003e\n\u003cli\u003eBogdanova S, Gorbatschev R, Skridlaite G, Soesoo A, Taran L, Kurlovich D (2015) Trans-Baltic Palaeoproterozoic correlations towards the reconstruction of supercontinent Columbia/Nuna. Precambrian Res 259:5\u0026ndash;33. https://doi.org/10.1016/j.precamres.2014.11.023\u003c/li\u003e\n\u003cli\u003eBogdanova SV, Bingen B, Gorbatschev R, Kheraskova TN, Kozlov VI, Puchkov VN, Volozh YA (2008) The East European Craton (Baltica) before and during the assembly of Rodinia. Precambrian Res 160:23\u0026ndash;45. https://doi.org/10.1016/j.precamres.2007.04.024\u003c/li\u003e\n\u003cli\u003eBrangulis A, Kur\u0026scaron;s V, Misāns J, Stinkulis Ģ (1998) Latvijas ģeoloģiskā karte, mērogs 1:500 000, Ģeoloģiskās uzbūves apraksts. Valsts Ģeoloģijas dienests, Rīga\u003c/li\u003e\n\u003cli\u003eChen Y, Savvaidis A, Siervo D, Huang D, Saad OM (2024) Near Real‐Time Earthquake Monitoring in Texas Using the Highly Precise Deep Learning Phase Picker. Earth Space Sci 11:e2024EA003890. https://doi.org/10.1029/2024EA003890\u003c/li\u003e\n\u003cli\u003eCocks LRM, Torsvik TH (2005) Baltica from the late Precambrian to mid-Palaeozoic times: The gain and loss of a terrane\u0026rsquo;s identity. Earth-Sci Rev 72:39\u0026ndash;66. https://doi.org/10.1016/j.earscirev.2005.04.001\u003c/li\u003e\n\u003cli\u003eCrotwell HP, Owens TJ, Ritsema J (1999) The TauP Toolkit: Flexible Seismic Travel-time and Ray-path Utilities. Seismol Res Lett 70:154\u0026ndash;160. https://doi.org/10.1785/gssrl.70.2.154\u003c/li\u003e\n\u003cli\u003eDuan C, Schmandt B, Maguire R, Wang R, Kong Q (2025) Differential Seismic Phase Detection Probability as a Potential Discriminant of Explosions and Earthquakes. Seism Rec 5:218\u0026ndash;227. https://doi.org/10.1785/0320250015\u003c/li\u003e\n\u003cli\u003eEggertsson G, Lund B, Roth M, Schmidt P (2024) Earthquake or blast? Classification of local-distance seismic events in Sweden using fully connected neural networks. Geophys J Int 236:1728\u0026ndash;1742. https://doi.org/10.1093/gji/ggae018\u003c/li\u003e\n\u003cli\u003eGarc\u0026iacute;a JE, Fern\u0026aacute;ndez-Prieto LM, Villase\u0026ntilde;or A, Sanz V, Ammirati J-B, D\u0026iacute;az Su\u0026aacute;rez EA, Garc\u0026iacute;a C (2022) Performance of Deep Learning Pickers in Routine Network Processing Applications. Seismol Res Lett 93:2529\u0026ndash;2542. https://doi.org/10.1785/0220210323\u003c/li\u003e\n\u003cli\u003eGeological Survey Of Estonia (EGT) (1996) Estonian National Seismic Network (EESN)\u003c/li\u003e\n\u003cli\u003eGEUS Geological Survey of Denmark and Greenland (1976) Danish Seismological Network\u003c/li\u003e\n\u003cli\u003eHavskov J, Ottemoller L (1999) SeisAn Earthquake Analysis Software. Seismol Res Lett 70:532\u0026ndash;534. https://doi.org/10.1785/gssrl.70.5.532\u003c/li\u003e\n\u003cli\u003eHellqvist {Niina Marjut}, Koskinen {Paula Helena}, M\u0026auml;ntyniemi {P\u0026auml;ivi Birgitta}, Uski {Marja Riitta}, Valtonen {Outi Sinikka}, Airo M-L, Huotari-Halkosaari T, Nironen M, Sutinen R, Grigull S, Stephens M, Karin H, Lund B (2015) Seismotectonic framework and seismic source area models in fennoscandia, northern europe. Institute of Seismology, University of Helsinki, Finland\u003c/li\u003e\n\u003cli\u003eInstitute of Geophysics, Polish Academy of Sciences (1990) Polish Seismological Network\u003c/li\u003e\n\u003cli\u003eInstitute of Seismology U of H (1980) The Finnish National Seismic Network. gt;1000GB\u003c/li\u003e\n\u003cli\u003eKlein FredW (2014) User\u0026rsquo;s Guide to HYPOINVERSE-2000, a Fortran Program to Solve for Earthquake Locations and Magnitudes\u003c/li\u003e\n\u003cli\u003eKong Q, Trugman DT, Ross ZE, Bianco MJ, Meade BJ, Gerstoft P (2019) Machine Learning in Seismology: Turning Data into Insights. Seismol Res Lett 90:3\u0026ndash;14. https://doi.org/10.1785/0220180259\u003c/li\u003e\n\u003cli\u003eLapins S, Goitom B, Kendall J, Werner MJ, Cashman KV, Hammond JOS (2021) A Little Data Goes a Long Way: Automating Seismic Phase Arrival Picking at Nabro Volcano With Transfer Learning. J Geophys Res Solid Earth 126. https://doi.org/10.1029/2021JB021910\u003c/li\u003e\n\u003cli\u003eLim CSY, Lapins S, Segou M, Werner MJ (2024) Deep learning phase pickers: how well can existing models detect hydraulic-fracturing induced microseismicity from a borehole array? Geophys J Int 240:535\u0026ndash;549. https://doi.org/10.1093/gji/ggae386\u003c/li\u003e\n\u003cli\u003eMichelini A, Cianetti S, Gaviano S, Giunchi C, Jozinović D, Lauciani V (2021) INSTANCE \u0026ndash; the Italian seismic dataset for machine learning. Earth Syst Sci Data 13:5509\u0026ndash;5544. https://doi.org/10.5194/essd-13-5509-2021\u003c/li\u003e\n\u003cli\u003eMousavi SM, Ellsworth WL, Zhu W, Chuang LY, Beroza GC (2020) Earthquake transformer\u0026mdash;an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat Commun 11:3952. https://doi.org/10.1038/s41467-020-17591-w\u003c/li\u003e\n\u003cli\u003eM\u0026uuml;nchmeyer J (2024) PyOcto: A high-throughput seismic phase associator. Seismica 3. https://doi.org/10.26443/seismica.v3i1.1130\u003c/li\u003e\n\u003cli\u003eM\u0026uuml;nchmeyer J, Woollam J, Rietbrock A, Tilmann F, Lange D, Bornstein T, Diehl T, Giunchi C, Haslinger F, Jozinović D, Michelini A, Saul J, Soto H (2022) Which Picker Fits My Data? A Quantitative Evaluation of Deep Learning Based Seismic Pickers. J Geophys Res Solid Earth 127:e2021JB023499. https://doi.org/10.1029/2021JB023499\u003c/li\u003e\n\u003cli\u003eNance RD, Murphy JB, Santosh M (2014) The supercontinent cycle: A retrospective essay. Gondwana Res 25:4\u0026ndash;29. https://doi.org/10.1016/j.gr.2012.12.026\u003c/li\u003e\n\u003cli\u003eŅikuļins VG (2020) Seismological Monitoring in Latvia. Summ Bull Int Seismol Cent 54:50\u0026ndash;66. https://doi.org/10.31905/BKETRT2R\u003c/li\u003e\n\u003cli\u003ePaszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch\u003c/li\u003e\n\u003cli\u003ePuente Huerta JA, M\u0026uuml;nchmeyer J, McBrearty I, Sippl C (2025) Benchmarking seismic phase associators: Insights from synthetic scenarios. https://meetingorganizer.copernicus.org/EGU24/EGU24-8913.html. Accessed 9 Dec 2025\u003c/li\u003e\n\u003cli\u003eQuinteros J, Strollo A, Evans PL, Hanka W, Heinloo A, Hemmleb S, Hillmann L, Jaeckel K-H, Kind R, Saul J, Zieke T, Tilmann F (2021) The GEOFON Program in 2020. Seismol Res Lett 92:1610\u0026ndash;1622. https://doi.org/10.1785/0220200415\u003c/li\u003e\n\u003cli\u003eRonneberger O, Fischer P, Brox T (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical Image Computing and Computer-Assisted Intervention \u0026ndash; MICCAI 2015. Springer International Publishing, Cham, pp 234\u0026ndash;241\u003c/li\u003e\n\u003cli\u003eRoss ZE, Meier M-A, Hauksson E, Heaton TH (2018) Generalized Seismic Phase Detection with Deep Learning. Bull Seismol Soc Am 108:2894\u0026ndash;2901. https://doi.org/10.1785/0120180080\u003c/li\u003e\n\u003cli\u003eSheen D-H, Friberg PA (2021) Seismic Phase Association Based on the Maximum Likelihood Method. Front Earth Sci 9:699281. https://doi.org/10.3389/feart.2021.699281\u003c/li\u003e\n\u003cli\u003eSi X, Wu X, Li Z, Wang S, Zhu J (2024) An all-in-one seismic phase picking, location, and association network for multi-task multi-station earthquake monitoring. Commun Earth Environ 5:22. https://doi.org/10.1038/s43247-023-01188-4\u003c/li\u003e\n\u003cli\u003eSoosalu H, Uski M, Komminaho K, Veski A (2022) Recent Intraplate Seismicity in Estonia, East European Platform. Seismol Res Lett 93:1800\u0026ndash;1811. https://doi.org/10.1785/0220210277\u003c/li\u003e\n\u003cli\u003eTorsvik TH, Cocks LRM (2005) Norway in space and time: A Centennial cavalcade. Nor J Geol 85:73\u0026ndash;86\u003c/li\u003e\n\u003cli\u003eWang T, Bian Y, Zhang Y, Hou X (2023) Classification of earthquakes, explosions and mining-induced earthquakes based on XGBoost algorithm. Comput Geosci 170:105242. https://doi.org/10.1016/j.cageo.2022.105242\u003c/li\u003e\n\u003cli\u003eWithers M, Aster R, Young C, Beiriger J, Harris M, Moore S, Trujillo J (1998) A comparison of select trigger algorithms for automated global seismic phase and event detection. Bull Seismol Soc Am 88:95\u0026ndash;106. https://doi.org/10.1785/BSSA0880010095\u003c/li\u003e\n\u003cli\u003eWoollam J, M\u0026uuml;nchmeyer J, Tilmann F, Rietbrock A, Lange D, Bornstein T, Diehl T, Giunchi C, Haslinger F, Jozinović D, Michelini A, Saul J, Soto H (2022) SeisBench\u0026mdash;A Toolbox for Machine Learning in Seismology. Seismol Res Lett 93:1695\u0026ndash;1709. https://doi.org/10.1785/0220210324\u003c/li\u003e\n\u003cli\u003eYeck WL, Patton JM, Johnson CE, Kragness D, Benz HM, Earle PS, Guy MR, Ambruz NB (2019) GLASS3: A Standalone Multiscale Seismic Detection Associator. Bull Seismol Soc Am 109:1469\u0026ndash;1478. https://doi.org/10.1785/0120180308\u003c/li\u003e\n\u003cli\u003eZhu W, Beroza GC (2018) PhaseNet: A Deep-Neural-Network-Based Seismic Arrival Time Picking Method\u003c/li\u003e\n\u003cli\u003eZhu W, McBrearty IW, Mousavi SM, Ellsworth WL, Beroza GC (2022) Earthquake Phase Association using a Bayesian Gaussian Mixture Model. J Geophys Res Solid Earth 127. https://doi.org/10.1029/2021JB023249\u003c/li\u003e\n\u003cli\u003e\u0026nbsp;\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"journal-of-seismology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"jose","sideBox":"Learn more about [Journal of Seismology](http://link.springer.com/journal/10950)","snPcode":"10950","submissionUrl":"https://submission.nature.com/new-submission/10950/3","title":"Journal of Seismology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"machine learning; Baltica, automatic event detection, seismic event catalogue","lastPublishedDoi":"10.21203/rs.3.rs-8988784/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8988784/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eReliable earthquake catalogues in stable continental regions are difficult to obtain due to sparse station coverage, low signal-to-noise ratios, and the predominance of low-magnitude and anthropogenic events. We evaluated the performance of three deep learning phase picking algorithms – Earthquake Transformer, PhaseNet, and Generalized Phase Detection (GPD) – combined with two phase association methods, Gaussian Mixture Model Association (GaMMA) and PyOcto, using seismic data from the Baltic States between January and October 2021. Automatic detections are benchmarked against manually compiled observations from the Latvian Environment, Geology, and Meteorology Centre.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe results show that PhaseNet and Earthquake Transformer substantially outperform GPD in terms of event recall. PyOcto associator generally produces higher recall but lower precision than the GaMMA. The PyOcto event relocation using HypoInverse significantly reduces recall, highlighting the sensitivity of sparse networks to misassociated or slightly mis-timed phase picks. Detection performance strongly depends on the number of available phase observations; events recorded by fewer than five picks are rarely recovered reliably.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOur analysis shows that automatic workflows are highly sensitive to the number and spatial distribution of phase observations. Ensemble combinations of multiple pickers and associators improve recovery but also amplify false detections if not carefully constrained. The results demonstrate that parameter tuning, association strategy, and network configuration together govern catalogue quality in low-seismicity intraplate environments.\u003c/p\u003e","manuscriptTitle":"Application of automated seismic event detection in a low seismicity region of the Baltic States","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-05 07:12:01","doi":"10.21203/rs.3.rs-8988784/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-04-20T01:54:28+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"64118671465153962568222911758863940236","date":"2026-03-28T00:43:28+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-03-02T19:10:54+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-03-02T05:09:44+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-03-02T05:07:05+00:00","index":"","fulltext":""},{"type":"submitted","content":"Journal of Seismology","date":"2026-02-27T13:38:34+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"journal-of-seismology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"jose","sideBox":"Learn more about [Journal of Seismology](http://link.springer.com/journal/10950)","snPcode":"10950","submissionUrl":"https://submission.nature.com/new-submission/10950/3","title":"Journal of Seismology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"0b03bcb4-3f9a-4408-a5a9-c778179e56d6","owner":[],"postedDate":"March 5th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-03-05T07:12:01+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-05 07:12:01","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8988784","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8988784","identity":"rs-8988784","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.