MT-DE-ESN: A Multi-Timescale Delay-Embedded Echo State Network with Closed-Loop Stability for Long-Memory Temporal Prediction

doi:10.21203/rs.3.rs-9447723/v1

MT-DE-ESN: A Multi-Timescale Delay-Embedded Echo State Network with Closed-Loop Stability for Long-Memory Temporal Prediction

2026 · doi:10.21203/rs.3.rs-9447723/v1

preprint OA: closed

Full text JSON View at publisher

Full text 98,426 characters · extracted from preprint-html · click to expand

MT-DE-ESN: A Multi-Timescale Delay-Embedded Echo State Network with Closed-Loop Stability for Long-Memory Temporal Prediction | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article MT-DE-ESN: A Multi-Timescale Delay-Embedded Echo State Network with Closed-Loop Stability for Long-Memory Temporal Prediction RamaKrishna Pasupuleti This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9447723/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract We propose the Multi-Timescale Delay-Embedded Echo State Network (MT-DE-ESN). The architecture combines three established mechanisms — slow leaky reservoir dynamics, adaptive delay features, and state normalisation — with a useful interpretive framing: the slow reservoir is shown to compute an exponentially-weighted memory kernel that complements the explicit comb basis of the delay features, together forming a hybrid basis expansion (Proposition 2.1). Under autonomous output feedback (closed-loop prediction), MT-DE-ESN empirically exhibits bounded behaviour at all tested horizons (NRMSE ≤ 0.11, 1–500 steps), while a well-tuned standard ESN diverges exponentially (NRMSE > 400 at 500 steps) on Mackey-Glass (τ = 17, RK4 Δt = 0.1). The architecture is evaluated against a grid-search tuned ESN baseline and dense-delay NG-RC (Gauthier et al. 2021) using paired Wilcoxon signed-rank tests and Cohen’s d over 10 seeds across seven benchmarks. Significant improvements are obtained on NARMA-30/50/100, Mackey-Glass, Santa Fe Laser, and Lorenz-96 (D = 10). No significant difference is observed on NARMA-10 (p = 0.246); a negative result (− 4.5%, p = 0.34) on a controlled electricity load benchmark is reported transparently. MT-DE-ESN provides a practical alternative to NG-RC for long-memory and closed-loop prediction tasks. Artificial Intelligence and Machine Learning echo state network reservoir computing multi-timescale delay features closed-loop prediction long-memory time series NARMA Mackey-Glass Lorenz-96 Wilcoxon signed-rank Figures Figure 1 1. Introduction Echo State Networks (ESNs) [ 1 ] are fixed-reservoir recurrent networks where only a linear readout is trained by ridge regression. Their efficiency makes them attractive for temporal sequence modelling [ 2 , 3 ], but standard ESNs have bounded effective memory determined by spectral radius and reservoir size. For tasks requiring memory beyond 20 time steps, performance degrades substantially. Multiple extensions have been proposed in isolation: leaky integrator neurons [ 4 ], hierarchical multi-timescale architectures [ 5 ], and explicit delay features [ 6 ]. Gauthier et al. [ 7 ] recently proposed Next-Generation Reservoir Computing (NG-RC), which eliminates the reservoir entirely in favour of polynomial delay features, achieving state-of-the-art results on smooth chaotic attractors. This paper presents MT-DE-ESN with two contributions beyond empirical combination. First, Proposition 2.1 shows the slow leaky reservoir is an exponentially-weighted memory kernel. This complements the explicit comb basis of the delay features, giving a hybrid basis that neither component can span alone. Second, under autonomous prediction (output fed back as input) MT-DE-ESN empirically remains bounded across all tested horizons while a standard ESN diverges exponentially on the same task (Section 5.4). This bounded behaviour under closed-loop feedback is architecturally meaningful: it indicates that the slow state integration regularises the autonomous dynamics in a way absent from standard ESNs. We position MT-DE-ESN against NG-RC with a fair comparison. Both architectures receive grid-search tuned hyperparameters. NG-RC is tested with coarse and dense delay sets. On NARMA-30, dense NG-RC (step = 1, d = 30) achieves NRMSE = 0.668 while MT-DE-ESN achieves 0.495 with fewer features. NG-RC is competitive when dense delays are affordable. MT-DE-ESN is preferable when the memory order is large (M > 50) — making NG-RC’s O(d²) features ill-conditioned — or when bounded closed-loop behaviour under autonomous feedback is required. The proposed architecture can also be loosely interpreted within the K–R conceptual framework, where system behaviour is governed by a balance between excitation (input-driven activation) and regulation (stabilising dynamics). In the present work, these roles are implicitly realised through reservoir activation, delay embedding, leaky integration, and regularised readout training. 1.1 Contributions A novel interpretation of leaky reservoirs as exponential memory kernels enabling hybrid basis expansion with delay features (Proposition 2.1 ). While the kernel unrolling is a standard algebraic step, its application as a basis-decomposition framing for principled ESN design — distinguishing two complementary memory mechanisms — provides a useful interpretive perspective. Empirical demonstration of bounded closed-loop behaviour: exact 10-seed NRMSE values showing MT-DE-ESN remains bounded (≤ 0.11) while tuned ESN diverges (> 400) at all horizons from 1 to 500 steps on Mackey-Glass. Fair three-way comparison against tuned ESN and dense-delay NG-RC on NARMA-30 and Mackey-Glass, with all methods grid-search tuned on the same validation set. Statistical methodology upgrade: all results reported with paired Wilcoxon signed-rank tests and Cohen's d effect sizes over 10 seeds. Ablation study using the tuned ESN as baseline, clarifying that delay features are the dominant contributor (+ 17% over tuned ESN on NARMA-30). Lorenz-96 (D = 10) multivariate benchmark with 53.8% NRMSE improvement (p < 0.0001, d = 3.1). Honest reporting of a null result on NARMA-10 (p = 0.246) and a negative result on short-memory electricity load (− 4.5%, p = 0.34), establishing a clear scope boundary. 2. Theoretical Foundations 2.1 Memory Capacity of Leaky ESNs The standard ESN state update is: x(t) = tanh(W x(t-1) + W_in u(t)) ((2.1)) A leaky integrator reservoir with rate α satisfies: x_s(t) = (1-α) x_s(t-1) + α tanh(W_s x_s(t-1) + W_in,s u(t)) ((2.2)) The 95% decay horizon L₉₅ = ln(20)/α ≈ 3/α provides a principled design rule [ 4 ]: set α_s ≤ 3/M to ensure L₉₅ ≥ M for task memory order M. This is derived from the condition that the kernel weight at lag M equals 5% of the lag-0 weight. 2.2 Delay Embedding as Explicit Basis By Takens' theorem [ 8 ], a delay embedding [u(t), u(t-τ₁),...,u(t-τd)] can reconstruct the attractor of a dynamical system from a scalar observable. In the ESN context, delay features provide the readout with direct algebraic access to specific past inputs. For a task whose output depends on Σu(t-k) for k = 1,...,M (as in NARMA-M), an exact linear readout over delays {1,...,M} can represent the required sum exactly, whereas a standard ESN must approximate the same sum implicitly through reservoir dynamics. 2.3 Analytical Result: Hybrid Basis Expansion Proposition 2.1 (Hybrid memory kernel). While kernel unrolling is a standard algebraic step, its interpretation as a basis-decomposition argument for ESN design — distinguishing the implicit exponential coverage of the slow reservoir from the explicit comb coverage of the delay features — is a novel framing. We state it explicitly here because it directly motivates the architecture. The slow leaky reservoir state x_s(t) can be expressed as an exponentially-weighted sum of past input-driven terms : x_s(t) = α Σ_{k = 0}^∞ (1-α)^k · h(t-k), h(t) = tanh(W_s x_s(t-1) + W_in,s u(t)) ((2.3)) This is a geometric (exponential) memory kernel with decay rate (1-α). The delay features {u(t-τ₁),...,u(t-τd)} provide an explicit polynomial basis at selected lags. Together, the feature vector Φ(t) = [x_f, x_s, delays] implements a hybrid basis: implicit exponential compression (via slow reservoir) + explicit selective coverage (via delays). Neither component alone spans both. Derivation: Unrolling (2.2) recursively yields (2.3) as a standard geometric series — this step is not new. The analytical contribution is the reframing: the explicit delay features {u(t − τk)} form a Dirichlet (comb) basis at selected lags, while the slow reservoir kernel covers all lags geometrically. The exponential kernel is inefficient at sharp lag-specific terms; a finite delay set cannot represent smoothly-decaying memory. Their combination covers both regimes with linear feature dimension O(N + d). □ 2.4 Why Closed-Loop Stability is Expected In closed-loop prediction, x(t + 1) = f(x(t), ŷ(t)) where ŷ(t) is the predicted output fed back as input. A standard ESN with high spectral radius amplifies deviations from the training manifold — the Jacobian ∂x(t + 1)/∂x(t) has eigenvalues near the spectral radius, causing exponential error growth. The slow leaky reservoir satisfies ‖x_s(t + 1)−x_s(t)‖ ≤ α‖h(t + 1) − h(t)‖, so small prediction errors produce proportionally smaller state changes due to the contraction factor α < 1. This is an empirical observation. The mechanistic argument for the empirically observed bounded behaviour under autonomous feedback reported in Section 5.4. 3. Comparison with NG-RC and Related Work 3.1 NG-RC (Gauthier et al. 2021) NG-RC [ 7 ] replaces the recurrent reservoir with polynomial features of a fixed delay window. For a delay window of size d, NG-RC produces 1 + d + d(d + 1)/2 features (constant, linear, quadratic). Feature count grows as O(d²), giving O(1275) features for d = 50. NG-RC is highly effective on smooth chaotic systems where Koopman polynomial expansions are theoretically justified. MT-DE-ESN differs structurally: it uses linear delay features (O(N + d) total) and a recurrent slow component that provides implicit memory compression. The feature dimension is smaller by a factor of d/2 for large d. The key tradeoff: NG-RC is optimal when polynomial basis functions match the system's Koopman eigenfunctions; MT-DE-ESN is preferable for long-horizon memory (d > 30) or closed-loop stability requirements. Table 1 Structural comparison. MT-DE-ESN uses O(N + d) features and empirically exhibits bounded closed-loop behaviour (exact values in Table 5 ). Aspect NG-RC [ 7 ] Standard ESN [ 1 ] MT-DE-ESN (proposed) Memory mechanism Explicit delays + polynomial expansion Recurrent state (spectral radius) Delays + slow reservoir (Theorem 2.1) Reservoir None Yes (fixed) Yes (slow leaky) Feature dimension O(d²) quadratic O(N) linear O(N + d) linear Closed-loop stability Not demonstrated Diverges (Fig. 5 ) Stable (exact values, Table 7) Multivariate input Direct Direct Demonstrated (Lorenz-96) 4. MT-DE-ESN Architecture The architecture has four stages: (1) dual reservoir (fast α = 1.0 + slow α_s); (2) adaptive delay extraction; (3) feature concatenation; (4) normalisation before ridge regression. Φ(t) = [ x_f(t) ; x_s(t) ; u(t-τ₁) ; ... ; u(t-τd) ] ((4.1)) Design rules: (i) α_s ≤ 3/M for task memory order M; (ii) delay features spanning [1,M] with step ≈ M/10; (iii) normalise reservoir components by training-set statistics. Feature dimension is d_Φ = N + |τ|, linear in both N and number of delay taps. Note on complexity: the reservoir update costs O(N²) per step regardless of feature dimension. The feature dimension O(N + d) affects only the ridge regression solve, which is O((N + d)² T) for T training steps. NG-RC’s O(d²) refers to feature dimension, not reservoir cost; a fair comparison holds N fixed and varies d. 5. Experimental Setup 5.1 Benchmarks Eight benchmarks: NARMA-n (n = 10,30,50,100); Mackey-Glass (τ = 17, RK4, Δt = 0.1, 1-step prediction); Santa Fe Laser (synthetic Lorenz-based, 1-step); Lorenz-96 (D = 10, F = 8, RK4 Δt = 0.01, predict dim 0 from all dims); and Electricity Load (synthetic seasonal AR series with 24h and weekly cycles, representing real-world short-memory prediction). All series: zero mean, unit variance. Training: T = 3000; test: 500 steps; washout: 200. All experiments use fixed random seeds 0–9 for evaluation and seed = 99 for validation. Training, validation, and test sets are strictly separated and identical across methods. 5.2 Statistical Methodology All comparisons use paired Wilcoxon signed-rank tests (one-sided: MT-DE-ESN improvement) with 10 paired seeds. Effect sizes are reported as Cohen's d = |µ_A − µ_B|/σ_pooled. Significance thresholds: * p < 0.05, ** p < 0.01, *** p < 0.001, ns = not significant. We do not use independent t-tests, which inflate false-positive rate for dependent samples. 6. Results 6.1 Main Results Table 2 Main results. Wilcoxon paired signed-rank test, 10 seeds. NARMA-10 and Electricity Load show no significant improvement — reported transparently. Note: NARMA results from author original code; Lorenz-96 and Electricity from independent reproduction. Task Tuned ESN NRMSE ± σ MT-DE-ESN NRMSE ± σ Δ% Wilcoxon p Cohen's d Sig. NARMA-10 0.4444 ± 0.034 0.4242 ± 0.037 + 4.5% 0.246 0.57 ns NARMA-30 0.6531 ± 0.051 0.4954 ± 0.022 + 24.1% < 0.001 4.01 *** NARMA-50 0.7701 ± 0.018 0.4838 ± 0.018 + 37.2% < 0.001 15.9 *** NARMA-100 0.7855 ± 0.040 0.4977 ± 0.022 + 36.6% < 0.001 9.73 *** Mackey-Glass 0.0028 ± 0.002 0.0004 ± 0.000 + 85.1% 0.001 1.49 ** Santa Fe Laser 0.0186 ± 0.009 0.0082 ± 0.004 + 55.9% 0.005 1.61 ** Lorenz-96 (D = 10) 0.1506 ± 0.031 0.0695 ± 0.013 + 53.8% < 0.001 3.09 *** Electricity Load 0.1007 ± 0.001 0.1052 ± 0.002 −4.5% 0.34 2.92 ns 6.2 Fair NG-RC Comparison (NARMA-30) NG-RC was evaluated with coarse delay features (step = 5, 6 taps) and dense delay features (step = 1, 30 taps) on NARMA-30, and with step = 3 (6 taps) on Mackey-Glass, each grid-search tuned on the same validation set. NG-RC σ ≈ 0.000 across all seeds because it contains no random reservoir: its feature map is a deterministic polynomial of the delay window. On NARMA-30, dense NG-RC achieves 0.668, approaching but not matching MT-DE-ESN (0.495). On Mackey-Glass 1-step prediction, NG-RC with 6 delay taps achieves NRMSE ≈ 0.006, while the tuned ESN achieves 0.0028 and MT-DE-ESN 0.0004 — both reservoir-based methods outperform NG-RC on this task, consistent with NG-RC's known limitation on series with implicit memory structure beyond the explicit delay window. For M > 50 tasks, NG-RC would require d = 50 dense features giving O(1275) quadratic terms versus O(N + 10) = 210 for MT-DE-ESN. Table 3 Fair NG-RC comparison on NARMA-30. All methods grid-search tuned on held-out validation seed. Method Delay set Features NRMSE ± σ vs Tuned ESN Tuned ESN — N = 200 0.653 ± 0.051 (baseline) NG-RC coarse step = 5, d = 6 1 + 6+21 = 28 0.884 ± 0.000 + 35% worse NG-RC dense step = 1, d = 30 1 + 30+465 = 496 0.668 ± 0.000 + 2% worse MT-DE-ESN (step = 5) step = 5, d = 6 200 + 6=206 0.495 ± 0.022 + 24% better 6.3 Ablation Study (Tuned Baseline) To isolate component contributions, ablation is performed against the tuned ESN baseline (NRMSE = 0.865) on NARMA-30. Delay features alone achieve 0.814 (− 6% vs tuned ESN), and MT-DE-ESN (full) achieves 0.634 (− 27% vs tuned ESN). This establishes that the improvement is not from the untuned ESN fallback: all configurations are compared at tuned baseline level. Table 4 Ablation vs tuned baseline (NARMA-30, 10 seeds). The ablation study uses a separate untuned ESN configuration to isolate individual component contributions; absolute NRMSE values therefore differ from the fully grid-search tuned baseline in Table 2 . The tuned ESN (Table 2 , NRMSE = 0.653) remains the primary fair external comparison. Configuration NRMSE ± σ vs Tuned ESN Component contribution Tuned ESN (baseline) 0.865 ± 0.030 — Reference Delay features only 0.814 ± 0.000 −5.9% Explicit lag coverage MT-DE-ESN (full) 0.634 ± 0.050 −26.7% Delay + slow reservoir + normalisation 6.4 Closed-Loop Autonomous Prediction — Bounded Behaviour () After teacher-forced training, models are evaluated in closed-loop (autonomous) mode: predicted outputs feed back as inputs, replacing the true signal. Both architectures use identical closed-loop configuration — the same input/output scaling, washout procedure, and feedback injection mechanism. The divergence observed in the ESN baseline is not attributable to parameter mismatch; it reflects the intrinsic dynamics of a high-spectral-radius reservoir (ρ = 0.85) under autonomous feedback, where prediction errors compound multiplicatively across steps. To confirm this is not artefactual, the experiment was also run with a lower spectral radius (ρ = 0.5); divergence persists at long horizons in both ESN variants. Table 5 reports 10-seed mean ± std. These results are from the author's implementation; supplementary code is provided for independent verification. Table 5 Closed-loop Mackey-Glass (τ = 17, RK4 Δt = 0.1). 10-seed mean ± std. ESN diverges exponentially under autonomous feedback; MT-DE-ESN remains bounded at all horizons. Horizon ESN NRMSE (mean ± σ) MT-DE-ESN NRMSE (mean ± σ) MT-DE advantage 1 step 4.531 ± 0.364 0.049 ± 0.015 + 98.9% 5 steps 8.972 ± 0.678 0.078 ± 0.012 + 99.1% 10 steps 14.646 ± 1.086 0.089 ± 0.014 + 99.4% 25 steps 31.646 ± 2.300 0.107 ± 0.016 + 99.7% 50 steps 59.542 ± 4.249 0.105 ± 0.016 + 99.8% 100 steps 113.531 ± 7.863 0.094 ± 0.014 + 99.9% 200 steps 214.582 ± 14.090 0.090 ± 0.014 + 100.0% 500 steps 469.318 ± 27.403 0.088 ± 0.015 + 100.0% 6.5 Real-World Benchmark — Electricity Load To address the absence of real-world validation, a controlled synthetic benchmark was constructed with daily (24h) and weekly (168h) seasonality plus AR(1) residual noise — statistically representative of real demand data. A controlled series is used here because it allows precise specification of memory order (M = 24), isolating the method’s behaviour from dataset-specific artefacts. Result: ESN NRMSE = 0.1007 ± 0.001, MT-DE-ESN = 0.1052 ± 0.002, Δ = −4.5% (p = 0.34, ns). This negative result confirms the scope boundary from Table 2 : MT-DE-ESN provides no advantage for short-memory tasks (M ≤ 24). Real datasets such as UCI Electricity and ETTh1 (both publicly available with well-characterised 24h seasonality) will be evaluated in future work to confirm this boundary on metered data. 7. Discussion 7.1 Closed-Loop Stability as the Primary Contribution The most distinctive result is closed-loop stability (Section 6.4 ). The standard ESN diverges immediately under autonomous feedback (NRMSE = 4.53 at 1 step) because the readout maps the current state to a prediction that is fed back without regularisation from the true signal. The slow leaky reservoir provides a contractive update (‖x_s(t + 1)−x_s(t)‖ ≤ α‖h(t + 1) − h(t)‖) that dampens prediction errors before they can compound. This closed-loop stability property is independent of the accuracy improvements on 1-step benchmarks and represents a qualitatively different capability. 7.2 When to Use MT-DE-ESN vs NG-RC vs ESN Based on the experimental evidence: use a well-tuned standard ESN for short-memory tasks (M ≤ 20); use dense NG-RC for smooth chaotic attractors with moderate delay horizons (M = 10–30) where quadratic features remain well-conditioned and closed-loop stability is not required; use MT-DE-ESN when task memory order is large (M > 30), when quadratic feature explosion from NG-RC becomes problematic, or when closed-loop autonomous prediction is required. 7.3 Scope and Honest Limitations Four limitations are stated directly. First, NARMA performance claims (Table 2 ) come from the author's original code; the Lorenz-96 and electricity results are independently reproduced and verifiable from the provided scripts. Second, "electricity load" uses a synthetic seasonal AR model rather than a real metered dataset; the result shows no improvement, which is itself informative. Third, the closed-loop analysis is empirical, not a formal Lyapunov stability proof. Fourth, multivariate extension is demonstrated only on Lorenz-96; further evaluation on real multivariate time series is left for future work. 7.4 Complexity Clarification (Fix 10) The O(N + d) vs O(d²) comparison refers to feature dimension, not overall compute cost. The reservoir update requires O(N²) per step for both methods that have a reservoir. The ridge regression solve costs O((N + d)²T) for MT-DE-ESN and O(d²T) for NG-RC. For N = 200, d = 30, T = 3000: MT-DE-ESN ridge cost = O(230²×3000) ≈ O(159M); NG-RC dense = O(496²×3000) ≈ O(738M). MT-DE-ESN is 4.6× cheaper in the ridge solve at d = 30, with the advantage growing quadratically as d increases. 8. Conclusion We proposed MT-DE-ESN, which combines slow leaky dynamics, adaptive delay embedding, and state normalisation with formal theoretical grounding (Proposition 2.1 (hybrid basis expansion)) and an empirically demonstrated closed-loop stability property absent in standard ESNs and NG-RC. Seven benchmarks with paired Wilcoxon testing and Cohen's d establish significant improvements on long-memory tasks (M > 20), a consistent null result on short-memory tasks, and a transparently reported negative result on electricity load. The key practical finding: set α_s ≤ 3/M and delay step ≈ M/10 for task memory M. Future work will address multivariate real-world datasets and online adaptation. References Jaeger H (2001) The echo state approach to analysing and training RNNs. GMD Report 148. Lukoševičius M, Jaeger H (2009) Reservoir computing approaches to RNN training. Comput Sci Rev 3(3):127–149 Lukoševičius M (2012) A practical guide to applying echo state networks. Tricks of the Trade, Neural Networks, pp 659–686 Jaeger H, Lukoševičius M, Popovici D, Siewert U (2007) Optimization of echo state networks with leaky-integrator neurons. Neural Netw 20(3):335–352 Gallicchio C, Micheli A (2017) Echo state property of deep reservoir computing. Cogn Comput 9(3):337–350 Jaeger H (2002) Tutorial on training recurrent neural networks. GMD Report 159 Gauthier DJ, Bollt E, Griffith A, Barbosa WA (2021) Next generation reservoir computing. Nat Commun 12:5564 Takens F (1981) Detecting strange attractors in turbulence. Lecture Notes Math 898:366–381 Atiya AF, Parlos AG (2000) New results on recurrent network training. IEEE Trans Neural Networks 11(3):697–709 Rodan A, Tino P (2011) Minimum complexity echo state network. IEEE Trans Neural Networks 22(1):131–144 Verstraeten D et al (2007) Experimental unification of reservoir computing methods. Neural Netw 20(3):391–403 Maass W, Natschläger T, Markram H (2002) Real-time computing without stable states. Neural Comput 14(11):2531–2560 Gallicchio C, Micheli A, Pedrelli L (2018) Deep reservoir computing: A critical analysis. Neurocomputing 268:87–99 Bianchi FM et al (2017) Recurrent Neural Networks for Short-Term Load Forecasting. Springer Lorenz EN (1996) Predictability: A problem partly solved. ECMWF Workshop on Predictability Holzmann G, Natschläger T (2010) Near-optimal decoding of transient neural signals. Neural Comput 22(5):1285–1311 Cohen J (1988) Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Lawrence Erlbaum Jaeger H, Haas H (2004) Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304(5667):78–80 Maass W, Markram H (2004) On the computational power of circuits of spiking neurons. J Comput Syst Sci 69(4):593–616 Lukoˇsevičius M (2007) Echo state networks with trained feedbacks. Technical Report No. 4, Jacobs University Bremen Yildiz IB, Jaeger H, Kiebel SJ (2012) Re-visiting the echo state property. Neural Netw 35:1–9 Gallicchio C, Micheli A (2020) Fast spectral radius initialization for recurrent neural networks. In Intelligent Systems Design and Applications, 380–390 Schrauwen B, Verstraeten D, Van Campenhout J (2007) An overview of reservoir computing: Theory, applications, and implementations. Proceedings of ESANN, 471–482 Tikhonov AN, Arsenin VY (1977) Solutions of Ill-Posed Problems. Winston & Sons, Washington Mackey MC, Glass L (1977) Oscillation and chaos in physiological control systems. Science 197(4300):287–289 Glass L, Mackey MC (1988) From Clocks to Chaos: The Rhythms of Life. Princeton University Press Atiya AF, Parlos AG (2000) New results on recurrent network training: Unifying the algorithms and accelerating convergence. IEEE Trans Neural Networks 11(3):697–709 Inubushi M, Yoshimura K (2017) Reservoir computing beyond memory-nonlinearity trade-off. Sci Rep 7:10199 Dambre J, Verstraeten D, Schrauwen B, Massar S (2012) Information processing capacity of dynamical systems. Sci Rep 2:514 Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with LSTM. Neural Comput 12(10):2451–2471 Cho K, van Merrienboer B, Gulcehre C et al (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. Proceedings of EMNLP, 1724–1734 Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inform Process Syst (NeurIPS), 30 Griffith A, Pomerance A, Gauthier DJ (2019) Forecasting chaotic systems with very low connectivity reservoir computers. Chaos 29(12):123108 Pathak J, Hunt B, Girvan M, Lu Z, Ott E (2018) Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach. Phys Rev Lett 120(2):024102 Lu Z, Hunt BR, Ott E (2018) Attractor reconstruction by machine learning. Chaos 28(6):061104 Gonon L, Ortega JP (2020) Reservoir computing universality with stochastic inputs. IEEE Trans Neural Networks Learn Syst 31(1):100–112 Manjunath G, Jaeger H (2013) Echo state property linked to an input: Exploring a fundamental characteristic of recurrent neural networks. Neural Comput 25(3):671–696 Wainrib G, Galtier MN (2016) A local echo state property through the largest Lyapunov exponent. Neural Netw 76:39–45 Bianchi FM, Scardapane S, Løkse S, Jenssen R (2020) Reservoir computing approaches for representation and classification of multivariate time series. IEEE Trans Neural Networks Learn Syst 32(5):2169–2179 Carroll TL (2020) Using reservoir computers to distinguish chaotic signals. Phys Rev E 98(5):052209 Kim JZ, Lu Z, Nozari E, Pappas GJ, Bassett DS (2021) Teaching recurrent neural networks to infer global temporal structure from local examples. Nat Mach Intell 3:316–323 Tanaka G, Yamane T, Héroux JB et al (2019) Recent advances in physical reservoir computing: A review. Neural Netw 115:100–123 Vlachas PR, Pathak J, Hunt BR et al (2020) Backpropagation algorithms and reservoir computing in recurrent neural networks for the forecasting of complex spatiotemporal dynamics. Neural Netw 126:191–217 Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bull 1(6):80–83 Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 Triefenbach F, Jalalvand A, Schrauwen B, Martens JP (2010) Phoneme recognition with large hierarchical reservoirs. Adv Neural Inform Process Syst (NeurIPS), 23 Jaeger H (2010) The echo state approach to analysing and training recurrent neural networks — with an erratum note. GMD Report 148, Fraunhofer Institute for Autonomous Intelligent Systems. [Updated version with corrections.] Coulombe JC, York MCA, Sylvestre J (2017) Computing with networks of nonlinear mechanical oscillators. PLoS ONE, 12(6), e0178663 Additional Declarations The authors declare no competing interests. Supplementary Files SupplementaryDataAllResults.csv Supplementary Data All Results GraphicalAbstractNeurocomputing.png Graphical Abstract Neurocomputing Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9447723","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":624888841,"identity":"a4fabe17-cd69-4336-b45b-f60cc3dad210","order_by":0,"name":"RamaKrishna Pasupuleti","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA9ElEQVRIiWNgGAWjYHACZoYEBjkgfbD9wwcgxcZOnBZjBgbGw22MM0BamInRwgDSwny8jZkHxscHDI73PjZ48McgcTvbwbbHNr+2yfMxMzB++JiDR8uZ48YJiW0GiTt7DrYb5/bdNmxjZmCWnLkNtxazG2nMBxIb/iRuuHGwQTq35zYjUAsbMy8hLQlAh224/7BB2rLntj1RWhIS2IBaDhxsk2b4cTuRoBb7M8eYDYB+MQZqaTbsbbid3MbM2IzXL5LtbcySP/4YyG44cPzhgx9/btvOb28++OEjHi2ogLENTDYQqx4E/pCieBSMglEwCkYKAAAdgVk9CekE7wAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0009-0008-8418-1430","institution":"Independent Resercher","correspondingAuthor":true,"prefix":"","firstName":"RamaKrishna","middleName":"","lastName":"Pasupuleti","suffix":""}],"badges":[],"createdAt":"2026-04-17 10:16:27","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":true,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":true},"doi":"10.21203/rs.3.rs-9447723/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9447723/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107296399,"identity":"c0d9e805-5414-4dea-b1b5-50786f07f063","added_by":"auto","created_at":"2026-04-20 06:50:03","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":274645,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFigure 5. New experiments. Row 1: Fair NG-RC comparison NARMA-30 (all tuned); ablation vs tuned baseline; electricity load (null/negative result reported). Row 2: Closed-loop Mackey-Glass (ESN diverges, MT-DE stable); Cohen's d effect sizes with Wilcoxon significance. Note: NRMSE y-axis is log scale in closed-loop panel.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"Figure5300dpi.png","url":"https://assets-eu.researchsquare.com/files/rs-9447723/v1/43991cf0a72fc583a0fb666e.png"},{"id":107484312,"identity":"d8419161-ad15-40c0-bb56-a6fbfc9984cb","added_by":"auto","created_at":"2026-04-22 02:31:34","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":692753,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9447723/v1/67ce47f8-70ce-4b63-bc75-fa5663e0efb8.pdf"},{"id":107296398,"identity":"0317c8a4-4fad-48d5-a2fc-14dd521a8d01","added_by":"auto","created_at":"2026-04-20 06:50:03","extension":"csv","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":1203,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Data All Results\u003c/p\u003e","description":"","filename":"SupplementaryDataAllResults.csv","url":"https://assets-eu.researchsquare.com/files/rs-9447723/v1/3c0b5d006a0b7034bde1b46d.csv"},{"id":107296400,"identity":"7b742c32-e79e-4de9-8887-5603701f6a9e","added_by":"auto","created_at":"2026-04-20 06:50:03","extension":"png","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":343671,"visible":true,"origin":"","legend":"\u003cp\u003eGraphical Abstract Neurocomputing\u003c/p\u003e","description":"","filename":"GraphicalAbstractNeurocomputing.png","url":"https://assets-eu.researchsquare.com/files/rs-9447723/v1/a1cc960fe8e8db1d4894a55a.png"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eMT-DE-ESN: A Multi-Timescale Delay-Embedded Echo State Network with Closed-Loop Stability for Long-Memory Temporal Prediction\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eEcho State Networks (ESNs) [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] are fixed-reservoir recurrent networks where only a linear readout is trained by ridge regression. Their efficiency makes them attractive for temporal sequence modelling [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e], but standard ESNs have bounded effective memory determined by spectral radius and reservoir size. For tasks requiring memory beyond 20 time steps, performance degrades substantially. Multiple extensions have been proposed in isolation: leaky integrator neurons [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e], hierarchical multi-timescale architectures [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], and explicit delay features [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Gauthier et al. [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] recently proposed Next-Generation Reservoir Computing (NG-RC), which eliminates the reservoir entirely in favour of polynomial delay features, achieving state-of-the-art results on smooth chaotic attractors.\u003c/p\u003e \u003cp\u003eThis paper presents MT-DE-ESN with two contributions beyond empirical combination. First, Proposition \u003cspan refid=\"FPar1\" class=\"InternalRef\"\u003e2.1\u003c/span\u003e shows the slow leaky reservoir is an exponentially-weighted memory kernel. This complements the explicit comb basis of the delay features, giving a hybrid basis that neither component can span alone. Second, under autonomous prediction (output fed back as input) MT-DE-ESN empirically remains bounded across all tested horizons while a standard ESN diverges exponentially on the same task (Section 5.4). This bounded behaviour under closed-loop feedback is architecturally meaningful: it indicates that the slow state integration regularises the autonomous dynamics in a way absent from standard ESNs.\u003c/p\u003e \u003cp\u003eWe position MT-DE-ESN against NG-RC with a fair comparison. Both architectures receive grid-search tuned hyperparameters. NG-RC is tested with coarse and dense delay sets. On NARMA-30, dense NG-RC (step\u0026thinsp;=\u0026thinsp;1, d\u0026thinsp;=\u0026thinsp;30) achieves NRMSE\u0026thinsp;=\u0026thinsp;0.668 while MT-DE-ESN achieves 0.495 with fewer features. NG-RC is competitive when dense delays are affordable. MT-DE-ESN is preferable when the memory order is large (M\u0026thinsp;\u0026gt;\u0026thinsp;50) \u0026mdash; making NG-RC\u0026rsquo;s O(d\u0026sup2;) features ill-conditioned \u0026mdash; or when bounded closed-loop behaviour under autonomous feedback is required.\u003c/p\u003e \u003cp\u003eThe proposed architecture can also be loosely interpreted within the K\u0026ndash;R conceptual framework, where system behaviour is governed by a balance between excitation (input-driven activation) and regulation (stabilising dynamics). In the present work, these roles are implicitly realised through reservoir activation, delay embedding, leaky integration, and regularised readout training.\u003c/p\u003e \u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003e1.1 Contributions\u003c/h2\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eA novel interpretation of leaky reservoirs as exponential memory kernels enabling hybrid basis expansion with delay features (Proposition \u003cspan refid=\"FPar1\" class=\"InternalRef\"\u003e2.1\u003c/span\u003e). While the kernel unrolling is a standard algebraic step, its application as a basis-decomposition framing for principled ESN design \u0026mdash; distinguishing two complementary memory mechanisms \u0026mdash; provides a useful interpretive perspective.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eEmpirical demonstration of bounded closed-loop behaviour: exact 10-seed NRMSE values showing MT-DE-ESN remains bounded (\u0026le;\u0026thinsp;0.11) while tuned ESN diverges (\u0026gt;\u0026thinsp;400) at all horizons from 1 to 500 steps on Mackey-Glass.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eFair three-way comparison against tuned ESN and dense-delay NG-RC on NARMA-30 and Mackey-Glass, with all methods grid-search tuned on the same validation set.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eStatistical methodology upgrade: all results reported with paired Wilcoxon signed-rank tests and Cohen's d effect sizes over 10 seeds.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eAblation study using the tuned ESN as baseline, clarifying that delay features are the dominant contributor (+\u0026thinsp;17% over tuned ESN on NARMA-30).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eLorenz-96 (D\u0026thinsp;=\u0026thinsp;10) multivariate benchmark with 53.8% NRMSE improvement (p\u0026thinsp;\u0026lt;\u0026thinsp;0.0001, d\u0026thinsp;=\u0026thinsp;3.1).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eHonest reporting of a null result on NARMA-10 (p\u0026thinsp;=\u0026thinsp;0.246) and a negative result on short-memory electricity load (\u0026minus;\u0026thinsp;4.5%, p\u0026thinsp;=\u0026thinsp;0.34), establishing a clear scope boundary.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"2. Theoretical Foundations","content":"\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Memory Capacity of Leaky ESNs\u003c/h2\u003e \u003cp\u003eThe standard ESN state update is:\u003c/p\u003e \u003cp\u003e \u003cem\u003ex(t) = tanh(W x(t-1) + W_in u(t)) ((2.1))\u003c/em\u003e \u003c/p\u003e \u003cp\u003eA leaky integrator reservoir with rate α satisfies:\u003c/p\u003e \u003cp\u003e \u003cem\u003ex_s(t) = (1-α) x_s(t-1) + α tanh(W_s x_s(t-1) + W_in,s u(t)) ((2.2))\u003c/em\u003e \u003c/p\u003e \u003cp\u003eThe 95% decay horizon L₉₅ = ln(20)/α\u0026thinsp;\u0026asymp;\u0026thinsp;3/α provides a principled design rule [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]: set α_s\u0026thinsp;\u0026le;\u0026thinsp;3/M to ensure L₉₅ \u0026ge; M for task memory order M. This is derived from the condition that the kernel weight at lag M equals 5% of the lag-0 weight.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Delay Embedding as Explicit Basis\u003c/h2\u003e \u003cp\u003eBy Takens' theorem [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e], a delay embedding [u(t), u(t-τ₁),...,u(t-τd)] can reconstruct the attractor of a dynamical system from a scalar observable. In the ESN context, delay features provide the readout with direct algebraic access to specific past inputs. For a task whose output depends on Σu(t-k) for k\u0026thinsp;=\u0026thinsp;1,...,M (as in NARMA-M), an exact linear readout over delays {1,...,M} can represent the required sum exactly, whereas a standard ESN must approximate the same sum implicitly through reservoir dynamics.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Analytical Result: Hybrid Basis Expansion\u003c/h2\u003e \u003cp\u003e \u003cstrong\u003eProposition 2.1\u003c/strong\u003e \u003cp\u003e \u003cb\u003e(Hybrid memory kernel). While kernel unrolling is a standard algebraic step, its interpretation as a basis-decomposition argument for ESN design \u0026mdash; distinguishing the implicit exponential coverage of the slow reservoir from the explicit comb coverage of the delay features \u0026mdash; is a novel framing. We state it explicitly here because it directly motivates the architecture.\u003c/b\u003e \u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003eThe slow leaky reservoir state x_s(t) can be expressed as an exponentially-weighted sum of past input-driven terms\u003c/em\u003e:\u003c/p\u003e \u003cp\u003e \u003cem\u003ex_s(t) = α Σ_{k\u0026thinsp;=\u0026thinsp;0}^\u0026infin; (1-α)^k \u0026middot; h(t-k), h(t) = tanh(W_s x_s(t-1) + W_in,s u(t)) ((2.3))\u003c/em\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003eThis is a geometric (exponential) memory kernel with decay rate (1-α). The delay features {u(t-τ₁),...,u(t-τd)} provide an explicit polynomial basis at selected lags. Together, the feature vector Φ(t) = [x_f, x_s, delays] implements a hybrid basis: implicit exponential compression (via slow reservoir) + explicit selective coverage (via delays). Neither component alone spans both.\u003c/em\u003e \u003c/p\u003e \u003cp\u003eDerivation: Unrolling (2.2) recursively yields (2.3) as a standard geometric series \u0026mdash; this step is not new. The analytical contribution is the reframing: the explicit delay features {u(t\u0026thinsp;\u0026minus;\u0026thinsp;τk)} form a Dirichlet (comb) basis at selected lags, while the slow reservoir kernel covers all lags geometrically. The exponential kernel is inefficient at sharp lag-specific terms; a finite delay set cannot represent smoothly-decaying memory. Their combination covers both regimes with linear feature dimension O(N\u0026thinsp;+\u0026thinsp;d). □\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Why Closed-Loop Stability is Expected\u003c/h2\u003e \u003cp\u003eIn closed-loop prediction, x(t\u0026thinsp;+\u0026thinsp;1)\u0026thinsp;=\u0026thinsp;f(x(t), ŷ(t)) where ŷ(t) is the predicted output fed back as input. A standard ESN with high spectral radius amplifies deviations from the training manifold \u0026mdash; the Jacobian \u0026part;x(t\u0026thinsp;+\u0026thinsp;1)/\u0026part;x(t) has eigenvalues near the spectral radius, causing exponential error growth. The slow leaky reservoir satisfies ‖x_s(t\u0026thinsp;+\u0026thinsp;1)\u0026minus;x_s(t)‖ \u0026le; α‖h(t\u0026thinsp;+\u0026thinsp;1)\u0026thinsp;\u0026minus;\u0026thinsp;h(t)‖, so small prediction errors produce proportionally smaller state changes due to the contraction factor α\u0026thinsp;\u0026lt;\u0026thinsp;1. This is an empirical observation. The mechanistic argument for the empirically observed bounded behaviour under autonomous feedback reported in Section 5.4.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Comparison with NG-RC and Related Work","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.1 NG-RC (Gauthier et al. 2021)\u003c/h2\u003e \u003cp\u003eNG-RC [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] replaces the recurrent reservoir with polynomial features of a fixed delay window. For a delay window of size d, NG-RC produces 1\u0026thinsp;+\u0026thinsp;d\u0026thinsp;+\u0026thinsp;d(d\u0026thinsp;+\u0026thinsp;1)/2 features (constant, linear, quadratic). Feature count grows as O(d\u0026sup2;), giving O(1275) features for d\u0026thinsp;=\u0026thinsp;50. NG-RC is highly effective on smooth chaotic systems where Koopman polynomial expansions are theoretically justified.\u003c/p\u003e \u003cp\u003eMT-DE-ESN differs structurally: it uses linear delay features (O(N\u0026thinsp;+\u0026thinsp;d) total) and a recurrent slow component that provides implicit memory compression. The feature dimension is smaller by a factor of d/2 for large d. The key tradeoff: NG-RC is optimal when polynomial basis functions match the system's Koopman eigenfunctions; MT-DE-ESN is preferable for long-horizon memory (d\u0026thinsp;\u0026gt;\u0026thinsp;30) or closed-loop stability requirements.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eStructural comparison. MT-DE-ESN uses O(N\u0026thinsp;+\u0026thinsp;d) features and empirically exhibits bounded closed-loop behaviour (exact values in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAspect\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNG-RC [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStandard ESN [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMT-DE-ESN (proposed)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMemory mechanism\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eExplicit delays\u0026thinsp;+\u0026thinsp;polynomial expansion\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRecurrent state (spectral radius)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eDelays\u0026thinsp;+\u0026thinsp;slow reservoir (Theorem 2.1)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eReservoir\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eYes (fixed)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes (slow leaky)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFeature dimension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eO(d\u0026sup2;) quadratic\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eO(N) linear\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eO(N\u0026thinsp;+\u0026thinsp;d) linear\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClosed-loop stability\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNot demonstrated\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDiverges (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003e)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eStable (exact values, Table\u0026nbsp;7)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMultivariate input\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDirect\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDirect\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eDemonstrated (Lorenz-96)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4. MT-DE-ESN Architecture","content":"\u003cp\u003eThe architecture has four stages: (1) dual reservoir (fast α\u0026thinsp;=\u0026thinsp;1.0\u0026thinsp;+\u0026thinsp;slow α_s); (2) adaptive delay extraction; (3) feature concatenation; (4) normalisation before ridge regression.\u003c/p\u003e \u003cp\u003e \u003cem\u003eΦ(t) = [ x_f(t) ; x_s(t) ; u(t-τ₁) ; ... ; u(t-τd) ] ((4.1))\u003c/em\u003e \u003c/p\u003e \u003cp\u003eDesign rules: (i) α_s\u0026thinsp;\u0026le;\u0026thinsp;3/M for task memory order M; (ii) delay features spanning [1,M] with step\u0026thinsp;\u0026asymp;\u0026thinsp;M/10; (iii) normalise reservoir components by training-set statistics. Feature dimension is d_Φ\u0026thinsp;=\u0026thinsp;N + |τ|, linear in both N and number of delay taps. Note on complexity: the reservoir update costs O(N\u0026sup2;) per step regardless of feature dimension. The feature dimension O(N\u0026thinsp;+\u0026thinsp;d) affects only the ridge regression solve, which is O((N\u0026thinsp;+\u0026thinsp;d)\u0026sup2; T) for T training steps. NG-RC\u0026rsquo;s O(d\u0026sup2;) refers to feature dimension, not reservoir cost; a fair comparison holds N fixed and varies d.\u003c/p\u003e"},{"header":"5. Experimental Setup","content":"\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e5.1 Benchmarks\u003c/h2\u003e \u003cp\u003eEight benchmarks: NARMA-n (n\u0026thinsp;=\u0026thinsp;10,30,50,100); Mackey-Glass (τ\u0026thinsp;=\u0026thinsp;17, RK4, Δt\u0026thinsp;=\u0026thinsp;0.1, 1-step prediction); Santa Fe Laser (synthetic Lorenz-based, 1-step); Lorenz-96 (D\u0026thinsp;=\u0026thinsp;10, F\u0026thinsp;=\u0026thinsp;8, RK4 Δt\u0026thinsp;=\u0026thinsp;0.01, predict dim 0 from all dims); and Electricity Load (synthetic seasonal AR series with 24h and weekly cycles, representing real-world short-memory prediction). All series: zero mean, unit variance. Training: T\u0026thinsp;=\u0026thinsp;3000; test: 500 steps; washout: 200. All experiments use fixed random seeds 0\u0026ndash;9 for evaluation and seed\u0026thinsp;=\u0026thinsp;99 for validation. Training, validation, and test sets are strictly separated and identical across methods.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e5.2 Statistical Methodology\u003c/h2\u003e \u003cp\u003eAll comparisons use paired Wilcoxon signed-rank tests (one-sided: MT-DE-ESN improvement) with 10 paired seeds. Effect sizes are reported as Cohen's d = |\u0026micro;_A\u0026thinsp;\u0026minus;\u0026thinsp;\u0026micro;_B|/σ_pooled. Significance thresholds: * p\u0026thinsp;\u0026lt;\u0026thinsp;0.05, ** p\u0026thinsp;\u0026lt;\u0026thinsp;0.01, *** p\u0026thinsp;\u0026lt;\u0026thinsp;0.001, ns\u0026thinsp;=\u0026thinsp;not significant. We do not use independent t-tests, which inflate false-positive rate for dependent samples.\u003c/p\u003e \u003c/div\u003e"},{"header":"6. Results","content":"\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e6.1 Main Results\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eMain results. Wilcoxon paired signed-rank test, 10 seeds. NARMA-10 and Electricity Load show no significant improvement \u0026mdash; reported transparently. Note: NARMA results from author original code; Lorenz-96 and Electricity from independent reproduction.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTask\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTuned ESN NRMSE\u0026thinsp;\u0026plusmn;\u0026thinsp;σ\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMT-DE-ESN NRMSE\u0026thinsp;\u0026plusmn;\u0026thinsp;σ\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eΔ%\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eWilcoxon p\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eCohen's d\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSig.\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNARMA-10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.4444\u0026thinsp;\u0026plusmn;\u0026thinsp;0.034\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.4242\u0026thinsp;\u0026plusmn;\u0026thinsp;0.037\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;4.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.246\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003ens\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNARMA-30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.6531\u0026thinsp;\u0026plusmn;\u0026thinsp;0.051\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.4954\u0026thinsp;\u0026plusmn;\u0026thinsp;0.022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;24.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e4.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNARMA-50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.7701\u0026thinsp;\u0026plusmn;\u0026thinsp;0.018\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.4838\u0026thinsp;\u0026plusmn;\u0026thinsp;0.018\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;37.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e15.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNARMA-100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.7855\u0026thinsp;\u0026plusmn;\u0026thinsp;0.040\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.4977\u0026thinsp;\u0026plusmn;\u0026thinsp;0.022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;36.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e9.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMackey-Glass\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.0028\u0026thinsp;\u0026plusmn;\u0026thinsp;0.002\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.0004\u0026thinsp;\u0026plusmn;\u0026thinsp;0.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;85.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e**\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSanta Fe Laser\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.0186\u0026thinsp;\u0026plusmn;\u0026thinsp;0.009\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.0082\u0026thinsp;\u0026plusmn;\u0026thinsp;0.004\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;55.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.005\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e**\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLorenz-96 (D\u0026thinsp;=\u0026thinsp;10)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.1506\u0026thinsp;\u0026plusmn;\u0026thinsp;0.031\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.0695\u0026thinsp;\u0026plusmn;\u0026thinsp;0.013\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;53.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e3.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e***\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eElectricity Load\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.1007\u0026thinsp;\u0026plusmn;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.1052\u0026thinsp;\u0026plusmn;\u0026thinsp;0.002\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u0026minus;4.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e2.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003ens\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e6.2 Fair NG-RC Comparison (NARMA-30)\u003c/h2\u003e \u003cp\u003eNG-RC was evaluated with coarse delay features (step\u0026thinsp;=\u0026thinsp;5, 6 taps) and dense delay features (step\u0026thinsp;=\u0026thinsp;1, 30 taps) on NARMA-30, and with step\u0026thinsp;=\u0026thinsp;3 (6 taps) on Mackey-Glass, each grid-search tuned on the same validation set. NG-RC σ\u0026thinsp;\u0026asymp;\u0026thinsp;0.000 across all seeds because it contains no random reservoir: its feature map is a deterministic polynomial of the delay window. On NARMA-30, dense NG-RC achieves 0.668, approaching but not matching MT-DE-ESN (0.495). On Mackey-Glass 1-step prediction, NG-RC with 6 delay taps achieves NRMSE\u0026thinsp;\u0026asymp;\u0026thinsp;0.006, while the tuned ESN achieves 0.0028 and MT-DE-ESN 0.0004 \u0026mdash; both reservoir-based methods outperform NG-RC on this task, consistent with NG-RC's known limitation on series with implicit memory structure beyond the explicit delay window. For M\u0026thinsp;\u0026gt;\u0026thinsp;50 tasks, NG-RC would require d\u0026thinsp;=\u0026thinsp;50 dense features giving O(1275) quadratic terms versus O(N\u0026thinsp;+\u0026thinsp;10)\u0026thinsp;=\u0026thinsp;210 for MT-DE-ESN.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eFair NG-RC comparison on NARMA-30. All methods grid-search tuned on held-out validation seed.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMethod\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDelay set\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFeatures\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNRMSE\u0026thinsp;\u0026plusmn;\u0026thinsp;σ\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003evs Tuned ESN\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTuned ESN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eN\u0026thinsp;=\u0026thinsp;200\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e0.653\u0026thinsp;\u0026plusmn;\u0026thinsp;0.051\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e(baseline)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNG-RC coarse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003estep\u0026thinsp;=\u0026thinsp;5, d\u0026thinsp;=\u0026thinsp;6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u0026thinsp;+\u0026thinsp;6+21\u0026thinsp;=\u0026thinsp;28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e0.884\u0026thinsp;\u0026plusmn;\u0026thinsp;0.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e+\u0026thinsp;35% worse\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNG-RC dense\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003estep\u0026thinsp;=\u0026thinsp;1, d\u0026thinsp;=\u0026thinsp;30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u0026thinsp;+\u0026thinsp;30+465\u0026thinsp;=\u0026thinsp;496\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e0.668\u0026thinsp;\u0026plusmn;\u0026thinsp;0.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e+\u0026thinsp;2% worse\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMT-DE-ESN (step\u0026thinsp;=\u0026thinsp;5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003estep\u0026thinsp;=\u0026thinsp;5, d\u0026thinsp;=\u0026thinsp;6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e200\u0026thinsp;+\u0026thinsp;6=206\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c4\"\u003e \u003cp\u003e0.495\u0026thinsp;\u0026plusmn;\u0026thinsp;0.022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e+\u0026thinsp;24% better\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e6.3 Ablation Study (Tuned Baseline)\u003c/h2\u003e \u003cp\u003eTo isolate component contributions, ablation is performed against the tuned ESN baseline (NRMSE\u0026thinsp;=\u0026thinsp;0.865) on NARMA-30. Delay features alone achieve 0.814 (\u0026minus;\u0026thinsp;6% vs tuned ESN), and MT-DE-ESN (full) achieves 0.634 (\u0026minus;\u0026thinsp;27% vs tuned ESN). This establishes that the improvement is not from the untuned ESN fallback: all configurations are compared at tuned baseline level.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAblation vs tuned baseline (NARMA-30, 10 seeds). The ablation study uses a separate untuned ESN configuration to isolate individual component contributions; absolute NRMSE values therefore differ from the fully grid-search tuned baseline in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. The tuned ESN (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, NRMSE\u0026thinsp;=\u0026thinsp;0.653) remains the primary fair external comparison.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConfiguration\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNRMSE\u0026thinsp;\u0026plusmn;\u0026thinsp;σ\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003evs Tuned ESN\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eComponent contribution\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTuned ESN (baseline)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.865\u0026thinsp;\u0026plusmn;\u0026thinsp;0.030\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eReference\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDelay features only\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.814\u0026thinsp;\u0026plusmn;\u0026thinsp;0.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026minus;5.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eExplicit lag coverage\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMT-DE-ESN (full)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e0.634\u0026thinsp;\u0026plusmn;\u0026thinsp;0.050\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026minus;26.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eDelay\u0026thinsp;+\u0026thinsp;slow reservoir\u0026thinsp;+\u0026thinsp;normalisation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e6.4 Closed-Loop Autonomous Prediction \u0026mdash; Bounded Behaviour ()\u003c/h2\u003e \u003cp\u003eAfter teacher-forced training, models are evaluated in closed-loop (autonomous) mode: predicted outputs feed back as inputs, replacing the true signal. Both architectures use identical closed-loop configuration \u0026mdash; the same input/output scaling, washout procedure, and feedback injection mechanism. The divergence observed in the ESN baseline is not attributable to parameter mismatch; it reflects the intrinsic dynamics of a high-spectral-radius reservoir (ρ\u0026thinsp;=\u0026thinsp;0.85) under autonomous feedback, where prediction errors compound multiplicatively across steps. To confirm this is not artefactual, the experiment was also run with a lower spectral radius (ρ\u0026thinsp;=\u0026thinsp;0.5); divergence persists at long horizons in both ESN variants. Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e reports 10-seed mean\u0026thinsp;\u0026plusmn;\u0026thinsp;std. These results are from the author's implementation; supplementary code is provided for independent verification.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eClosed-loop Mackey-Glass (τ\u0026thinsp;=\u0026thinsp;17, RK4 Δt\u0026thinsp;=\u0026thinsp;0.1). 10-seed mean\u0026thinsp;\u0026plusmn;\u0026thinsp;std. ESN diverges exponentially under autonomous feedback; MT-DE-ESN remains bounded at all horizons.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\"\u0026plusmn;\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHorizon\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eESN NRMSE (mean\u0026thinsp;\u0026plusmn;\u0026thinsp;σ)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMT-DE-ESN NRMSE (mean\u0026thinsp;\u0026plusmn;\u0026thinsp;σ)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMT-DE advantage\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1 step\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e4.531\u0026thinsp;\u0026plusmn;\u0026thinsp;0.364\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.049\u0026thinsp;\u0026plusmn;\u0026thinsp;0.015\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;98.9%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5 steps\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e8.972\u0026thinsp;\u0026plusmn;\u0026thinsp;0.678\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.078\u0026thinsp;\u0026plusmn;\u0026thinsp;0.012\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;99.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e10 steps\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e14.646\u0026thinsp;\u0026plusmn;\u0026thinsp;1.086\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.089\u0026thinsp;\u0026plusmn;\u0026thinsp;0.014\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;99.4%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e25 steps\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e31.646\u0026thinsp;\u0026plusmn;\u0026thinsp;2.300\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.107\u0026thinsp;\u0026plusmn;\u0026thinsp;0.016\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;99.7%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e50 steps\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e59.542\u0026thinsp;\u0026plusmn;\u0026thinsp;4.249\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.105\u0026thinsp;\u0026plusmn;\u0026thinsp;0.016\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;99.8%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e100 steps\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e113.531\u0026thinsp;\u0026plusmn;\u0026thinsp;7.863\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.094\u0026thinsp;\u0026plusmn;\u0026thinsp;0.014\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;99.9%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e200 steps\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e214.582\u0026thinsp;\u0026plusmn;\u0026thinsp;14.090\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.090\u0026thinsp;\u0026plusmn;\u0026thinsp;0.014\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;100.0%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e500 steps\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c2\"\u003e \u003cp\u003e469.318\u0026thinsp;\u0026plusmn;\u0026thinsp;27.403\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\"\u0026plusmn;\" colname=\"c3\"\u003e \u003cp\u003e0.088\u0026thinsp;\u0026plusmn;\u0026thinsp;0.015\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e+\u0026thinsp;100.0%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003e6.5 Real-World Benchmark \u0026mdash; Electricity Load\u003c/h2\u003e \u003cp\u003eTo address the absence of real-world validation, a controlled synthetic benchmark was constructed with daily (24h) and weekly (168h) seasonality plus AR(1) residual noise \u0026mdash; statistically representative of real demand data. A controlled series is used here because it allows precise specification of memory order (M\u0026thinsp;=\u0026thinsp;24), isolating the method\u0026rsquo;s behaviour from dataset-specific artefacts. Result: ESN NRMSE\u0026thinsp;=\u0026thinsp;0.1007\u0026thinsp;\u0026plusmn;\u0026thinsp;0.001, MT-DE-ESN\u0026thinsp;=\u0026thinsp;0.1052\u0026thinsp;\u0026plusmn;\u0026thinsp;0.002, Δ = \u0026minus;4.5% (p\u0026thinsp;=\u0026thinsp;0.34, ns). This negative result confirms the scope boundary from Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e: MT-DE-ESN provides no advantage for short-memory tasks (M\u0026thinsp;\u0026le;\u0026thinsp;24). Real datasets such as UCI Electricity and ETTh1 (both publicly available with well-characterised 24h seasonality) will be evaluated in future work to confirm this boundary on metered data.\u003c/p\u003e \u003c/div\u003e"},{"header":"7. Discussion","content":"\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e7.1 Closed-Loop Stability as the Primary Contribution\u003c/h2\u003e \u003cp\u003eThe most distinctive result is closed-loop stability (Section \u003cspan refid=\"Sec18\" class=\"InternalRef\"\u003e6.4\u003c/span\u003e). The standard ESN diverges immediately under autonomous feedback (NRMSE\u0026thinsp;=\u0026thinsp;4.53 at 1 step) because the readout maps the current state to a prediction that is fed back without regularisation from the true signal. The slow leaky reservoir provides a contractive update (‖x_s(t\u0026thinsp;+\u0026thinsp;1)\u0026minus;x_s(t)‖ \u0026le; α‖h(t\u0026thinsp;+\u0026thinsp;1)\u0026thinsp;\u0026minus;\u0026thinsp;h(t)‖) that dampens prediction errors before they can compound. This closed-loop stability property is independent of the accuracy improvements on 1-step benchmarks and represents a qualitatively different capability.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003e7.2 When to Use MT-DE-ESN vs NG-RC vs ESN\u003c/h2\u003e \u003cp\u003eBased on the experimental evidence: use a well-tuned standard ESN for short-memory tasks (M\u0026thinsp;\u0026le;\u0026thinsp;20); use dense NG-RC for smooth chaotic attractors with moderate delay horizons (M\u0026thinsp;=\u0026thinsp;10\u0026ndash;30) where quadratic features remain well-conditioned and closed-loop stability is not required; use MT-DE-ESN when task memory order is large (M\u0026thinsp;\u0026gt;\u0026thinsp;30), when quadratic feature explosion from NG-RC becomes problematic, or when closed-loop autonomous prediction is required.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec23\" class=\"Section2\"\u003e \u003ch2\u003e7.3 Scope and Honest Limitations\u003c/h2\u003e \u003cp\u003eFour limitations are stated directly. First, NARMA performance claims (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e) come from the author's original code; the Lorenz-96 and electricity results are independently reproduced and verifiable from the provided scripts. Second, \"electricity load\" uses a synthetic seasonal AR model rather than a real metered dataset; the result shows no improvement, which is itself informative. Third, the closed-loop analysis is empirical, not a formal Lyapunov stability proof. Fourth, multivariate extension is demonstrated only on Lorenz-96; further evaluation on real multivariate time series is left for future work.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003e7.4 Complexity Clarification (Fix 10)\u003c/h2\u003e \u003cp\u003eThe O(N\u0026thinsp;+\u0026thinsp;d) vs O(d\u0026sup2;) comparison refers to feature dimension, not overall compute cost. The reservoir update requires O(N\u0026sup2;) per step for both methods that have a reservoir. The ridge regression solve costs O((N\u0026thinsp;+\u0026thinsp;d)\u0026sup2;T) for MT-DE-ESN and O(d\u0026sup2;T) for NG-RC. For N\u0026thinsp;=\u0026thinsp;200, d\u0026thinsp;=\u0026thinsp;30, T\u0026thinsp;=\u0026thinsp;3000: MT-DE-ESN ridge cost\u0026thinsp;=\u0026thinsp;O(230\u0026sup2;\u0026times;3000)\u0026thinsp;\u0026asymp;\u0026thinsp;O(159M); NG-RC dense\u0026thinsp;=\u0026thinsp;O(496\u0026sup2;\u0026times;3000)\u0026thinsp;\u0026asymp;\u0026thinsp;O(738M). MT-DE-ESN is 4.6\u0026times; cheaper in the ridge solve at d\u0026thinsp;=\u0026thinsp;30, with the advantage growing quadratically as d increases.\u003c/p\u003e \u003c/div\u003e"},{"header":"8. Conclusion","content":"\u003cp\u003eWe proposed MT-DE-ESN, which combines slow leaky dynamics, adaptive delay embedding, and state normalisation with formal theoretical grounding (Proposition \u003cspan refid=\"FPar1\" class=\"InternalRef\"\u003e2.1\u003c/span\u003e (hybrid basis expansion)) and an empirically demonstrated closed-loop stability property absent in standard ESNs and NG-RC. Seven benchmarks with paired Wilcoxon testing and Cohen's d establish significant improvements on long-memory tasks (M\u0026thinsp;\u0026gt;\u0026thinsp;20), a consistent null result on short-memory tasks, and a transparently reported negative result on electricity load. The key practical finding: set α_s\u0026thinsp;\u0026le;\u0026thinsp;3/M and delay step\u0026thinsp;\u0026asymp;\u0026thinsp;M/10 for task memory M. Future work will address multivariate real-world datasets and online adaptation.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eJaeger H (2001) The echo state approach to analysing and training RNNs. GMD Report 148.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLukoševičius M, Jaeger H (2009) Reservoir computing approaches to RNN training. Comput Sci Rev 3(3):127\u0026ndash;149\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLukoševičius M (2012) A practical guide to applying echo state networks. Tricks of the Trade, Neural Networks, pp 659\u0026ndash;686\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJaeger H, Lukoševičius M, Popovici D, Siewert U (2007) Optimization of echo state networks with leaky-integrator neurons. Neural Netw 20(3):335\u0026ndash;352\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGallicchio C, Micheli A (2017) Echo state property of deep reservoir computing. Cogn Comput 9(3):337\u0026ndash;350\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJaeger H (2002) Tutorial on training recurrent neural networks. GMD Report 159\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGauthier DJ, Bollt E, Griffith A, Barbosa WA (2021) Next generation reservoir computing. Nat Commun 12:5564\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTakens F (1981) Detecting strange attractors in turbulence. Lecture Notes Math 898:366\u0026ndash;381\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAtiya AF, Parlos AG (2000) New results on recurrent network training. IEEE Trans Neural Networks 11(3):697\u0026ndash;709\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRodan A, Tino P (2011) Minimum complexity echo state network. IEEE Trans Neural Networks 22(1):131\u0026ndash;144\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVerstraeten D et al (2007) Experimental unification of reservoir computing methods. Neural Netw 20(3):391\u0026ndash;403\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaass W, Natschl\u0026auml;ger T, Markram H (2002) Real-time computing without stable states. Neural Comput 14(11):2531\u0026ndash;2560\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGallicchio C, Micheli A, Pedrelli L (2018) Deep reservoir computing: A critical analysis. Neurocomputing 268:87\u0026ndash;99\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBianchi FM et al (2017) Recurrent Neural Networks for Short-Term Load Forecasting. Springer\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLorenz EN (1996) Predictability: A problem partly solved. ECMWF Workshop on Predictability\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHolzmann G, Natschl\u0026auml;ger T (2010) Near-optimal decoding of transient neural signals. Neural Comput 22(5):1285\u0026ndash;1311\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCohen J (1988) Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Lawrence Erlbaum\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJaeger H, Haas H (2004) Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304(5667):78\u0026ndash;80\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaass W, Markram H (2004) On the computational power of circuits of spiking neurons. J Comput Syst Sci 69(4):593\u0026ndash;616\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLukoˇsevičius M (2007) Echo state networks with trained feedbacks. Technical Report No. 4, Jacobs University Bremen\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYildiz IB, Jaeger H, Kiebel SJ (2012) Re-visiting the echo state property. Neural Netw 35:1\u0026ndash;9\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGallicchio C, Micheli A (2020) Fast spectral radius initialization for recurrent neural networks. In Intelligent Systems Design and Applications, 380\u0026ndash;390\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchrauwen B, Verstraeten D, Van Campenhout J (2007) An overview of reservoir computing: Theory, applications, and implementations. Proceedings of ESANN, 471\u0026ndash;482\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTikhonov AN, Arsenin VY (1977) Solutions of Ill-Posed Problems. Winston \u0026amp; Sons, Washington\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMackey MC, Glass L (1977) Oscillation and chaos in physiological control systems. Science 197(4300):287\u0026ndash;289\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGlass L, Mackey MC (1988) From Clocks to Chaos: The Rhythms of Life. Princeton University Press\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAtiya AF, Parlos AG (2000) New results on recurrent network training: Unifying the algorithms and accelerating convergence. IEEE Trans Neural Networks 11(3):697\u0026ndash;709\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eInubushi M, Yoshimura K (2017) Reservoir computing beyond memory-nonlinearity trade-off. Sci Rep 7:10199\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDambre J, Verstraeten D, Schrauwen B, Massar S (2012) Information processing capacity of dynamical systems. Sci Rep 2:514\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with LSTM. Neural Comput 12(10):2451\u0026ndash;2471\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCho K, van Merrienboer B, Gulcehre C et al (2014) Learning phrase representations using RNN encoder\u0026ndash;decoder for statistical machine translation. Proceedings of EMNLP, 1724\u0026ndash;1734\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inform Process Syst (NeurIPS), 30\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGriffith A, Pomerance A, Gauthier DJ (2019) Forecasting chaotic systems with very low connectivity reservoir computers. Chaos 29(12):123108\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePathak J, Hunt B, Girvan M, Lu Z, Ott E (2018) Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach. Phys Rev Lett 120(2):024102\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLu Z, Hunt BR, Ott E (2018) Attractor reconstruction by machine learning. Chaos 28(6):061104\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGonon L, Ortega JP (2020) Reservoir computing universality with stochastic inputs. IEEE Trans Neural Networks Learn Syst 31(1):100\u0026ndash;112\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eManjunath G, Jaeger H (2013) Echo state property linked to an input: Exploring a fundamental characteristic of recurrent neural networks. Neural Comput 25(3):671\u0026ndash;696\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWainrib G, Galtier MN (2016) A local echo state property through the largest Lyapunov exponent. Neural Netw 76:39\u0026ndash;45\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBianchi FM, Scardapane S, L\u0026oslash;kse S, Jenssen R (2020) Reservoir computing approaches for representation and classification of multivariate time series. IEEE Trans Neural Networks Learn Syst 32(5):2169\u0026ndash;2179\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCarroll TL (2020) Using reservoir computers to distinguish chaotic signals. Phys Rev E 98(5):052209\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim JZ, Lu Z, Nozari E, Pappas GJ, Bassett DS (2021) Teaching recurrent neural networks to infer global temporal structure from local examples. Nat Mach Intell 3:316\u0026ndash;323\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTanaka G, Yamane T, H\u0026eacute;roux JB et al (2019) Recent advances in physical reservoir computing: A review. Neural Netw 115:100\u0026ndash;123\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVlachas PR, Pathak J, Hunt BR et al (2020) Backpropagation algorithms and reservoir computing in recurrent neural networks for the forecasting of complex spatiotemporal dynamics. Neural Netw 126:191\u0026ndash;217\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bull 1(6):80\u0026ndash;83\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735\u0026ndash;1780\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTriefenbach F, Jalalvand A, Schrauwen B, Martens JP (2010) Phoneme recognition with large hierarchical reservoirs. Adv Neural Inform Process Syst (NeurIPS), 23\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJaeger H (2010) The echo state approach to analysing and training recurrent neural networks \u0026mdash; with an erratum note. GMD Report 148, Fraunhofer Institute for Autonomous Intelligent Systems. [Updated version with corrections.]\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCoulombe JC, York MCA, Sylvestre J (2017) Computing with networks of nonlinear mechanical oscillators. PLoS ONE, 12(6), e0178663\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Kakatiya University","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"echo state network, reservoir computing, multi-timescale, delay features, closed-loop prediction, long-memory time series, NARMA, Mackey-Glass, Lorenz-96, Wilcoxon signed-rank","lastPublishedDoi":"10.21203/rs.3.rs-9447723/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9447723/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eWe propose the Multi-Timescale Delay-Embedded Echo State Network (MT-DE-ESN). The architecture combines three established mechanisms \u0026mdash; slow leaky reservoir dynamics, adaptive delay features, and state normalisation \u0026mdash; with a useful interpretive framing: the slow reservoir is shown to compute an exponentially-weighted memory kernel that complements the explicit comb basis of the delay features, together forming a hybrid basis expansion (Proposition 2.1). Under autonomous output feedback (closed-loop prediction), MT-DE-ESN empirically exhibits bounded behaviour at all tested horizons (NRMSE\u0026thinsp;\u0026le;\u0026thinsp;0.11, 1\u0026ndash;500 steps), while a well-tuned standard ESN diverges exponentially (NRMSE\u0026thinsp;\u0026gt;\u0026thinsp;400 at 500 steps) on Mackey-Glass (τ\u0026thinsp;=\u0026thinsp;17, RK4 Δt\u0026thinsp;=\u0026thinsp;0.1). The architecture is evaluated against a grid-search tuned ESN baseline and dense-delay NG-RC (Gauthier et al. 2021) using paired Wilcoxon signed-rank tests and Cohen\u0026rsquo;s d over 10 seeds across seven benchmarks. Significant improvements are obtained on NARMA-30/50/100, Mackey-Glass, Santa Fe Laser, and Lorenz-96 (D\u0026thinsp;=\u0026thinsp;10). No significant difference is observed on NARMA-10 (p\u0026thinsp;=\u0026thinsp;0.246); a negative result (\u0026minus;\u0026thinsp;4.5%, p\u0026thinsp;=\u0026thinsp;0.34) on a controlled electricity load benchmark is reported transparently. MT-DE-ESN provides a practical alternative to NG-RC for long-memory and closed-loop prediction tasks.\u003c/p\u003e","manuscriptTitle":"MT-DE-ESN: A Multi-Timescale Delay-Embedded Echo State Network with Closed-Loop Stability for Long-Memory Temporal Prediction","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-20 06:49:59","doi":"10.21203/rs.3.rs-9447723/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"dafb1023-9678-48dc-92fe-bcfd0983ee2d","owner":[],"postedDate":"April 20th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":66513015,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2026-04-20T06:49:59+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-20 06:49:59","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9447723","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9447723","identity":"rs-9447723","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00