Breaking the size limit: efficient sampling of large-scale transition pathways and intermediate conformations in sub-mesoscopic protein complexes

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 187,566 characters · extracted from preprint-html · click to expand
Breaking the size limit: efficient sampling of large-scale transition pathways and intermediate conformations in sub-mesoscopic protein complexes | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Breaking the size limit: efficient sampling of large-scale transition pathways and intermediate conformations in sub-mesoscopic protein complexes Laura Orellana, Domenico Scaramozzino, Byung Ho Lee This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6504036/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 02 Mar, 2026 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Abstract Protein conformational changes are the cornerstone of biological function. While conformers captured experimentally represent meta-stable states, the pathways connecting them have been elusive for experiments and simulations alike. Nowadays, cryogenic Electron Microscopy is providing rich structural data on proteins trapped in different states for increasingly large systems, but these are out of scope for current computational methods, which usually exhibit an N 2 dependence on size. Based on our previous eBDIMS algorithm, here we present eBDIMS2, an extremely optimized version with quasi-linear size dependence, able to simulate on a desktop computer exceptionally complex transitions for megadalton protein assemblies, like the rotary motion of ATP synthases. Not only eBDIMS2 pathways spontaneously visit experimental intermediates but also overlap with microsecond Molecular Dynamics simulations requiring extensive supercomputing resources. By integrating Elastic Networks with Brownian Dynamics, eBDIMS2 allows an unprecedented exploration of conformational changes of sub-mesoscopic systems, previously inaccessible. Biological sciences/Computational biology and bioinformatics Biological sciences/Biophysics/Computational biophysics Biological sciences/Structural biology/Molecular modelling Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 INTRODUCTION Protein dynamics is the fundamental link between structure and function 1 – 3 , refined and conserved through evolution 4 . In response to signals such as electrochemical gradients or ligand binding, proteins transition between different states or conformations – such as open/closed, active/inactive, etc. – enabling biological regulation. Understanding these processes requires bridging static experimental snapshots to uncover transition pathways for conformational changes 5 , often involving transient intermediates critical to function 6 . Despite advances in hardware and software 7 , sampling such transitions with Molecular Dynamics (MD) 8 – the gold standard for biomolecular simulations – remains challenging for the large time and length scales involved 3 . To increase the exploration of the conformational space, numerous methods have been proposed 9 . While some rely on tricks to enhance the atomistic sampling 10 , 11 , others employ coarse-grained (CG) techniques to simplify the physical modeling 12 . However, both approaches require complicated setups and often significant computing resources. As a faster alternative, minimalist CG-methods like the Elastic Network Models (ENMs) 13 can predict conformation changes with remarkable precision 14 , 15 , leading to numerous variants 16 – 22 to model protein dynamics 4 , 23 , flexibility 24 , 25 or guide transition pathways. However, their lack of rigorous CG-parameterization limits transferability and general applicability. Building on our carefully database-parameterized edENM force field 19 , we previously developed eBDIMS 26 (elastic network-driven Brownian Dynamics IMportance Sampling) to track transition paths between protein end-states. Following Levitt’s approach 27 , we validated the biological significance of the predicted pathways using a benchmark of proteins trapped experimentally in multiple states, assessing whether intermediates are visited spontaneously by the algorithm without prior knowledge of them. To move beyond pairwise structure comparisons, we also applied Principal Component Analysis (PCA) 28 to extract key motions from structural ensembles, ensuring that sampled pathways aligned with experimental dynamics. When tested against other path-sampling methods, eBDIMS was the only CG method capable of predicting pathways with the same accuracy as MD or atomistic force-field methods like Climber 27 , but at a fraction of the computational cost 26 . The high quality of the predicted intermediates has enabled seeding of full-atom MD simulations to explore the Free Energy Landscape (FEL) of very subtle allosteric transitions, such as ion channel gating 29 , 30 . Providing mechanistic insights across diverse systems, from transcription factors 31 to enzymes or fusion proteins 32 , 33 , eBDIMS has proven to be a general and versatile method for studying protein transitions. Another key application has been in cryogenic Electron Microscopy (cryo-EM) studies 34 , 35 , where eBDIMS results enabled to explain intermediates between trapped conformations 36 . However, in the past years the cryo-EM " resolution revolution " 37 has been dramatically expanding the Protein Data Bank (PDB) 38 with large multi-state protein complexes 39 – 41 , creating challenges for computational methods that often scale quadratically with system size. This has made large-scale transitions in protein assemblies > 300 kDa nearly intractable, even for CG methods like eBDIMS, requiring resources that few labs can afford. For instance, observing the opening of the SARS-CoV-2 spike glycoprotein required over a week of enhanced Weighted Ensemble Simulations (WES) running on 80 GPUs 42 . Similarly, capturing the conformational variability of the GroEL chaperone demanded hundreds of microseconds on the special-purpose supercomputer Anton 43 . In large multimers like GroEL, most CG algorithms fail to progress, and the only method capable of handling such systems, MinActionPath2 44 , was found to produce significant structural distortions (see below). To overcome this, here we present eBDIMS2 (Fig. 1 ), an extension of our previous algorithm which leverages smart cutoff and parallelization schemes to achieve quasi-linear scaling with the system size. This enables efficient CG-simulations of large and complex protein transitions (up to ~ 2 MDa) on a standard computer, with virtually no constraints on system size or motion complexity. To assess the accuracy of the method, we built an updated and comprehensive dataset of large protein ensembles and simulated transitions between all relevant end states. Our results demonstrate that eBDIMS2 surpasses all existing path-sampling methods, generating realistic and stereochemically accurate transitions that align with experimental intermediates, MD simulation results, as well as experimental observations. Notably, eBDIMS2 is currently the only method capable of handling such large systems and specially, transitions of extreme complexity such as the rotary motions of ATP synthases. As cryo-EM data for sub-mesoscopic protein complexes continues to grow, eBDIMS2 will provide a powerful and accessible tool for studying the dynamics of these critical yet underexplored systems, far beyond the reach of other methods. RESULTS eBDIMS2 large-scale protein benchmark and pathway validation by PCA To validate the new eBDIMS2 path-sampling algorithm, we expanded our previous benchmark composed of proteins that have well-characterized intermediates between end-states (e.g., RBP, RNaseIII, and SERCA) 26 by performing an exhaustive search for large and conformationally diverse systems in the PDB. This resulted in a total of 47 large proteins of different stoichiometry and motions (Fig. 2 a), for which “conformationally rich” ensembles were retrieved for robust ensemble-PCA (see below). All complexes are larger than ~ 300 kDa, have at least 3 experimental models available, and Root Mean Square Deviations (RMSDs) between two end states of at least ~ 4Å. This resulted in a collection of 872 PDB structures, mostly from cryo-EM, with ensembles containing from a minimum of 3 to nearly 90 structures, and an average ensemble RMSD over ~ 7Å (Supplementary Tables). The benchmark includes protein assemblies ranging from the ~ 300 kDa SARS-CoV spike glycoprotein to the ~ 2.3 MDa S. Cerevisiae Fatty-Acid Synthase (FAS, Fig. 2 a). Most of the conformational changes in these proteins are large-scale and collective (Fig. 2 b; Supplementary Tables), with transition RMSDs ranging from a minimum of ~ 4Å in the gigantic ryanodine receptor 2 (RyR2, ~ 16k residues), to an astonishing maximum of ~ 30Å for Nf1 (~ 4.3k residues, see below). Upon PCA (see Methods), the first two PCs of such “conformationally-rich” ensembles capture > 70% of the structural variance in all ensembles, and > 90% in ~ 70% of the cases (Supplementary Tables), thus providing powerful Collective Variables (CVs) for system analysis, as shown previously by us 26 , 45 and others 46 , 47 . PC1 and PC2 typically describe global motions, such as hinge-bending, twisting, and breathing modes (Supplementary Material), and as user-independent intrinsic CVs, they facilitate the identification of main conformers (e.g. open/closed, inward/outward, etc.) and their interconnecting pathways. Using PC projections as reference, we identified 124 relevant end states and simulated a total of 191 transition pathways with eBDIMS2 (Supplementary Tables), identifying intermediate conformations in > 30% of PC spaces. eBDIMS2 outperforms existing algorithms achieving a quasi-linear size-time dependence For a few full-length proteins, we compared eBDIMS2 to other existing path-sampling algorithms (see Methods). The obtained paths were evaluated in terms of computational speed and smoothness/stability of the sampling in PC projections (Supplementary Material). While for small- to medium-size proteins ( 300 kDa). For example, the ~ 15Å transition of GroEL 7-mer (~ 400 kDa) can be simulated by eBDIMS2 in ~ 1 hour, reaching an astonishing convergence of ~ 0.6 Å from the target state, while none of the other tested path-sampling methods is able to simulate it in reasonable times (Fig. 2 d). To achieve the same convergence, our previous algorithm 45 takes ~ 7 hours, highlighting a dramatic ~ 6-fold performance increase. Moreover, other methods struggle to sample the conformational space, undergoing abrupt changes in sampling direction (seen as zig-zag projections in Fig. 2 d), while eBDIMS and eBDIMS2 are always able to trace smoother and stable pathways (Supplementary Material). Apart from speed and sampling stability, eBDIMS2 has also special versatility as it can generate paths between conformers with different number of residues, thus facilitating, e.g., reconstruction of alternate conformers from full-length ones (see examples in Supplementary Material). Overall, eBDIMS2 computing times for large systems have a median value of ~ 2.5 hours (Fig. 2 b) and range from a minimum of ~ 19 minutes for DNA-dependent protein kinase catalytic subunit (DNA-PKcs, ~3k residues, ~ 6Å RMSD) to a maximum of ~ 49 hours for the gigantic FAS 12-mer (~ 21k residues, ~ 5Å; transition details in Supplementary Tables). The transitions are always able to reach the target state with high convergence, below ~ 1Å in most cases (Fig. 2 b, orange bars). As expected, we observe a clear correlation between computing times and system sizes, with a Pearson Correlation Coefficient (PCC) of ~ 0.9, showing a quasi-linear size-time dependency (Fig. 2 c, left panel). As a matter of fact, if we exclude the three largest systems in our dataset (the two ryanodine receptors and FAS 12-mer, > 1.5 MDa), the linear size-time correlation is still present, with a correlation of ~ 0.7 (Fig. 2 c, right). This confirms the outstanding performance of eBDIMS2 compared with other path-sampling methods that typically follow a N 2 dependence with size. We also expected computing times not only to depend on the protein size, but also on the extent and complexity of the transition, as can be seen from outliers in the size-time relationship (Fig. 2 c, right). Considering two similarly sized systems, we expected that the larger the extent of a transition, i.e., the larger the RMSD between the end states, the larger the computing time. This is confirmed by a positive PCC (~ 0.5) between the size-normalized computing times vs. RMSD values (Supplementary Material). On the other hand, size-normalized computing times also display a slightly negative correlation versus collectivity degrees (~-0.2). This suggests that eBDIMS2 is faster for global-collective conformational changes, likely arising from the tendency of ENMs to model large rigid-body motions better than localized ones 3 , 14 , 23 . As an example of two similarly sized systems but with different conformational changes, polyprotein P1234 and Nf1 (both ~ 500 kDa, Supplementary Tables) undergo completely distinct motions, thus leading to different outcomes in terms of computing effort (Fig. 2 c, right). P1234 undergoes a low-complexity uniform expansion-compression of all protomers (breathing mode), relatively medium-scale (~ 4Å RMSD) and highly collective ( κ ~ 0.9), which can be simulated by eBDIMS2 in just ~ 30 mins. On the other hand, Nf1 experiences dramatic conformational changes (> 23 Å RMSD), but rather localized compared to the overall Nf1 structure ( κ ~ 0.2), where the GTPase-activating protein-related domains (GRDs) undergo large-scale roto-translations to facilitate the interaction with its partner Ras 48 , 49 . As a result, this much more challenging transition can require up to several hours to compute (Supplementary Tables and Supplementary movie 1). eBDIMS2 generates realistic pathways for challenging transitions unfeasible for other methods As we have seen, complex transitions not only require longer simulation time (see Nf1 and ATP synthases outliers in Fig. 2 c), but they also tend to strain simulating algorithms, especially at intermediate points, generating stereochemical distortions 50 . To assess the overall stereochemical quality and ability to sample accurate transitions, we first computed distances between consecutive C α atoms for all 191 eBDIMS2 mid-point intermediates and compared them to the 124 experimental end-states and all 872 PDB models in our ensembles. In all cases, distance distributions are centered at the known value of ~ 3.8Å (Supplementary Material), with eBDIMS2 frames displaying a slightly larger standard deviation due to CG relaxation of the backbone. Still, outliers in the eBDIMS2 distribution coincide with those already present in experimental models (Supplementary Material). Then, we took a closer look to three large systems (~ 500 kDa) undergoing extremely complex and large-scale conformational changes, two of which stand out as outliers in the size-time relationships in our benchmark (Fig. 2 c), i.e., M. Smegmatis ATP synthase, Nf1 isoform 2, and α-macroglobulin (A2M) (Supplementary movies 1–3), and we ran their transitions also with the MinActionPath2 webserver (see Methods) 44 . While MinActionPath2 was able to compute these transitions remarkably fast (< 30 minutes), the intermediate conformations exhibit extreme distortions of the backbone stereochemistry (Fig. 3 a). In experimental models and eBDIMS2 intermediates, distances between consecutive C α atoms only show sub-Å deviations from the theoretical value of 3.8Å. Conversely, values as low as ~ 0.3 Å and as high as ~ 34 Å are observed in MinActionPath2 transition points, making these intermediates completely unphysical (details in the Supplementary Material). The rotary motions of ATP synthases are a clear example of exceptional transition complexity. All ATP synthases in our benchmark dataset undergo similar conformation changes, where the rigid rotation of the F 0 rotor is coupled with the opening-closing motions of α- and β -subunits during ATP synthesis. By applying rigid constraints to the chains of the F 0 rotor (Supplementary Material), eBDIMS2 is able to guarantee the rotor’s rigidity and generate realistic intermediates along the whole synthesis cycles (Fig. 3 ). The eBDIMS2 transition cycle (Supplementary movie 3) allows us to recapitulate the experimentally known, alternated and coordinated opening/closing motions of the three catalytic β -subunits 51 and in particular how these are coupled to the rotation of the transmembrane domain (Fig. 3 b-c), which was also observed with ns-MD simulations 52 . We also tested the suitability of the eBDIMS2-generated intermediates as seeds for atomistic MD, after all-atom reconstruction via cg2all 53 . For instance, we computed MolProbity 54 quality scores for GroEL 7-mer experimental models as well as for the eBDIMS2 mid-point intermediate along the opening transition (Supplementary Tables). The reconstructed intermediate is found to exhibit good rotameric states, but sub-optimal Ramachandran ϕ - ψ values. However, these are generally fixed after simple energy minimizations or short (1ns) MD runs (Supplementary Material), thus demonstrating that eBDIMS2 conformers can generate sterochemically realistic models suitable for further MD simulations and atomistic analysis. eBDIMS2 pathways overlap with experimental motions and with enhanced and µs -long MD simulations with minimal computational cost To further evaluate the biological significance of eBDIMS2 pathways, we assessed whether they spontaneously approach experimental intermediates and MD sampling, as shown previously for smaller systems 26 , 27 . Identification of potential intermediate states is facilitated by projections of structural ensembles on the low-dimensionality spaces defined by the PCs describing the main ensemble motions 26 . We selected six systems for deeper study, where intermediate identification is supported by PC projections as well as the literature (Supplementary Material): DNA-PKcs, the two spike glycoproteins from SARS-CoV and SARS-CoV-2 , ATP-citrate synthase (ACLY), H. Sapiens T-complex chaperonin 16-mer (TRiC), and inositol 1,4,5-trisphosphate receptor type 3 (ITPR3). Despite no information on intermediates is fed to the algorithm, eBDIMS2 is always able to approach the existing experimental intermediates with RMSDs as low as ~ 3–4Å (Figs. 4 a and 5 a, right panels; Supplementary Material), even for these large cryo-EM proteins. We also performed biased simulations with targeted MD (TMD) 55 , 56 for two of the large systems in our dataset, i.e., DNA-PKcs and ACLY, as well as some from our previous benchmark 26 . A significant agreement is generally observed between eBDIMS2 and TMD pathways, which becomes truly outstanding in cases of marked pathway asymmetries 26 , like the opening transition of RNAseIII (see Supplementary Material). An especially interesting case is that of DNA-PKcs (Fig. 4 ). This large monomeric protein is a fundamental component of the DNA-PK complex, which is central to the process of non-homologous end joining (NHEJ) of DNA 57 . Considering an ensemble of 43 experimental models (Supplementary Tables), PC1 is found to cover ~ 53% of the ensemble variance and corresponds to a vertical motion of the N-HEAT domain, which mediates DNA binding, coupled with a horizontal rotation of the FAT and kinase domains (FAT-KINs). On the other hand, PC2 explains ~ 21% of the variance and involves a lateral expansion/contraction of the N- and M-HEAT domains (Fig. 4 a). Other than a small cluster of X-ray-based conformations, PC projections allow to detect four main functional clusters 57 , 58 : a cluster of apo-like inactive conformations, where the N-HEAT domain is in the downward position and FAT-KINs are in the inactive inward conformation; a second cluster of intermediate DNA-bound states, where the FATKINs are still in the inactive conformation, but the N-HEAT region has moved upwards into the DNA-binding groove to accommodate DNA-binding; a third cluster of active conformations, where the N-HEAT domain remains in the upward DNA-bound position and the FATKIN head has raised in the active conformation 57 , and a fourth cluster of phosphorylated conformers 58 . PC1 correlates with the sequential N-HEAT and FATKIN motions required during activation-deactivation of the protein, while PC2 captures the lateral expansions associated with DNA-PKcs phosphorylation. Based on the cryo-EM data, we infer that DNA-PKcs could switch from the inactive conformation (7k19) to the active state (7k0y), and vice versa, directly or via a two-step process visiting the intermediate conformers (7k1n) 57 , 58 . We simulated several transitions between these end states with eBDIMS2 (Supplementary Tables) and verified that the experimental intermediate is approached very closely even when we simulate a one-step pathway between the inactive and active state (Fig. 4 a, right panel). Moreover, Fig. 4 b shows that eBDIMS2 pathways agree with atomistic TMD simulations, as both sample the same area of the conformational space. For the activation, TMD implies that both one- and two-step mechanisms are possible (Fig. 4 b, upper panel; see Supplementary Material), whereas for the inactivation mechanism, all simulations suggest a one-step transition without visiting the intermediate conformation (Fig. 4 b, lower panel). While describing similar conformational trajectories, the advantage of eBDIMS2 over TMD is that our CG algorithm does not require a lengthy preparation of the molecular system, and it consumes significantly lower computational resources (~ 30 min in a desktop computer with 16 OpenMP threads for eBDIMS2 vs. ~11 hours on a high-performance computing cluster with 128 parallel cores for TMD; see Supplementary Material). We also carried out unbiased MD from end-state and eBDIMS2-generated conformations to explore whether they spontaneously sample the transition (Supplementary Tables). While some end-state conformers are stuck in low-energy minima of the MD simulations, others lead to a broader sampling of the conformational space, in good agreement with experimental PCs and eBDIMS2 trajectories (see Supplementary Material). For example, in all simulations from the apo-open conformation of ACLY, this large protein exhibits collective motions along PC1 (~ 0.8 overlap; Supplementary Tables), with large-scale twisting of the four acetyl-CoA synthetase homology domains, in agreement with the eBDIMS2 paths from an inhibited conformation to a partially open state (Supplementary Material). On the other hand, MD simulations from the holo conformation in absence of ligands show opening motions along PC2, highlighting a spontaneous tendency to go back to the apo-open state. These trajectories are again consistent with the eBDIMS2 holo-apo pathway and capture experimental intermediates that are also closely approached by the eBDIMS2 path (see Supplementary Material). Lastly, we compared eBDIMS2 to MD trajectories of the SARS-CoV-2 spike glycoprotein (Fig. 5 a), publicly available from the Amaro’s lab 42 , 59 . In this case, unbiased simulations 59 provide only a low amount of sampling of the spike receptor-binding domain (RBD) motions compared with eBDIMS2 transitions between the experimental end states (see Supplementary Material). Yet, µs -long simulations of a double-mutant (N165A-N234A) spike, which was experimentally shown to reduce binding to angiotensin-converting enzyme 2 (ACE2) receptor as a result of the RBD conformational shift toward the “down” state 59 , were found to provide a FEL consistent with the direction of RBD opening/closing pathways predicted by eBDIMS2 (Fig. 5 b). We also compared a Weighted Ensemble (WE) enhanced sampling trajectory, where one RBD was observed to undergo complete opening 42 , to the eBDIMS2 pathway from the closed spike (6xr8) to the one-RBD-up conformation (7a94) (Supplementary movie 4). From the comparison, we found that the two trajectories sample a similar area of the conformational space, showing a good similarity between the WE and eBDIMS2 intermediates (Fig. 5 c; more details in the Supplementary Material). Both methods are thus able to provide a realistic description of RBD opening, yet eBDIMS2 requires much lower computing resources in comparison (20 CPU-h for eBDIMS2 vs. 17,000 GPU-h for WES; more details in Supplementary Material). DISCUSSION Here, we introduce eBDIMS2, an enhanced version of our previous path-sampling ENM-driven Brownian Dynamics (BD) algorithm 26 , which achieves over 6-fold speed improvement for large protein systems. This advance enables realistic CG-simulations of transition pathways in sub-mesoscopic assemblies that were previously infeasible. The method is validated through an expanded benchmark of large-scale protein transitions and comparisons with multiple MD techniques, demonstrating its superior speed, stability, and versatility. To test eBDIMS2, we extended our previous benchmark to include 47 large, conformationally diverse proteins ranging from ~ 300 kDa to 2 MDa (mostly from cryo-EM). These proteins undergo transitions from simple ~ 4Å breathing motions to highly complex ~ 20–30Å rotations/translations, such as Nf1 activation and ATP synthase rotary motion. Many of the proteins in our new benchmark dataset have recently been found to play key roles in diseases like cancer 48 , 49 , 60 – 62 , tuberculosis 63 or skeletal muscle disorders 64 . Yet, they have been largely overlooked in simulations due to their extreme size. Using projections on PC1 and PC2 (> 70% variance) essential modes from experimental ensembles as collective variables (CVs), we identified 124 relevant end states and simulated 191 transition pathways, capturing intermediate conformations in ~ 30% of cases, consistent with our previous benchmark 26 . For small- to medium-sized proteins (< 1k residues), eBDIMS2 performs comparably to existing path-sampling methods but with smoother trajectory sampling in PC space. However, its advantage becomes clear for systems larger than ~ 300 kDa, particularly those with high-complexity transitions. For example, eBDIMS2 simulates the 400 kDa GroEL chaperone (15Å transition) in ~ 1 hour, whereas other methods fail to complete the task in reasonable timeframes. Across large systems, eBDIMS2 has a median runtime of ~ 2.5 hours, preponderantly reaching sub-1Å convergence. Its computing times scale quasi-linearly with system size (PCC ~ 0.9 overall, ~ 0.7 excluding the largest systems > 1 MDa), contrasting with the near-quadratic scaling of other path-sampling methods. In MD simulations, O(N²) complexity is reduced to O(NlogN) with Ewald electrostatics, yet long-timescale transitions remain computationally prohibitive. For instance, unbiased MD struggles to capture the SARS-CoV-2 spike glycoprotein opening 42 , 59 , requiring WES enhanced sampling and 7.5 µs of simulation (17,000 GPU-hours over a week) 42 . In contrast, eBDIMS2 can achieve a similar motion description in just ~ 1.2 hours on a standard desktop using 20 CPU-h (Fig. 5 ). Likewise, for the activation transition of DNA-PKc, eBDIMS2 reduced computational times from 11 HPC hours to just ~ 30 minutes on a desktop, while maintaining agreement with Targeted MD (TMD). Notably, TMD further corroborates the asymmetry of forward and reverse pathways (Fig. 4 ), as we previously demonstrated for smaller proteins 26 . Despite the inevitable loss of atomistic detail due to coarse-graining, eBDIMS2 can generate high-quality intermediates suitable for atomistic reconstruction, thus offering a quick route to explore and populate the conformational space for further MD simulations. Even for highly complex transitions where other methods fail, eBDIMS2 produces stereochemically correct pathways within hours. For example, the extreme complexity of the rotary motions of ATP synthase strains MinActionPath2 (the only other algorithm capable to deal with systems of this size), up to generate structures with completely unphysical bond lengths near the transition point. In contrast, the eBDIMS2 transition cycle (Supplementary movie 3 and Fig. 3 ) accurately recapitulates the known, alternated and coordinated opening/closing motions of the three catalytic β -subunits 51 and their allosteric coupling to transmembrane rotation as seen in MD 52 . With minimal preparation—requiring only two sets of coordinates—eBDIMS2 is particularly valuable for the cryo-EM community, which increasingly generates high-resolution structural data 37 but lacks efficient methods for further mechanistic analysis. By leveraging just a few experimental end-state conformers, researchers can use eBDIMS2 on a standard computer to "bridge the gaps" in conformational space at the CG-level, then refine intermediates through atomistic reconstruction and further MD 29 , 30 (Fig. 1 ). This capability can make eBDIMS2 a powerful tool for applications ranging from drug discovery and design of novel binding partners and small molecules 65 to elucidating their mechanisms of action 6 , 66 . To our knowledge, eBDIMS2 is the only algorithm currently capable of generating accurate and realistic pathways for protein transitions of this large scale and complexity, with minimal preparation and computational cost, and thus can accelerate the dynamical and biological interpretation of rapidly growing large-scale structural data. METHODS Protein dataset In this work, we built a comprehensive dataset of protein ensembles (Supplementary Tables), with a wide variety of functions, sizes, and shapes (Fig. 2 a), including three smaller systems from our previous benchmark 26 , i.e., ribose-binding protein (RBP, 30 kDa), RNA endonuclease III (RNAseIII, 48 kDa), sarcoplasmic/endoplasmic reticulum Ca 2+ ATPase1 (SERCA, 109 kDa). These medium-size proteins and their structural ensembles of X-ray conformations were used as relevant test cases to assess the ability of eBDIMS2 to replicate our previous results 26 , test the performance of the algorithm for different parameters, compare with other path-sampling algorithms, and evaluate the capability of Molecular Dynamics (MD) simulations to explore their conformational space. To retrieve structural data specifically for large proteins (> 300 kDa) that undergo large-scale conformational changes (RMSD > 4Å), we performed a far-reaching bioinformatic search from the PDB 38 and UniProt 67 databases, eventually building structural ensembles for a total of 47 large proteins (Supplementary Tables). Detailed information about the screening protocol and dataset generation is provided in the Supplementary Material. Structural ensembles and Principal Component Analysis (PCA) For all proteins in our dataset, we generated structural ensembles based on experimental conformations available in the PDB. We retrieved all PDB models based on UniProt ID codes, and we considered all oligomeric states relevant for the protein biological function (see Supplementary Material). Structures with low resolution (> 5–6Å) and/or large missing domains were excluded from further analysis. The final list of all PDB models used for the ensemble generation is reported in the Supplementary Material. We made sure that all the structures belonging to the same ensemble are consistent for further quantitative analyses. We checked that the different chains in multi-chain proteins correspond to the same protomers, and we removed regions that are missing in at least one conformation to guarantee that all structures have the same number of residues. For each ensemble, a reference structure was selected, generally corresponding to the resting-apo state of the protein. Global structural alignment with respect to the reference structure was applied to all conformations, and PCA was performed on the aligned ensemble. PCA is a multivariate statistical technique applied to reduce the number of dimensions to describe protein structures and dynamics 68 . PCA has been widely used to describe the essential motions of proteins from MD simulations 69 and experimental ensembles 26 , 47 . The input of PCA is an n × 3N coordinate matrix, X , n being the number of structures in the ensemble and N the number of residues, usually considering only C α atoms. From X , the elements of the symmetric covariance matrix, C , are calculated as: $$\:{c}_{ij}=⟨\left({x}_{i}-⟨{x}_{i}⟩\right)\left({x}_{j}-⟨{x}_{j}⟩\right)⟩$$ 1 where the brackets \(\:⟨\dots\:⟩\) indicate the average over the n structures. Eigenvalue-eigenvector decomposition is then used to diagonalize the covariance matrix as: $$\:\varvec{C}=\varvec{U}\varvec{\varDelta\:}{\varvec{U}}^{T}$$ 2 where the diagonal matrix Δ contains the eigenvalues of C , while the matrix U contains its eigenvectors, representing the Principal Components (PCs). Eigenvalues are sorted in descending order and are directly proportional to the variance captured from each corresponding PC. After calculating the PCs, each structure is projected in the low-dimensionality PC space 26 : $$\:{p}_{l,m}=\left({\varvec{X}}_{\varvec{l}}-{\varvec{X}}_{\varvec{r}\varvec{e}\varvec{f}}\right)\bullet\:\frac{{\varvec{P}\varvec{C}}_{\varvec{m}}}{\left|{\varvec{P}\varvec{C}}_{\varvec{m}}\right|}$$ 3 where p l,m is the projection of conformation l along the m th PC, X l and X ref are the vectors containing the 3D coordinates of the l th conformation and the reference structure, respectively, and PC m is the vector of the apparent motion captured by the m th PC. As shown in our previous work 26 , PCA of structurally rich ensembles allows to identify clusters of significant conformations, as well as conformational-functional intermediates. In this work, we used PC projections to select relevant end-state conformers of large proteins and simulate their transition pathways with eBDIMS2. eBDIMS2 for modeling conformational transitions eBDIMS2 is an optimized version of our previous elastic network-driven Brownian Dynamics IMportance Sampling (eBDIMS) method 26 , 45 , which is now able to deal with very large proteins and complex conformational transitions in remarkably low computing times. The goal of eBDIMS is to model conformational changes from a starting conformation, R 0 , to a target state, R t . It uses a coarse-grained (CG) representation of the protein, considering one bead per amino acid (C α atom), and implements the MD-derived essential-dynamics Elastic Network Model (edENM) force-field 19 , where the protein is treated as a network of mass particles connected by elastic springs. A Brownian Dynamics (BD) framework 70 is used to simulate the protein dynamics and trace physically acceptable trajectories from R 0 to R t . The equations of motion follow the Langevin Eq. 2 6 : $$\:{m}_{i}{\ddot{r}}_{i}={F}_{i}-\gamma\:{\dot{r}}_{i}+{\xi\:}_{i}\left(t\right)$$ 4 where m i and r i are the mass and the position of the i th particle, respectively, F i is the force acting on the i th particle due to the particle-particle interactions from the edENM, γ is the friction coefficient related to dispersion forces arising from to the interactions with the surrounding fluid 70 , and ξ i is a time-dependent white-noise term that accounts for the thermal motion of the solvent 9 , 26 , 70 . In order to bias the trajectory in the direction of the target, eBDIMS uses Dynamics IMportance Sampling (DIMS). Every number k of BD steps, a progress variable Γ is computed and used to drive the transition. Γ is defined as difference of pairwise distances between the simulated ( R s ) and target ( R t ) conformations 26 : $$\:{\varGamma\:}_{s}=\sum\:_{i=1}^{N-1}\sum\:_{j=i+1}^{N}{\left({{d}_{ij}}^{s}-{{d}_{ij}}^{t}\right)}^{2}$$ 5 where d ij s is the distance between the i th and j th particles in the simulated structure R s at step s , d ij t is their distance in the target conformation R t , and N is the total number of particles in the system. Γ s is compared every k steps to the previous value, Γ s−1 , and the current conformation R s is accepted if Γ s < Γ s−1 , or rejected otherwise 26 . The iterations proceed until convergence to R t , e.g., until the sampled conformations reach a Root Mean Square Deviation (RMSD) from the target in the range of thermal oscillations (~ 1Å) or when Γ s is sufficiently close to zero. Our original version of eBDIMS is currently available as a public webserver and as a stand-alone C ++ code 45 and is efficient for proteins up to ~ 2k residues and ~ 250 kDa. Larger systems would require an enormous time to drive the transition up to the target conformation. The new eBDIMS2 algorithm, implemented in Fortran , overcomes this limitation by modifying the strategy to calculate interaction forces during the BD simulation. In eBDIMS, the F i term reported in Eq. ( 4 ) is calculated by iterating over every possible interaction, i.e.: $$\:{F}_{i}=\sum\:_{i=1}^{N-1}\sum\:_{j=i+1}^{N}{F}_{ij}$$ 6 where F ij is the force between two particles i and j based on the edENM force-field. This gives rise to a quadratic increase in the number of interactions with the system size, i.e., ( N 2 – N )/2 for a set of N particles. For a small protein like RBP (271 residues), this would lead to considering ~ 36.5k particle-particle interactions, while for a huge system like ryanodine receptor 1 (RyR1, ~ 17k residues) this leads to almost 140 million interactions to be computed at each step. Due to the strong power-law decay of edENM interactions between non-bonded particles (see Supplementary Material), eBDIMS2 implements a more efficient strategy based on an adaptive cutoff, similar to the procedures employed in MD simulations. At the beginning of the BD, we generate a list of proximal residues in the reference conformation R 0 based on the cutoff r c . This list is used to evaluate all i - j interactions that are numerically meaningful and therefore used to compute all not-zero F ij values. This strategy strongly reduces the number of iterations, decreasing the computational burden of the algorithm, particularly for larger systems. For example, considering a cutoff r c of 8 Å, RBP requires ~ 1.6k interactions in eBDIMS2 (instead of ~ 36.5k), while only ~ 80k interactions are required for RyR1 (instead of ~ 140 million). After a certain number of BD steps, we update this force interaction list, to account for the new positions of C α atoms in the new conformation. Then, the simulation proceeds until convergence. We tested the performance of eBDIMS2 for several values of the cutoff r c (8, 10, 15, 20 Å) and the biasing frequency k (1, 2, 5, 10), looking at the time required to simulate the transitions and its projection in the PC space. While these parameters were not found to play a significant role on the PC projections of the transitions, a cutoff r c of 8 Å and a biasing frequency k of 10 steps were found to be optimal to minimize computing time, especially for larger systems (see Supplementary Material). Another advantage of eBDIMS2 is that it can now compute pathways between structures with missing residues. Most of the path-sampling methods available in the literature 71 require the two end-state protein conformations not to have missing residues. However, almost all large systems from cryo-EM inevitably present several regions that are missing in the 3D model due to, e.g., difficulties in fitting density maps, low resolution, high local flexibilities, etc. For this reason, we have developed eBDIMS2 in such a way that the two protein end states can have gaps in the sequence, the overall connectivity being ensured by the ENM non-bonded interactions. An additional feature that we included in eBDIMS2 is also the possibility to model rigid blocks in the protein explicitly, and we removed the need to have exactly the same number of residues in the end-state conformers. Additional information about these extra features can be found in the Supplementary Material. Application of eBDIMS2 to large proteins After identifying biologically relevant conformational clusters from the PC spaces, we applied eBDIMS2 to run transition pathways between all end-state conformers, both in the forward and reverse directions. All simulations were carried out on a Linux workstation with an Intel® Core i9-13900K processor, 64 GB of RAM, and using OpenMP parallelization with 16 threads. For each transition, we computed RMSD values and collectivity degrees. The former quantifies the amplitude of the conformational change: $$\:RMSD=\sqrt{\frac{1}{N}\sum\:_{i=1}^{N}{\left({{r}_{i}}^{t}-{{r}_{i}}^{0}\right)}^{2}}$$ 7 where r i t and r i 0 represent the positions (after alignment) of the i th C α atom in the target and reference conformation, respectively. The collectivity degree κ provides an estimate of the global-local nature of the transition 14 : $$\:\kappa\:=\frac{1}{N}\text{exp}\left(-\sum\:_{i=1}^{N}\frac{\left|{{r}_{i}}^{t}-{{r}_{i}}^{0}\right|}{\sum\:_{i=1}^{N}\left|{{r}_{i}}^{t}-{{r}_{i}}^{0}\right|}\text{log}\frac{\left|{{r}_{i}}^{t}-{{r}_{i}}^{0}\right|}{\sum\:_{i=1}^{N}\left|{{r}_{i}}^{t}-{{r}_{i}}^{0}\right|}\right)$$ 8 and its value can range from a minimum of 1 /N , when only one atom is involved in the conformational change, to a maximum of 1 when all atoms uniformly participate to the transition. eBDIMS2 pathways were then projected on the corresponding PC spaces via Eq. ( 3 ), to inspect the relationship between generated intermediates and experimental conformations. To quantify the performance of the method, we recorded RMSD values from the target at the moment of convergence and the time employed by the method to reach convergence. We also computed RMSD values between eBDIMS2 transitions and on-path experimental intermediates. To assess the stereochemical quality of the eBDIMS2-generated conformers, we computed distances between pairs of consecutive C α atoms and compared the C α -C α distance distributions to those obtained from experimental structures. After performing all-atom reconstruction with cg2all 53 , we also made use of MolProbity 54 to generate Ramachandran plots and check the atomistic quality of our intermediate conformers. eBDIMS2 transition pathways and comparison with other path-sampling methods Over the past 20 years a plethora of methods have been developed to model conformational transitions in proteins 71 . Here we compared eBDIMS2 with our previous eBDIMS C ++ version 26 and eight additional algorithms whose executables are freely available: iMOD 20 , GOdMD 72 , NGENI 73 , ICONGENI 74 , Climber 27 , ENI 75 , aANM 76 , and ANMPathway 50 . These methodologies mainly differ for: (i) the representation of the protein degrees of freedom (DOFs); (ii) the simulation framework to model the protein dynamics; (iii) the biasing strategy used to drive the transition; and (iv) the reversibility-irreversibility of the transition in the forward-backward direction. More information about these methods is provided in the Supplementary Material. As mentioned above, the majority of these algorithms do not allow to analyze transitions in proteins with missing residues. For this reason, we carried out a detailed comparison for four proteins in our dataset that have full-length structural ensembles and of increasing size, i.e., RBP (271 residues), RNAseIII (432 residues), SERCA (993 residues), and GroEL in the 7-mer oligomerization state (3,626 residues). For these four systems, we simulated transition pathways between two relevant end-state conformations, and we compared computing times and pathway projections in the PC space. All calculations were performed on the same Linux workstation described above. At the time of this writing, MinActionPath2 was released 44 , which represents an improvement of the previous MinActionPath algorithm 77 , and can now be used to deal with large macromolecular assemblies. Both MinActionPath and MinActionPath2 are only available as webservers, which prevents a thorough comparison of computing times with eBDIMS2. However, we used the MinActionPath2 webserver to assess the quality of the transition points of some of the large proteins in our dataset that undergo complex and large-scale conformational changes and compared them with our eBDIMS2 intermediates. Atomistic MD simulations We also performed atomistic MD simulations, both unbiased MD from end-state and intermediate conformations to assess the sampling of the conformational landscape, and targeted MD (TMD) to simulate transition pathways 55 , 56 . For proteins with missing residues, e.g., DNA-dependent protein kinase catalytic subunit (DNA-PKcs) and ATP-citrate synthase (ACLY), unmodelled gaps were filled by using the SWISS-MODEL 78 webserver. For simulations starting from eBDIMS2-generated intermediates, cg2all 53 was used to obtain atomistic models suitable for MD (see Supplementary Tables). All molecular systems were prepared with CHARMM-GUI 79 , and MD simulations were performed using Gromacs 80 version 2024.1. For TMD, we used Gromacs patched with Plumed 81 . The CHARMM36m force field was used to describe the biomolecular interactions 82 , and we added TIP3P water molecules as well as sodium (Na + ) and chloride (Cl − ) ions at 150 mM concentration, to maintain physiological salt concentration and mimic intracellular conditions. First, we carried out an energy minimization with the steepest descent algorithm for 5,000 steps. Then, the system underwent a 125-ps equilibration in order to maintain a temperature of 303.15 K, with the Nose–Hoover thermostat 83 , 84 , and a pressure of 1.0 bar, using the Parrinello–Rahman barostat 85 with isotropic pressure coupling. The LINCS algorithm was used to constrain H-bonds 86 . Short-range van der Waals and electrostatic interactions cutoffs were set to 12 Å. Long-range electrostatic interactions were described using the particle mesh Ewald (PME) approach 87 , 88 with periodic boundary conditions. For unbiased MD simulations, we carried out production runs using a 2-fs time step and saving coordinate frames every 100 ps. To speed up the simulations for medium-size proteins, we made use of H-mass repartitioning with a longer 4-fs time step 11 . Each system was simulated for 200 ns, with three replicas starting from different random seeds. A total of 0.6 µs unbiased atomistic dynamics was thus generated for at least two distinct conformations (Supplementary Tables), which we used to build Free Energy Landscapes (FELs). To check how well MD trajectories align with experimental conformations, we computed overlap scores and Root Mean Square Inner Products (RMSIPs) between experimental PCs and Essential Dynamics (ED) eigenvectors (see Supplementary Material) 69 . For TMD, three production runs were also carried out to obtain transition pathways between the selected end-state conformers. For the medium-size systems, TMD runs were carried out for 1 ns, while for DNA-PKcs and ACLY we simulated for 2 ns (Supplementary Tables). In TMD, the RMSD between the two aligned end states was used as bias and applied every 10 steps with an elastic constant of 100 kcal/mol/Å 2 . TMD trajectories reached convergence in all analyzed systems, with an RMSD from the target of ~ 1–2 Å. Since our benchmark dataset included the much-studied SARS-CoV-2 spike glycoprotein, we also made use of MD trajectories publicly available from the Amaro’s lab 42 , 59 . We downloaded several simulations of the open, closed, and N165A-N234A double-mutant state of the spike 59 , as well as a trajectory showing the opening of one receptor-binding domain (RBD) obtained through a Weighted Ensemble (WE) enhanced sampling approach 42 . All trajectories and FELs were projected on the experimental PC spaces and compared with our eBDIMS2 pathways. Declarations Author Contributions DS developed eBDIMS2, built the protein dataset, generated the structural ensembles, designed and performed all simulations and analyses, prepared figures, and wrote the manuscript draft. BHL contributed with MD simulations and with comparison with ENI, NGENI and ICONGENI path-sampling algorithms. LO conceived the original idea and contributed to discussions. All authors participated in data interpretation and manuscript revision. Data availability The eBDIMS2 code is available at https://github.com/domenicoscaramozzino/eBDIMS2. Additional data, such as PDB models of ensembles and transition pathways, as well as eBDIMS2 transition GIFs, is available at figshare (ensemble data: https://doi.org/10.6084/m9.figshare.28334204.v1; GIFs: https://doi.org/10.6084/m9.figshare.28334195.v1). Funding and acknowledgments LO acknowledges financial support from Cancerfonden Junior Investigator Award (CF 21 0305 JIA) and Project Grants (CF 21 1471 Pj, CF 24 3801 Pj) as well as Vetenskapsrådet Starting Grant (VR 2021-02248) and Karolinska Institutet. DS acknowledges financial support from Cancerfonden postdoctoral fellowship (24 0908 PT). Simulations were run using the National Academic Infrastructure for Supercomputing in Sweden (allocations NAISS 2023/5-400 and 2024/1-7 to LO). References Hensen, U. et al. Exploring Protein Dynamics Space: The Dynasome as the Missing Link between Protein Structure and Function. PLOS ONE 7 , e33931 (2012). Henzler-Wildman, K. & Kern, D. Dynamic personalities of proteins. Nature 450 , 964–72 (2007). Orellana, L. Large-Scale Conformational Changes and Protein Function: Breaking the in silico Barrier. Front. Mol. Biosci. 6 , 117 (2019). Orellana, L. Are Protein Shape-Encoded Lowest-Frequency Motions a Key Phenotype Selected by Evolution? Appl. Sci. 13 , 6756 (2023). Miller, M. D. & Phillips, G. N. Moving beyond static snapshots: Protein dynamics and the Protein Data Bank. J. Biol. Chem. 296 , 100749 (2021). Orellana, L. et al. Oncogenic mutations at the EGFR ectodomain structurally converge to remove a steric hindrance on a kinase-coupled cryptic epitope. Proc. Natl. Acad. Sci. 116 , 10009–10018 (2019). Dror, R. O., Dirks, R. M., Grossman, J. P., Xu, H. & Shaw, D. E. Biomolecular Simulation: A Computational Microscope for Molecular Biology. Annu. Rev. Biophys. 41 , 429–452 (2012). Hospital, A., Goñi, J. R., Orozco, M. & Gelpí, J. L. Molecular dynamics simulations: advances and applications. Adv. Appl. Bioinforma. Chem. AABC 8 , 37–47 (2015). Mhashal, A., Emperador, A. & Orellana, L. Computational techniques to study protein dynamics and conformations. in Advances in Protein Molecular and Structural Biology Methods 199–212 (Elsevier, 2022). doi:10.1016/B978-0-323-90264-9.00013-1. Laio, A. & Parrinello, M. Escaping free-energy minima. Proc. Natl. Acad. Sci. 99 , 12562–12566 (2002). Hopkins, C. W., Le Grand, S., Walker, R. C. & Roitberg, A. E. Long-Time-Step Molecular Dynamics through Hydrogen Mass Repartitioning. J. Chem. Theory Comput. 11 , 1864–1874 (2015). Kmiecik, S. et al. Coarse-Grained Protein Models and Their Applications. Chem. Rev. 116 , 7898–7936 (2016). Bahar, I. & Rader, A. Coarse-grained normal mode analysis in structural biology. Curr. Opin. Struct. Biol. 15 , 586–592 (2005). Tama, F. & Sanejouand, Y.-H. Conformational change of proteins arising from normal mode calculations. Protein Eng. Des. Sel. 14 , 1–6 (2001). Yang, L., Song, G. & Jernigan, R. L. How Well Can We Understand Large-Scale Protein Motions Using Normal Modes of Elastic Network Models? Biophys. J. 93 , 920–929 (2007). Tama, F., Gadea, F. X., Marques, O. & Sanejouand, Y.-H. Building-block approach for determining low-frequency normal modes of macromolecules. Proteins Struct. Funct. Bioinforma. 41 , 1–7 (2000). Atilgan, A. R. et al. Anisotropy of Fluctuation Dynamics of Proteins with an Elastic Network Model. Biophys. J. 80 , 505–515 (2001). Yang, L., Song, G. & Jernigan, R. L. Protein elastic network models and the ranges of cooperativity. Proc. Natl. Acad. Sci. 106 , 12347–12352 (2009). Orellana, L. et al. Approaching Elastic Network Models to Molecular Dynamics Flexibility. J. Chem. Theory Comput. 6 , 2910–2923 (2010). Lopéz-Blanco, J. R., Garzón, J. I. & Chacón, P. iMod: multipurpose normal mode analysis in internal coordinates. Bioinformatics 27 , 2843–2850 (2011). Hoffmann, A. & Grudinin, S. NOLB: Nonlinear Rigid Block Normal-Mode Analysis Method. J. Chem. Theory Comput. 13 , 2123–2134 (2017). Khade, P. M. et al. hdANM: a new comprehensive dynamics model for protein hinges. Biophys. J. 120 , 4955–4965 (2021). Scaramozzino, D., Piana, G., Lacidogna, G. & Carpinteri, A. Low-Frequency Harmonic Perturbations Drive Protein Conformational Changes. Int. J. Mol. Sci. 22 , 10501 (2021). Scaramozzino, D., Khade, P. M., Jernigan, R. L., Lacidogna, G. & Carpinteri, A. Structural compliance: A new metric for protein flexibility. Proteins Struct. Funct. Bioinforma. 88 , 1482–1492 (2020). Scaramozzino, D., Khade, P. M. & Jernigan, R. L. Protein Fluctuations in Response to Random External Forces. Appl. Sci. 12 , 2344 (2022). Orellana, L., Yoluk, O., Carrillo, O., Orozco, M. & Lindahl, E. Prediction and validation of protein intermediate states from structurally rich ensembles and coarse-grained simulations. Nat. Commun. 7 , 12575 (2016). Weiss, D. R. & Levitt, M. Can Morphing Methods Predict Intermediate Structures? J. Mol. Biol. 385 , 665–674 (2009). Kitao, A. Principal Component Analysis and Related Methods for Investigating the Dynamics of Biological Macromolecules. J 5 , 298–317 (2022). Bergh, C., Heusser, S. A., Howard, R. & Lindahl, E. Markov state models of proton- and pore-dependent activation in a pentameric ligand-gated ion channel. eLife 10 , e68369 (2021). Lycksell, M. et al. Probing solution structure of the pentameric ligand-gated ion channel GLIC by small-angle neutron scattering. Proc. Natl. Acad. Sci. 118 , e2108006118 (2021). Almeida, B. C., Kaczmarek, J. A., Figueiredo, P. R., Prather, K. L. J. & Carvalho, A. T. P. Transcription factor allosteric regulation through substrate coordination to zinc. NAR Genomics Bioinforma. 3 , lqab033 (2021). Chaudhuri, D., Majumder, S., Datta, J. & Giri, K. Elucidating the conformational change of dengue envelope protein using the Markov state model. Mol. Simul. 50 , 1153–1169 (2024). Querino Lima Afonso, M., da Fonseca, N. J. Jr., de Oliveira, L. C., Lobo, F. P. & Bleicher, L. Coevolved Positions Represent Key Functional Properties in the Trypsin-Like Serine Proteases Protein Family. J. Chem. Inf. Model. 60 , 1060–1068 (2020). Zhekova, H. R. et al. CryoEM structures of anion exchanger 1 capture multiple states of inward- and outward-facing conformations. Commun. Biol. 5 , 1–13 (2022). Matsuoka, R. et al. Structure, mechanism and lipid-mediated remodeling of the mammalian Na+/H+ exchanger NHA2. Nat. Struct. Mol. Biol. 29 , 108–120 (2022). Kim, J. J. et al. Shared structural mechanisms of general anaesthetics and benzodiazepines. Nature 585 , 303–308 (2020). Carroni, M. & Saibil, H. R. Cryo electron microscopy to determine the structure of macromolecular complexes. Methods 95 , 78–85 (2016). Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28 , 235–242 (2000). Bonomi, M. & Vendruscolo, M. Determination of protein structural ensembles using cryo-electron microscopy. Curr. Opin. Struct. Biol. (2019) doi:10.1016/j.sbi.2018.10.006. Krieger, J. M., Sorzano, C. O. S., Carazo, J. M. & Bahar, I. Protein dynamics developments for the large scale and cryoEM: case study of ProDy 2.0. Acta Crystallogr. Sect. Struct. Biol. 78 , 399–409 (2022). Frank, J. New Opportunities Created by Single-Particle Cryo-EM: The Mapping of Conformational Space. Biochemistry (2018) doi:10.1021/acs.biochem.8b00064. Casalino, L. et al. AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics. Int. J. High Perform. Comput. Appl. 35 , 432–451 (2021). Piana, S. & Shaw, D. E. Atomic-Level Description of Protein Folding inside the GroEL Cavity. J. Phys. Chem. B 122 , 11440–11449 (2018). Koehl, P., Navaza, R., Tekpinar, M. & Delarue, M. MinActionPath2: path generation between different conformations of large macromolecular assemblies by action minimization. Nucleic Acids Res. 52 , W256–W263 (2024). Orellana, L., Gustavsson, J., Bergh, C., Yoluk, O. & Lindahl, E. eBDIMS server: protein transition pathways with ensemble analysis in 2D-motion spaces. Bioinformatics 35 , 3505–3507 (2019). Yang, L., Song, G., Carriquiry, A. & Jernigan, R. L. Close correspondence between the motions from principal component analysis of multiple HIV-1 protease structures and elastic network modes. Struct. Lond. Engl. 1993 16 , 321–30 (2008). Sankar, K., Mishra, S. K. & Jernigan, R. L. Comparisons of Protein Dynamics from Experimental Structure Ensembles, Molecular Dynamics Ensembles, and Coarse-Grained Elastic Network Models. J. Phys. Chem. B 122 , 5409–5417 (2018). Chaker-Margot, M. et al. Structural basis of activation of the tumor suppressor protein neurofibromin. Mol. Cell 82 , 1288-1296.e5 (2022). Naschberger, A., Baradaran, R., Rupp, B. & Carroni, M. The structure of neurofibromin isoform 2 reveals different functional states. Nature 599 , 315–319 (2021). Das, A. et al. Exploring the Conformational Transitions of Biomolecular Systems Using a Simple Two-State Anisotropic Network Model. PLoS Comput. Biol. 10 , e1003521 (2014). Sobti, M., Ueno, H., Noji, H. & Stewart, A. G. The six steps of the complete F1-ATPase rotary catalytic cycle. Nat. Commun. 12 , 4690 (2021). Böckmann, R. A. & Grubmüller, H. Nanoseconds molecular dynamics simulation of primary mechanical energy transfer steps in F1-ATP synthase. Nat. Struct. Biol. (2002) doi:10.1038/nsb760. Heo, L. & Feig, M. One bead per residue can describe all-atom protein structures. Structure 32 , 97-111.e6 (2024). Williams, C. J. et al. MolProbity: More and better reference data for improved all-atom structure validation. Protein Sci. Publ. Protein Soc. 27 , 293–315 (2018). Schlitter, J., Engels, M. & Krüger, P. Targeted molecular dynamics: a new approach for searching pathways of conformational transitions. J. Mol. Graph. 12 , 84–89 (1994). Ovchinnikov, V. & Karplus, M. Analysis and Elimination of a Bias in Targeted Molecular Dynamics Simulations of Conformational Transitions: Application to Calmodulin. J. Phys. Chem. B 116 , 8584–8603 (2012). Chen, X. et al. Structure of an activated DNA-PK and its implications for NHEJ. Mol. Cell 81 , 801-810.e3 (2021). Liu, L. et al. Autophosphorylation transforms DNA-PK from protecting to processing DNA ends. Mol. Cell 82 , 177-189.e4 (2022). Casalino, L. et al. Beyond Shielding: The Roles of Glycans in the SARS-CoV-2 Spike Protein. ACS Cent. Sci. 6 , 1722–1734 (2020). Chen, Y. et al. Role of PRKDC in cancer initiation, progression, and treatment. Cancer Cell Int. 21 , 563 (2021). Icard, P. et al. ATP citrate lyase: A central metabolic enzyme in cancer. Cancer Lett. 471 , 125–134 (2020). Zhang, M. et al. ITPR3 facilitates tumor growth, metastasis and stemness by inducing the NF-ĸB/CD44 pathway in urinary bladder carcinoma. J. Exp. Clin. Cancer Res. 40 , 65 (2021). Guo, H. et al. Structure of mycobacterial ATP synthase bound to the tuberculosis drug bedaquiline. Nature 589 , 143–147 (2021). McCarthy, T. V., Quane, K. A. & Lynch, P. J. Ryanodine receptor mutations in malignant hyperthermia and central core disease. Hum. Mutat. 15 , 410–417 (2000). Zacharioudakis, E. & Gavathiotis, E. Targeting protein conformations with small molecules to control protein complexes. Trends Biochem. Sci. 47 , 1023–1037 (2022). Binder, Z. A. et al. Epidermal Growth Factor Receptor Extracellular Domain Mutations in Glioblastoma Present Opportunities for Clinical Imaging and Therapeutic Development. Cancer Cell 34 , 163-177.e7 (2018). The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51 , D523–D531 (2023). David, C. C. & Jacobs, D. J. Principal Component Analysis: A Method for Determining the Essential Dynamics of Proteins. Methods Mol. Biol. Clifton NJ 1084 , 193–226 (2014). Daidone, I. & Amadei, A. Essential dynamics: foundation and applications. WIREs Comput. Mol. Sci. 2 , 762–770 (2012). Emperador, A., Carrillo, O., Rueda, M. & Orozco, M. Exploring the Suitability of Coarse-Grained Techniques for the Representation of Protein Dynamics. Biophys. J. 95 , 2127–2138 (2008). Zheng, W. & Wen, H. A survey of coarse-grained methods for modeling protein conformational transitions. Curr. Opin. Struct. Biol. 42 , 24–30 (2017). Sfriso, P., Hospital, A., Emperador, A. & Orozco, M. Exploration of conformational transition pathways from coarse-grained simulations. Bioinformatics 29 , 1980–1986 (2013). Lee, B. H. et al. Normal mode-guided transition pathway generation in proteins. PLOS ONE 12 , e0185658 (2017). Lee, B. H., Park, S. W., Jo, S. & Kim, M. K. Protein conformational transitions explored by a morphing approach based on normal mode analysis in internal coordinates. PLOS ONE 16 , e0258818 (2021). Kim, M. K., Chirikjian, G. S. & Jernigan, R. L. Elastic models of conformational transitions in macromolecules. J. Mol. Graph. Model. 21 , 151–160 (2002). Yang, Z., Májek, P. & Bahar, I. Allosteric Transitions of Supramolecular Systems Explored by Network Models: Application to Chaperonin GroEL. PLoS Comput. Biol. 5 , e1000360 (2009). Franklin, J., Koehl, P., Doniach, S. & Delarue, M. MinActionPath: maximum likelihood trajectory for large-scale structural transitions in a coarse-grained locally harmonic energy landscape. Nucleic Acids Res. 35 , W477–W482 (2007). Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46 , W296–W303 (2018). Jo, S., Kim, T., Iyer, V. G. & Im, W. CHARMM-GUI: A web-based graphical user interface for CHARMM. J. Comput. Chem. 29 , 1859–1865 (2008). Abraham, M. J. et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2 , 19–25 (2015). Bonomi, M. et al. Promoting transparency and reproducibility in enhanced molecular simulations. Nat. Methods 16 , 670–673 (2019). Huang, J. et al. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods 14 , 71–73 (2017). Nosé, S. A molecular dynamics method for simulations in the canonical ensemble. Mol. Phys. 52 , 255–268 (1984). Hoover, W. G. Canonical dynamics: Equilibrium phase-space distributions. Phys. Rev. Gen. Phys. 31 , 1695–1697 (1985). Parrinello, M. & Rahman, A. Polymorphic transitions in single crystals: A new molecular dynamics method. J. Appl. Phys. 52 , 7182–7190 (1981). Hess, B., Bekker, H., Berendsen, H. J. C. & Fraaije, J. G. E. M. LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem. 18 , 1463–1472 (1997). Ewald, P. P. Die Berechnung optischer und elektrostatischer Gitterpotentiale. Ann. Phys. 369 , 253–287 (1921). Essmann, U. et al. A smooth particle mesh Ewald method. J. Chem. Phys. 103 , 8577–8593 (1995). Additional Declarations There is NO Competing Interest. Supplementary Files Supplementarymovie1Nf1.avi Supplementary Movie 1 Supplementarymovie2A2M.avi Supplementary Movie 2 Supplementarymovie3ATPsynthase.avi Supplementary Movie 3 Supplementarymovie4SARSspike.avi Supplementary Movie 4 ScaramozzinoetalSupplementarymaterial.docx Supplementary Material Cite Share Download PDF Status: Published Journal Publication published 02 Mar, 2026 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6504036","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":452597687,"identity":"17cbdea3-e167-4c71-8a0b-89130aafafaf","order_by":0,"name":"Laura Orellana","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA40lEQVRIiWNgGAWjYBACPjiLnbkBSNoYsBHSglDAzAjSkka6lsMGBB3Gxn7G7MMPBpt8fmbGxs8FNeeN+RjYHz7Aq4Unx3hmD0Oa5cxmxmbpGcdum7Ex8BjjtYqNIceYgQfoHoPDjA3SPGy3bYBa2CTwauF/Y8z4B6jF/jBj82+ef+eAWtif/8CrRSLHmBlsCzNjmzRv2wGgwxjM8OkAanlWzCxjkGYgcZixzZq3L9mYjZnHGK/D+PmTNzO+qbAx4G9vPnyb55ud4fz29ocf8FoDBighxExY/SgYBaNgFIwCAgAAlMQ2oUcbFJYAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0003-1927-555X","institution":"Karolinska Institute","correspondingAuthor":true,"prefix":"","firstName":"Laura","middleName":"","lastName":"Orellana","suffix":""},{"id":452597688,"identity":"787396e6-cfe0-49cf-b4ee-ef7266bb63c2","order_by":1,"name":"Domenico Scaramozzino","email":"","orcid":"https://orcid.org/0000-0002-6235-8070","institution":"Karolinska Institute","correspondingAuthor":false,"prefix":"","firstName":"Domenico","middleName":"","lastName":"Scaramozzino","suffix":""},{"id":452597689,"identity":"503b3539-76aa-47c1-9860-9855d7c0d9c5","order_by":2,"name":"Byung Ho Lee","email":"","orcid":"https://orcid.org/0000-0001-6515-7656","institution":"Karolinska Institute","correspondingAuthor":false,"prefix":"","firstName":"Byung","middleName":"Ho","lastName":"Lee","suffix":""}],"badges":[],"createdAt":"2025-04-22 11:55:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6504036/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6504036/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41467-026-69809-y","type":"published","date":"2026-03-02T05:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":82168544,"identity":"8d7eaaf7-2a48-4f20-a099-9c64f54bc2ff","added_by":"auto","created_at":"2025-05-07 09:30:51","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1881207,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOverview of\u003c/strong\u003e \u003cstrong\u003ethe eBDIMS2 path-sampling method for the generation of realistic conformational intermediates of large protein systems, cross-validation with Principal Component Analysis (PCA) of experimental ensembles, and applications upon atomistic reconstruction. \u003c/strong\u003eThe first step is to generate an experimental ensemble from available structures in the Protein Data Bank (PDB). PCA is then performed to cluster experimental conformations and assign biological states to each conformational cluster, e.g., apo/holo, inactive/active. After identification of relevant end-state conformers, eBDIMS2 uses a combination of coarse-grained (CG) Elastic Network Modeling (ENM) and Brownian Dynamics (BD) to sample the conformational transition between the two states. The transition pathway is projected back onto the experimental PC space and used to explain experimental intermediates and conformational cycles. The CG intermediates can be atomistically reconstructed and used for further applications, e.g., enhanced sampling with Molecular Dynamics (MD), drug design targeting the intermediate conformation, etc.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6504036/v1/54b1b66aa1fb7773f5bb0d77.png"},{"id":82168543,"identity":"aedf643f-0fa9-454b-8d2f-24cd6f4a25e4","added_by":"auto","created_at":"2025-05-07 09:30:51","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1769925,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eeBDIMS2 benchmark and computing times. (a)\u003c/strong\u003e Large proteins investigated here (from ~300 kDa to ~2.3MDa; more details in Supplementary Tables). The three medium-size proteins from our previous eBDIMS benchmark\u003csup\u003e26\u003c/sup\u003e are also shown here for size comparison. \u003cstrong\u003e(b) \u003c/strong\u003eTransition pathways simulated with eBDIMS2: (1) system sizes (number of common residues in the ensemble PDB models); (2) RMSD value between end-state conformers (green bars) and RMSD at the time of convergence (orange), averaged across all analyzed transitions; (3) collectivity degree of the conformational changes; (4) eBDIMS2 computing time to reach convergence. \u003cstrong\u003e(c) \u003c/strong\u003eCorrelation between eBDIMS2 computing time and system size (number of residues), for all 47 large protein systems (left panel, PCC ~0.9) and for proteins smaller than ~1 MDa (right panel, PCC ~0.7). The black line corresponds to the trend line of the dataset, while the dashed lines are related to 95% confidence intervals.\u003cstrong\u003e (d)\u003c/strong\u003e Transition pathways in GroEL 7-mer (~400 kDa) and comparison of computing times between eBDIMS2 and other path-sampling algorithms including non-linear (left panel) and linear (right panel) methods. For non-linear methods, both forward and backward transitions, as well as final RMSDs from the target, are reported. Red stop signals indicate simulations that were not able to complete the pathway (see Supplementary Material). The transition projections in the experimental PC space are also reported.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6504036/v1/1acd9afce3faae200dac0209.png"},{"id":82167599,"identity":"5c73c087-a791-494e-8bd9-ae1fb2b44646","added_by":"auto","created_at":"2025-05-07 09:22:51","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":2202053,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eeBDIMS2 generates realistic transitions also for extremely complex and large-scale conformational changes. (a) \u003c/strong\u003eComparison between the quality of experimental end states, eBDIMS2 mid-point intermediates, and MinActionPath2 transition points for: (1) \u003cem\u003eM. Smegmatis\u003c/em\u003e ATP synthase, from rotary state 1 (PDB: 7jg5) to rotary state 2 (7jg6); (2) Nf1 isoform 2, from active conformation (7pgt) to inactive state (7pgr); (3) A2M, from native state (7o7l) to activated state (7o7p) (Supplementary movies 1-3). Distances between consecutive C\u003csup\u003eα\u003c/sup\u003e atoms are reported using boxplot representations. The bottom and top of each box are the 25\u003csup\u003eth\u003c/sup\u003e and 75\u003csup\u003eth\u003c/sup\u003e percentiles of the C\u003csup\u003eα\u003c/sup\u003e-C\u003csup\u003eα\u003c/sup\u003e distance distributions and the red line in the middle represents the median value. Observations beyond the whiskers are marked as outliers using red “+” signs (more details in the Supplementary Material). \u003cstrong\u003e(b)\u003c/strong\u003e eBDIMS2 simulation of the synthesis cycle for \u003cem\u003eM. Smegmatis\u003c/em\u003e ATP synthase. The conformational cycle is generated merging three eBDIMS2 trajectories: state 1 (7jg5) to state 2 (7jg6), state 2 (7jg6) to state 3 (7jg7), and state 3 (7jg7) to state 1 (7jg5). A cartoon representation of the synthesis cycle is provided with snapshots of eBDIMS2 intermediates (left), together with its projection in the PC1-PC2 space (middle). Cartoon representations with arrows are also reported to highlight the large-scale PC1 and PC2 motions captured from the experimental ensemble (right). \u003cstrong\u003e(c)\u003c/strong\u003e Opening/closing motions of the three \u003cem\u003eβ\u003c/em\u003e-subunits (chains D, E, F) during the ATP synthase cycle (Supplementary movie 3), elucidated by looking at the evolution of the angle \u003cem\u003eθ\u003c/em\u003e between D59, K142, and L389 residues (right). Angle values are plotted along the synthesis cycle (left) with color representation: dark blue spots (\u003cem\u003eθ\u003c/em\u003e ~ 90°) correspond to closed \u003cem\u003eβ\u003c/em\u003e-subunit conformations, while bright yellow (\u003cem\u003eθ\u003c/em\u003e ~ 110°) correspond to open conformations. Snapshots of \u003cem\u003eβ\u003c/em\u003e-subunit motions and the conformation of the F\u003csub\u003e0\u003c/sub\u003e \u003cem\u003ec\u003c/em\u003e-chains are also reported (seen from the cytoplasmic side) for each end- and intermediate-state along the cycle (bottom panel), showing the coordinated opening/closing motions of \u003cem\u003eβ\u003c/em\u003e-chains and their coupling with the rotation of the transmembrane domain.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6504036/v1/f26967590ccd9322c8685afc.png"},{"id":82168545,"identity":"370a417a-2606-462e-b8d2-65bce7b55533","added_by":"auto","created_at":"2025-05-07 09:30:51","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1706078,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eeBDIMS2 pathways capture experimental intermediates and agree with TMD simulations: the case of DNA-PKcs. (a) \u003c/strong\u003eCartoon representation of an ensemble of end-state conformations and eBDIMS2 intermediates of DNA-PKcs (left), together with its projection in the PC1-PC2 space (center), highlighting the cluster of apo-like, intermediate, active, phosphorylated, and X-ray-based conformers. RMSD distance of the eBDIMS2 trajectory between inactive (7k19) and active state (7k0y) from the experimental intermediates (right). Continuous and dashed lines refer to the forward and backward pathway directions, respectively. \u003cstrong\u003e(b)\u003c/strong\u003e Comparison between eBDIMS2 pathways and TMD trajectories from the inactive to active state (upper panel) and vice versa (lower panel). eBDIMS2 paths are shown as continuous lines with markers, and the three TMD replicas per direction are represented as points with increasing brightness as the target conformation is approached.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-6504036/v1/46bc4116f70e454402458b04.png"},{"id":82167607,"identity":"0bbafd1b-9bce-4e3b-8d35-ccd3a71719e4","added_by":"auto","created_at":"2025-05-07 09:22:51","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":2066781,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eeBDIMS2 pathways capture experimental intermediates and agree with \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eµs\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e-long and enhanced MD simulations: the case of \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eSARS-CoV-2\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e spike glycoprotein. (a) \u003c/strong\u003eCartoon representation of an ensemble of end-state conformations and eBDIMS2 intermediates of the \u003cem\u003eSARS-CoV-2\u003c/em\u003e spike (left), together with its projection in the PC1-PC2 space (center), highlighting the four conformational clusters corresponding to different opening/closing states of the RBDs. RMSD distance of the eBDIMS2 trajectory between closed (6xr8) and fully open state (7a9) from an experimental intermediate (7krs, right). Continuous and dashed lines refer to the forward and backward pathway directions, respectively. \u003cstrong\u003e(b)\u003c/strong\u003e Comparison between eBDIMS2 transition pathways and unbiased (center) and adaptive-sampling (right) MD simulations from Amaro’s lab\u003csup\u003e42\u003c/sup\u003e starting from an open conformation with N165A and N234A mutations and glycosylation (6vsb, left). Three independent \u003cem\u003eµs\u003c/em\u003e-long MD trajectories have been merged into a single FEL.\u003cstrong\u003e (c)\u003c/strong\u003e RBD full opening motion: comparison between the GPU-based Weighted Ensemble (WE) trajectory from Amaro’s lab\u003csup\u003e42\u003c/sup\u003e and the eBDIMS2 pathway between the closed and fully open state (Supplementary movie 4). Pairwise RMSD comparison (based on minimal RMSD scores, ~4Å; more details in Supplementary Material) and visual comparison of WE and eBDIMS2 intermediates along the transition.\u0026nbsp;\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-6504036/v1/0a8935dccb3ffc556c2e29e4.png"},{"id":103973017,"identity":"3b17b7bb-f20e-425e-b900-34a9d4b1b609","added_by":"auto","created_at":"2026-03-05 08:07:55","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":12145490,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6504036/v1/6fa9d32c-8c44-4937-bf02-7a9b1fb702f7.pdf"},{"id":82169337,"identity":"634643a4-a145-4faa-97f8-41ae5cb3dadc","added_by":"auto","created_at":"2025-05-07 09:38:51","extension":"avi","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":341350,"visible":true,"origin":"","legend":"Supplementary Movie 1","description":"","filename":"Supplementarymovie1Nf1.avi","url":"https://assets-eu.researchsquare.com/files/rs-6504036/v1/dfa58a1a256cbf8a0a4a6d27.avi"},{"id":82167606,"identity":"fdafda5a-b8f3-4ad9-8dc6-8984d60f3179","added_by":"auto","created_at":"2025-05-07 09:22:51","extension":"avi","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":624412,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Movie 2\u003c/p\u003e","description":"","filename":"Supplementarymovie2A2M.avi","url":"https://assets-eu.researchsquare.com/files/rs-6504036/v1/d9b06b95a7f9a3715d0a7268.avi"},{"id":82167601,"identity":"7226875d-739f-4f81-831f-9c0b6b5a6e1a","added_by":"auto","created_at":"2025-05-07 09:22:51","extension":"avi","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":571978,"visible":true,"origin":"","legend":"Supplementary Movie 3","description":"","filename":"Supplementarymovie3ATPsynthase.avi","url":"https://assets-eu.researchsquare.com/files/rs-6504036/v1/ee4a8b96834d115ed6457271.avi"},{"id":82169339,"identity":"c80e0291-e6a0-404e-89b2-8dd470841c53","added_by":"auto","created_at":"2025-05-07 09:38:51","extension":"avi","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":343896,"visible":true,"origin":"","legend":"Supplementary Movie 4","description":"","filename":"Supplementarymovie4SARSspike.avi","url":"https://assets-eu.researchsquare.com/files/rs-6504036/v1/104f7deb4cb2624d1eb48b17.avi"},{"id":82167620,"identity":"ab9379b8-b81e-4655-8dbf-590fb700e67b","added_by":"auto","created_at":"2025-05-07 09:22:51","extension":"docx","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":31836488,"visible":true,"origin":"","legend":"Supplementary Material","description":"","filename":"ScaramozzinoetalSupplementarymaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-6504036/v1/0bad01276fd2e23b35252f43.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Breaking the size limit: efficient sampling of large-scale transition pathways and intermediate conformations in sub-mesoscopic protein complexes","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003eProtein dynamics is the fundamental link between structure and function\u003csup\u003e\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e, refined and conserved through evolution\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. In response to signals such as electrochemical gradients or ligand binding, proteins transition between different states or conformations \u0026ndash; such as open/closed, active/inactive, etc. \u0026ndash; enabling biological regulation. Understanding these processes requires bridging static experimental snapshots to uncover transition pathways for conformational changes\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e, often involving transient intermediates critical to function\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. Despite advances in hardware and software\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e, sampling such transitions with Molecular Dynamics (MD)\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e \u0026ndash; the gold standard for biomolecular simulations \u0026ndash; remains challenging for the large time and length scales involved\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. To increase the exploration of the conformational space, numerous methods have been proposed\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. While some rely on tricks to enhance the atomistic sampling\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e,\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e, others employ coarse-grained (CG) techniques to simplify the physical modeling\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. However, both approaches require complicated setups and often significant computing resources. As a faster alternative, minimalist CG-methods like the Elastic Network Models (ENMs)\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e can predict conformation changes with remarkable precision\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e,\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e, leading to numerous variants\u003csup\u003e\u003cspan additionalcitationids=\"CR17 CR18 CR19 CR20 CR21\" citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e to model protein dynamics\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e,\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e, flexibility\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e,\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e or guide transition pathways. However, their lack of rigorous CG-parameterization limits transferability and general applicability.\u003c/p\u003e \u003cp\u003eBuilding on our carefully database-parameterized edENM force field\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e, we previously developed eBDIMS\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e (elastic network-driven Brownian Dynamics IMportance Sampling) to track transition paths between protein end-states. Following Levitt\u0026rsquo;s approach\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e, we validated the biological significance of the predicted pathways using a benchmark of proteins trapped experimentally in multiple states, assessing whether intermediates are visited spontaneously by the algorithm without prior knowledge of them. To move beyond pairwise structure comparisons, we also applied Principal Component Analysis (PCA)\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e to extract key motions from structural ensembles, ensuring that sampled pathways aligned with experimental dynamics. When tested against other path-sampling methods, eBDIMS was the only CG method capable of predicting pathways with the same accuracy as MD or atomistic force-field methods like \u003cem\u003eClimber\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e, but at a fraction of the computational cost\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. The high quality of the predicted intermediates has enabled seeding of full-atom MD simulations to explore the Free Energy Landscape (FEL) of very subtle allosteric transitions, such as ion channel gating\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e,\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e. Providing mechanistic insights across diverse systems, from transcription factors\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e to enzymes or fusion proteins\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e,\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e, eBDIMS has proven to be a general and versatile method for studying protein transitions. Another key application has been in cryogenic Electron Microscopy (cryo-EM) studies\u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e,\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e, where eBDIMS results enabled to explain intermediates between trapped conformations\u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eHowever, in the past years the cryo-EM \"\u003cem\u003eresolution revolution\u003c/em\u003e\"\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e has been dramatically expanding the Protein Data Bank (PDB)\u003csup\u003e\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e with large multi-state protein complexes\u003csup\u003e\u003cspan additionalcitationids=\"CR40\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u003c/sup\u003e, creating challenges for computational methods that often scale quadratically with system size. This has made large-scale transitions in protein assemblies\u0026thinsp;\u0026gt;\u0026thinsp;300 kDa nearly intractable, even for CG methods like eBDIMS, requiring resources that few labs can afford. For instance, observing the opening of the \u003cem\u003eSARS-CoV-2\u003c/em\u003e spike glycoprotein required over a week of enhanced Weighted Ensemble Simulations (WES) running on 80 GPUs\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. Similarly, capturing the conformational variability of the GroEL chaperone demanded hundreds of microseconds on the special-purpose supercomputer Anton\u003csup\u003e\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e\u003c/sup\u003e. In large multimers like GroEL, most CG algorithms fail to progress, and the only method capable of handling such systems, MinActionPath2\u003csup\u003e44\u003c/sup\u003e, was found to produce significant structural distortions (see below). To overcome this, here we present eBDIMS2 (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), an extension of our previous algorithm which leverages smart cutoff and parallelization schemes to achieve quasi-linear scaling with the system size. This enables efficient CG-simulations of large and complex protein transitions (up to ~\u0026thinsp;2 MDa) on a standard computer, with virtually no constraints on system size or motion complexity. To assess the accuracy of the method, we built an updated and comprehensive dataset of large protein ensembles and simulated transitions between all relevant end states. Our results demonstrate that eBDIMS2 surpasses all existing path-sampling methods, generating realistic and stereochemically accurate transitions that align with experimental intermediates, MD simulation results, as well as experimental observations. Notably, eBDIMS2 is currently the only method capable of handling such large systems and specially, transitions of extreme complexity such as the rotary motions of ATP synthases. As cryo-EM data for sub-mesoscopic protein complexes continues to grow, eBDIMS2 will provide a powerful and accessible tool for studying the dynamics of these critical yet underexplored systems, far beyond the reach of other methods.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"RESULTS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eeBDIMS2 large-scale protein benchmark and pathway validation by PCA\u003c/h2\u003e \u003cp\u003eTo validate the new eBDIMS2 path-sampling algorithm, we expanded our previous benchmark composed of proteins that have well-characterized intermediates between end-states (e.g., RBP, RNaseIII, and SERCA)\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e by performing an exhaustive search for large and conformationally diverse systems in the PDB. This resulted in a total of 47 large proteins of different stoichiometry and motions (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea), for which \u0026ldquo;conformationally rich\u0026rdquo; ensembles were retrieved for robust ensemble-PCA (see below). All complexes are larger than ~\u0026thinsp;300 kDa, have at least 3 experimental models available, and Root Mean Square Deviations (RMSDs) between two end states of at least\u0026thinsp;~\u0026thinsp;4\u0026Aring;. This resulted in a collection of 872 PDB structures, mostly from cryo-EM, with ensembles containing from a minimum of 3 to nearly 90 structures, and an average ensemble RMSD over ~\u0026thinsp;7\u0026Aring; (Supplementary Tables). The benchmark includes protein assemblies ranging from the ~\u0026thinsp;300 kDa \u003cem\u003eSARS-CoV\u003c/em\u003e spike glycoprotein to the ~\u0026thinsp;2.3 MDa \u003cem\u003eS. Cerevisiae\u003c/em\u003e Fatty-Acid Synthase (FAS, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea).\u003c/p\u003e \u003cp\u003eMost of the conformational changes in these proteins are large-scale and collective (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb; Supplementary Tables), with transition RMSDs ranging from a minimum of ~\u0026thinsp;4\u0026Aring; in the gigantic ryanodine receptor 2 (RyR2, ~\u0026thinsp;16k residues), to an astonishing maximum of ~\u0026thinsp;30\u0026Aring; for Nf1 (~\u0026thinsp;4.3k residues, see below). Upon PCA (see Methods), the first two PCs of such \u0026ldquo;conformationally-rich\u0026rdquo; ensembles capture\u0026thinsp;\u0026gt;\u0026thinsp;70% of the structural variance in all ensembles, and \u0026gt;\u0026thinsp;90% in ~\u0026thinsp;70% of the cases (Supplementary Tables), thus providing powerful Collective Variables (CVs) for system analysis, as shown previously by us\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e,\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e and others\u003csup\u003e\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e,\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e. PC1 and PC2 typically describe global motions, such as hinge-bending, twisting, and breathing modes (Supplementary Material), and as user-independent intrinsic CVs, they facilitate the identification of main conformers (e.g. open/closed, inward/outward, etc.) and their interconnecting pathways. Using PC projections as reference, we identified 124 relevant end states and simulated a total of 191 transition pathways with eBDIMS2 (Supplementary Tables), identifying intermediate conformations in \u0026gt;\u0026thinsp;30% of PC spaces.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eeBDIMS2 outperforms existing algorithms achieving a quasi-linear size-time dependence\u003c/h3\u003e\n\u003cp\u003eFor a few full-length proteins, we compared eBDIMS2 to other existing path-sampling algorithms (see Methods). The obtained paths were evaluated in terms of computational speed and smoothness/stability of the sampling in PC projections (Supplementary Material). While for small- to medium-size proteins (\u0026lt;\u0026thinsp;1k residues), eBDIMS2 shows similar performances compared to other methods (Supplementary Material), it clearly outperforms all of them for larger systems (\u0026gt;\u0026thinsp;300 kDa). For example, the ~\u0026thinsp;15\u0026Aring; transition of GroEL 7-mer (~\u0026thinsp;400 kDa) can be simulated by eBDIMS2 in ~\u0026thinsp;1 hour, reaching an astonishing convergence of ~\u0026thinsp;0.6 \u0026Aring; from the target state, while none of the other tested path-sampling methods is able to simulate it in reasonable times (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ed). To achieve the same convergence, our previous algorithm\u003csup\u003e\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e takes\u0026thinsp;~\u0026thinsp;7 hours, highlighting a dramatic\u0026thinsp;~\u0026thinsp;6-fold performance increase. Moreover, other methods struggle to sample the conformational space, undergoing abrupt changes in sampling direction (seen as zig-zag projections in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ed), while eBDIMS and eBDIMS2 are always able to trace smoother and stable pathways (Supplementary Material). Apart from speed and sampling stability, eBDIMS2 has also special versatility as it can generate paths between conformers with different number of residues, thus facilitating, e.g., reconstruction of alternate conformers from full-length ones (see examples in Supplementary Material).\u003c/p\u003e \u003cp\u003eOverall, eBDIMS2 computing times for large systems have a median value of ~\u0026thinsp;2.5 hours (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb) and range from a minimum of ~\u0026thinsp;19 minutes for DNA-dependent protein kinase catalytic subunit (DNA-PKcs, ~3k residues, ~\u0026thinsp;6\u0026Aring; RMSD) to a maximum of ~\u0026thinsp;49 hours for the gigantic FAS 12-mer (~\u0026thinsp;21k residues, ~\u0026thinsp;5\u0026Aring;; transition details in Supplementary Tables). The transitions are always able to reach the target state with high convergence, below ~\u0026thinsp;1\u0026Aring; in most cases (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb, orange bars). As expected, we observe a clear correlation between computing times and system sizes, with a Pearson Correlation Coefficient (PCC) of ~\u0026thinsp;0.9, showing a quasi-linear size-time dependency (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec, left panel). As a matter of fact, if we exclude the three largest systems in our dataset (the two ryanodine receptors and FAS 12-mer, \u0026gt;\u0026thinsp;1.5 MDa), the linear size-time correlation is still present, with a correlation of ~\u0026thinsp;0.7 (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec, right). This confirms the outstanding performance of eBDIMS2 compared with other path-sampling methods that typically follow a \u003cem\u003eN\u003c/em\u003e\u003csup\u003e\u003cem\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/em\u003e\u003c/sup\u003e dependence with size. We also expected computing times not only to depend on the protein size, but also on the extent and complexity of the transition, as can be seen from outliers in the size-time relationship (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec, right). Considering two similarly sized systems, we expected that the larger the extent of a transition, i.e., the larger the RMSD between the end states, the larger the computing time. This is confirmed by a positive PCC (~\u0026thinsp;0.5) between the size-normalized computing times vs. RMSD values (Supplementary Material). On the other hand, size-normalized computing times also display a slightly negative correlation versus collectivity degrees (~-0.2). This suggests that eBDIMS2 is faster for global-collective conformational changes, likely arising from the tendency of ENMs to model large rigid-body motions better than localized ones\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e,\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e,\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e. As an example of two similarly sized systems but with different conformational changes, polyprotein P1234 and Nf1 (both ~\u0026thinsp;500 kDa, Supplementary Tables) undergo completely distinct motions, thus leading to different outcomes in terms of computing effort (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec, right). P1234 undergoes a low-complexity uniform expansion-compression of all protomers (breathing mode), relatively medium-scale (~\u0026thinsp;4\u0026Aring; RMSD) and highly collective (\u003cem\u003eκ\u003c/em\u003e\u0026thinsp;~\u0026thinsp;0.9), which can be simulated by eBDIMS2 in just\u0026thinsp;~\u0026thinsp;30 mins. On the other hand, Nf1 experiences dramatic conformational changes (\u0026gt;\u0026thinsp;23 \u0026Aring; RMSD), but rather localized compared to the overall Nf1 structure (\u003cem\u003eκ\u003c/em\u003e\u0026thinsp;~\u0026thinsp;0.2), where the GTPase-activating protein-related domains (GRDs) undergo large-scale roto-translations to facilitate the interaction with its partner Ras\u003csup\u003e\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e,\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e. As a result, this much more challenging transition can require up to several hours to compute (Supplementary Tables and Supplementary movie 1).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eeBDIMS2 generates realistic pathways for challenging transitions unfeasible for other methods\u003c/h3\u003e\n\u003cp\u003eAs we have seen, complex transitions not only require longer simulation time (see Nf1 and ATP synthases outliers in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec), but they also tend to strain simulating algorithms, especially at intermediate points, generating stereochemical distortions\u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e. To assess the overall stereochemical quality and ability to sample accurate transitions, we first computed distances between consecutive C\u003csup\u003eα\u003c/sup\u003e atoms for all 191 eBDIMS2 mid-point intermediates and compared them to the 124 experimental end-states and all 872 PDB models in our ensembles. In all cases, distance distributions are centered at the known value of ~\u0026thinsp;3.8\u0026Aring; (Supplementary Material), with eBDIMS2 frames displaying a slightly larger standard deviation due to CG relaxation of the backbone. Still, outliers in the eBDIMS2 distribution coincide with those already present in experimental models (Supplementary Material).\u003c/p\u003e \u003cp\u003eThen, we took a closer look to three large systems (~\u0026thinsp;500 kDa) undergoing extremely complex and large-scale conformational changes, two of which stand out as outliers in the size-time relationships in our benchmark (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec), i.e., \u003cem\u003eM. Smegmatis\u003c/em\u003e ATP synthase, Nf1 isoform 2, and α-macroglobulin (A2M) (Supplementary movies 1\u0026ndash;3), and we ran their transitions also with the MinActionPath2 webserver (see Methods)\u003csup\u003e\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e. While MinActionPath2 was able to compute these transitions remarkably fast (\u0026lt;\u0026thinsp;30 minutes), the intermediate conformations exhibit extreme distortions of the backbone stereochemistry (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea). In experimental models and eBDIMS2 intermediates, distances between consecutive C\u003csup\u003eα\u003c/sup\u003e atoms only show sub-\u0026Aring; deviations from the theoretical value of 3.8\u0026Aring;. Conversely, values as low as ~\u0026thinsp;0.3 \u0026Aring; and as high as ~\u0026thinsp;34 \u0026Aring; are observed in MinActionPath2 transition points, making these intermediates completely unphysical (details in the Supplementary Material).\u003c/p\u003e \u003cp\u003eThe rotary motions of ATP synthases are a clear example of exceptional transition complexity. All ATP synthases in our benchmark dataset undergo similar conformation changes, where the rigid rotation of the F\u003csub\u003e0\u003c/sub\u003e rotor is coupled with the opening-closing motions of \u003cem\u003eα-\u003c/em\u003e and \u003cem\u003eβ\u003c/em\u003e-subunits during ATP synthesis. By applying rigid constraints to the chains of the F\u003csub\u003e0\u003c/sub\u003e rotor (Supplementary Material), eBDIMS2 is able to guarantee the rotor\u0026rsquo;s rigidity and generate realistic intermediates along the whole synthesis cycles (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). The eBDIMS2 transition cycle (Supplementary movie 3) allows us to recapitulate the experimentally known, alternated and coordinated opening/closing motions of the three catalytic \u003cem\u003eβ\u003c/em\u003e-subunits\u003csup\u003e\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e and in particular how these are coupled to the rotation of the transmembrane domain (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb-c), which was also observed with ns-MD simulations\u003csup\u003e\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eWe also tested the suitability of the eBDIMS2-generated intermediates as seeds for atomistic MD, after all-atom reconstruction via \u003cem\u003ecg2all\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003e. For instance, we computed MolProbity\u003csup\u003e\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e quality scores for GroEL 7-mer experimental models as well as for the eBDIMS2 mid-point intermediate along the opening transition (Supplementary Tables). The reconstructed intermediate is found to exhibit good rotameric states, but sub-optimal Ramachandran \u003cem\u003eϕ\u003c/em\u003e-\u003cem\u003eψ\u003c/em\u003e values. However, these are generally fixed after simple energy minimizations or short (1ns) MD runs (Supplementary Material), thus demonstrating that eBDIMS2 conformers can generate sterochemically realistic models suitable for further MD simulations and atomistic analysis.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eeBDIMS2 pathways overlap with experimental motions and with enhanced and\u003c/b\u003e \u003cb\u003e\u0026micro;s\u003c/b\u003e\u003cb\u003e-long MD simulations with minimal computational cost\u003c/b\u003e\u003c/p\u003e \u003cp\u003eTo further evaluate the biological significance of eBDIMS2 pathways, we assessed whether they spontaneously approach experimental intermediates and MD sampling, as shown previously for smaller systems\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e,\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e. Identification of potential intermediate states is facilitated by projections of structural ensembles on the low-dimensionality spaces defined by the PCs describing the main ensemble motions\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. We selected six systems for deeper study, where intermediate identification is supported by PC projections as well as the literature (Supplementary Material): DNA-PKcs, the two spike glycoproteins from \u003cem\u003eSARS-CoV\u003c/em\u003e and \u003cem\u003eSARS-CoV-2\u003c/em\u003e, ATP-citrate synthase (ACLY), \u003cem\u003eH. Sapiens\u003c/em\u003e T-complex chaperonin 16-mer (TRiC), and inositol 1,4,5-trisphosphate receptor type 3 (ITPR3). Despite no information on intermediates is fed to the algorithm, eBDIMS2 is always able to approach the existing experimental intermediates with RMSDs as low as ~\u0026thinsp;3\u0026ndash;4\u0026Aring; (Figs.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea and \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea, right panels; Supplementary Material), even for these large cryo-EM proteins.\u003c/p\u003e \u003cp\u003eWe also performed biased simulations with targeted MD (TMD)\u003csup\u003e\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e,\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e\u003c/sup\u003e for two of the large systems in our dataset, i.e., DNA-PKcs and ACLY, as well as some from our previous benchmark\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. A significant agreement is generally observed between eBDIMS2 and TMD pathways, which becomes truly outstanding in cases of marked pathway asymmetries\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e, like the opening transition of RNAseIII (see Supplementary Material). An especially interesting case is that of DNA-PKcs (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). This large monomeric protein is a fundamental component of the DNA-PK complex, which is central to the process of non-homologous end joining (NHEJ) of DNA\u003csup\u003e\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e\u003c/sup\u003e. Considering an ensemble of 43 experimental models (Supplementary Tables), PC1 is found to cover\u0026thinsp;~\u0026thinsp;53% of the ensemble variance and corresponds to a vertical motion of the N-HEAT domain, which mediates DNA binding, coupled with a horizontal rotation of the FAT and kinase domains (FAT-KINs). On the other hand, PC2 explains\u0026thinsp;~\u0026thinsp;21% of the variance and involves a lateral expansion/contraction of the N- and M-HEAT domains (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea). Other than a small cluster of X-ray-based conformations, PC projections allow to detect four main functional clusters\u003csup\u003e\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e,\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e\u003c/sup\u003e: a cluster of apo-like inactive conformations, where the N-HEAT domain is in the downward position and FAT-KINs are in the inactive inward conformation; a second cluster of intermediate DNA-bound states, where the FATKINs are still in the inactive conformation, but the N-HEAT region has moved upwards into the DNA-binding groove to accommodate DNA-binding; a third cluster of active conformations, where the N-HEAT domain remains in the upward DNA-bound position and the FATKIN head has raised in the active conformation\u003csup\u003e\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e\u003c/sup\u003e, and a fourth cluster of phosphorylated conformers\u003csup\u003e\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e\u003c/sup\u003e. PC1 correlates with the sequential N-HEAT and FATKIN motions required during activation-deactivation of the protein, while PC2 captures the lateral expansions associated with DNA-PKcs phosphorylation. Based on the cryo-EM data, we infer that DNA-PKcs could switch from the inactive conformation (7k19) to the active state (7k0y), and vice versa, directly or via a two-step process visiting the intermediate conformers (7k1n)\u003csup\u003e\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e,\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e\u003c/sup\u003e. We simulated several transitions between these end states with eBDIMS2 (Supplementary Tables) and verified that the experimental intermediate is approached very closely even when we simulate a one-step pathway between the inactive and active state (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea, right panel). Moreover, Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb shows that eBDIMS2 pathways agree with atomistic TMD simulations, as both sample the same area of the conformational space. For the activation, TMD implies that both one- and two-step mechanisms are possible (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb, upper panel; see Supplementary Material), whereas for the inactivation mechanism, all simulations suggest a one-step transition without visiting the intermediate conformation (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb, lower panel). While describing similar conformational trajectories, the advantage of eBDIMS2 over TMD is that our CG algorithm does not require a lengthy preparation of the molecular system, and it consumes significantly lower computational resources (~\u0026thinsp;30 min in a desktop computer with 16 \u003cem\u003eOpenMP\u003c/em\u003e threads for eBDIMS2 vs. ~11 hours on a high-performance computing cluster with 128 parallel cores for TMD; see Supplementary Material).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe also carried out unbiased MD from end-state and eBDIMS2-generated conformations to explore whether they spontaneously sample the transition (Supplementary Tables). While some end-state conformers are stuck in low-energy minima of the MD simulations, others lead to a broader sampling of the conformational space, in good agreement with experimental PCs and eBDIMS2 trajectories (see Supplementary Material). For example, in all simulations from the apo-open conformation of ACLY, this large protein exhibits collective motions along PC1 (~\u0026thinsp;0.8 overlap; Supplementary Tables), with large-scale twisting of the four acetyl-CoA synthetase homology domains, in agreement with the eBDIMS2 paths from an inhibited conformation to a partially open state (Supplementary Material). On the other hand, MD simulations from the holo conformation in absence of ligands show opening motions along PC2, highlighting a spontaneous tendency to go back to the apo-open state. These trajectories are again consistent with the eBDIMS2 holo-apo pathway and capture experimental intermediates that are also closely approached by the eBDIMS2 path (see Supplementary Material).\u003c/p\u003e \u003cp\u003eLastly, we compared eBDIMS2 to MD trajectories of the \u003cem\u003eSARS-CoV-2\u003c/em\u003e spike glycoprotein (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea), publicly available from the Amaro\u0026rsquo;s lab\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e,\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e\u003c/sup\u003e. In this case, unbiased simulations\u003csup\u003e\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e\u003c/sup\u003e provide only a low amount of sampling of the spike receptor-binding domain (RBD) motions compared with eBDIMS2 transitions between the experimental end states (see Supplementary Material). Yet, \u003cem\u003e\u0026micro;s\u003c/em\u003e-long simulations of a double-mutant (N165A-N234A) spike, which was experimentally shown to reduce binding to angiotensin-converting enzyme 2 (ACE2) receptor as a result of the RBD conformational shift toward the \u0026ldquo;down\u0026rdquo; state\u003csup\u003e\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e\u003c/sup\u003e, were found to provide a FEL consistent with the direction of RBD opening/closing pathways predicted by eBDIMS2 (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eb). We also compared a Weighted Ensemble (WE) enhanced sampling trajectory, where one RBD was observed to undergo complete opening\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e, to the eBDIMS2 pathway from the closed spike (6xr8) to the one-RBD-up conformation (7a94) (Supplementary movie 4). From the comparison, we found that the two trajectories sample a similar area of the conformational space, showing a good similarity between the WE and eBDIMS2 intermediates (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ec; more details in the Supplementary Material). Both methods are thus able to provide a realistic description of RBD opening, yet eBDIMS2 requires much lower computing resources in comparison (20 CPU-h for eBDIMS2 vs. 17,000 GPU-h for WES; more details in Supplementary Material).\u003c/p\u003e\n"},{"header":"DISCUSSION","content":"\u003cp\u003eHere, we introduce eBDIMS2, an enhanced version of our previous path-sampling ENM-driven Brownian Dynamics (BD) algorithm\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e, which achieves over 6-fold speed improvement for large protein systems. This advance enables realistic CG-simulations of transition pathways in sub-mesoscopic assemblies that were previously infeasible. The method is validated through an expanded benchmark of large-scale protein transitions and comparisons with multiple MD techniques, demonstrating its superior speed, stability, and versatility. To test eBDIMS2, we extended our previous benchmark to include 47 large, conformationally diverse proteins ranging from ~\u0026thinsp;300 kDa to 2 MDa (mostly from cryo-EM). These proteins undergo transitions from simple\u0026thinsp;~\u0026thinsp;4\u0026Aring; breathing motions to highly complex\u0026thinsp;~\u0026thinsp;20\u0026ndash;30\u0026Aring; rotations/translations, such as Nf1 activation and ATP synthase rotary motion. Many of the proteins in our new benchmark dataset have recently been found to play key roles in diseases like cancer\u003csup\u003e\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e,\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e,\u003cspan additionalcitationids=\"CR61\" citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e\u003c/sup\u003e, tuberculosis\u003csup\u003e\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e\u003c/sup\u003e or skeletal muscle disorders\u003csup\u003e\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e\u003c/sup\u003e. Yet, they have been largely overlooked in simulations due to their extreme size. Using projections on PC1 and PC2 (\u0026gt;\u0026thinsp;70% variance) essential modes from experimental ensembles as collective variables (CVs), we identified 124 relevant end states and simulated 191 transition pathways, capturing intermediate conformations in ~\u0026thinsp;30% of cases, consistent with our previous benchmark\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eFor small- to medium-sized proteins (\u0026lt;\u0026thinsp;1k residues), eBDIMS2 performs comparably to existing path-sampling methods but with smoother trajectory sampling in PC space. However, its advantage becomes clear for systems larger than ~\u0026thinsp;300 kDa, particularly those with high-complexity transitions. For example, eBDIMS2 simulates the 400 kDa GroEL chaperone (15\u0026Aring; transition) in ~\u0026thinsp;1 hour, whereas other methods fail to complete the task in reasonable timeframes. Across large systems, eBDIMS2 has a median runtime of ~\u0026thinsp;2.5 hours, preponderantly reaching sub-1\u0026Aring; convergence. Its computing times scale quasi-linearly with system size (PCC\u0026thinsp;~\u0026thinsp;0.9 overall, ~\u0026thinsp;0.7 excluding the largest systems\u0026thinsp;\u0026gt;\u0026thinsp;1 MDa), contrasting with the near-quadratic scaling of other path-sampling methods. In MD simulations, \u003cem\u003eO(N\u0026sup2;)\u003c/em\u003e complexity is reduced to \u003cem\u003eO(NlogN)\u003c/em\u003e with Ewald electrostatics, yet long-timescale transitions remain computationally prohibitive. For instance, unbiased MD struggles to capture the \u003cem\u003eSARS-CoV-2\u003c/em\u003e spike glycoprotein opening\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e,\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e\u003c/sup\u003e, requiring WES enhanced sampling and 7.5 \u0026micro;s of simulation (17,000 GPU-hours over a week)\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. In contrast, eBDIMS2 can achieve a similar motion description in just\u0026thinsp;~\u0026thinsp;1.2 hours on a standard desktop using 20 CPU-h (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). Likewise, for the activation transition of DNA-PKc, eBDIMS2 reduced computational times from 11 HPC hours to just\u0026thinsp;~\u0026thinsp;30 minutes on a desktop, while maintaining agreement with Targeted MD (TMD). Notably, TMD further corroborates the asymmetry of forward and reverse pathways (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e), as we previously demonstrated for smaller proteins\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. Despite the inevitable loss of atomistic detail due to coarse-graining, eBDIMS2 can generate high-quality intermediates suitable for atomistic reconstruction, thus offering a quick route to explore and populate the conformational space for further MD simulations. Even for highly complex transitions where other methods fail, eBDIMS2 produces stereochemically correct pathways within hours. For example, the extreme complexity of the rotary motions of ATP synthase strains MinActionPath2 (the only other algorithm capable to deal with systems of this size), up to generate structures with completely unphysical bond lengths near the transition point. In contrast, the eBDIMS2 transition cycle (Supplementary movie 3 and Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e) accurately recapitulates the known, alternated and coordinated opening/closing motions of the three catalytic \u003cem\u003eβ\u003c/em\u003e-subunits\u003csup\u003e\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e and their allosteric coupling to transmembrane rotation as seen in MD\u003csup\u003e\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eWith minimal preparation\u0026mdash;requiring only two sets of coordinates\u0026mdash;eBDIMS2 is particularly valuable for the cryo-EM community, which increasingly generates high-resolution structural data\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e but lacks efficient methods for further mechanistic analysis. By leveraging just a few experimental end-state conformers, researchers can use eBDIMS2 on a standard computer to \"bridge the gaps\" in conformational space at the CG-level, then refine intermediates through atomistic reconstruction and further MD\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e,\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). This capability can make eBDIMS2 a powerful tool for applications ranging from drug discovery and design of novel binding partners and small molecules\u003csup\u003e\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e\u003c/sup\u003e to elucidating their mechanisms of action\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e,\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e\u003c/sup\u003e. To our knowledge, eBDIMS2 is the only algorithm currently capable of generating accurate and realistic pathways for protein transitions of this large scale and complexity, with minimal preparation and computational cost, and thus can accelerate the dynamical and biological interpretation of rapidly growing large-scale structural data.\u003c/p\u003e"},{"header":"METHODS","content":"\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eProtein dataset\u003c/h2\u003e \u003cp\u003eIn this work, we built a comprehensive dataset of protein ensembles (Supplementary Tables), with a wide variety of functions, sizes, and shapes (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea), including three smaller systems from our previous benchmark\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e, i.e., ribose-binding protein (RBP, 30 kDa), RNA endonuclease III (RNAseIII, 48 kDa), sarcoplasmic/endoplasmic reticulum Ca\u003csup\u003e2+\u003c/sup\u003e ATPase1 (SERCA, 109 kDa). These medium-size proteins and their structural ensembles of X-ray conformations were used as relevant test cases to assess the ability of eBDIMS2 to replicate our previous results\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e, test the performance of the algorithm for different parameters, compare with other path-sampling algorithms, and evaluate the capability of Molecular Dynamics (MD) simulations to explore their conformational space. To retrieve structural data specifically for large proteins (\u0026gt;\u0026thinsp;300 kDa) that undergo large-scale conformational changes (RMSD\u0026thinsp;\u0026gt;\u0026thinsp;4\u0026Aring;), we performed a far-reaching bioinformatic search from the PDB\u003csup\u003e\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e and UniProt\u003csup\u003e\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e\u003c/sup\u003e databases, eventually building structural ensembles for a total of 47 large proteins (Supplementary Tables). Detailed information about the screening protocol and dataset generation is provided in the Supplementary Material.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eStructural ensembles and Principal Component Analysis (PCA)\u003c/h3\u003e\n\u003cp\u003eFor all proteins in our dataset, we generated structural ensembles based on experimental conformations available in the PDB. We retrieved all PDB models based on UniProt ID codes, and we considered all oligomeric states relevant for the protein biological function (see Supplementary Material). Structures with low resolution (\u0026gt;\u0026thinsp;5\u0026ndash;6\u0026Aring;) and/or large missing domains were excluded from further analysis. The final list of all PDB models used for the ensemble generation is reported in the Supplementary Material. We made sure that all the structures belonging to the same ensemble are consistent for further quantitative analyses. We checked that the different chains in multi-chain proteins correspond to the same protomers, and we removed regions that are missing in at least one conformation to guarantee that all structures have the same number of residues. For each ensemble, a reference structure was selected, generally corresponding to the resting-apo state of the protein. Global structural alignment with respect to the reference structure was applied to all conformations, and PCA was performed on the aligned ensemble.\u003c/p\u003e \u003cp\u003ePCA is a multivariate statistical technique applied to reduce the number of dimensions to describe protein structures and dynamics\u003csup\u003e\u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e\u003c/sup\u003e. PCA has been widely used to describe the essential motions of proteins from MD simulations\u003csup\u003e\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e\u003c/sup\u003e and experimental ensembles\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e,\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e. The input of PCA is an \u003cem\u003en\u003c/em\u003e \u0026times; \u003cem\u003e3N\u003c/em\u003e coordinate matrix, \u003cb\u003eX\u003c/b\u003e, \u003cem\u003en\u003c/em\u003e being the number of structures in the ensemble and \u003cem\u003eN\u003c/em\u003e the number of residues, usually considering only C\u003csup\u003eα\u003c/sup\u003e atoms. From \u003cb\u003eX\u003c/b\u003e, the elements of the symmetric covariance matrix, \u003cb\u003eC\u003c/b\u003e, are calculated as:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:{c}_{ij}=\u0026lang;\\left({x}_{i}-\u0026lang;{x}_{i}\u0026rang;\\right)\\left({x}_{j}-\u0026lang;{x}_{j}\u0026rang;\\right)\u0026rang;$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere the brackets \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\u0026lang;\\dots\\:\u0026rang;\\)\u003c/span\u003e\u003c/span\u003e indicate the average over the \u003cem\u003en\u003c/em\u003e structures. Eigenvalue-eigenvector decomposition is then used to diagonalize the covariance matrix as:\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:\\varvec{C}=\\varvec{U}\\varvec{\\varDelta\\:}{\\varvec{U}}^{T}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere the diagonal matrix \u003cb\u003eΔ\u003c/b\u003e contains the eigenvalues of \u003cb\u003eC\u003c/b\u003e, while the matrix \u003cb\u003eU\u003c/b\u003e contains its eigenvectors, representing the Principal Components (PCs). Eigenvalues are sorted in descending order and are directly proportional to the variance captured from each corresponding PC. After calculating the PCs, each structure is projected in the low-dimensionality PC space\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e:\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:{p}_{l,m}=\\left({\\varvec{X}}_{\\varvec{l}}-{\\varvec{X}}_{\\varvec{r}\\varvec{e}\\varvec{f}}\\right)\\bullet\\:\\frac{{\\varvec{P}\\varvec{C}}_{\\varvec{m}}}{\\left|{\\varvec{P}\\varvec{C}}_{\\varvec{m}}\\right|}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cem\u003ep\u003c/em\u003e\u003csub\u003e\u003cem\u003el,m\u003c/em\u003e\u003c/sub\u003e is the projection of conformation \u003cem\u003el\u003c/em\u003e along the \u003cem\u003em\u003c/em\u003e\u003csup\u003eth\u003c/sup\u003e PC, \u003cb\u003eX\u003c/b\u003e\u003csub\u003e\u003cb\u003el\u003c/b\u003e\u003c/sub\u003e and \u003cb\u003eX\u003c/b\u003e\u003csub\u003e\u003cb\u003eref\u003c/b\u003e\u003c/sub\u003e are the vectors containing the 3D coordinates of the \u003cem\u003el\u003c/em\u003e\u003csup\u003eth\u003c/sup\u003e conformation and the reference structure, respectively, and \u003cb\u003ePC\u003c/b\u003e\u003csub\u003e\u003cb\u003em\u003c/b\u003e\u003c/sub\u003e is the vector of the apparent motion captured by the \u003cem\u003em\u003c/em\u003e\u003csup\u003eth\u003c/sup\u003e PC. As shown in our previous work\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e, PCA of structurally rich ensembles allows to identify clusters of significant conformations, as well as conformational-functional intermediates. In this work, we used PC projections to select relevant end-state conformers of large proteins and simulate their transition pathways with eBDIMS2.\u003c/p\u003e\n\u003ch3\u003eeBDIMS2 for modeling conformational transitions\u003c/h3\u003e\n\u003cp\u003eeBDIMS2 is an optimized version of our previous elastic network-driven Brownian Dynamics IMportance Sampling (eBDIMS) method\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e,\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e, which is now able to deal with very large proteins and complex conformational transitions in remarkably low computing times. The goal of eBDIMS is to model conformational changes from a starting conformation, \u003cem\u003eR\u003c/em\u003e\u003csub\u003e\u003cem\u003e0\u003c/em\u003e\u003c/sub\u003e, to a target state, \u003cem\u003eR\u003c/em\u003e\u003csub\u003e\u003cem\u003et\u003c/em\u003e\u003c/sub\u003e. It uses a coarse-grained (CG) representation of the protein, considering one bead per amino acid (C\u003csup\u003eα\u003c/sup\u003e atom), and implements the MD-derived essential-dynamics Elastic Network Model (edENM) force-field\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e, where the protein is treated as a network of mass particles connected by elastic springs. A Brownian Dynamics (BD) framework\u003csup\u003e\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e\u003c/sup\u003e is used to simulate the protein dynamics and trace physically acceptable trajectories from \u003cem\u003eR\u003c/em\u003e\u003csub\u003e\u003cem\u003e0\u003c/em\u003e\u003c/sub\u003e to \u003cem\u003eR\u003c/em\u003e\u003csub\u003e\u003cem\u003et\u003c/em\u003e\u003c/sub\u003e. The equations of motion follow the Langevin Eq.\u0026nbsp;2\u003csup\u003e6\u003c/sup\u003e:\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:{m}_{i}{\\ddot{r}}_{i}={F}_{i}-\\gamma\\:{\\dot{r}}_{i}+{\\xi\\:}_{i}\\left(t\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cem\u003em\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e and \u003cem\u003er\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e are the mass and the position of the \u003cem\u003ei\u003c/em\u003e\u003csup\u003eth\u003c/sup\u003e particle, respectively, \u003cem\u003eF\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e is the force acting on the \u003cem\u003ei\u003c/em\u003e\u003csup\u003eth\u003c/sup\u003e particle due to the particle-particle interactions from the edENM, \u003cem\u003eγ\u003c/em\u003e is the friction coefficient related to dispersion forces arising from to the interactions with the surrounding fluid\u003csup\u003e\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e\u003c/sup\u003e, and \u003cem\u003eξ\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e is a time-dependent white-noise term that accounts for the thermal motion of the solvent\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e,\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e,\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e\u003c/sup\u003e. In order to bias the trajectory in the direction of the target, eBDIMS uses Dynamics IMportance Sampling (DIMS). Every number \u003cem\u003ek\u003c/em\u003e of BD steps, a progress variable \u003cem\u003eΓ\u003c/em\u003e is computed and used to drive the transition. \u003cem\u003eΓ\u003c/em\u003e is defined as difference of pairwise distances between the simulated (\u003cem\u003eR\u003c/em\u003e\u003csub\u003e\u003cem\u003es\u003c/em\u003e\u003c/sub\u003e) and target (\u003cem\u003eR\u003c/em\u003e\u003csub\u003e\u003cem\u003et\u003c/em\u003e\u003c/sub\u003e) conformations\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e:\u003cdiv id=\"Equ5\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$\\:{\\varGamma\\:}_{s}=\\sum\\:_{i=1}^{N-1}\\sum\\:_{j=i+1}^{N}{\\left({{d}_{ij}}^{s}-{{d}_{ij}}^{t}\\right)}^{2}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cem\u003ed\u003c/em\u003e\u003csub\u003e\u003cem\u003eij\u003c/em\u003e\u003c/sub\u003e\u003csup\u003e\u003cem\u003es\u003c/em\u003e\u003c/sup\u003e is the distance between the \u003cem\u003ei\u003c/em\u003e\u003csup\u003eth\u003c/sup\u003e and \u003cem\u003ej\u003c/em\u003e\u003csup\u003eth\u003c/sup\u003e particles in the simulated structure \u003cem\u003eR\u003c/em\u003e\u003csub\u003e\u003cem\u003es\u003c/em\u003e\u003c/sub\u003e at step \u003cem\u003es\u003c/em\u003e, \u003cem\u003ed\u003c/em\u003e\u003csub\u003e\u003cem\u003eij\u003c/em\u003e\u003c/sub\u003e\u003csup\u003e\u003cem\u003et\u003c/em\u003e\u003c/sup\u003e is their distance in the target conformation \u003cem\u003eR\u003c/em\u003e\u003csub\u003e\u003cem\u003et\u003c/em\u003e\u003c/sub\u003e, and \u003cem\u003eN\u003c/em\u003e is the total number of particles in the system. \u003cem\u003eΓ\u003c/em\u003e\u003csub\u003e\u003cem\u003es\u003c/em\u003e\u003c/sub\u003e is compared every \u003cem\u003ek\u003c/em\u003e steps to the previous value, \u003cem\u003eΓ\u003c/em\u003e\u003csub\u003e\u003cem\u003es\u0026minus;1\u003c/em\u003e\u003c/sub\u003e, and the current conformation \u003cem\u003eR\u003c/em\u003e\u003csub\u003e\u003cem\u003es\u003c/em\u003e\u003c/sub\u003e is accepted if \u003cem\u003eΓ\u003c/em\u003e\u003csub\u003e\u003cem\u003es\u003c/em\u003e\u003c/sub\u003e\u0026thinsp;\u0026lt;\u0026thinsp;\u003cem\u003eΓ\u003c/em\u003e\u003csub\u003e\u003cem\u003es\u0026minus;1\u003c/em\u003e\u003c/sub\u003e, or rejected otherwise\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. The iterations proceed until convergence to \u003cem\u003eR\u003c/em\u003e\u003csub\u003e\u003cem\u003et\u003c/em\u003e\u003c/sub\u003e, e.g., until the sampled conformations reach a Root Mean Square Deviation (RMSD) from the target in the range of thermal oscillations (~\u0026thinsp;1\u0026Aring;) or when \u003cem\u003eΓ\u003c/em\u003e\u003csub\u003e\u003cem\u003es\u003c/em\u003e\u003c/sub\u003e is sufficiently close to zero.\u003c/p\u003e \u003cp\u003eOur original version of eBDIMS is currently available as a public webserver and as a stand-alone \u003cem\u003eC\u003c/em\u003e\u003csup\u003e\u003cem\u003e++\u003c/em\u003e\u003c/sup\u003e code\u003csup\u003e45\u003c/sup\u003e and is efficient for proteins up to ~\u0026thinsp;2k residues and ~\u0026thinsp;250 kDa. Larger systems would require an enormous time to drive the transition up to the target conformation. The new eBDIMS2 algorithm, implemented in \u003cem\u003eFortran\u003c/em\u003e, overcomes this limitation by modifying the strategy to calculate interaction forces during the BD simulation. In eBDIMS, the \u003cem\u003eF\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e term reported in Eq.\u0026nbsp;(\u003cspan refid=\"Equ4\" class=\"InternalRef\"\u003e4\u003c/span\u003e) is calculated by iterating over every possible interaction, i.e.:\u003cdiv id=\"Equ6\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$$\\:{F}_{i}=\\sum\\:_{i=1}^{N-1}\\sum\\:_{j=i+1}^{N}{F}_{ij}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cem\u003eF\u003c/em\u003e\u003csub\u003e\u003cem\u003eij\u003c/em\u003e\u003c/sub\u003e is the force between two particles \u003cem\u003ei\u003c/em\u003e and \u003cem\u003ej\u003c/em\u003e based on the edENM force-field. This gives rise to a quadratic increase in the number of interactions with the system size, i.e., (\u003cem\u003eN\u003c/em\u003e\u003csup\u003e\u003cem\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/em\u003e\u003c/sup\u003e \u003cem\u003e\u0026ndash; N\u003c/em\u003e)/2 for a set of \u003cem\u003eN\u003c/em\u003e particles. For a small protein like RBP (271 residues), this would lead to considering\u0026thinsp;~\u0026thinsp;36.5k particle-particle interactions, while for a huge system like ryanodine receptor 1 (RyR1, ~\u0026thinsp;17k residues) this leads to almost 140\u0026nbsp;million interactions to be computed at each step. Due to the strong power-law decay of edENM interactions between non-bonded particles (see Supplementary Material), eBDIMS2 implements a more efficient strategy based on an adaptive cutoff, similar to the procedures employed in MD simulations. At the beginning of the BD, we generate a list of proximal residues in the reference conformation \u003cem\u003eR\u003c/em\u003e\u003csub\u003e\u003cem\u003e0\u003c/em\u003e\u003c/sub\u003e based on the cutoff \u003cem\u003er\u003c/em\u003e\u003csub\u003e\u003cem\u003ec\u003c/em\u003e\u003c/sub\u003e. This list is used to evaluate all \u003cem\u003ei\u003c/em\u003e-\u003cem\u003ej\u003c/em\u003e interactions that are numerically meaningful and therefore used to compute all not-zero \u003cem\u003eF\u003c/em\u003e\u003csub\u003e\u003cem\u003eij\u003c/em\u003e\u003c/sub\u003e values. This strategy strongly reduces the number of iterations, decreasing the computational burden of the algorithm, particularly for larger systems. For example, considering a cutoff \u003cem\u003er\u003c/em\u003e\u003csub\u003e\u003cem\u003ec\u003c/em\u003e\u003c/sub\u003e of 8 \u0026Aring;, RBP requires\u0026thinsp;~\u0026thinsp;1.6k interactions in eBDIMS2 (instead of ~\u0026thinsp;36.5k), while only\u0026thinsp;~\u0026thinsp;80k interactions are required for RyR1 (instead of ~\u0026thinsp;140\u0026nbsp;million). After a certain number of BD steps, we update this force interaction list, to account for the new positions of C\u003csup\u003eα\u003c/sup\u003e atoms in the new conformation. Then, the simulation proceeds until convergence. We tested the performance of eBDIMS2 for several values of the cutoff \u003cem\u003er\u003c/em\u003e\u003csub\u003e\u003cem\u003ec\u003c/em\u003e\u003c/sub\u003e (8, 10, 15, 20 \u0026Aring;) and the biasing frequency \u003cem\u003ek\u003c/em\u003e (1, 2, 5, 10), looking at the time required to simulate the transitions and its projection in the PC space. While these parameters were not found to play a significant role on the PC projections of the transitions, a cutoff \u003cem\u003er\u003c/em\u003e\u003csub\u003e\u003cem\u003ec\u003c/em\u003e\u003c/sub\u003e of 8 \u0026Aring; and a biasing frequency \u003cem\u003ek\u003c/em\u003e of 10 steps were found to be optimal to minimize computing time, especially for larger systems (see Supplementary Material).\u003c/p\u003e \u003cp\u003eAnother advantage of eBDIMS2 is that it can now compute pathways between structures with missing residues. Most of the path-sampling methods available in the literature\u003csup\u003e\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e\u003c/sup\u003e require the two end-state protein conformations not to have missing residues. However, almost all large systems from cryo-EM inevitably present several regions that are missing in the 3D model due to, e.g., difficulties in fitting density maps, low resolution, high local flexibilities, etc. For this reason, we have developed eBDIMS2 in such a way that the two protein end states can have gaps in the sequence, the overall connectivity being ensured by the ENM non-bonded interactions. An additional feature that we included in eBDIMS2 is also the possibility to model rigid blocks in the protein explicitly, and we removed the need to have exactly the same number of residues in the end-state conformers. Additional information about these extra features can be found in the Supplementary Material.\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eApplication of eBDIMS2 to large proteins\u003c/h2\u003e \u003cp\u003eAfter identifying biologically relevant conformational clusters from the PC spaces, we applied eBDIMS2 to run transition pathways between all end-state conformers, both in the forward and reverse directions. All simulations were carried out on a Linux workstation with an Intel\u0026reg; Core i9-13900K processor, 64 GB of RAM, and using \u003cem\u003eOpenMP\u003c/em\u003e parallelization with 16 threads. For each transition, we computed RMSD values and collectivity degrees. The former quantifies the amplitude of the conformational change:\u003cdiv id=\"Equ7\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ7\" name=\"EquationSource\"\u003e\n$$\\:RMSD=\\sqrt{\\frac{1}{N}\\sum\\:_{i=1}^{N}{\\left({{r}_{i}}^{t}-{{r}_{i}}^{0}\\right)}^{2}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e7\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cem\u003er\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e\u003csup\u003e\u003cem\u003et\u003c/em\u003e\u003c/sup\u003e and \u003cem\u003er\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e\u003csup\u003e\u003cem\u003e0\u003c/em\u003e\u003c/sup\u003e represent the positions (after alignment) of the \u003cem\u003ei\u003c/em\u003e\u003csup\u003eth\u003c/sup\u003e C\u003csup\u003eα\u003c/sup\u003e atom in the target and reference conformation, respectively. The collectivity degree \u003cem\u003eκ\u003c/em\u003e provides an estimate of the global-local nature of the transition\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e:\u003cdiv id=\"Equ8\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ8\" name=\"EquationSource\"\u003e\n$$\\:\\kappa\\:=\\frac{1}{N}\\text{exp}\\left(-\\sum\\:_{i=1}^{N}\\frac{\\left|{{r}_{i}}^{t}-{{r}_{i}}^{0}\\right|}{\\sum\\:_{i=1}^{N}\\left|{{r}_{i}}^{t}-{{r}_{i}}^{0}\\right|}\\text{log}\\frac{\\left|{{r}_{i}}^{t}-{{r}_{i}}^{0}\\right|}{\\sum\\:_{i=1}^{N}\\left|{{r}_{i}}^{t}-{{r}_{i}}^{0}\\right|}\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e8\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eand its value can range from a minimum of 1\u003cem\u003e/N\u003c/em\u003e, when only one atom is involved in the conformational change, to a maximum of 1 when all atoms uniformly participate to the transition. eBDIMS2 pathways were then projected on the corresponding PC spaces via Eq.\u0026nbsp;(\u003cspan refid=\"Equ3\" class=\"InternalRef\"\u003e3\u003c/span\u003e), to inspect the relationship between generated intermediates and experimental conformations. To quantify the performance of the method, we recorded RMSD values from the target at the moment of convergence and the time employed by the method to reach convergence. We also computed RMSD values between eBDIMS2 transitions and on-path experimental intermediates.\u003c/p\u003e \u003cp\u003eTo assess the stereochemical quality of the eBDIMS2-generated conformers, we computed distances between pairs of consecutive C\u003csup\u003eα\u003c/sup\u003e atoms and compared the C\u003csup\u003eα\u003c/sup\u003e-C\u003csup\u003eα\u003c/sup\u003e distance distributions to those obtained from experimental structures. After performing all-atom reconstruction with \u003cem\u003ecg2all\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003e, we also made use of MolProbity\u003csup\u003e\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e to generate Ramachandran plots and check the atomistic quality of our intermediate conformers.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eeBDIMS2 transition pathways and comparison with other path-sampling methods\u003c/h2\u003e \u003cp\u003eOver the past 20 years a plethora of methods have been developed to model conformational transitions in proteins\u003csup\u003e\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e\u003c/sup\u003e. Here we compared eBDIMS2 with our previous eBDIMS \u003cem\u003eC\u003c/em\u003e\u003csup\u003e\u003cem\u003e++\u003c/em\u003e\u003c/sup\u003e version\u003csup\u003e26\u003c/sup\u003e and eight additional algorithms whose executables are freely available: iMOD\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e, GOdMD\u003csup\u003e\u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e72\u003c/span\u003e\u003c/sup\u003e, NGENI\u003csup\u003e\u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e73\u003c/span\u003e\u003c/sup\u003e, ICONGENI\u003csup\u003e\u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e74\u003c/span\u003e\u003c/sup\u003e, Climber\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e, ENI\u003csup\u003e\u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e75\u003c/span\u003e\u003c/sup\u003e, aANM\u003csup\u003e\u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e76\u003c/span\u003e\u003c/sup\u003e, and ANMPathway\u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e. These methodologies mainly differ for: (i) the representation of the protein degrees of freedom (DOFs); (ii) the simulation framework to model the protein dynamics; (iii) the biasing strategy used to drive the transition; and (iv) the reversibility-irreversibility of the transition in the forward-backward direction. More information about these methods is provided in the Supplementary Material. As mentioned above, the majority of these algorithms do not allow to analyze transitions in proteins with missing residues. For this reason, we carried out a detailed comparison for four proteins in our dataset that have full-length structural ensembles and of increasing size, i.e., RBP (271 residues), RNAseIII (432 residues), SERCA (993 residues), and GroEL in the 7-mer oligomerization state (3,626 residues). For these four systems, we simulated transition pathways between two relevant end-state conformations, and we compared computing times and pathway projections in the PC space. All calculations were performed on the same Linux workstation described above.\u003c/p\u003e \u003cp\u003eAt the time of this writing, MinActionPath2 was released\u003csup\u003e\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e, which represents an improvement of the previous MinActionPath algorithm\u003csup\u003e\u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e77\u003c/span\u003e\u003c/sup\u003e, and can now be used to deal with large macromolecular assemblies. Both MinActionPath and MinActionPath2 are only available as webservers, which prevents a thorough comparison of computing times with eBDIMS2. However, we used the MinActionPath2 webserver to assess the quality of the transition points of some of the large proteins in our dataset that undergo complex and large-scale conformational changes and compared them with our eBDIMS2 intermediates.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eAtomistic MD simulations\u003c/h2\u003e \u003cp\u003eWe also performed atomistic MD simulations, both unbiased MD from end-state and intermediate conformations to assess the sampling of the conformational landscape, and targeted MD (TMD) to simulate transition pathways\u003csup\u003e\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e,\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e\u003c/sup\u003e. For proteins with missing residues, e.g., DNA-dependent protein kinase catalytic subunit (DNA-PKcs) and ATP-citrate synthase (ACLY), unmodelled gaps were filled by using the SWISS-MODEL\u003csup\u003e\u003cspan citationid=\"CR78\" class=\"CitationRef\"\u003e78\u003c/span\u003e\u003c/sup\u003e webserver. For simulations starting from eBDIMS2-generated intermediates, \u003cem\u003ecg2all\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003e was used to obtain atomistic models suitable for MD (see Supplementary Tables).\u003c/p\u003e \u003cp\u003eAll molecular systems were prepared with CHARMM-GUI\u003csup\u003e\u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e79\u003c/span\u003e\u003c/sup\u003e, and MD simulations were performed using Gromacs\u003csup\u003e\u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e80\u003c/span\u003e\u003c/sup\u003e version 2024.1. For TMD, we used Gromacs patched with Plumed\u003csup\u003e\u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e81\u003c/span\u003e\u003c/sup\u003e. The CHARMM36m force field was used to describe the biomolecular interactions\u003csup\u003e\u003cspan citationid=\"CR82\" class=\"CitationRef\"\u003e82\u003c/span\u003e\u003c/sup\u003e, and we added TIP3P water molecules as well as sodium (Na\u003csup\u003e+\u003c/sup\u003e) and chloride (Cl\u003csup\u003e\u0026minus;\u003c/sup\u003e) ions at 150 mM concentration, to maintain physiological salt concentration and mimic intracellular conditions. First, we carried out an energy minimization with the steepest descent algorithm for 5,000 steps. Then, the system underwent a 125-ps equilibration in order to maintain a temperature of 303.15 K, with the Nose\u0026ndash;Hoover thermostat\u003csup\u003e\u003cspan citationid=\"CR83\" class=\"CitationRef\"\u003e83\u003c/span\u003e,\u003cspan citationid=\"CR84\" class=\"CitationRef\"\u003e84\u003c/span\u003e\u003c/sup\u003e, and a pressure of 1.0 bar, using the Parrinello\u0026ndash;Rahman barostat\u003csup\u003e\u003cspan citationid=\"CR85\" class=\"CitationRef\"\u003e85\u003c/span\u003e\u003c/sup\u003e with isotropic pressure coupling. The LINCS algorithm was used to constrain H-bonds\u003csup\u003e\u003cspan citationid=\"CR86\" class=\"CitationRef\"\u003e86\u003c/span\u003e\u003c/sup\u003e. Short-range van der Waals and electrostatic interactions cutoffs were set to 12 \u0026Aring;. Long-range electrostatic interactions were described using the particle mesh Ewald (PME) approach\u003csup\u003e\u003cspan citationid=\"CR87\" class=\"CitationRef\"\u003e87\u003c/span\u003e,\u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e88\u003c/span\u003e\u003c/sup\u003e with periodic boundary conditions.\u003c/p\u003e \u003cp\u003eFor unbiased MD simulations, we carried out production runs using a 2-fs time step and saving coordinate frames every 100 ps. To speed up the simulations for medium-size proteins, we made use of H-mass repartitioning with a longer 4-fs time step\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e. Each system was simulated for 200 ns, with three replicas starting from different random seeds. A total of 0.6 \u0026micro;s unbiased atomistic dynamics was thus generated for at least two distinct conformations (Supplementary Tables), which we used to build Free Energy Landscapes (FELs). To check how well MD trajectories align with experimental conformations, we computed overlap scores and Root Mean Square Inner Products (RMSIPs) between experimental PCs and Essential Dynamics (ED) eigenvectors (see Supplementary Material)\u003csup\u003e\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e\u003c/sup\u003e. For TMD, three production runs were also carried out to obtain transition pathways between the selected end-state conformers. For the medium-size systems, TMD runs were carried out for 1 ns, while for DNA-PKcs and ACLY we simulated for 2 ns (Supplementary Tables). In TMD, the RMSD between the two aligned end states was used as bias and applied every 10 steps with an elastic constant of 100 kcal/mol/\u0026Aring;\u003csup\u003e2\u003c/sup\u003e. TMD trajectories reached convergence in all analyzed systems, with an RMSD from the target of ~\u0026thinsp;1\u0026ndash;2 \u0026Aring;. Since our benchmark dataset included the much-studied \u003cem\u003eSARS-CoV-2\u003c/em\u003e spike glycoprotein, we also made use of MD trajectories publicly available from the Amaro\u0026rsquo;s lab\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e,\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e\u003c/sup\u003e. We downloaded several simulations of the open, closed, and N165A-N234A double-mutant state of the spike\u003csup\u003e\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e\u003c/sup\u003e, as well as a trajectory showing the opening of one receptor-binding domain (RBD) obtained through a Weighted Ensemble (WE) enhanced sampling approach\u003csup\u003e\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. All trajectories and FELs were projected on the experimental PC spaces and compared with our eBDIMS2 pathways.\u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDS developed eBDIMS2, built the protein dataset, generated the structural ensembles, designed and performed all simulations and analyses, prepared figures, and wrote the manuscript draft. BHL contributed with MD simulations and with comparison with ENI, NGENI and ICONGENI path-sampling algorithms. LO conceived the original idea and contributed to discussions. All authors participated in data interpretation and manuscript revision.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe eBDIMS2 code is available at https://github.com/domenicoscaramozzino/eBDIMS2. Additional data, such as PDB models of ensembles and transition pathways, as well as eBDIMS2 transition GIFs,\u003c/p\u003e\n\u003cp\u003eis available at figshare (ensemble data: https://doi.org/10.6084/m9.figshare.28334204.v1; GIFs: https://doi.org/10.6084/m9.figshare.28334195.v1).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding and acknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eLO acknowledges financial support from Cancerfonden Junior Investigator Award (CF 21 0305 JIA) and Project Grants (CF 21 1471 Pj, CF 24 3801 Pj) as well as Vetenskapsr\u0026aring;det Starting Grant (VR 2021-02248) and Karolinska Institutet. DS acknowledges financial support from Cancerfonden postdoctoral fellowship (24 0908 PT). Simulations were run using the National Academic Infrastructure for Supercomputing in Sweden (allocations NAISS 2023/5-400 and 2024/1-7 to LO).\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eHensen, U. \u003cem\u003eet al.\u003c/em\u003e Exploring Protein Dynamics Space: The Dynasome as the Missing Link between Protein Structure and Function. \u003cem\u003ePLOS ONE\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, e33931 (2012).\u003c/li\u003e\n\u003cli\u003eHenzler-Wildman, K. \u0026amp; Kern, D. Dynamic personalities of proteins. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e450\u003c/strong\u003e, 964\u0026ndash;72 (2007).\u003c/li\u003e\n\u003cli\u003eOrellana, L. Large-Scale Conformational Changes and Protein Function: Breaking the in silico Barrier. \u003cem\u003eFront. Mol. Biosci.\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 117 (2019).\u003c/li\u003e\n\u003cli\u003eOrellana, L. Are Protein Shape-Encoded Lowest-Frequency Motions a Key Phenotype Selected by Evolution? \u003cem\u003eAppl. Sci.\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 6756 (2023).\u003c/li\u003e\n\u003cli\u003eMiller, M. D. \u0026amp; Phillips, G. N. Moving beyond static snapshots: Protein dynamics and the Protein Data Bank. \u003cem\u003eJ. Biol. Chem.\u003c/em\u003e \u003cstrong\u003e296\u003c/strong\u003e, 100749 (2021).\u003c/li\u003e\n\u003cli\u003eOrellana, L. \u003cem\u003eet al.\u003c/em\u003e Oncogenic mutations at the EGFR ectodomain structurally converge to remove a steric hindrance on a kinase-coupled cryptic epitope. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e \u003cstrong\u003e116\u003c/strong\u003e, 10009\u0026ndash;10018 (2019).\u003c/li\u003e\n\u003cli\u003eDror, R. O., Dirks, R. M., Grossman, J. P., Xu, H. \u0026amp; Shaw, D. E. Biomolecular Simulation: A Computational Microscope for Molecular Biology. \u003cem\u003eAnnu. Rev. Biophys.\u003c/em\u003e \u003cstrong\u003e41\u003c/strong\u003e, 429\u0026ndash;452 (2012).\u003c/li\u003e\n\u003cli\u003eHospital, A., Go\u0026ntilde;i, J. R., Orozco, M. \u0026amp; Gelp\u0026iacute;, J. L. Molecular dynamics simulations: advances and applications. \u003cem\u003eAdv. Appl. Bioinforma. Chem. AABC\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 37\u0026ndash;47 (2015).\u003c/li\u003e\n\u003cli\u003eMhashal, A., Emperador, A. \u0026amp; Orellana, L. Computational techniques to study protein dynamics and conformations. in \u003cem\u003eAdvances in Protein Molecular and Structural Biology Methods\u003c/em\u003e 199\u0026ndash;212 (Elsevier, 2022). doi:10.1016/B978-0-323-90264-9.00013-1.\u003c/li\u003e\n\u003cli\u003eLaio, A. \u0026amp; Parrinello, M. Escaping free-energy minima. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e \u003cstrong\u003e99\u003c/strong\u003e, 12562\u0026ndash;12566 (2002).\u003c/li\u003e\n\u003cli\u003eHopkins, C. W., Le Grand, S., Walker, R. C. \u0026amp; Roitberg, A. E. Long-Time-Step Molecular Dynamics through Hydrogen Mass Repartitioning. \u003cem\u003eJ. Chem. Theory Comput.\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 1864\u0026ndash;1874 (2015).\u003c/li\u003e\n\u003cli\u003eKmiecik, S. \u003cem\u003eet al.\u003c/em\u003e Coarse-Grained Protein Models and Their Applications. \u003cem\u003eChem. Rev.\u003c/em\u003e \u003cstrong\u003e116\u003c/strong\u003e, 7898\u0026ndash;7936 (2016).\u003c/li\u003e\n\u003cli\u003eBahar, I. \u0026amp; Rader, A. Coarse-grained normal mode analysis in structural biology. \u003cem\u003eCurr. Opin. Struct. Biol.\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 586\u0026ndash;592 (2005).\u003c/li\u003e\n\u003cli\u003eTama, F. \u0026amp; Sanejouand, Y.-H. Conformational change of proteins arising from normal mode calculations. \u003cem\u003eProtein Eng. Des. Sel.\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 1\u0026ndash;6 (2001).\u003c/li\u003e\n\u003cli\u003eYang, L., Song, G. \u0026amp; Jernigan, R. L. How Well Can We Understand Large-Scale Protein Motions Using Normal Modes of Elastic Network Models? \u003cem\u003eBiophys. J.\u003c/em\u003e \u003cstrong\u003e93\u003c/strong\u003e, 920\u0026ndash;929 (2007).\u003c/li\u003e\n\u003cli\u003eTama, F., Gadea, F. X., Marques, O. \u0026amp; Sanejouand, Y.-H. Building-block approach for determining low-frequency normal modes of macromolecules. \u003cem\u003eProteins Struct. Funct. Bioinforma.\u003c/em\u003e \u003cstrong\u003e41\u003c/strong\u003e, 1\u0026ndash;7 (2000).\u003c/li\u003e\n\u003cli\u003eAtilgan, A. R. \u003cem\u003eet al.\u003c/em\u003e Anisotropy of Fluctuation Dynamics of Proteins with an Elastic Network Model. \u003cem\u003eBiophys. J.\u003c/em\u003e \u003cstrong\u003e80\u003c/strong\u003e, 505\u0026ndash;515 (2001).\u003c/li\u003e\n\u003cli\u003eYang, L., Song, G. \u0026amp; Jernigan, R. L. Protein elastic network models and the ranges of cooperativity. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e \u003cstrong\u003e106\u003c/strong\u003e, 12347\u0026ndash;12352 (2009).\u003c/li\u003e\n\u003cli\u003eOrellana, L. \u003cem\u003eet al.\u003c/em\u003e Approaching Elastic Network Models to Molecular Dynamics Flexibility. \u003cem\u003eJ. Chem. Theory Comput.\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 2910\u0026ndash;2923 (2010).\u003c/li\u003e\n\u003cli\u003eLop\u0026eacute;z-Blanco, J. R., Garz\u0026oacute;n, J. I. \u0026amp; Chac\u0026oacute;n, P. iMod: multipurpose normal mode analysis in internal coordinates. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 2843\u0026ndash;2850 (2011).\u003c/li\u003e\n\u003cli\u003eHoffmann, A. \u0026amp; Grudinin, S. NOLB: Nonlinear Rigid Block Normal-Mode Analysis Method. \u003cem\u003eJ. Chem. Theory Comput.\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 2123\u0026ndash;2134 (2017).\u003c/li\u003e\n\u003cli\u003eKhade, P. M. \u003cem\u003eet al.\u003c/em\u003e hdANM: a new comprehensive dynamics model for protein hinges. \u003cem\u003eBiophys. J.\u003c/em\u003e \u003cstrong\u003e120\u003c/strong\u003e, 4955\u0026ndash;4965 (2021).\u003c/li\u003e\n\u003cli\u003eScaramozzino, D., Piana, G., Lacidogna, G. \u0026amp; Carpinteri, A. Low-Frequency Harmonic Perturbations Drive Protein Conformational Changes. \u003cem\u003eInt. J. Mol. Sci.\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 10501 (2021).\u003c/li\u003e\n\u003cli\u003eScaramozzino, D., Khade, P. M., Jernigan, R. L., Lacidogna, G. \u0026amp; Carpinteri, A. Structural compliance: A new metric for protein flexibility. \u003cem\u003eProteins Struct. Funct. Bioinforma.\u003c/em\u003e \u003cstrong\u003e88\u003c/strong\u003e, 1482\u0026ndash;1492 (2020).\u003c/li\u003e\n\u003cli\u003eScaramozzino, D., Khade, P. M. \u0026amp; Jernigan, R. L. Protein Fluctuations in Response to Random External Forces. \u003cem\u003eAppl. Sci.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 2344 (2022).\u003c/li\u003e\n\u003cli\u003eOrellana, L., Yoluk, O., Carrillo, O., Orozco, M. \u0026amp; Lindahl, E. Prediction and validation of protein intermediate states from structurally rich ensembles and coarse-grained simulations. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 12575 (2016).\u003c/li\u003e\n\u003cli\u003eWeiss, D. R. \u0026amp; Levitt, M. Can Morphing Methods Predict Intermediate Structures? \u003cem\u003eJ. Mol. Biol.\u003c/em\u003e \u003cstrong\u003e385\u003c/strong\u003e, 665\u0026ndash;674 (2009).\u003c/li\u003e\n\u003cli\u003eKitao, A. Principal Component Analysis and Related Methods for Investigating the Dynamics of Biological Macromolecules. \u003cem\u003eJ\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 298\u0026ndash;317 (2022).\u003c/li\u003e\n\u003cli\u003eBergh, C., Heusser, S. A., Howard, R. \u0026amp; Lindahl, E. Markov state models of proton- and pore-dependent activation in a pentameric ligand-gated ion channel. \u003cem\u003eeLife\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, e68369 (2021).\u003c/li\u003e\n\u003cli\u003eLycksell, M. \u003cem\u003eet al.\u003c/em\u003e Probing solution structure of the pentameric ligand-gated ion channel GLIC by small-angle neutron scattering. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e \u003cstrong\u003e118\u003c/strong\u003e, e2108006118 (2021).\u003c/li\u003e\n\u003cli\u003eAlmeida, B. C., Kaczmarek, J. A., Figueiredo, P. R., Prather, K. L. J. \u0026amp; Carvalho, A. T. P. Transcription factor allosteric regulation through substrate coordination to zinc. \u003cem\u003eNAR Genomics Bioinforma.\u003c/em\u003e \u003cstrong\u003e3\u003c/strong\u003e, lqab033 (2021).\u003c/li\u003e\n\u003cli\u003eChaudhuri, D., Majumder, S., Datta, J. \u0026amp; Giri, K. Elucidating the conformational change of dengue envelope protein using the Markov state model. \u003cem\u003eMol. Simul.\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, 1153\u0026ndash;1169 (2024).\u003c/li\u003e\n\u003cli\u003eQuerino Lima Afonso, M., da Fonseca, N. J. Jr., de Oliveira, L. C., Lobo, F. P. \u0026amp; Bleicher, L. Coevolved Positions Represent Key Functional Properties in the Trypsin-Like Serine Proteases Protein Family. \u003cem\u003eJ. Chem. Inf. Model.\u003c/em\u003e \u003cstrong\u003e60\u003c/strong\u003e, 1060\u0026ndash;1068 (2020).\u003c/li\u003e\n\u003cli\u003eZhekova, H. R. \u003cem\u003eet al.\u003c/em\u003e CryoEM structures of anion exchanger 1 capture multiple states of inward- and outward-facing conformations. \u003cem\u003eCommun. Biol.\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 1\u0026ndash;13 (2022).\u003c/li\u003e\n\u003cli\u003eMatsuoka, R. \u003cem\u003eet al.\u003c/em\u003e Structure, mechanism and lipid-mediated remodeling of the mammalian Na+/H+ exchanger NHA2. \u003cem\u003eNat. Struct. Mol. Biol.\u003c/em\u003e \u003cstrong\u003e29\u003c/strong\u003e, 108\u0026ndash;120 (2022).\u003c/li\u003e\n\u003cli\u003eKim, J. J. \u003cem\u003eet al.\u003c/em\u003e Shared structural mechanisms of general anaesthetics and benzodiazepines. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e585\u003c/strong\u003e, 303\u0026ndash;308 (2020).\u003c/li\u003e\n\u003cli\u003eCarroni, M. \u0026amp; Saibil, H. R. Cryo electron microscopy to determine the structure of macromolecular complexes. \u003cem\u003eMethods\u003c/em\u003e \u003cstrong\u003e95\u003c/strong\u003e, 78\u0026ndash;85 (2016).\u003c/li\u003e\n\u003cli\u003eBerman, H. M. \u003cem\u003eet al.\u003c/em\u003e The Protein Data Bank. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 235\u0026ndash;242 (2000).\u003c/li\u003e\n\u003cli\u003eBonomi, M. \u0026amp; Vendruscolo, M. Determination of protein structural ensembles using cryo-electron microscopy. \u003cem\u003eCurr. Opin. Struct. Biol.\u003c/em\u003e (2019) doi:10.1016/j.sbi.2018.10.006.\u003c/li\u003e\n\u003cli\u003eKrieger, J. M., Sorzano, C. O. S., Carazo, J. M. \u0026amp; Bahar, I. Protein dynamics developments for the large scale and cryoEM: case study of ProDy 2.0. \u003cem\u003eActa Crystallogr. Sect. Struct. Biol.\u003c/em\u003e \u003cstrong\u003e78\u003c/strong\u003e, 399\u0026ndash;409 (2022).\u003c/li\u003e\n\u003cli\u003eFrank, J. New Opportunities Created by Single-Particle Cryo-EM: The Mapping of Conformational Space. \u003cem\u003eBiochemistry\u003c/em\u003e (2018) doi:10.1021/acs.biochem.8b00064.\u003c/li\u003e\n\u003cli\u003eCasalino, L. \u003cem\u003eet al.\u003c/em\u003e AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics. \u003cem\u003eInt. J. High Perform. Comput. Appl.\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, 432\u0026ndash;451 (2021).\u003c/li\u003e\n\u003cli\u003ePiana, S. \u0026amp; Shaw, D. E. Atomic-Level Description of Protein Folding inside the GroEL Cavity. \u003cem\u003eJ. Phys. Chem. B\u003c/em\u003e \u003cstrong\u003e122\u003c/strong\u003e, 11440\u0026ndash;11449 (2018).\u003c/li\u003e\n\u003cli\u003eKoehl, P., Navaza, R., Tekpinar, M. \u0026amp; Delarue, M. MinActionPath2: path generation between different conformations of large macromolecular assemblies by action minimization. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, W256\u0026ndash;W263 (2024).\u003c/li\u003e\n\u003cli\u003eOrellana, L., Gustavsson, J., Bergh, C., Yoluk, O. \u0026amp; Lindahl, E. eBDIMS server: protein transition pathways with ensemble analysis in 2D-motion spaces. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, 3505\u0026ndash;3507 (2019).\u003c/li\u003e\n\u003cli\u003eYang, L., Song, G., Carriquiry, A. \u0026amp; Jernigan, R. L. Close correspondence between the motions from principal component analysis of multiple HIV-1 protease structures and elastic network modes. \u003cem\u003eStruct. Lond. Engl. 1993\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 321\u0026ndash;30 (2008).\u003c/li\u003e\n\u003cli\u003eSankar, K., Mishra, S. K. \u0026amp; Jernigan, R. L. Comparisons of Protein Dynamics from Experimental Structure Ensembles, Molecular Dynamics Ensembles, and Coarse-Grained Elastic Network Models. \u003cem\u003eJ. Phys. Chem. B\u003c/em\u003e \u003cstrong\u003e122\u003c/strong\u003e, 5409\u0026ndash;5417 (2018).\u003c/li\u003e\n\u003cli\u003eChaker-Margot, M. \u003cem\u003eet al.\u003c/em\u003e Structural basis of activation of the tumor suppressor protein neurofibromin. \u003cem\u003eMol. Cell\u003c/em\u003e \u003cstrong\u003e82\u003c/strong\u003e, 1288-1296.e5 (2022).\u003c/li\u003e\n\u003cli\u003eNaschberger, A., Baradaran, R., Rupp, B. \u0026amp; Carroni, M. The structure of neurofibromin isoform 2 reveals different functional states. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e599\u003c/strong\u003e, 315\u0026ndash;319 (2021).\u003c/li\u003e\n\u003cli\u003eDas, A. \u003cem\u003eet al.\u003c/em\u003e Exploring the Conformational Transitions of Biomolecular Systems Using a Simple Two-State Anisotropic Network Model. \u003cem\u003ePLoS Comput. Biol.\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, e1003521 (2014).\u003c/li\u003e\n\u003cli\u003eSobti, M., Ueno, H., Noji, H. \u0026amp; Stewart, A. G. The six steps of the complete F1-ATPase rotary catalytic cycle. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 4690 (2021).\u003c/li\u003e\n\u003cli\u003eB\u0026ouml;ckmann, R. A. \u0026amp; Grubm\u0026uuml;ller, H. Nanoseconds molecular dynamics simulation of primary mechanical energy transfer steps in F1-ATP synthase. \u003cem\u003eNat. Struct. Biol.\u003c/em\u003e (2002) doi:10.1038/nsb760.\u003c/li\u003e\n\u003cli\u003eHeo, L. \u0026amp; Feig, M. One bead per residue can describe all-atom protein structures. \u003cem\u003eStructure\u003c/em\u003e \u003cstrong\u003e32\u003c/strong\u003e, 97-111.e6 (2024).\u003c/li\u003e\n\u003cli\u003eWilliams, C. J. \u003cem\u003eet al.\u003c/em\u003e MolProbity: More and better reference data for improved all-atom structure validation. \u003cem\u003eProtein Sci. Publ. Protein Soc.\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 293\u0026ndash;315 (2018).\u003c/li\u003e\n\u003cli\u003eSchlitter, J., Engels, M. \u0026amp; Kr\u0026uuml;ger, P. Targeted molecular dynamics: a new approach for searching pathways of conformational transitions. \u003cem\u003eJ. Mol. Graph.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 84\u0026ndash;89 (1994).\u003c/li\u003e\n\u003cli\u003eOvchinnikov, V. \u0026amp; Karplus, M. Analysis and Elimination of a Bias in Targeted Molecular Dynamics Simulations of Conformational Transitions: Application to Calmodulin. \u003cem\u003eJ. Phys. Chem. B\u003c/em\u003e \u003cstrong\u003e116\u003c/strong\u003e, 8584\u0026ndash;8603 (2012).\u003c/li\u003e\n\u003cli\u003eChen, X. \u003cem\u003eet al.\u003c/em\u003e Structure of an activated DNA-PK and its implications for NHEJ. \u003cem\u003eMol. Cell\u003c/em\u003e \u003cstrong\u003e81\u003c/strong\u003e, 801-810.e3 (2021).\u003c/li\u003e\n\u003cli\u003eLiu, L. \u003cem\u003eet al.\u003c/em\u003e Autophosphorylation transforms DNA-PK from protecting to processing DNA ends. \u003cem\u003eMol. Cell\u003c/em\u003e \u003cstrong\u003e82\u003c/strong\u003e, 177-189.e4 (2022).\u003c/li\u003e\n\u003cli\u003eCasalino, L. \u003cem\u003eet al.\u003c/em\u003e Beyond Shielding: The Roles of Glycans in the SARS-CoV-2 Spike Protein. \u003cem\u003eACS Cent. Sci.\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 1722\u0026ndash;1734 (2020).\u003c/li\u003e\n\u003cli\u003eChen, Y. \u003cem\u003eet al.\u003c/em\u003e Role of PRKDC in cancer initiation, progression, and treatment. \u003cem\u003eCancer Cell Int.\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 563 (2021).\u003c/li\u003e\n\u003cli\u003eIcard, P. \u003cem\u003eet al.\u003c/em\u003e ATP citrate lyase: A central metabolic enzyme in cancer. \u003cem\u003eCancer Lett.\u003c/em\u003e \u003cstrong\u003e471\u003c/strong\u003e, 125\u0026ndash;134 (2020).\u003c/li\u003e\n\u003cli\u003eZhang, M. \u003cem\u003eet al.\u003c/em\u003e ITPR3 facilitates tumor growth, metastasis and stemness by inducing the NF-ĸB/CD44 pathway in urinary bladder carcinoma. \u003cem\u003eJ. Exp. Clin. Cancer Res.\u003c/em\u003e \u003cstrong\u003e40\u003c/strong\u003e, 65 (2021).\u003c/li\u003e\n\u003cli\u003eGuo, H. \u003cem\u003eet al.\u003c/em\u003e Structure of mycobacterial ATP synthase bound to the tuberculosis drug bedaquiline. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e589\u003c/strong\u003e, 143\u0026ndash;147 (2021).\u003c/li\u003e\n\u003cli\u003eMcCarthy, T. V., Quane, K. A. \u0026amp; Lynch, P. J. Ryanodine receptor mutations in malignant hyperthermia and central core disease. \u003cem\u003eHum. Mutat.\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 410\u0026ndash;417 (2000).\u003c/li\u003e\n\u003cli\u003eZacharioudakis, E. \u0026amp; Gavathiotis, E. Targeting protein conformations with small molecules to control protein complexes. \u003cem\u003eTrends Biochem. Sci.\u003c/em\u003e \u003cstrong\u003e47\u003c/strong\u003e, 1023\u0026ndash;1037 (2022).\u003c/li\u003e\n\u003cli\u003eBinder, Z. A. \u003cem\u003eet al.\u003c/em\u003e Epidermal Growth Factor Receptor Extracellular Domain Mutations in Glioblastoma Present Opportunities for Clinical Imaging and Therapeutic Development. \u003cem\u003eCancer Cell\u003c/em\u003e \u003cstrong\u003e34\u003c/strong\u003e, 163-177.e7 (2018).\u003c/li\u003e\n\u003cli\u003eThe UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e51\u003c/strong\u003e, D523\u0026ndash;D531 (2023).\u003c/li\u003e\n\u003cli\u003eDavid, C. C. \u0026amp; Jacobs, D. J. Principal Component Analysis: A Method for Determining the Essential Dynamics of Proteins. \u003cem\u003eMethods Mol. Biol. Clifton NJ\u003c/em\u003e \u003cstrong\u003e1084\u003c/strong\u003e, 193\u0026ndash;226 (2014).\u003c/li\u003e\n\u003cli\u003eDaidone, I. \u0026amp; Amadei, A. Essential dynamics: foundation and applications. \u003cem\u003eWIREs Comput. Mol. Sci.\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, 762\u0026ndash;770 (2012).\u003c/li\u003e\n\u003cli\u003eEmperador, A., Carrillo, O., Rueda, M. \u0026amp; Orozco, M. Exploring the Suitability of Coarse-Grained Techniques for the Representation of Protein Dynamics. \u003cem\u003eBiophys. J.\u003c/em\u003e \u003cstrong\u003e95\u003c/strong\u003e, 2127\u0026ndash;2138 (2008).\u003c/li\u003e\n\u003cli\u003eZheng, W. \u0026amp; Wen, H. A survey of coarse-grained methods for modeling protein conformational transitions. \u003cem\u003eCurr. Opin. Struct. Biol.\u003c/em\u003e \u003cstrong\u003e42\u003c/strong\u003e, 24\u0026ndash;30 (2017).\u003c/li\u003e\n\u003cli\u003eSfriso, P., Hospital, A., Emperador, A. \u0026amp; Orozco, M. Exploration of conformational transition pathways from coarse-grained simulations. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e29\u003c/strong\u003e, 1980\u0026ndash;1986 (2013).\u003c/li\u003e\n\u003cli\u003eLee, B. H. \u003cem\u003eet al.\u003c/em\u003e Normal mode-guided transition pathway generation in proteins. \u003cem\u003ePLOS ONE\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, e0185658 (2017).\u003c/li\u003e\n\u003cli\u003eLee, B. H., Park, S. W., Jo, S. \u0026amp; Kim, M. K. Protein conformational transitions explored by a morphing approach based on normal mode analysis in internal coordinates. \u003cem\u003ePLOS ONE\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, e0258818 (2021).\u003c/li\u003e\n\u003cli\u003eKim, M. K., Chirikjian, G. S. \u0026amp; Jernigan, R. L. Elastic models of conformational transitions in macromolecules. \u003cem\u003eJ. Mol. Graph. Model.\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 151\u0026ndash;160 (2002).\u003c/li\u003e\n\u003cli\u003eYang, Z., M\u0026aacute;jek, P. \u0026amp; Bahar, I. Allosteric Transitions of Supramolecular Systems Explored by Network Models: Application to Chaperonin GroEL. \u003cem\u003ePLoS Comput. Biol.\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, e1000360 (2009).\u003c/li\u003e\n\u003cli\u003eFranklin, J., Koehl, P., Doniach, S. \u0026amp; Delarue, M. MinActionPath: maximum likelihood trajectory for large-scale structural transitions in a coarse-grained locally harmonic energy landscape. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e35\u003c/strong\u003e, W477\u0026ndash;W482 (2007).\u003c/li\u003e\n\u003cli\u003eWaterhouse, A. \u003cem\u003eet al.\u003c/em\u003e SWISS-MODEL: homology modelling of protein structures and complexes. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e46\u003c/strong\u003e, W296\u0026ndash;W303 (2018).\u003c/li\u003e\n\u003cli\u003eJo, S., Kim, T., Iyer, V. G. \u0026amp; Im, W. CHARMM-GUI: A web-based graphical user interface for CHARMM. \u003cem\u003eJ. Comput. Chem.\u003c/em\u003e \u003cstrong\u003e29\u003c/strong\u003e, 1859\u0026ndash;1865 (2008).\u003c/li\u003e\n\u003cli\u003eAbraham, M. J. \u003cem\u003eet al.\u003c/em\u003e GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. \u003cem\u003eSoftwareX\u003c/em\u003e \u003cstrong\u003e1\u0026ndash;2\u003c/strong\u003e, 19\u0026ndash;25 (2015).\u003c/li\u003e\n\u003cli\u003eBonomi, M. \u003cem\u003eet al.\u003c/em\u003e Promoting transparency and reproducibility in enhanced molecular simulations. \u003cem\u003eNat. Methods\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 670\u0026ndash;673 (2019).\u003c/li\u003e\n\u003cli\u003eHuang, J. \u003cem\u003eet al.\u003c/em\u003e CHARMM36m: an improved force field for folded and intrinsically disordered proteins. \u003cem\u003eNat. Methods\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 71\u0026ndash;73 (2017).\u003c/li\u003e\n\u003cli\u003eNos\u0026eacute;, S. A molecular dynamics method for simulations in the canonical ensemble. \u003cem\u003eMol. Phys.\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, 255\u0026ndash;268 (1984).\u003c/li\u003e\n\u003cli\u003eHoover, W. G. Canonical dynamics: Equilibrium phase-space distributions. \u003cem\u003ePhys. Rev. Gen. Phys.\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 1695\u0026ndash;1697 (1985).\u003c/li\u003e\n\u003cli\u003eParrinello, M. \u0026amp; Rahman, A. Polymorphic transitions in single crystals: A new molecular dynamics method. \u003cem\u003eJ. Appl. Phys.\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, 7182\u0026ndash;7190 (1981).\u003c/li\u003e\n\u003cli\u003eHess, B., Bekker, H., Berendsen, H. J. C. \u0026amp; Fraaije, J. G. E. M. LINCS: A linear constraint solver for molecular simulations. \u003cem\u003eJ. Comput. Chem.\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 1463\u0026ndash;1472 (1997).\u003c/li\u003e\n\u003cli\u003eEwald, P. P. Die Berechnung optischer und elektrostatischer Gitterpotentiale. \u003cem\u003eAnn. Phys.\u003c/em\u003e \u003cstrong\u003e369\u003c/strong\u003e, 253\u0026ndash;287 (1921).\u003c/li\u003e\n\u003cli\u003eEssmann, U. \u003cem\u003eet al.\u003c/em\u003e A smooth particle mesh Ewald method. \u003cem\u003eJ. Chem. Phys.\u003c/em\u003e \u003cstrong\u003e103\u003c/strong\u003e, 8577\u0026ndash;8593 (1995).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-6504036/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6504036/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eProtein conformational changes are the cornerstone of biological function. While conformers captured experimentally represent meta-stable states, the pathways connecting them have been elusive for experiments and simulations alike. Nowadays, cryogenic Electron Microscopy is providing rich structural data on proteins trapped in different states for increasingly large systems, but these are out of scope for current computational methods, which usually exhibit an \u003cem\u003eN\u003c/em\u003e\u003csup\u003e\u003cem\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/em\u003e\u003c/sup\u003e dependence on size. Based on our previous eBDIMS algorithm, here we present eBDIMS2, an extremely optimized version with quasi-linear size dependence, able to simulate on a desktop computer exceptionally complex transitions for megadalton protein assemblies, like the rotary motion of ATP synthases. Not only eBDIMS2 pathways spontaneously visit experimental intermediates but also overlap with microsecond Molecular Dynamics simulations requiring extensive supercomputing resources. By integrating Elastic Networks with Brownian Dynamics, eBDIMS2 allows an unprecedented exploration of conformational changes of sub-mesoscopic systems, previously inaccessible.\u003c/p\u003e","manuscriptTitle":"Breaking the size limit: efficient sampling of large-scale transition pathways and intermediate conformations in sub-mesoscopic protein complexes","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-07 09:22:46","doi":"10.21203/rs.3.rs-6504036/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"5fa0eafb-9577-4d06-b37c-f2f1c8666da2","owner":[],"postedDate":"May 7th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":48131819,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":48131820,"name":"Biological sciences/Biophysics/Computational biophysics"},{"id":48131821,"name":"Biological sciences/Structural biology/Molecular modelling"}],"tags":[],"updatedAt":"2026-03-05T08:07:44+00:00","versionOfRecord":{"articleIdentity":"rs-6504036","link":"https://doi.org/10.1038/s41467-026-69809-y","journal":{"identity":"nature-communications","isVorOnly":false,"title":"Nature Communications"},"publishedOn":"2026-03-02 05:00:00","publishedOnDateReadable":"March 2nd, 2026"},"versionCreatedAt":"2025-05-07 09:22:46","video":"","vorDoi":"10.1038/s41467-026-69809-y","vorDoiUrl":"https://doi.org/10.1038/s41467-026-69809-y","workflowStages":[]},"version":"v1","identity":"rs-6504036","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6504036","identity":"rs-6504036","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0