Generic residue numbering of the GAIN domain of adhesion GPCRs

preprint OA: closed
Full text JSON View at publisher
Full text 135,342 characters · extracted from preprint-html · click to expand
Generic residue numbering of the GAIN domain of adhesion GPCRs | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Generic residue numbering of the GAIN domain of adhesion GPCRs Florian Seufert, Guillermo Pérez-Hernández, Gáspár Pándy-Szekeres, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4761600/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 01 Jan, 2025 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Abstract The GPCR autoproteolysis inducing (GAIN) domain is an ancient protein fold ubiquitous in adhesion G protein-coupled receptors (aGPCR). It contains a concealed tethered agonist element, which is necessary and sufficient for receptor activation. The GAIN domain is a hotspot for pathological mutations. However, the low primary sequence conservation of GAIN domains has thus far hindered the knowledge transfer across different GAIN domains in human receptors as well as species orthologs. Here, we present a scheme for generic residue numbering of GAIN domains based on structural alignments of six experimental and more than 14,000 modeled GAIN domain structures. This scheme is implemented in the GPCR database (GPCRdb) and elucidates the domain topology across different aGPCRs and their homologs in a large panel of species. We identify conservation hotspots and cancer-enriched positions in human aGPCRs and show the transferability of positional and structural information between GAIN domain homologs. The GAIN-GRN scheme provides a robust strategy to allocate structural homologies at the primary and secondary levels also to GAIN folds of GAIN domains of polycystic kidney disease 1/PKD1-like proteins, which now renders positions in both GAIN domain types comparable to one another. Structural Biology General Biochemistry Bioinformatics aGPCR generic residue numbering databases GAIN domain PKD Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Adhesion / class B2 G protein-coupled receptors (aGPCRs), the second-largest class of GPCRs, have garnered substantial research and medical interest due to their involvement in neural development, hereditary disorders and cancers among others 1 – 4 . aGPCRs are characterized by a very large extracellular region, containing the conserved GPCR autoproteolysis inducing (GAIN) domain. The GAIN domain is positioned directly N-terminal of the seven-transmembrane domain (7TM, Fig. 1 a), which transduces an extracellular signal to intracellular effector proteins 5 . The GAIN domain serves several functions. First, at its GPCR proteolysis site (GPS) an autoproteolytic cleavage event occurs adjacent to the 7TM domain, which yields a bipartite structure stabilized by non-covalent interactions 6 – 9 . The two resulting elements, called N-terminal and C-terminal fragment (NTF/CTF), remain attached to one another even at the cell surface. Second, the GAIN domain contains a tethered agonist element (TA, Stachel, Fig. 1 b), which corresponds to the N-terminus of the CTF that arises through GAIN domain cleavage 10 – 13 . The TA activates the receptor upon dissociation of the NTF/CTF complex 10 , 14 – 17 , the biophysical intricacies of which are yet to be uncovered 18 , 19 . Third, several aGPCRs act as metabotropic mechanosensors 20 – 24 , where the GAIN domain is proposed to serve as a molecular integrator of mechanical forces through its partial unfolding or eventual dissociation of the NTF/CTF complex upon force stimulation 25 – 29 . X-ray and cryo-EM structures provided first insights into GAIN domain structures and TA-7TM complexes 8 , 9 , 30 – 36 . The available set of GAIN domain structures indicates a common architecture with two, structurally variable subdomains: The more variable subdomain A is comprised of up to six helices, and the more conserved subdomain B adopts a β-sandwich with the TA as its most C-terminal strand (Fig. 1 b-c) 8 , 9 , 19 , 35 – 37 . The low sequence identity of GAIN domains and variable number of constituting segments, however, led to inadequate annotations of the GAIN domain in protein databases, hampering inter-species comparison of GAIN domains and limiting a holistic understanding of GAIN domain function. Generic residue numbers (GRNs) provide a common index to corresponding amino acids across the different members of a protein family. GRNs have a great utility as they enable comparison and inference of a multitude of residue data spanning pharmacology (e.g., in vitro mutations), structural biology (e.g., ligand, domain or protein interactions) and genetics (natural variants). For GPCRs, the first GRN scheme was that of Ballesteros-Weinstein and assigned residue indices in the 7TM domains of class A GPCRs 38 . A number, 50, is given to the most conserved residue in each of the seven helices that serves as a reference when assigning consecutive numbers of upstream and downstream. This system has since been adapted to other GPCR classes 39 , including the Wootten numbering scheme for the class B1 (Secretin) and B2 (Adhesion) receptor families 40 . As GPCR structures became available, these sequence-based schemes were found to suffer from non-generic numbers when some receptors have helix bulges or constrictions causing a one-position residue gap in structural alignment and offset of following residues in sequence alignment 39 . To mitigate this numbering issue, the GPCR database (GPCRdb) provided structure-based GRNs schemes for each GPCR class wherein structural residue gaps are also present as single gaps in the sequence alignment 39 . The GPCRdb schemes also added helix 8 (H8) and structurally conserved stretches of the first extra- and intracellular loops. Highly flexible and variable protein regions such as loops remain unannotated in currently established schemes. GRNs have seen wide adoption among researchers, becoming a frequently used tool in communicating GPCR research. Other GRN schemes like the kinase − ligand interaction fingerprints and structure database (KLIFS) 41 – 43 ., the common G protein Gα numbering (CGN) 44 or the common arrestin numbering (CAN) 45 serve in mapping functional protein networks or drug-development purposes in additional protein families. Here, we introduce a GRN scheme for aGPCR GAIN domains based on the superposition of more than 14,000 structural GAIN domain models generated with ColabFold/AlphaFold 2 46,47 . We highlight structural variability and common features of all aGPCR GAIN domains and demonstrate position-specific data transfer by finding cancer-enriched positions in humans. The GAIN-GRN was implemented into the GPCRdb 48 , 49 to allow for intuitive use in a highly accessible and widely adapted resource, as well as programmatic access to the data. Additionally, we provide a web-based notebook enabling the ad-hoc indexing of any GAIN-domain containing protein. These results promote future experiments that focus on the central role of GAIN domains in the signal transduction and physiological functions of aGPCRs, and will aid analyses on how structural anomalies contribute to aGPCR dysfunction under disease conditions. Results The heterogeneity of the GAIN domain necessitates structure-based residue numbering A comprehensive analysis of GAIN domains by means of multiple sequence alignments fails due to their low sequence identity and variable number of segments (α-helices and β-strands) 8 , 9 , 35 , 36 . Thus, to enable a comprehensive description of the GAIN domain, we opted for a structure-based approach, for which we generated a set of 14,435 GAIN domain models encompassing orthologs of the 33 mammalian aGPCR and 916 non-ortholog proteins with ColabFold/AlphaFold 2 46,47 . In order to assess the composition of both GAIN subdomains, we used structural alignments with GESAMT 50 for indexing segments. Using this approach, the segment position in space determines its index instead of their sequence-based order, allowing the assignment of equivalent positions for GRN indexing in the context of variable domain composition. Based on the complete set of GAIN domain models, we asserted that subdomains A and B are composed of two to six helices and 12 to 14 strands, respectively, which we indexed using the identifiers H1-6 and S1-14 (Fig. 2 a). Subdomain B exhibits generally high segment conservation, with only strand 4 specific to subfamilies A and G (Fig. 2 b). The composition of subdomain A is more variable. While the A, B, C, F and L subfamilies all have six helices, the D and G subfamilies show heterogeneity ranging from two to six helices (Fig. 2 b). Structures reflecting subdomain A variability are for example the rat ADGRL1 with a six-helix bundle (Fig. 1 c) 8 , and the human ADGRG1 GAIN domain with only Helix 4 and 6 36 , the two most conserved helices in the dataset. When looking at individual residue positions (to which GRN labels are assigned, Fig. 2 c), notably the center regions of helix 3, 4 and 6 are more frequently occupied (occupancy referring to the fraction of models containing a segment in the dataset) than the extreme positions, highlighting varying helix lengths in the model dataset, with especially the L subfamily exhibiting longer helices. Aside from the residues in the less conserved strands 4 and 7, there is high occupancy in subdomain B. The unindexed GAIN domain loops connecting the structured elements show very different lengths, frequently exceeding 50 residues (Supplementary Fig. 6). Notably, a total of 84 homologs of ADGRA1 in 47 species have a GAIN domain, which is not found in human ADGRA1 8 , whereas 78 GAIN domains were identified for ADGRE4, which is a pseudogene in human (Supplementary Table 1). Generic residue numbering denotes corresponding GAIN domain amino acids across receptors Based on the GAIN-GRN indexing of all 14,435 structural models, we created comprehensive alignments of the GAIN domain in structure and in sequence (Supplementary Fig. 1). These provide a novel utility to map data across all adhesion GPCRs and cross-map sequence-position specific data between homologs. A schematic of the GRN assignment process is outlined in Fig. 3 . Each GRN consists of the segment identifier (e. g. Helix 6 = “H6”) and the respective index relative to the most conserved residue in the segment, separated by a dot (e.g. “H6.50”). In this example, representative GAIN domains from each aGPCR subfamily are structurally aligned, with the Cα-atoms corresponding to GRN positions (Fig. 3 a). Using the backbone alignment positions as the basis for a sequence alignment, the most conserved position is identified here as the acidic E/D and gets assigned the “center” .50 index (Fig. 3 b). A residue table of the aligned segments highlights variation in segment lengths and reveals positions with similar physicochemical properties, e.g. the H6.50 as acidic, H6.45, H6.46 and H6.52 as aliphatic, despite low sequence identity (Fig. 3 c). GPCRdb resources aiding use of GAIN GRNs As part of the GPCRdb integration, we assigned the 6 helices (H1-6), 14 strands (S1-14), and 21 loops connecting segments (h1h2, h2h3, h3h4, h4h5, h5h6, h6s1, s1s2, s2s3, s3s4, s4s5, s5s6, s6s7, s7s8, s8s9, s9s10, s10s11, s11s12, s12s13, s13gps, gpss14, s14tm1) to all Class B2 (Adhesion) sequences. In addition, the GPS motif has a separate assigned segment with the conventional GPS-2, GPS-1 and GPS + 1 notation for the three residues closest to the catalytic site. With the introduction of these new segments, researchers can apply the GPCRdb toolkit to the whole, or selected parts of the GAIN domain. We updated the snake plots of the Class B2 (Adhesion) GPCRs to contain the segments of the GAIN domain (Fig. 4 b). The snake plots can be found on the Receptor page ( https://gpcrdb.org/protein/ ) with custom coloring options 51 . Along the already provided data, GAIN domain data is also accessible programmatically via REST API. Consensus contacts stabilizing the GAIN domain fold Addressing of GAIN domain positions via GRN enables mapping any GRN-label-dependent information across aGPCR homologs. Corroborating the analysis of tertiary structures, we exploit the GRN indexing to consolidate pairwise residue-residue contacts - particular to each structure - into unified consensus contacts. The entirety of GRN-label pairs occurring over the complete dataset with a given frequency yields the GAIN domain contactome, shown in Supplementary Fig. 7 as a flareplot. This plot represents a contact matrix, individually resolving consensus contacts at the residue-level while highlighting contact relationships between the different GAIN segments. The importance of H6 as a “hub” connecting the subdomains A and B is clearly seen with highly conserved contacts to primarily S6 and also S2, S8, S10, and S14. Furthermore, H4 partially tethers subdomains A and B via highly conserved contacts to S1 and S2. In Supplementary Tables 3 and 4, the most frequent inter-domain and GPS contacts are listed individually, respectively. Additionally, we coarse-grained the contactome into the GAIN segments (Supplementary Fig. 8) reproducing the tethering structure of the segments regardless of individual contacts, further highlighting H4 and H6 as the segments mediating contacts between both GAIN subdomains. A map of cancer-enriched mutations in the GAIN domain The GAIN domain, present in 31 of 32 human aGPCRs, is a mutational hotspot affected in various pathologies 6 , 8 , 52 , 53 . To find cancer-enriched positions and differentiate them from variance-enriched positions, we adopted the cancer-enrichment score from Wright et al. 54 for all 31 human aGPCR GAIN domains (Fig. 5 a-b) indexed by the GAIN-GRN. While generally, enriched positions of both types are distributed throughout the GAIN domain (Fig. 5 c), eight of the ten most cancer-enriched GRN positions are found in subdomain B carrying the TA (Fig. 5 a, Supplementary Fig. 2). We identify a “VWWL” motif consisting of four conserved top-ten cancer-enriched residues S7.50, S10.50, S11.50 and S14.50. This buried motif is located in direct vicinity of the GPS cleavage site (Fig. 5 d-e, Supplementary Fig. 1), with mutations known to affect GAIN domain autoproteolysis and TA function: the conserved leucine at the S14.50 position (Supplementary Fig. 1) is a TA residue deeply buried into the orthosteric binding site of the 7TM domain in active aGPCR-7TM structures 19 , 30 – 34 , 55 , and its mutation led to altered receptor activity 10 , 22 , 56 , 57 . Strikingly, the mutation of any tryptophane within the “VWWL” motif causes loss-of-function in rat ADGRL1 8 . With a comprehensive analysis of human aGPCR GAIN domains, we find a total of 46 cancer-enriched positions (Supplementary Fig. 2). By using the GAIN-GRN, homologous cancer enriched residue positions can now be assigned to any GAIN domain. This allows the transferal of positional information between GAIN domains in different species, particularly from human to model organisms, such as D. melanogaster , C. elegans or D. rerio 58 – 61 . The functional analyses in these model systems can now provide valuable insights into the molecular causes of cancer mutations in human in future studies. GAIN domains of PKD1/PKD1-like proteins possess an extended topology The only other protein family known to contain GAIN domains are PKD (polycystic kidney disease)1/PKD1-like proteins (in short here PKD1; also referred to as polycystin-1[PC1]) 8 . Mutations in PKD1 are responsible for the majority of autosomal dominant kidney PKD, a devastating disorder that entails the development of cysts in the kidney and other organs leading to their eventual failure 62 . PKD1 GAIN domains display similar molecular properties as aGPCR GAIN domains with autoproteolytic cleavage resulting in a bipartite NTF-CTF protein layout after proteolysis 63 . Enabling the comparison and transferal of experimental and mutational knowledge between aGPCR and PKD1 GAIN folds is the basis for the understanding of similarities and differences between the two, and can offer valuable insights into the cell biological and physiological consequences of GAIN domain functions. However, thus far such transfer has been obstructed by the lack of clear homology assignments of primary and secondary structural positions between aGPCR and PKD1 GAIN domains. Thus, we next employed the GAIN-GRN scheme to allocate positional labels in PKD1 GAIN domains and compare them to those of aGPCR GAIN folds. Since no experimental structure of PKD1 GAIN domains is available yet, we prepared 2,738 structural models analogously to the aGPCR dataset 63 – 65 . We applied the GAIN-GRN scheme to the models, which on average resulted in four subdomain A α-helices and twelve subdomain B β-strands recognized by the GAIN-GRN method (Fig. 2 b), thus structural elements homologous to aGPCR GAIN domains. Interestingly, we also observed differences to aGPCR GAIN domain layouts as the PKD1 GAIN domains showed an additional β-sandwich fold, which contains an extension of S10 and C-terminally elongated TA. Finally, we also observed up to a total of eight subdomain A helices with additional, unindexed helices (Supplementary Fig. 3). In sum, the GAIN-GRN scheme provides a robust strategy to allocate structural homologies at the primary and secondary levels also to GAIN folds of PKD1 molecules, which now renders positions in both GAIN domain types comparable to one another. Discussion The GAIN domain is an ancient extracellular protein domain of the large adhesion GPCR family, involved in neural development, hereditary disorders and cancer 1 – 4 . Despite recent insights obtained from high-resolution structures of GAIN domains in complex with the 7TM 18 , the GAIN domain function in autoproteolysis, mechanosensing and TA-dependent receptor activation is still poorly understood. To overcome the limitations imposed by the structural heterogeneity of GAIN domains in aGPCR homologs - the variable number of secondary structure segments and overall low sequence identity - we developed the GAIN-GRN as a generic residue numbering scheme for aGPCR GAIN domains. We used spatial alignments of structural models, generating multiple sequence alignments to define the reference residue position as the most conserved residue in each segment 39 , 48 , 51 . The GAIN-GRN is based on GAIN domain models predicted by AlphaFold 2/Colabfold to include most GAIN domains in proteins present in the Uniprot database 41 , 42 . To aid users in employing GAIN GRNs for data analysis and hypothesis-generation, we implemented the GAIN-GRN in the GPCRdb serving as an accessible and established resource (Fig. 4 ). We also show that the GAIN-GRN is a robust tool to assign structure-homologous residues across molecule families as we have retrieved GRN also for PKD1 GAIN domains. Statistical evaluation of the dataset of aGPCR GAIN domains enables us to assess their composition as well as the spatial and positional conservation of information as reflected by the GAIN contactome (Supplementary Fig. 7–8). The evolutionarily conserved two-subdomain architecture of the GAIN domain is present in humans as well as distantly related organisms such as Trichoplax adhaerens 66 – 70 . Subdomain B, containing the autoproteolytic cleavage site and the tethered agonist, is structurally less variable consisting of 12–14 β-strands, in agreement with its implied function in NTF-CTF association, force-dependent GAIN domain separation and mechanosensing 26 , 27 , 29 . Notably, our analysis underlines the notion that the known "GPS motif" is not an individual protein domain, as initially anticipated, but rather the C-terminal section of Subdomain B 7 , 8 . By contrast, subdomain A, shows high structural heterogeneity with only two critically conserved helices (H4 and H6, Fig. 2 ), with their core regions forming an interface with and presumably stabilizing subdomain B (Supplementary Fig. 7–8). Despite structural heterogeneity, our structure-based alignment reveals highly conserved stretches of residues and segments with low overall sequence identity but similar physicochemical properties (Supplementary Fig. 1), thus corroborating the notion that structural conservation outweighs sequence conservation 71 . Creating structure-based alignments of larger protein sets with representatives in humans enables us to structurally map benign and malign mutations for testing in homologous positions of distantly related proteins. For example, the mutations within the newly coined “VWWL” motif (Fig. 5 ) close to the GPS may now be tested in any model system based on their GRN index. Analogously, we can now assess the location of known pathological mutations: Avila-Zozaya et al. have investigated cancer-related mutations in ADGRL3, with impacts on G 13 -signaling for K561N H 1 . 51 , D798H S 9 . 47 , S810L s 9 s 10 and E811Q s 9 s 10 , where the latter two residues correspond to the interaction region of the GAIN domain with the seven-transmembrane domain 19 , 53 . Two mutations responsible for loss of surface expression in GPR56, causing bilateral frontoparietal polymicrogyria (BFPP), are the highly conserved C346S S 10 . 47 and W349S S 10 . 50 ref 6 , 72 , 73 . More generally, our approach promotes future experiments focusing on the central role of GAIN domains in physiological functions of aGPCRs and PKD1 molecules, and will aid analyses on how structural anomalies contribute to their dysfunction under disease conditions. Methods All computational pipelines were implemented in Python 3.9. Generation of the GAIN domain model dataset Sequences were retrieved from the UniProtKB database with two queries for adhesion GPCR and CELSR, respectively, yielding 22,946 and 2,179 sequences, respectively (Supplementary Fig. 5). Sequences were filtered for a minimum length of 50 residues and the presence of a “GPS” domain annotation in their domain records. The C-terminal sequence boundary was read from the “GPS” Domain record, whereas for the N-terminal boundary, lengths exceeding 800 residues were truncated, resulting in 16,537 sequences. The structures of all processed sequences containing potential GAIN domains were predicted with ColabFold 46 , 47 by using batches of 30 length-sorted sequences with a pre-defined padding to account for sequence length differences per batch.⁠ A multiple sequence alignment was constructed from initial 15,957 successfully folded and non-doublet aGPCR/CELSR sequences using MAFFT 74 for localizing the GPS motif (SI Methods).⁠ The secondary structure information of the resulting folded structures was read out with STRIDE 75 .⁠⁠ The data from the resulting files was used to apply two criteria for a valid GAIN domain: The presence of both the helical Subdomain A and the β-sandwich Subdomain B as well as the existence of the GPS or a homologously aligned sequence. The filtered dataset consists of 14,435 valid GAIN domains (Supplementary Fig. 5). The human dataset consists of 31 aGPCR GAIN domains. GAIN Domain Detection The presence of both subdomains was detected by using a numerical transformation of the sequence, assigning a 1 to helical and − 1 to beta-strand residues. By using linear convolution, a signal was generated, whose sign changes were detected as boundaries between helical and sheet-like protein segments. The presence of both subdomains was confirmed by identifying the largest helical segment adjacent to a C-terminal sheet-like segment corresponding to Subdomain in each respective structure. The signal decay N-terminal of Subdomain A, by presence of non-helical residues was used to determine the GAIN domain boundary for each generated model. The column index of the GPS-1 residue (corresponding to leucine in the conserved HL|S/T triad) was set as reference for detecting the presence of a GPS motif or homologous aligned sequence elements. Any structure showing a residue at the corresponding column in the MSA was set as possessing the GPS, therefore satisfying the second criterion. Template Model Selection Template candidates were extracted by selecting a random 400 structures of each subfamily GAIN domain models and generating a root-mean-square deviation (RMSD) matrix by pairwise alignment using GESAMT in the CCP4.0 package 50 , 76 on the respective subdomain. The matrix was clustered and sorted using agglomerative clustering via the scikit-learn python package 77 and the lowest-RMSD model of the largest cluster selected as a candidate template. Candidate templates were checked against each receptor sub-selection of the dataset via occupancy (fraction of structures matching the template anchor) and distance (pairwise Cα-Cα distance). Filtering out badly matched receptors from the initial set, additional templates were added selected from individual receptor selections, reaching a total of 15 subdomain A and two subdomain B templates for the complete indexing. Segment center residues for each element were generated by pairwise aligning all GAIN domains against each candidate template using GESAMT and collecting all pairwise residue matches into a multiple sequence alignment, finding the position of highest occupancy and residue identity. Segment centers were validated and manually curated via 3-D aligning all candidate template and verifying identical position of the anchor in space. The position of the H4 segment center was manually adjusted to avoid ambiguities with the H5 residue center. The unique orientation of the most N-terminal helix of ADGRD1, ADGRE1 and ADGRF4 yielded three individual segment centers. Each receptor GAIN was assigned a template per subdomain to be matched to by default. Segment Overlap and ambiguity cases For some cases of low-quality proteins, SSE of template and GAIN were overlapping without a pairwise match of the template anchor. In these cases, the match closest to the template anchor was set as the reference position considering the offset (i. e. “S14.47” when the residue is three residues N-terminal of the template S14.50) and enumerated analogously from there. Anchor ambiguity cases arose when two elements were detected as one by STRIDE with two template center residues matched, however the spatial orientation of two SSE was distinguishable. These cases were handled by a hierarchical segment splitting routine assessing the segment between both matched segment centers in decreasing priority: presence of a coiled residue, a residue with backbone angles outside of five standard deviations of the element total distribution in the dataset, presence of a proline or glycine and a manually defined truncation element for common occurrences. Creating the template set Templates are defined as consensus structural models used for structural alignment of other GAIN domain models for segment identification and indexing. Templates were defined separately for GAIN Subdomain A and B. The definition of the template set consisted of three steps: Identifying candidate template structures, finding their center positions and assessing their coverage and quality for integration into the final template set (Supplementary Fig. 4). Templates have the center residues of each segment already assigned based on structural alignments of all template structures (Fig. 3 b). Indexing via GAIN-GRN Each GAIN domain was pairwise aligned to its assigned subdomain A and B template, respectively. GAIN domains not assigned a receptor were structurally aligned to all templates using GESAMT 50 , selecting the lowest RMSD template for each subdomain. For each SSE the residue matching the template center was labeled “##.50” with the corresponding element name (H1-H6, S1-14), enumerating all residues in the SSE with numbers decreasing in the N-terminal and increasing in the C-terminal direction. Each ordered residue in the GAIN was assigned a label and exported tabulated. An additional workflow was created in an interactive notebook enabling the assignment of the GAIN-GRN for any protein with an associated model in the alphafoldDB 78 with either retrieving the information about the GPS from the Uniprot database or manually defining the C-terminal GAIN boundary. Mutation Mapping onto GRN positions Mutations were retrieved from the Cancer Genome Atlas (TCGA, within the Genomic Data Commons https://portal.gdc.cancer.gov/ ) for each of the 31 human aGPCRs, yielding a total of 6,874 individual mutations. A routine was implemented to correct the residue indices of the GAIN domain residues to match the UniProtKB indices. By matching each position, we assigned the GRN to each occurring mutation within the indexed GAIN domain space with a total of 861 mutations, of which 769 mutations were within ordered segments with individual labels. Additionally, we implemented a parsing routine to parse the mapped mutations, map the number of mutations and their occurrence onto any GRN-mapped GAIN domain and filter mutations by the impact metrics SIFT and Polyphen 79 , 80 to tailor the query routine to the individual purpose. In our example, cancer-enriched positions were extracted by calculating the number of cancer-associated mutations against the number of natural variants extracted from dbSNP ( www.nbci.nlm.nih.gov/snp/ ) analogous to Wright et al., 2019 54 . Contact Frequencies For each of the 14,435 GAIN domain structures in the dataset, heavy-atom residue-residue contacts were computed using a distance cutoff of 4 Å. All pairs of residues sharing a contact are aggregated into a single contact matrix which is indexed with GRN labels. Some elements of this matrix are shown partially as well as in full in Supplementary Table 2–4. The computation of contacts, GRN label-handling and plotting (flareplot and contact-matrix) was done using mdciao 81 . Declarations Acknowledgments We like to thank Peter Stadler and Franziska Reinhardt of Leipzig University for providing initial GAIN domain multiple sequence alignments, and Albert J. Kooistra for his consultations. This work was funded by the Deutsche Forschungsgemeinschaft (DFG), project number 421152132, SFB 1423, subproject A06, C01, Z04. This work was funded by grants from the Lundbeck Foundation (R383-2022-306) and Novo Nordisk Foundation (NNF23OC0082561) to D.E.G. Author Contributions F.S. Conceptualization; Methodology; Validation; Investigation; Software; Formal Analysis; Visualization; Writing – original draft; Writing – review and editing; Data Curation P.W.H. Conceptualization; Supervision; Funding Acquisition; Validation; Writing – original draft (supporting); Writing – review and editing G.P.-H. Conceptualization, Methodology, Validation, Writing – review and editing; Visualization G.P.-S. Software; Resources; Validation; Visualization; Data Curation R.G.-G. Conceptualization (supporting); Validation (supporting) T.L. Conceptualization (supporting); Writing – original draft (supporting); Writing – review and editing D.E.G. Conceptualization; Supervision; Funding Acquisition; Validation; Writing – original draft (part and supporting); Writing – review and editing The authors declare the following competing interests: D.E.G. is a part-time employee and warrant holder at Kvantify. Data availability The generated GAIN domain models have been deposited in the online repository zenodo (DOI 10.5281/zenodo.12515545). The generated code and interactive notebooks are available under https://github.com/FloSeu/GAIN-GRN . I copied this sentence from another manuscript. I think it is the NPG format, but otherwise please rephrase. References Chiang N-Y et al (2017) GPR56/ADGRG1 Activation Promotes Melanoma Cell Migration via NTF Dissociation and CTF-Mediated Gα12/13/RhoA Signaling. J Invest Dermatology 137:727–736 Scholz N (2018) Cancer Cell Mechanics: Adhesion G Protein-coupled Receptors in Action? Front Oncol 8 Kan Z et al (2010) Diverse somatic mutation patterns and pathway alterations in human cancers. Nature 466:869–873 Langenhan T, Piao X, Monk KR (2016) Adhesion G protein-coupled receptors in nervous system development and disease. Nat Rev Neurosci 17:550–561 Batebi H et al (2024) Mechanistic insights into G-protein coupling with an agonist-bound G-protein-coupled receptor. Nat Struct Mol Biol 1–10. 10.1038/s41594-024-01334-2 Prömel S, Langenhan T, Araç D (2013) Matching structure with function: The GAIN domain of Adhesion-GPCR and PKD1-like proteins. Trends Pharmacol Sci 34:470–478 Liao Y, Pei J, Cheng H, Grishin NV (2014) An ancient autoproteolytic domain found in GAIN, ZU5 and Nucleoporin98. J Mol Biol 426:3935–3945 Araç D et al (2012) A novel evolutionarily conserved domain of cell-adhesion GPCRs mediates autoproteolysis. EMBO J 31:1364–1378 Pohl F et al (2023) Structural basis of GAIN domain autoproteolysis and cleavage-resistance in the adhesion G-protein coupled receptors. bioRxiv. 10.1101/2023.03.12.532270 Liebscher I et al (2014) A Tethered Agonist within the Ectodomain Activates the Adhesion G Protein-Coupled Receptors GPR126 and GPR133. Cell Rep 9:2018–2026 Stoveken HM, Hajduczok AG, Xu L, Tall GG (2015) Adhesion G protein-coupled receptors are activated by exposure of a cryptic tethered agonist. Proceedings of the National Academy of Sciences 112, 6194–6199 Mathiasen S et al (2020) G12/13 is activated by acute tethered agonist exposure in the adhesion GPCR ADGRL3. Nat Chem Biol 16:1343–1350 Zhu B et al (2019) GAIN domain-mediated cleavage is required for activation of G protein- coupled receptor 56 (GPR56) by its natural ligands and a small-molecule agonist. J Biol Chem 294:19246–19254 Paavola KJ, Stephenson JR, Ritter SL, Alter SP, Hall RA (2011) The N Terminus of the Adhesion G Protein-coupled Receptor GPR56 Controls Receptor Signaling Activity. J Biol Chem 286:28914–28921 Frenster JD et al (2021) Functional impact of intramolecular cleavage and dissociation of adhesion G protein–coupled receptor GPR133 (ADGRD1) on canonical signaling. J Biol Chem 296:100798 Yang L et al (2011) GPR56 Regulates VEGF Production and Angiogenesis during Melanoma Progression. Cancer Res 71:5558–5568 Stoveken HM, Hajduczok AG, Xu L, Tall GG (2015) Adhesion G protein-coupled receptors are activated by exposure of a cryptic tethered agonist. Proc. Natl. Acad. Sci. 112, 6194–6199 Seufert F, Chung YK, Hildebrand PW, Langenhan T (2023) 7TM domain structures of adhesion GPCRs: what’s new and what’s missing? Trends Biochem Sci 48:726–739 Mao C et al (2024) Conformational transitions and activation of the adhesion receptor CD97. Mol Cell. 10.1016/j.molcel.2023.12.020 Scholz N et al (2015) The Adhesion GPCR Latrophilin/CIRL Shapes Mechanosensation. Cell Rep 11:866–874 Petersen SC et al (2015) The Adhesion GPCR GPR126 Has Distinct, Domain-Dependent Functions in Schwann Cell Development Mediated by Interaction with Laminin-211. Neuron 85:755–769 Wilde C et al (2016) The constitutive activity of the adhesion GPCR GPR114/ADGRG5 is mediated by its tethered agonist. FASEB J 30:666–673 Liu D et al (2022) CD97 promotes spleen dendritic cell homeostasis through the mechanosensing of red blood cells. Science 375:eabi5965 Boyden SE et al (2016) Vibratory Urticaria Associated with a Missense Variant in ADGRE2. N Engl J Med 374:656–663 Scholz N et al (2023) Molecular sensing of mechano- and ligand-dependent adhesion GPCR dissociation. Nature 615:945–953 Fu C et al (2023) Step-wise mechanical unfolding and dissociation of the GAIN domains of ADGRG1/GPR56, ADGRL1/Latrophilin-1 and ADGRB3/BAI3: insights into the mechanical activation hypothesis of adhesion G protein-coupled receptors. bioRxiv 2023.03.14.532526 10.1101/2023.03.14.532526 Dumas L et al (2023) Uncovering and engineering the mechanical properties of the adhesion GPCR ADGRG1 GAIN domain. bioRxiv Beliu G et al (2021) Tethered agonist exposure in intact adhesion/class B2 GPCRs through intrinsic structural flexibility of the GAIN domain. Mol Cell 81:905–921e5 Zhong BL et al (2023) Piconewton Forces Mediate GAIN Domain Dissociation of the Latrophilin–3 Adhesion GPCR. Nano Lett 23:9187–9194 Xiao P et al (2022) Tethered peptide activation mechanism of the adhesion GPCRs ADGRG2 and ADGRG4. Nature 604:771–778 Ping YQ et al (2021) Structures of the glucocorticoid-bound adhesion receptor GPR97–Go complex. Nature 589:620–626 Barros-Álvarez X et al (2022) The tethered peptide activation mechanism of adhesion GPCRs. Nature 604:757–762 Ping YQ et al (2022) Structural basis for the tethered peptide activation of adhesion GPCRs. Nature 604:763–770 Sun Y et al (2021) Optimization of a peptide ligand for the adhesion GPCR ADGRG2 provides a potent tool to explore receptor biology. J Biol Chem 296:100174 Leon K et al (2020) Structural basis for adhesion G protein-coupled receptor Gpr126 function. Nat Commun 11:194 Salzman GS et al (2016) Structural Basis for Regulation of GPR56/ADGRG1 by Its Alternatively Spliced Extracellular Domains. Neuron 91:1292–1304 Chu TY et al (2022) GPR97 triggers inflammatory processes in human neutrophils via a macromolecular complex upstream of PAR2 activation. Nat Commun 13:6385 Ballesteros JA, Weinstein H (1995) Integrated methods for the construction of three-dimensional models and computational probing of structure-function relations in G protein-coupled receptors. Methods Neurosciences 25:366–428 Isberg V et al (2015) Generic GPCR residue numbers - Aligning topology maps while minding the gaps. Trends Pharmacol Sci 36:22–31 Wootten D, Simms J, Miller LJ, Christopoulos A, Sexton PM (2013) Polar transmembrane interactions drive formation of ligand-specific and signal pathway-biased family B G protein-coupled receptor conformations. Proc. Natl. Acad. Sci. 110, 5211–5216 van Linden OPJ, Kooistra AJ, Leurs R, de Esch IJP, de Graaf C (2014) KLIFS: A Knowledge-Based Structural Database To Navigate Kinase–Ligand Interaction Space. J Med Chem 57:249–277 Kanev GK, de Graaf C, Westerman BA, de Esch IJP, Kooistra A (2020) J. KLIFS: an overhaul after the first 5 years of supporting kinase research. Nucleic Acids Res 49:gkaa895 Kanev GK et al (2019) The Landscape of Atypical and Eukaryotic Protein Kinases. Trends Pharmacol Sci 40:818–832 Flock T et al (2015) Universal allosteric mechanism for Gα activation by GPCRs. Nature 524:173–179 Sente A et al (2018) Molecular mechanism of modulating arrestin conformation by GPCR phosphorylation. Nat Struct Mol Biol 25:538–545 Mirdita M et al (2022) ColabFold: making protein folding accessible to all. Nat Methods 19:679–682 Jumper J et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589 Kooistra AJ et al (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function. Nucleic Acids Res 49:D335–D343 Pándy-Szekeres G et al (2018) GPCRdb in 2018: Adding GPCR structure models and ligands. Nucleic Acids Res 46:D440–D446 Krissinel E (2012) Enhanced fold recognition using efficient short fragment clustering. J Mol Biochem 1:76 Isberg V et al (2014) GPCRDB: an information system for G protein-coupled receptors. Nucleic Acids Res 42:D422–D425 Moreno-Salinas AL et al (2022) Convergent selective signaling impairment exposes the pathogenicity of latrophilin-3 missense variants linked to inheritable ADHD susceptibility. Mol Psychiatry 27:2425–2438 Avila-Zozaya M, Rodríguez-Hernández B, Monterrubio-Ledezma F, Cisneros B, Boucard AA (2022) Thwarting of Lphn3 Functions in Cell Motility and Signaling by Cancer-Related GAIN Domain Somatic Mutations. Cells 11, 1913 Wright SC et al (2019) A conserved molecular switch in Class F receptors regulates receptor activation and pathway selection. Nat Commun 10 Lin H et al (2022) Structures of the ADGRG2–Gs complex in apo and ligand-bound forms. Nat Chem Biol. 10.1038/s41589-022-01084-6 Bernadyn TF, Vizurraga A, Adhikari R, Kwarcinski F, Tall GG (2023) GPR114/ADGRG5 is activated by its tethered-peptide-agonist because it is a cleaved adhesion GPCR. J Biol Chem 105223. 10.1016/j.jbc.2023.105223 Kishore A, Purcell RH, Nassiri-Toosi Z, Hall RA (2016) Stalk-dependent and stalk-independent signaling by the adhesion G protein-coupled receptors GPR56 (ADGRG1) and BAI1 (ADGRB1). J Biol Chem 291:3385–3394 Müller A et al (2015) Oriented Cell Division in the C. elegans Embryo Is Coordinated by G-Protein Signaling Dependent on the Adhesion GPCR LAT-1. PLoS Genet 11 Scholz N et al (2017) Mechano-dependent signaling by latrophilin/CIRL quenches cAMP in proprioceptive neurons. eLife 6 Monk KR et al (2009) A G Protein–Coupled Receptor Is Essential for Schwann Cells to Initiate Myelination. Science 325:1402–1405 Langenhan T et al (2015) Model Organisms in G Protein–Coupled Receptor Research. Mol Pharmacol 88:596–603 Bergmann C et al (2018) Polycystic kidney disease. Nat Rev Dis Prim 4:50 Qian F et al (2002) Cleavage of polycystin-1 requires the receptor for egg jelly domain and is disrupted by human autosomal-dominant polycystic kidney disease 1-associated mutations. Proceedings of the National Academy of Sciences 99, 16981–16986 Yu S et al (2007) Essential role of cleavage of Polycystin-1 at G protein-coupled receptor proteolytic site for kidney tubular structure. Proc. Natl. Acad. Sci. 104, 18688–18693 Wei W, Hackmann K, Xu H, Germino G, Qian F (2007) Characterization of cis-autoproteolysis of polycystin-1, the product of human polycystic kidney disease 1 gene. J Biol Chem 282:21729–21737 Scholz N, Langenhan T, Schöneberg T (2019) Revisiting the classification of adhesion GPCRs. Ann N York Acad Sci 1456:80–95 Nordström KJV, Lagerström MC, Wallér LMJ, Fredriksson R, Schiöth HB (2009) The Secretin GPCRs Descended from the Family of Adhesion GPCRs. Mol Biol Evol 26:71–84 Dohrmann M, Wörheide G (2017) Dating early animal evolution using phylogenomic data. Sci Rep 7:3599 Wittlake A, Prömel S, Schöneberg T (2021) The Evolutionary History of Vertebrate Adhesion GPCRs and Its Implication on Their Classification. Int J Mol Sci 22:11803 Krishnan A et al (2014) The GPCR repertoire in the demosponge Amphimedon queenslandica: insights into the GPCR system at the early divergence of animals. BMC Evol Biol 14:270 Illergård K, Ardell DH, Elofsson A (2009) Structure is three to ten times more conserved than sequence—A study of structural response in protein cores. Proteins: Struct Funct Bioinform 77:499–508 Piao X et al (2005) Genotype–phenotype analysis of human frontoparietal polymicrogyria syndromes. Ann Neurol 58:680–687 Chang G-W et al (2016) The Adhesion G Protein-Coupled Receptor GPR56/ADGRG1 Is an Inhibitory Receptor on Human NK Cells. Cell Rep 15:1757–1770 Russell RB, Barton GJ (1992) Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels. Proteins Struct Funct Bioinform 14:309–323 Frishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins Struct Funct Bioinform 23:566–579 Agirre J et al (2023) The CCP4 suite: integrative software for macromolecular crystallography. Acta Crystallogr Sect D Struct biology 79:449–461 Pedregosa F et al (2011) Scikit-Learn: Machine Learning in Python. J Mach Learn Res 12:2825–2830 Varadi M et al (2022) AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 50:D439–D444 Ng PC, Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Res 11:863–874 Adzhubei I, Jordan DM, Sunyaev SR (2013) Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protocols Hum Genet 0 7, Unit7.20 Pérez-Hernández G, Hildebrand PW (2022) mdciao: Accessible Analysis and Visualization of Molecular Dynamics Simulation Data. bioRxiv 2022.07.15.500163 10.1101/2022.07.15.500163 Munk C, Harpsøe K, Hauser AS, Isberg V, Gloriam DE (2016) Integrating structural and mutagenesis data to elucidate GPCR ligand binding. Curr Opin Pharmacol 30:51–58 Collins RL et al (2020) A structural variation reference for medical and population genetics. Nature 581:444–451 Vincent F et al (2016) Toward a Shared Vision for Cancer Genomic Data. N Engl J Med 375:1109–1112 Additional Declarations The authors declare potential competing interests as follows: The authors declare the following competing interests: D.E.G. is a part-time employee and warrant holder at Kvantify. Supplementary Files SupplementaryFig.docx SupplementaryTable1.xlsx SupplementaryTable2.xlsx SupplementaryTable3.docx SupplementaryTable4.docx Cite Share Download PDF Status: Published Journal Publication published 01 Jan, 2025 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4761600","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":328629315,"identity":"33dab1c3-748c-4beb-9a9c-896a09765c75","order_by":0,"name":"Florian Seufert","email":"","orcid":"https://orcid.org/0000-0002-0664-7169","institution":"Leipzig University","correspondingAuthor":false,"prefix":"","firstName":"Florian","middleName":"","lastName":"Seufert","suffix":""},{"id":328629316,"identity":"89b19790-07d6-489a-b30b-1b86226494d2","order_by":1,"name":"Guillermo Pérez-Hernández","email":"","orcid":"https://orcid.org/0000-0002-9287-8704","institution":"Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin","correspondingAuthor":false,"prefix":"","firstName":"Guillermo","middleName":"","lastName":"Pérez-Hernández","suffix":""},{"id":328629317,"identity":"10a87a31-daaf-44d8-be1a-dcdc149d7c78","order_by":2,"name":"Gáspár Pándy-Szekeres","email":"","orcid":"https://orcid.org/0000-0002-7697-2696","institution":"University of Copenhagen, HUN-REN Research Center for Natural Sciences","correspondingAuthor":false,"prefix":"","firstName":"Gáspár","middleName":"","lastName":"Pándy-Szekeres","suffix":""},{"id":328629318,"identity":"4059f766-c5d4-4198-9147-8e8544fa9cc0","order_by":3,"name":"Ramon Guixà-González","email":"","orcid":"https://orcid.org/0000-0003-0397-9800","institution":"Leipzig University, Institute for Advanced Chemistry of Catalonia","correspondingAuthor":false,"prefix":"","firstName":"Ramon","middleName":"","lastName":"Guixà-González","suffix":""},{"id":328629319,"identity":"580fff58-3891-44fb-8f5a-4e1e88324b16","order_by":4,"name":"Tobias Langenhan","email":"","orcid":"https://orcid.org/0000-0002-9061-3809","institution":"Leipzig University","correspondingAuthor":false,"prefix":"","firstName":"Tobias","middleName":"","lastName":"Langenhan","suffix":""},{"id":328629320,"identity":"198c4ae9-a7fe-4dd6-a08c-947034e721e4","order_by":5,"name":"David E. Gloriam","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABBUlEQVRIiWNgGAWjYDCCA0DEw8Agw8DA2MDAUAESSmAGEswGhLTwQLScIVILA0QLEDC2EaGF7/gZwwNvGGp5+KUPN366Oa8ucTt7ArMxD4O1MS4tkmdyDA7OYTjOI9mX2Cydu+1w4s6eB8zJPAzpZri0GBxISzjMw3CMx+AMYwNQy4HEDTcSmIEih21wajn/DK6l+XfunDoitNxIPgBUUAPS0iad28AM1gJ02GGcDpO88fjAwTkGB3gkexjbrHOOHTbecOZhs+Ecg3Sc3uc7n9j84U1FnRw/D/vj2zk1dbIbjicflnhTYW3YgEsPxHmHkXmgOMUdkTBQR1DFKBgFo2AUjGAAAF+qWuXoqL/QAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0002-4299-7561","institution":"University of Copenhagen","correspondingAuthor":true,"prefix":"","firstName":"David","middleName":"E.","lastName":"Gloriam","suffix":""},{"id":328629321,"identity":"8725c3e8-83f0-46ed-8e0a-af1412455ffe","order_by":6,"name":"Peter W. Hildebrand","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABfElEQVRIie2RQUsCQRTH3yLMXoa8zpJ+h5EFXUH0q8yyYJctg0D2EDEi7EnyWvglBMHzLgN2EY9hKKQIS4cO20UKJBtb3ZIoOkbs7zAMb96P/8wbgISEv4uHD4ChzU5py4UwIFEdTgHLIgdAH92pSEE75ZrvKfRbBWKl68G2/x2628RKQW37CwemGaRag4VzPs32boU/nq2MEyDHvkwpZQpp0XyAemmrFFsilRtCgBEOjnLDQaD3J1XLYJicATlhUqniIvddA0bVXe7YQhoHgRGx8xpHwuxP7DxhhJic2FSsqMDUb7hUccVOuV+oL5FSW2r8VZi9Tm1JGI0UmbLGVCgbZR2npJCyTUFawxVm99BGhLFY8TAdKM2Z4nrxWyxd4zJdvqWQa1wK/WpS1Q3mEdPFjxvFwsWW4gIbWfHE/PkTd0QlrVrBnC9Ftt2x5nfPqwuzrdp6CE65UkjfLMKwXt4bezz8z8QfIU/lVb/+1A/InlT4i76EhISE/8obeNaLD9GcZ/4AAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0003-0063-1104","institution":"Leipzig University","correspondingAuthor":true,"prefix":"","firstName":"Peter","middleName":"W.","lastName":"Hildebrand","suffix":""}],"badges":[],"createdAt":"2024-07-18 09:51:41","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":true,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-4761600/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4761600/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41467-024-55466-6","type":"published","date":"2025-01-02T00:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":60671031,"identity":"3c72ab49-857d-4e21-b12f-cd3b37155f60","added_by":"auto","created_at":"2024-07-19 10:11:18","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":200935,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe variable topology of the GPCR autoproteolysis inducing (GAIN) domain. a,\u003c/strong\u003e aGPCR topology with the N-terminal extracellular region composed of various extracellular domains (ECDs) and the GAIN domain directly N-terminal of the seven-transmembrane domain (TM). The GAIN domain is comprised of subdomain A and B, with the tethered agonist (TA) as the most C-terminal strand. The GAIN domain is frequently preceded by a hormone receptor motif (HRM) domain of unknown function. \u003cstrong\u003eb\u003c/strong\u003e, the GAIN domain is composed of two subdomains, with subdomain A (blue) comprised of 2-6 Helices with conservation decreasing towards the N-terminal boundary. The beta-sandwich subdomain B (orange) is composed of 13-14 strands with a conserved autoproteolytic cleavage triad (GPS) of sequence HL|T/S (red triangle: cleavage site), followed by the tethered agonist (TA, yellow). \u003cstrong\u003ec\u003c/strong\u003e, the GAIN domain of rat ADGRL1 (PDB ID: 4DLQ) shows all hallmarks of the GAIN domain.\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-4761600/v1/b55ecab7aef2f57554511fb8.png"},{"id":60671033,"identity":"10e9b92b-9c12-4c82-8aaa-56ee17bce558","added_by":"auto","created_at":"2024-07-19 10:11:18","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":303590,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGAIN domain helix and strand architecture of human aGPCRs and conservation across orthologs. a,\u003c/strong\u003e Example chimeric GAIN domain structure showing all six helices, 14 strands and the GPS on its topology, red asterisk marks the autoproteolytic cleavage site N-terminal of the tethered agonist (S14). \u003cstrong\u003eb, \u003c/strong\u003eeach row represents all orthologs in the UniProtKB database that have a GAIN domain for each receptor, where individual elements are highlighted by occupancy (blue: subdomain A helices, orange: subdomain B strands), higher color intensity represents a higher conservation of the element within the group of orthologs. White circles denote elements that are not present in the corresponding human aGPCR GAIN domain (ADGRA1 without GAIN; ADGRE4 is a pseudogene in human). Other receptors (red label) are aGPCRs without a receptor ortholog in humans. A set of 2,872 polycystic kidney disease-type proteins (PKD, green label) have GAIN domains, which were matched against the set of aGPCR templates, matching well with an additional beta-sandwich subdomain between extended S9 and S10. \u003cstrong\u003ec,\u003c/strong\u003e residue conservation for residues indexed with the GAIN-GRN for subdomain A (shades of blue) and subdomain B (shades of orange), with 14435 GAIN domains as the underlying number of GAIN domains. Unindexed residues are colored green.\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-4761600/v1/db826f2acb18326cc141f3c6.png"},{"id":60671039,"identity":"21a197c0-5ca0-4a29-90c9-1dd6f1e835c9","added_by":"auto","created_at":"2024-07-19 10:11:18","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":423212,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGeneric residue numbering denotes corresponding residues across receptors. \u003c/strong\u003eGRNs are equivalent residues in structure and sequence across receptors enabling comparisons of e.g., mutations\u003csup\u003e39,51,82\u003c/sup\u003e, sequence conservation\u003csup\u003e39\u003c/sup\u003e structural contacts and ligand interactions\u003csup\u003e48,82\u003c/sup\u003e. \u003cstrong\u003ea\u003c/strong\u003e, a GRN is composed of the segment identifier and the numeric residue index. The structural alignment of the GAIN helix 6 segment of nine human adhesion GPCR indicates the GRN at the respective aligned residue Cα (spheres). \u003cstrong\u003eb\u003c/strong\u003e, sequence alignment of the nine receptors with the receptor subfamily indicated in bold in front of the receptor name. Sequence alignment is based on the structural alignment, with the H6.50 GRN denoting the most conserved residue in the segment (bold). All other segment residues are indexed relative to the .50 position. \u003cstrong\u003ec\u003c/strong\u003e, residue table of aligned segments colored by chemical properties.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-4761600/v1/320f5a331e8c8c562d668632.png"},{"id":60671036,"identity":"c6cd1d5b-a7b9-4e0a-ace5-8ee9bf554543","added_by":"auto","created_at":"2024-07-19 10:11:18","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":815488,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOnline tools enabling use of GAIN GRNs. a,\u003c/strong\u003e Interactive notebook for assigning GRNs to any GAIN-containing, possibly unknown protein via the AlphaFoldDB model based on its UniProtKB identifier. The model is retrieved, the GAIN domain extracted and its boundaries detected, then the GAIN domain matched against the available templates to assign the GRNs based on the best template matches. \u003cstrong\u003eb-e,\u003c/strong\u003e GRNs data and tools made available in the GPCR database, GPCRdb. \u003cstrong\u003eb\u003c/strong\u003e, the snake plot of the GAIN domain provides a simple 2D representation with the option of custom coloring. \u003cstrong\u003ec,\u003c/strong\u003e Generic residue numbering tables show GRNs followed by receptor-specific residue numbers and amino acids. \u003cstrong\u003ed,\u003c/strong\u003e Sequence alignments allow swift sequence comparison across all GAIN segments (helices and strands) as well as conservation (% identify and consensus sequence) and physicochemical properties (residue polarity, size, helical propensity and z scales). \u003cstrong\u003ee,\u003c/strong\u003e The Sequence signature tool identifies structural determinants – uniquely conserved residues – upon contrasting of two sets of sequence alignments of receptors that have and lack the given function, respectively. \u003cstrong\u003ed,\u003c/strong\u003e Sequence aliments allow swift sequence comparison across all GAIN segments (helices and strands) as well as conservation (% identify and consensus sequence) and physicochemical properties (residue polarity, size, helical propensity and z scales). \u003cstrong\u003ee,\u003c/strong\u003e The Sequence signature tool identifies structural determinants – uniquely conserved residues – upon contrasting of two sets of sequence alignments of receptors that have and lack the given function, respectively\u003csup\u003e48\u003c/sup\u003e.\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-4761600/v1/61499a2e23b4e330a1543e11.png"},{"id":60671040,"identity":"e190f36f-03e6-49c1-8261-14e056d155e0","added_by":"auto","created_at":"2024-07-19 10:11:19","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":213392,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCancer Genome Atlas mutations mapped onto the GAIN domain via GRNs show mutational hotspots. \u003c/strong\u003eEnrichment scores were calculated via a relation between the number of naturally occurring variants (retrieved from Genome Aggregation Database (gnomAD), accessed on Jan 16\u003csup\u003eth\u003c/sup\u003e, 2023 from https://registry.opendata.aws/broad-gnomad). and cancer-associated mutations (retrieved from The Cancer Genome Atlas (TCGA) Genomic Data Commons, GDC, portal.gdc.cancer.gov) at a GRN according to the formula from Wright \u003cem\u003eet al.\u003c/em\u003e, 2019. \u003cstrong\u003ea,\u003c/strong\u003e the ten most cancer-enriched positions in humans, with the S4 element excluded due to too low number of mutations and variants. \u003cstrong\u003eb,\u003c/strong\u003e the ten most natural variant enriched positions, showing negative association with cancer, analogous to the mutation enrichment score \u003cstrong\u003ec, \u003c/strong\u003eenriched positions mapped onto the human ADGRD1 GAIN domain model (UniProt ID: Q6QNK2), with values above 0.1 of maximum intensity colored for cancer enriched (red sticks) and variant enriched (lavender) positions. \u003cstrong\u003ed\u003c/strong\u003e, a cluster of cancer enriched positions shows the most conserved residues of four strand segments (S7.50, S10.50, S11.50 and S14.50) contacting each other. All positions are part of the ten most cancer-enriched positions. \u003cstrong\u003ee\u003c/strong\u003e, logoplots of residue conservation (fraction of total structures with 1.0 meaning that the position is conserved in all 14,435 GAIN domains in the dataset) for the enriched cluster show strong residue conservation for the VWWL motif composed of V\u003csup\u003eS7.50\u003c/sup\u003e, W\u003csup\u003eS10.50\u003c/sup\u003e, W\u003csup\u003eS11.50\u003c/sup\u003e, L\u003csup\u003eS14.50\u003c/sup\u003e.\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-4761600/v1/0661dd32a4eef27ed2cc6d01.png"},{"id":73196341,"identity":"68c3d0a9-b90d-42dd-8650-a2801b9aa621","added_by":"auto","created_at":"2025-01-07 15:28:01","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2638949,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4761600/v1/546976f7-73aa-48f6-a351-4af02734d986.pdf"},{"id":60671487,"identity":"7ebd8c41-b9fc-4a17-a36a-2c16dbe5e4c7","added_by":"auto","created_at":"2024-07-19 10:19:18","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":3666946,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFig.docx","url":"https://assets-eu.researchsquare.com/files/rs-4761600/v1/a1ae6a1c17e266ab37db3e50.docx"},{"id":60671034,"identity":"8db1dbb7-1fe7-4d92-a3c8-6e42136a6cc6","added_by":"auto","created_at":"2024-07-19 10:11:18","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":7569,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTable1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-4761600/v1/c1abb58d0a0b8aaf38eb2fd8.xlsx"},{"id":60671037,"identity":"0264988c-1217-484b-8a47-27401c256592","added_by":"auto","created_at":"2024-07-19 10:11:18","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":103347,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTable2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-4761600/v1/b43bbd805d959f530585ef5b.xlsx"},{"id":60671038,"identity":"1f30aa7a-aa86-4a30-8439-a794524aac0c","added_by":"auto","created_at":"2024-07-19 10:11:18","extension":"docx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":39133,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTable3.docx","url":"https://assets-eu.researchsquare.com/files/rs-4761600/v1/06158dc90ebe261c2a85d4e1.docx"},{"id":60671041,"identity":"7411ecae-5b5e-48d0-9fa7-aa6552fdd6ee","added_by":"auto","created_at":"2024-07-19 10:11:19","extension":"docx","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":38579,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTable4.docx","url":"https://assets-eu.researchsquare.com/files/rs-4761600/v1/ebd8b50503e7952a2a149ce7.docx"}],"financialInterests":"The authors declare potential competing interests as follows: The authors declare the following competing interests: D.E.G. is a part-time employee and warrant holder at Kvantify. ","formattedTitle":"\u003cp\u003eGeneric residue numbering of the GAIN domain of adhesion GPCRs\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eAdhesion / class B2 G protein-coupled receptors (aGPCRs), the second-largest class of GPCRs, have garnered substantial research and medical interest due to their involvement in neural development, hereditary disorders and cancers among others\u003csup\u003e\u003cspan additionalcitationids=\"CR2 CR3\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. aGPCRs are characterized by a very large extracellular region, containing the conserved GPCR autoproteolysis inducing (GAIN) domain. The GAIN domain is positioned directly N-terminal of the seven-transmembrane domain (7TM, Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ea), which transduces an extracellular signal to intracellular effector proteins\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. The GAIN domain serves several functions. First, at its GPCR proteolysis site (GPS) an autoproteolytic cleavage event occurs adjacent to the 7TM domain, which yields a bipartite structure stabilized by non-covalent interactions\u003csup\u003e\u003cspan additionalcitationids=\"CR7 CR8\" citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. The two resulting elements, called N-terminal and C-terminal fragment (NTF/CTF), remain attached to one another even at the cell surface. Second, the GAIN domain contains a tethered agonist element (TA, Stachel, Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eb), which corresponds to the N-terminus of the CTF that arises through GAIN domain cleavage\u003csup\u003e\u003cspan additionalcitationids=\"CR11 CR12\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. The TA activates the receptor upon dissociation of the NTF/CTF complex\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e,\u003cspan additionalcitationids=\"CR15 CR16\" citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e, the biophysical intricacies of which are yet to be uncovered\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e,\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. Third, several aGPCRs act as metabotropic mechanosensors\u003csup\u003e\u003cspan additionalcitationids=\"CR21 CR22 CR23\" citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e, where the GAIN domain is proposed to serve as a molecular integrator of mechanical forces through its partial unfolding or eventual dissociation of the NTF/CTF complex upon force stimulation\u003csup\u003e\u003cspan additionalcitationids=\"CR26 CR27 CR28\" citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eX-ray and cryo-EM structures provided first insights into GAIN domain structures and TA-7TM complexes\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e,\u003cspan additionalcitationids=\"CR31 CR32 CR33 CR34 CR35\" citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. The available set of GAIN domain structures indicates a common architecture with two, structurally variable subdomains: The more variable subdomain A is comprised of up to six helices, and the more conserved subdomain B adopts a β-sandwich with the TA as its most C-terminal strand (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eb-c)\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e,\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e,\u003cspan additionalcitationids=\"CR36\" citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e\u003c/sup\u003e. The low sequence identity of GAIN domains and variable number of constituting segments, however, led to inadequate annotations of the GAIN domain in protein databases, hampering inter-species comparison of GAIN domains and limiting a holistic understanding of GAIN domain function.\u003c/p\u003e \u003cp\u003eGeneric residue numbers (GRNs) provide a common index to corresponding amino acids across the different members of a protein family. GRNs have a great utility as they enable comparison and inference of a multitude of residue data spanning pharmacology (e.g., \u003cem\u003ein vitro\u003c/em\u003e mutations), structural biology (e.g., ligand, domain or protein interactions) and genetics (natural variants). For GPCRs, the first GRN scheme was that of \u003cem\u003eBallesteros-Weinstein\u003c/em\u003e and assigned residue indices in the 7TM domains of class A GPCRs \u003csup\u003e\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e. A number, 50, is given to the most conserved residue in each of the seven helices that serves as a reference when assigning consecutive numbers of upstream and downstream. This system has since been adapted to other GPCR classes\u003csup\u003e\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u003c/sup\u003e, including the Wootten numbering scheme for the class B1 (Secretin) and B2 (Adhesion) receptor families\u003csup\u003e\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e\u003c/sup\u003e. As GPCR structures became available, these sequence-based schemes were found to suffer from non-generic numbers when some receptors have helix bulges or constrictions causing a one-position residue gap in structural alignment and offset of following residues in sequence alignment\u003csup\u003e\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u003c/sup\u003e. To mitigate this numbering issue, the GPCR database (GPCRdb) provided structure-based GRNs schemes for each GPCR class wherein structural residue gaps are also present as single gaps in the sequence alignment\u003csup\u003e\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u003c/sup\u003e. The GPCRdb schemes also added helix 8 (H8) and structurally conserved stretches of the first extra- and intracellular loops. Highly flexible and variable protein regions such as loops remain unannotated in currently established schemes. GRNs have seen wide adoption among researchers, becoming a frequently used tool in communicating GPCR research. Other GRN schemes like the kinase\u0026thinsp;\u0026minus;\u0026thinsp;ligand interaction fingerprints and structure database (KLIFS)\u003csup\u003e\u003cspan additionalcitationids=\"CR42\" citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e\u003c/sup\u003e., the common G protein Gα numbering (CGN)\u003csup\u003e\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e\u003c/sup\u003e or the common arrestin numbering (CAN)\u003csup\u003e\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e serve in mapping functional protein networks or drug-development purposes in additional protein families.\u003c/p\u003e \u003cp\u003eHere, we introduce a GRN scheme for aGPCR GAIN domains based on the superposition of more than 14,000 structural GAIN domain models generated with ColabFold/AlphaFold 2\u003csup\u003e46,47\u003c/sup\u003e. We highlight structural variability and common features of all aGPCR GAIN domains and demonstrate position-specific data transfer by finding cancer-enriched positions in humans. The GAIN-GRN was implemented into the GPCRdb\u003csup\u003e\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e,\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e to allow for intuitive use in a highly accessible and widely adapted resource, as well as programmatic access to the data. Additionally, we provide a web-based notebook enabling the \u003cem\u003ead-hoc\u003c/em\u003e indexing of any GAIN-domain containing protein. These results promote future experiments that focus on the central role of GAIN domains in the signal transduction and physiological functions of aGPCRs, and will aid analyses on how structural anomalies contribute to aGPCR dysfunction under disease conditions.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eThe heterogeneity of the GAIN domain necessitates structure-based residue numbering\u003c/h2\u003e \u003cp\u003eA comprehensive analysis of GAIN domains by means of multiple sequence alignments fails due to their low sequence identity and variable number of segments (α-helices and β-strands)\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e,\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e,\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. Thus, to enable a comprehensive description of the GAIN domain, we opted for a structure-based approach, for which we generated a set of 14,435 GAIN domain models encompassing orthologs of the 33 mammalian aGPCR and 916 non-ortholog proteins with ColabFold/AlphaFold 2\u003csup\u003e46,47\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eIn order to assess the composition of both GAIN subdomains, we used structural alignments with GESAMT\u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e for indexing segments. Using this approach, the segment position in space determines its index instead of their sequence-based order, allowing the assignment of equivalent positions for GRN indexing in the context of variable domain composition. Based on the complete set of GAIN domain models, we asserted that subdomains A and B are composed of two to six helices and 12 to 14 strands, respectively, which we indexed using the identifiers H1-6 and S1-14 (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea). Subdomain B exhibits generally high segment conservation, with only strand 4 specific to subfamilies A and G (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb). The composition of subdomain A is more variable. While the A, B, C, F and L subfamilies all have six helices, the D and G subfamilies show heterogeneity ranging from two to six helices (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb). Structures reflecting subdomain A variability are for example the rat ADGRL1 with a six-helix bundle (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003ec)\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e, and the human ADGRG1 GAIN domain with only Helix 4 and 6\u003csup\u003e36\u003c/sup\u003e, the two most conserved helices in the dataset.\u003c/p\u003e \u003cp\u003eWhen looking at individual residue positions (to which GRN labels are assigned, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec), notably the center regions of helix 3, 4 and 6 are more frequently occupied (occupancy referring to the fraction of models containing a segment in the dataset) than the extreme positions, highlighting varying helix lengths in the model dataset, with especially the L subfamily exhibiting longer helices. Aside from the residues in the less conserved strands 4 and 7, there is high occupancy in subdomain B. The unindexed GAIN domain loops connecting the structured elements show very different lengths, frequently exceeding 50 residues (Supplementary Fig.\u0026nbsp;6). Notably, a total of 84 homologs of ADGRA1 in 47 species have a GAIN domain, which is not found in human ADGRA1\u003csup\u003e8\u003c/sup\u003e, whereas 78 GAIN domains were identified for ADGRE4, which is a pseudogene in human (Supplementary Table\u0026nbsp;1).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eGeneric residue numbering denotes corresponding GAIN domain amino acids across receptors\u003c/h2\u003e \u003cp\u003eBased on the GAIN-GRN indexing of all 14,435 structural models, we created comprehensive alignments of the GAIN domain in structure and in sequence (Supplementary Fig.\u0026nbsp;1). These provide a novel utility to map data across all adhesion GPCRs and cross-map sequence-position specific data between homologs. A schematic of the GRN assignment process is outlined in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. Each GRN consists of the segment identifier (e. g. Helix 6 = \u0026ldquo;H6\u0026rdquo;) and the respective index relative to the most conserved residue in the segment, separated by a dot (e.g. \u0026ldquo;H6.50\u0026rdquo;). In this example, representative GAIN domains from each aGPCR subfamily are structurally aligned, with the Cα-atoms corresponding to GRN positions (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea). Using the backbone alignment positions as the basis for a sequence alignment, the most conserved position is identified here as the acidic E/D and gets assigned the \u0026ldquo;center\u0026rdquo; .50 index (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb). A residue table of the aligned segments highlights variation in segment lengths and reveals positions with similar physicochemical properties, e.g. the H6.50 as acidic, H6.45, H6.46 and H6.52 as aliphatic, despite low sequence identity (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eGPCRdb resources aiding use of GAIN GRNs\u003c/h2\u003e \u003cp\u003eAs part of the GPCRdb integration, we assigned the 6 helices (H1-6), 14 strands (S1-14), and 21 loops connecting segments (h1h2, h2h3, h3h4, h4h5, h5h6, h6s1, s1s2, s2s3, s3s4, s4s5, s5s6, s6s7, s7s8, s8s9, s9s10, s10s11, s11s12, s12s13, s13gps, gpss14, s14tm1) to all Class B2 (Adhesion) sequences. In addition, the GPS motif has a separate assigned segment with the conventional GPS-2, GPS-1 and GPS\u0026thinsp;+\u0026thinsp;1 notation for the three residues closest to the catalytic site. With the introduction of these new segments, researchers can apply the GPCRdb toolkit to the whole, or selected parts of the GAIN domain. We updated the snake plots of the Class B2 (Adhesion) GPCRs to contain the segments of the GAIN domain (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb). The snake plots can be found on the Receptor page (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://gpcrdb.org/protein/\u003c/span\u003e\u003cspan address=\"https://gpcrdb.org/protein/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) with custom coloring options\u003csup\u003e\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e. Along the already provided data, GAIN domain data is also accessible programmatically via REST API.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eConsensus contacts stabilizing the GAIN domain fold\u003c/h2\u003e \u003cp\u003eAddressing of GAIN domain positions via GRN enables mapping any GRN-label-dependent information across aGPCR homologs. Corroborating the analysis of tertiary structures, we exploit the GRN indexing to consolidate pairwise residue-residue contacts - particular to each structure - into unified consensus contacts. The entirety of GRN-label pairs occurring over the complete dataset with a given frequency yields the GAIN domain contactome, shown in Supplementary Fig.\u0026nbsp;7 as a flareplot. This plot represents a contact matrix, individually resolving consensus contacts at the residue-level while highlighting contact relationships between the different GAIN segments. The importance of H6 as a \u0026ldquo;hub\u0026rdquo; connecting the subdomains A and B is clearly seen with highly conserved contacts to primarily S6 and also S2, S8, S10, and S14. Furthermore, H4 partially tethers subdomains A and B via highly conserved contacts to S1 and S2. In Supplementary Tables\u0026nbsp;3 and 4, the most frequent inter-domain and GPS contacts are listed individually, respectively. Additionally, we coarse-grained the contactome into the GAIN segments (Supplementary Fig.\u0026nbsp;8) reproducing the tethering structure of the segments regardless of individual contacts, further highlighting H4 and H6 as the segments mediating contacts between both GAIN subdomains.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eA map of cancer-enriched mutations in the GAIN domain\u003c/h2\u003e \u003cp\u003eThe GAIN domain, present in 31 of 32 human aGPCRs, is a mutational hotspot affected in various pathologies\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e,\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003e. To find cancer-enriched positions and differentiate them from variance-enriched positions, we adopted the cancer-enrichment score from Wright \u003cem\u003eet al.\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e for all 31 human aGPCR GAIN domains (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea-b) indexed by the GAIN-GRN. While generally, enriched positions of both types are distributed throughout the GAIN domain (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ec), eight of the ten most cancer-enriched GRN positions are found in subdomain B carrying the TA (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ea, Supplementary Fig.\u0026nbsp;2). We identify a \u0026ldquo;VWWL\u0026rdquo; motif consisting of four conserved top-ten cancer-enriched residues S7.50, S10.50, S11.50 and S14.50. This buried motif is located in direct vicinity of the GPS cleavage site (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003ed-e, Supplementary Fig.\u0026nbsp;1), with mutations known to affect GAIN domain autoproteolysis and TA function: the conserved leucine at the S14.50 position (Supplementary Fig.\u0026nbsp;1) is a TA residue deeply buried into the orthosteric binding site of the 7TM domain in active aGPCR-7TM structures\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e,\u003cspan additionalcitationids=\"CR31 CR32 CR33\" citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e,\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e\u003c/sup\u003e, and its mutation led to altered receptor activity\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e,\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e,\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e,\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e\u003c/sup\u003e. Strikingly, the mutation of any tryptophane within the \u0026ldquo;VWWL\u0026rdquo; motif causes loss-of-function in rat ADGRL1\u003csup\u003e8\u003c/sup\u003e. With a comprehensive analysis of human aGPCR GAIN domains, we find a total of 46 cancer-enriched positions (Supplementary Fig.\u0026nbsp;2). By using the GAIN-GRN, homologous cancer enriched residue positions can now be assigned to any GAIN domain. This allows the transferal of positional information between GAIN domains in different species, particularly from human to model organisms, such as \u003cem\u003eD. melanogaster\u003c/em\u003e, \u003cem\u003eC. elegans\u003c/em\u003e or \u003cem\u003eD. rerio\u003c/em\u003e \u003csup\u003e\u003cspan additionalcitationids=\"CR59 CR60\" citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e\u003c/sup\u003e. The functional analyses in these model systems can now provide valuable insights into the molecular causes of cancer mutations in human in future studies.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eGAIN domains of PKD1/PKD1-like proteins possess an extended topology\u003c/h2\u003e \u003cp\u003eThe only other protein family known to contain GAIN domains are PKD (polycystic kidney disease)1/PKD1-like proteins (in short here PKD1; also referred to as polycystin-1[PC1])\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. Mutations in PKD1 are responsible for the majority of autosomal dominant kidney PKD, a devastating disorder that entails the development of cysts in the kidney and other organs leading to their eventual failure\u003csup\u003e\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e\u003c/sup\u003e. PKD1 GAIN domains display similar molecular properties as aGPCR GAIN domains with autoproteolytic cleavage resulting in a bipartite NTF-CTF protein layout after proteolysis\u003csup\u003e\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e\u003c/sup\u003e. Enabling the comparison and transferal of experimental and mutational knowledge between aGPCR and PKD1 GAIN folds is the basis for the understanding of similarities and differences between the two, and can offer valuable insights into the cell biological and physiological consequences of GAIN domain functions. However, thus far such transfer has been obstructed by the lack of clear homology assignments of primary and secondary structural positions between aGPCR and PKD1 GAIN domains. Thus, we next employed the GAIN-GRN scheme to allocate positional labels in PKD1 GAIN domains and compare them to those of aGPCR GAIN folds.\u003c/p\u003e \u003cp\u003eSince no experimental structure of PKD1 GAIN domains is available yet, we prepared 2,738 structural models analogously to the aGPCR dataset\u003csup\u003e\u003cspan additionalcitationids=\"CR64\" citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e\u003c/sup\u003e. We applied the GAIN-GRN scheme to the models, which on average resulted in four subdomain A α-helices and twelve subdomain B β-strands recognized by the GAIN-GRN method (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb), thus structural elements homologous to aGPCR GAIN domains. Interestingly, we also observed differences to aGPCR GAIN domain layouts as the PKD1 GAIN domains showed an additional β-sandwich fold, which contains an extension of S10 and C-terminally elongated TA. Finally, we also observed up to a total of eight subdomain A helices with additional, unindexed helices (Supplementary Fig.\u0026nbsp;3).\u003c/p\u003e \u003cp\u003eIn sum, the GAIN-GRN scheme provides a robust strategy to allocate structural homologies at the primary and secondary levels also to GAIN folds of PKD1 molecules, which now renders positions in both GAIN domain types comparable to one another.\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe GAIN domain is an ancient extracellular protein domain of the large adhesion GPCR family, involved in neural development, hereditary disorders and cancer\u003csup\u003e\u003cspan additionalcitationids=\"CR2 CR3\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Despite recent insights obtained from high-resolution structures of GAIN domains in complex with the 7TM\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e, the GAIN domain function in autoproteolysis, mechanosensing and TA-dependent receptor activation is still poorly understood. To overcome the limitations imposed by the structural heterogeneity of GAIN domains in aGPCR homologs - the variable number of secondary structure segments and overall low sequence identity - we developed the GAIN-GRN as a generic residue numbering scheme for aGPCR GAIN domains. We used spatial alignments of structural models, generating multiple sequence alignments to define the reference residue position as the most conserved residue in each segment\u003csup\u003e\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e,\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e,\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e. The GAIN-GRN is based on GAIN domain models predicted by AlphaFold 2/Colabfold to include most GAIN domains in proteins present in the Uniprot database\u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e,\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. To aid users in employing GAIN GRNs for data analysis and hypothesis-generation, we implemented the GAIN-GRN in the GPCRdb serving as an accessible and established resource (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). We also show that the GAIN-GRN is a robust tool to assign structure-homologous residues across molecule families as we have retrieved GRN also for PKD1 GAIN domains.\u003c/p\u003e \u003cp\u003eStatistical evaluation of the dataset of aGPCR GAIN domains enables us to assess their composition as well as the spatial and positional conservation of information as reflected by the GAIN \u003cem\u003econtactome\u003c/em\u003e (Supplementary Fig.\u0026nbsp;7\u0026ndash;8). The evolutionarily conserved two-subdomain architecture of the GAIN domain is present in humans as well as distantly related organisms such as \u003cem\u003eTrichoplax adhaerens\u003c/em\u003e\u003csup\u003e\u003cspan additionalcitationids=\"CR67 CR68 CR69\" citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e\u003c/sup\u003e. Subdomain B, containing the autoproteolytic cleavage site and the tethered agonist, is structurally less variable consisting of 12\u0026ndash;14 β-strands, in agreement with its implied function in NTF-CTF association, force-dependent GAIN domain separation and mechanosensing\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e,\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e,\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e. Notably, our analysis underlines the notion that the known \"GPS motif\" is not an individual protein domain, as initially anticipated, but rather the C-terminal section of Subdomain B\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. By contrast, subdomain A, shows high structural heterogeneity with only two critically conserved helices (H4 and H6, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e), with their core regions forming an interface with and presumably stabilizing subdomain B (Supplementary Fig.\u0026nbsp;7\u0026ndash;8). Despite structural heterogeneity, our structure-based alignment reveals highly conserved stretches of residues and segments with low overall sequence identity but similar physicochemical properties (Supplementary Fig.\u0026nbsp;1), thus corroborating the notion that structural conservation outweighs sequence conservation\u003csup\u003e\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eCreating structure-based alignments of larger protein sets with representatives in humans enables us to structurally map benign and malign mutations for testing in homologous positions of distantly related proteins. For example, the mutations within the newly coined \u0026ldquo;VWWL\u0026rdquo; motif (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e) close to the GPS may now be tested in any model system based on their GRN index. Analogously, we can now assess the location of known pathological mutations: Avila-Zozaya \u003cem\u003eet al.\u003c/em\u003e have investigated cancer-related mutations in ADGRL3, with impacts on G\u003csub\u003e13\u003c/sub\u003e-signaling for K561N\u003csup\u003eH\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e.\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e, D798H\u003csup\u003eS\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e.\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e, S810L\u003csup\u003es\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003es\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e and E811Q\u003csup\u003es\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003es\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e, where the latter two residues correspond to the interaction region of the GAIN domain with the seven-transmembrane domain\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e,\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003e. Two mutations responsible for loss of surface expression in GPR56, causing bilateral frontoparietal polymicrogyria (BFPP), are the highly conserved C346S\u003csup\u003eS\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e.\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e and W349S\u003csup\u003eS\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e.\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e ref\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e,\u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e72\u003c/span\u003e,\u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e73\u003c/span\u003e\u003c/sup\u003e. More generally, our approach promotes future experiments focusing on the central role of GAIN domains in physiological functions of aGPCRs and PKD1 molecules, and will aid analyses on how structural anomalies contribute to their dysfunction under disease conditions.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003eAll computational pipelines were implemented in Python 3.9.\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eGeneration of the GAIN domain model dataset\u003c/h2\u003e \u003cp\u003eSequences were retrieved from the UniProtKB database with two queries for adhesion GPCR and CELSR, respectively, yielding 22,946 and 2,179 sequences, respectively (Supplementary Fig.\u0026nbsp;5). Sequences were filtered for a minimum length of 50 residues and the presence of a \u0026ldquo;GPS\u0026rdquo; domain annotation in their domain records. The C-terminal sequence boundary was read from the \u0026ldquo;GPS\u0026rdquo; Domain record, whereas for the N-terminal boundary, lengths exceeding 800 residues were truncated, resulting in 16,537 sequences. The structures of all processed sequences containing potential GAIN domains were predicted with ColabFold\u003csup\u003e\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e,\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u003c/sup\u003e by using batches of 30 length-sorted sequences with a pre-defined padding to account for sequence length differences per batch.⁠ A multiple sequence alignment was constructed from initial 15,957 successfully folded and non-doublet aGPCR/CELSR sequences using MAFFT\u003csup\u003e\u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e74\u003c/span\u003e\u003c/sup\u003e for localizing the GPS motif (SI Methods).⁠\u003c/p\u003e \u003cp\u003eThe secondary structure information of the resulting folded structures was read out with STRIDE\u003csup\u003e\u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e75\u003c/span\u003e\u003c/sup\u003e.⁠⁠ The data from the resulting files was used to apply two criteria for a valid GAIN domain: The presence of both the helical Subdomain A and the β-sandwich Subdomain B as well as the existence of the GPS or a homologously aligned sequence. The filtered dataset consists of 14,435 valid GAIN domains (Supplementary Fig.\u0026nbsp;5). The human dataset consists of 31 aGPCR GAIN domains.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eGAIN Domain Detection\u003c/h2\u003e \u003cp\u003eThe presence of both subdomains was detected by using a numerical transformation of the sequence, assigning a 1 to helical and \u0026minus;\u0026thinsp;1 to beta-strand residues. By using linear convolution, a signal was generated, whose sign changes were detected as boundaries between helical and sheet-like protein segments. The presence of both subdomains was confirmed by identifying the largest helical segment adjacent to a C-terminal sheet-like segment corresponding to Subdomain in each respective structure. The signal decay N-terminal of Subdomain A, by presence of non-helical residues was used to determine the GAIN domain boundary for each generated model. The column index of the GPS-1 residue (corresponding to leucine in the conserved HL|S/T triad) was set as reference for detecting the presence of a GPS motif or homologous aligned sequence elements. Any structure showing a residue at the corresponding column in the MSA was set as possessing the GPS, therefore satisfying the second criterion.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eTemplate Model Selection\u003c/h2\u003e \u003cp\u003eTemplate candidates were extracted by selecting a random 400 structures of each subfamily GAIN domain models and generating a root-mean-square deviation (RMSD) matrix by pairwise alignment using GESAMT in the CCP4.0 package\u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e,\u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e76\u003c/span\u003e\u003c/sup\u003e on the respective subdomain. The matrix was clustered and sorted using agglomerative clustering via the scikit-learn python package\u003csup\u003e\u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e77\u003c/span\u003e\u003c/sup\u003e and the lowest-RMSD model of the largest cluster selected as a candidate template. Candidate templates were checked against each receptor sub-selection of the dataset via occupancy (fraction of structures matching the template anchor) and distance (pairwise Cα-Cα distance). Filtering out badly matched receptors from the initial set, additional templates were added selected from individual receptor selections, reaching a total of 15 subdomain A and two subdomain B templates for the complete indexing.\u003c/p\u003e \u003cp\u003eSegment center residues for each element were generated by pairwise aligning all GAIN domains against each candidate template using GESAMT and collecting all pairwise residue matches into a multiple sequence alignment, finding the position of highest occupancy and residue identity. Segment centers were validated and manually curated via 3-D aligning all candidate template and verifying identical position of the anchor in space. The position of the H4 segment center was manually adjusted to avoid ambiguities with the H5 residue center. The unique orientation of the most N-terminal helix of ADGRD1, ADGRE1 and ADGRF4 yielded three individual segment centers. Each receptor GAIN was assigned a template per subdomain to be matched to by default.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eSegment Overlap and ambiguity cases\u003c/h2\u003e \u003cp\u003eFor some cases of low-quality proteins, SSE of template and GAIN were overlapping without a pairwise match of the template anchor. In these cases, the match closest to the template anchor was set as the reference position considering the offset (i. e. \u0026ldquo;S14.47\u0026rdquo; when the residue is three residues N-terminal of the template S14.50) and enumerated analogously from there.\u003c/p\u003e \u003cp\u003eAnchor ambiguity cases arose when two elements were detected as one by STRIDE with two template center residues matched, however the spatial orientation of two SSE was distinguishable. These cases were handled by a hierarchical segment splitting routine assessing the segment between both matched segment centers in decreasing priority: presence of a coiled residue, a residue with backbone angles outside of five standard deviations of the element total distribution in the dataset, presence of a proline or glycine and a manually defined truncation element for common occurrences.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eCreating the template set\u003c/h2\u003e \u003cp\u003eTemplates are defined as consensus structural models used for structural alignment of other GAIN domain models for segment identification and indexing. Templates were defined separately for GAIN Subdomain A and B. The definition of the template set consisted of three steps: Identifying candidate template structures, finding their center positions and assessing their coverage and quality for integration into the final template set (Supplementary Fig.\u0026nbsp;4). Templates have the center residues of each segment already assigned based on structural alignments of all template structures (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eIndexing via GAIN-GRN\u003c/h2\u003e \u003cp\u003eEach GAIN domain was pairwise aligned to its assigned subdomain A and B template, respectively. GAIN domains not assigned a receptor were structurally aligned to all templates using GESAMT\u003csup\u003e\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e\u003c/sup\u003e, selecting the lowest RMSD template for each subdomain. For each SSE the residue matching the template center was labeled \u0026ldquo;##.50\u0026rdquo; with the corresponding element name (H1-H6, S1-14), enumerating all residues in the SSE with numbers decreasing in the N-terminal and increasing in the C-terminal direction. Each ordered residue in the GAIN was assigned a label and exported tabulated. An additional workflow was created in an interactive notebook enabling the assignment of the GAIN-GRN for any protein with an associated model in the alphafoldDB\u003csup\u003e\u003cspan citationid=\"CR78\" class=\"CitationRef\"\u003e78\u003c/span\u003e\u003c/sup\u003e with either retrieving the information about the GPS from the Uniprot database or manually defining the C-terminal GAIN boundary.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eMutation Mapping onto GRN positions\u003c/h2\u003e \u003cp\u003eMutations were retrieved from the Cancer Genome Atlas (TCGA, within the Genomic Data Commons \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://portal.gdc.cancer.gov/\u003c/span\u003e\u003cspan address=\"https://portal.gdc.cancer.gov/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) for each of the 31 human aGPCRs, yielding a total of 6,874 individual mutations. A routine was implemented to correct the residue indices of the GAIN domain residues to match the UniProtKB indices. By matching each position, we assigned the GRN to each occurring mutation within the indexed GAIN domain space with a total of 861 mutations, of which 769 mutations were within ordered segments with individual labels. Additionally, we implemented a parsing routine to parse the mapped mutations, map the number of mutations and their occurrence onto any GRN-mapped GAIN domain and filter mutations by the impact metrics SIFT and Polyphen\u003csup\u003e\u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e79\u003c/span\u003e,\u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e80\u003c/span\u003e\u003c/sup\u003e to tailor the query routine to the individual purpose. In our example, cancer-enriched positions were extracted by calculating the number of cancer-associated mutations against the number of natural variants extracted from dbSNP (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e\u003ca href=\"https://gpcrdb.org/protein/\" target=\"_blank\"\u003ewww.nbci.nlm.nih.gov/snp/\u003c/a\u003e\u003c/span\u003e\u003cspan address=\"http://www.nbci.nlm.nih.gov/snp/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) analogous to Wright et al., 2019\u003csup\u003e54\u003c/sup\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eContact Frequencies\u003c/h2\u003e \u003cp\u003eFor each of the 14,435 GAIN domain structures in the dataset, heavy-atom residue-residue contacts were computed using a distance cutoff of 4 \u0026Aring;. All pairs of residues sharing a contact are aggregated into a single contact matrix which is indexed with GRN labels. Some elements of this matrix are shown partially as well as in full in Supplementary Table\u0026nbsp;2\u0026ndash;4. The computation of contacts, GRN label-handling and plotting (flareplot and contact-matrix) was done using mdciao\u003csup\u003e\u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e81\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe like to thank Peter Stadler and Franziska Reinhardt of Leipzig University for providing initial GAIN domain multiple sequence alignments, and Albert J. Kooistra for his consultations. This work was funded by the Deutsche Forschungsgemeinschaft (DFG), project number 421152132, SFB 1423, subproject A06, C01, Z04. This work was funded by grants from the Lundbeck Foundation (R383-2022-306) and Novo Nordisk Foundation (NNF23OC0082561) to D.E.G.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eF.S. Conceptualization; Methodology; Validation; Investigation; Software; Formal Analysis; Visualization; Writing \u0026ndash; original draft; Writing \u0026ndash; review and editing; Data Curation\u003c/p\u003e\n\u003cp\u003eP.W.H. Conceptualization; Supervision; Funding Acquisition; Validation; Writing \u0026ndash; original draft (supporting); Writing \u0026ndash; review and editing\u003c/p\u003e\n\u003cp\u003eG.P.-H. Conceptualization, Methodology, Validation, Writing \u0026ndash; review and editing; Visualization\u003c/p\u003e\n\u003cp\u003eG.P.-S. Software; Resources; Validation; Visualization; Data Curation\u003c/p\u003e\n\u003cp\u003eR.G.-G. Conceptualization (supporting); Validation (supporting)\u003c/p\u003e\n\u003cp\u003eT.L. Conceptualization (supporting); Writing \u0026ndash; original draft (supporting); Writing \u0026ndash; review and editing\u003c/p\u003e\n\u003cp\u003eD.E.G. Conceptualization; Supervision; Funding Acquisition; Validation; Writing \u0026ndash; original draft (part and supporting); Writing \u0026ndash; review and editing\u003c/p\u003e\n\u003cp\u003eThe authors declare the following competing interests: D.E.G. is a part-time employee and warrant holder at Kvantify.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe generated GAIN domain models have been deposited in the online repository zenodo (DOI 10.5281/zenodo.12515545). The generated code and interactive notebooks are available under https://github.com/FloSeu/GAIN-GRN .\u003c/p\u003e\n\u003cdiv id=\"_com_1\" language=\"JavaScript\"\u003e\n \u003cp\u003eI copied this sentence from another manuscript. I think it is the NPG format, but otherwise please rephrase.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eChiang N-Y et al (2017) GPR56/ADGRG1 Activation Promotes Melanoma Cell Migration via NTF Dissociation and CTF-Mediated Gα12/13/RhoA Signaling. J Invest Dermatology 137:727\u0026ndash;736\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eScholz N (2018) Cancer Cell Mechanics: Adhesion G Protein-coupled Receptors in Action? Front Oncol 8\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKan Z et al (2010) Diverse somatic mutation patterns and pathway alterations in human cancers. Nature 466:869\u0026ndash;873\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLangenhan T, Piao X, Monk KR (2016) Adhesion G protein-coupled receptors in nervous system development and disease. Nat Rev Neurosci 17:550\u0026ndash;561\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBatebi H et al (2024) Mechanistic insights into G-protein coupling with an agonist-bound G-protein-coupled receptor. Nat Struct Mol Biol 1\u0026ndash;10. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41594-024-01334-2\u003c/span\u003e\u003cspan address=\"10.1038/s41594-024-01334-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePr\u0026ouml;mel S, Langenhan T, Ara\u0026ccedil; D (2013) Matching structure with function: The GAIN domain of Adhesion-GPCR and PKD1-like proteins. Trends Pharmacol Sci 34:470\u0026ndash;478\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiao Y, Pei J, Cheng H, Grishin NV (2014) An ancient autoproteolytic domain found in GAIN, ZU5 and Nucleoporin98. J Mol Biol 426:3935\u0026ndash;3945\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAra\u0026ccedil; D et al (2012) A novel evolutionarily conserved domain of cell-adhesion GPCRs mediates autoproteolysis. EMBO J 31:1364\u0026ndash;1378\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePohl F et al (2023) Structural basis of GAIN domain autoproteolysis and cleavage-resistance in the adhesion G-protein coupled receptors. bioRxiv. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/2023.03.12.532270\u003c/span\u003e\u003cspan address=\"10.1101/2023.03.12.532270\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiebscher I et al (2014) A Tethered Agonist within the Ectodomain Activates the Adhesion G Protein-Coupled Receptors GPR126 and GPR133. Cell Rep 9:2018\u0026ndash;2026\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStoveken HM, Hajduczok AG, Xu L, Tall GG (2015) Adhesion G protein-coupled receptors are activated by exposure of a cryptic tethered agonist. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e 112, 6194\u0026ndash;6199\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMathiasen S et al (2020) G12/13 is activated by acute tethered agonist exposure in the adhesion GPCR ADGRL3. Nat Chem Biol 16:1343\u0026ndash;1350\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu B et al (2019) GAIN domain-mediated cleavage is required for activation of G protein- coupled receptor 56 (GPR56) by its natural ligands and a small-molecule agonist. J Biol Chem 294:19246\u0026ndash;19254\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePaavola KJ, Stephenson JR, Ritter SL, Alter SP, Hall RA (2011) The N Terminus of the Adhesion G Protein-coupled Receptor GPR56 Controls Receptor Signaling Activity. J Biol Chem 286:28914\u0026ndash;28921\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFrenster JD et al (2021) Functional impact of intramolecular cleavage and dissociation of adhesion G protein\u0026ndash;coupled receptor GPR133 (ADGRD1) on canonical signaling. J Biol Chem 296:100798\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang L et al (2011) GPR56 Regulates VEGF Production and Angiogenesis during Melanoma Progression. Cancer Res 71:5558\u0026ndash;5568\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStoveken HM, Hajduczok AG, Xu L, Tall GG (2015) Adhesion G protein-coupled receptors are activated by exposure of a cryptic tethered agonist. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e 112, 6194\u0026ndash;6199\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSeufert F, Chung YK, Hildebrand PW, Langenhan T (2023) 7TM domain structures of adhesion GPCRs: what\u0026rsquo;s new and what\u0026rsquo;s missing? Trends Biochem Sci 48:726\u0026ndash;739\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMao C et al (2024) Conformational transitions and activation of the adhesion receptor CD97. Mol Cell. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.molcel.2023.12.020\u003c/span\u003e\u003cspan address=\"10.1016/j.molcel.2023.12.020\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eScholz N et al (2015) The Adhesion GPCR Latrophilin/CIRL Shapes Mechanosensation. Cell Rep 11:866\u0026ndash;874\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePetersen SC et al (2015) The Adhesion GPCR GPR126 Has Distinct, Domain-Dependent Functions in Schwann Cell Development Mediated by Interaction with Laminin-211. Neuron 85:755\u0026ndash;769\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWilde C et al (2016) The constitutive activity of the adhesion GPCR GPR114/ADGRG5 is mediated by its tethered agonist. FASEB J 30:666\u0026ndash;673\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu D et al (2022) CD97 promotes spleen dendritic cell homeostasis through the mechanosensing of red blood cells. Science 375:eabi5965\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBoyden SE et al (2016) Vibratory Urticaria Associated with a Missense Variant in ADGRE2. N Engl J Med 374:656\u0026ndash;663\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eScholz N et al (2023) Molecular sensing of mechano- and ligand-dependent adhesion GPCR dissociation. Nature 615:945\u0026ndash;953\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFu C et al (2023) Step-wise mechanical unfolding and dissociation of the GAIN domains of ADGRG1/GPR56, ADGRL1/Latrophilin-1 and ADGRB3/BAI3: insights into the mechanical activation hypothesis of adhesion G protein-coupled receptors. \u003cem\u003ebioRxiv\u003c/em\u003e 2023.03.14.532526 \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/2023.03.14.532526\u003c/span\u003e\u003cspan address=\"10.1101/2023.03.14.532526\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDumas L et al (2023) Uncovering and engineering the mechanical properties of the adhesion GPCR ADGRG1 GAIN domain. \u003cem\u003ebioRxiv\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBeliu G et al (2021) Tethered agonist exposure in intact adhesion/class B2 GPCRs through intrinsic structural flexibility of the GAIN domain. Mol Cell 81:905\u0026ndash;921e5\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhong BL et al (2023) Piconewton Forces Mediate GAIN Domain Dissociation of the Latrophilin\u0026ndash;3 Adhesion GPCR. Nano Lett 23:9187\u0026ndash;9194\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXiao P et al (2022) Tethered peptide activation mechanism of the adhesion GPCRs ADGRG2 and ADGRG4. Nature 604:771\u0026ndash;778\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePing YQ et al (2021) Structures of the glucocorticoid-bound adhesion receptor GPR97\u0026ndash;Go complex. Nature 589:620\u0026ndash;626\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBarros-\u0026Aacute;lvarez X et al (2022) The tethered peptide activation mechanism of adhesion GPCRs. Nature 604:757\u0026ndash;762\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePing YQ et al (2022) Structural basis for the tethered peptide activation of adhesion GPCRs. Nature 604:763\u0026ndash;770\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun Y et al (2021) Optimization of a peptide ligand for the adhesion GPCR ADGRG2 provides a potent tool to explore receptor biology. J Biol Chem 296:100174\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLeon K et al (2020) Structural basis for adhesion G protein-coupled receptor Gpr126 function. Nat Commun 11:194\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSalzman GS et al (2016) Structural Basis for Regulation of GPR56/ADGRG1 by Its Alternatively Spliced Extracellular Domains. Neuron 91:1292\u0026ndash;1304\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChu TY et al (2022) GPR97 triggers inflammatory processes in human neutrophils via a macromolecular complex upstream of PAR2 activation. Nat Commun 13:6385\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBallesteros JA, Weinstein H (1995) Integrated methods for the construction of three-dimensional models and computational probing of structure-function relations in G protein-coupled receptors. Methods Neurosciences 25:366\u0026ndash;428\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIsberg V et al (2015) Generic GPCR residue numbers - Aligning topology maps while minding the gaps. Trends Pharmacol Sci 36:22\u0026ndash;31\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWootten D, Simms J, Miller LJ, Christopoulos A, Sexton PM (2013) Polar transmembrane interactions drive formation of ligand-specific and signal pathway-biased family B G protein-coupled receptor conformations. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e 110, 5211\u0026ndash;5216\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan Linden OPJ, Kooistra AJ, Leurs R, de Esch IJP, de Graaf C (2014) KLIFS: A Knowledge-Based Structural Database To Navigate Kinase\u0026ndash;Ligand Interaction Space. J Med Chem 57:249\u0026ndash;277\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKanev GK, de Graaf C, Westerman BA, de Esch IJP, Kooistra A (2020) J. KLIFS: an overhaul after the first 5 years of supporting kinase research. Nucleic Acids Res 49:gkaa895\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKanev GK et al (2019) The Landscape of Atypical and Eukaryotic Protein Kinases. Trends Pharmacol Sci 40:818\u0026ndash;832\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFlock T et al (2015) Universal allosteric mechanism for Gα activation by GPCRs. Nature 524:173\u0026ndash;179\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSente A et al (2018) Molecular mechanism of modulating arrestin conformation by GPCR phosphorylation. Nat Struct Mol Biol 25:538\u0026ndash;545\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMirdita M et al (2022) ColabFold: making protein folding accessible to all. Nat Methods 19:679\u0026ndash;682\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJumper J et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583\u0026ndash;589\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKooistra AJ et al (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function. Nucleic Acids Res 49:D335\u0026ndash;D343\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eP\u0026aacute;ndy-Szekeres G et al (2018) GPCRdb in 2018: Adding GPCR structure models and ligands. Nucleic Acids Res 46:D440\u0026ndash;D446\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKrissinel E (2012) Enhanced fold recognition using efficient short fragment clustering. J Mol Biochem 1:76\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIsberg V et al (2014) GPCRDB: an information system for G protein-coupled receptors. Nucleic Acids Res 42:D422\u0026ndash;D425\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMoreno-Salinas AL et al (2022) Convergent selective signaling impairment exposes the pathogenicity of latrophilin-3 missense variants linked to inheritable ADHD susceptibility. Mol Psychiatry 27:2425\u0026ndash;2438\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAvila-Zozaya M, Rodr\u0026iacute;guez-Hern\u0026aacute;ndez B, Monterrubio-Ledezma F, Cisneros B, Boucard AA (2022) Thwarting of Lphn3 Functions in Cell Motility and Signaling by Cancer-Related GAIN Domain Somatic Mutations. \u003cem\u003eCells\u003c/em\u003e 11, 1913\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWright SC et al (2019) A conserved molecular switch in Class F receptors regulates receptor activation and pathway selection. Nat Commun 10\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin H et al (2022) Structures of the ADGRG2\u0026ndash;Gs complex in apo and ligand-bound forms. Nat Chem Biol. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41589-022-01084-6\u003c/span\u003e\u003cspan address=\"10.1038/s41589-022-01084-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBernadyn TF, Vizurraga A, Adhikari R, Kwarcinski F, Tall GG (2023) GPR114/ADGRG5 is activated by its tethered-peptide-agonist because it is a cleaved adhesion GPCR. J Biol Chem 105223. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.jbc.2023.105223\u003c/span\u003e\u003cspan address=\"10.1016/j.jbc.2023.105223\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKishore A, Purcell RH, Nassiri-Toosi Z, Hall RA (2016) Stalk-dependent and stalk-independent signaling by the adhesion G protein-coupled receptors GPR56 (ADGRG1) and BAI1 (ADGRB1). J Biol Chem 291:3385\u0026ndash;3394\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eM\u0026uuml;ller A et al (2015) Oriented Cell Division in the C. elegans Embryo Is Coordinated by G-Protein Signaling Dependent on the Adhesion GPCR LAT-1. PLoS Genet 11\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eScholz N et al (2017) Mechano-dependent signaling by latrophilin/CIRL quenches cAMP in proprioceptive neurons. \u003cem\u003eeLife\u003c/em\u003e 6\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMonk KR et al (2009) A G Protein\u0026ndash;Coupled Receptor Is Essential for Schwann Cells to Initiate Myelination. Science 325:1402\u0026ndash;1405\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLangenhan T et al (2015) Model Organisms in G Protein\u0026ndash;Coupled Receptor Research. Mol Pharmacol 88:596\u0026ndash;603\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBergmann C et al (2018) Polycystic kidney disease. Nat Rev Dis Prim 4:50\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQian F et al (2002) Cleavage of polycystin-1 requires the receptor for egg jelly domain and is disrupted by human autosomal-dominant polycystic kidney disease 1-associated mutations. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e 99, 16981\u0026ndash;16986\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu S et al (2007) Essential role of cleavage of Polycystin-1 at G protein-coupled receptor proteolytic site for kidney tubular structure. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e 104, 18688\u0026ndash;18693\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWei W, Hackmann K, Xu H, Germino G, Qian F (2007) Characterization of cis-autoproteolysis of polycystin-1, the product of human polycystic kidney disease 1 gene. J Biol Chem 282:21729\u0026ndash;21737\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eScholz N, Langenhan T, Sch\u0026ouml;neberg T (2019) Revisiting the classification of adhesion GPCRs. Ann N York Acad Sci 1456:80\u0026ndash;95\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNordstr\u0026ouml;m KJV, Lagerstr\u0026ouml;m MC, Wall\u0026eacute;r LMJ, Fredriksson R, Schi\u0026ouml;th HB (2009) The Secretin GPCRs Descended from the Family of Adhesion GPCRs. Mol Biol Evol 26:71\u0026ndash;84\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDohrmann M, W\u0026ouml;rheide G (2017) Dating early animal evolution using phylogenomic data. Sci Rep 7:3599\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWittlake A, Pr\u0026ouml;mel S, Sch\u0026ouml;neberg T (2021) The Evolutionary History of Vertebrate Adhesion GPCRs and Its Implication on Their Classification. Int J Mol Sci 22:11803\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKrishnan A et al (2014) The GPCR repertoire in the demosponge Amphimedon queenslandica: insights into the GPCR system at the early divergence of animals. BMC Evol Biol 14:270\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIllerg\u0026aring;rd K, Ardell DH, Elofsson A (2009) Structure is three to ten times more conserved than sequence\u0026mdash;A study of structural response in protein cores. Proteins: Struct Funct Bioinform 77:499\u0026ndash;508\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePiao X et al (2005) Genotype\u0026ndash;phenotype analysis of human frontoparietal polymicrogyria syndromes. Ann Neurol 58:680\u0026ndash;687\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChang G-W et al (2016) The Adhesion G Protein-Coupled Receptor GPR56/ADGRG1 Is an Inhibitory Receptor on Human NK Cells. Cell Rep 15:1757\u0026ndash;1770\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRussell RB, Barton GJ (1992) Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels. Proteins Struct Funct Bioinform 14:309\u0026ndash;323\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFrishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins Struct Funct Bioinform 23:566\u0026ndash;579\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAgirre J et al (2023) The CCP4 suite: integrative software for macromolecular crystallography. Acta Crystallogr Sect D Struct biology 79:449\u0026ndash;461\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePedregosa F et al (2011) Scikit-Learn: Machine Learning in Python. J Mach Learn Res 12:2825\u0026ndash;2830\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVaradi M et al (2022) AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 50:D439\u0026ndash;D444\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNg PC, Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Res 11:863\u0026ndash;874\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAdzhubei I, Jordan DM, Sunyaev SR (2013) Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protocols Hum Genet 0 7, Unit7.20\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eP\u0026eacute;rez-Hern\u0026aacute;ndez G, Hildebrand PW (2022) mdciao: Accessible Analysis and Visualization of Molecular Dynamics Simulation Data. \u003cem\u003ebioRxiv\u003c/em\u003e 2022.07.15.500163 \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/2022.07.15.500163\u003c/span\u003e\u003cspan address=\"10.1101/2022.07.15.500163\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMunk C, Harps\u0026oslash;e K, Hauser AS, Isberg V, Gloriam DE (2016) Integrating structural and mutagenesis data to elucidate GPCR ligand binding. Curr Opin Pharmacol 30:51\u0026ndash;58\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCollins RL et al (2020) A structural variation reference for medical and population genetics. Nature 581:444\u0026ndash;451\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVincent F et al (2016) Toward a Shared Vision for Cancer Genomic Data. N Engl J Med 375:1109\u0026ndash;1112\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[{"identity":"26966397-c31e-4828-8f7c-249044989594","identifier":"10.13039/501100001659","name":"Deutsche Forschungsgemeinschaft","awardNumber":"421152132","order_by":0},{"identity":"87f5d15a-9ba2-4e7f-9400-5001850714d0","identifier":"10.13039/501100003554","name":"Lundbeckfonden","awardNumber":"R383-2022-306","order_by":1},{"identity":"b14d47b6-f83d-41f9-babc-7761cbdda8de","identifier":"10.13039/501100004191","name":"Novo Nordisk","awardNumber":"NNF23OC0082561","order_by":2}],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"Leipzig University","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"aGPCR, generic residue numbering, databases, GAIN domain, PKD","lastPublishedDoi":"10.21203/rs.3.rs-4761600/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4761600/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe GPCR autoproteolysis inducing (GAIN) domain is an ancient protein fold ubiquitous in adhesion G protein-coupled receptors (aGPCR). It contains a concealed tethered agonist element, which is necessary and sufficient for receptor activation. The GAIN domain is a hotspot for pathological mutations. However, the low primary sequence conservation of GAIN domains has thus far hindered the knowledge transfer across different GAIN domains in human receptors as well as species orthologs. Here, we present a scheme for generic residue numbering of GAIN domains based on structural alignments of six experimental and more than 14,000 modeled GAIN domain structures. This scheme is implemented in the GPCR database (GPCRdb) and elucidates the domain topology across different aGPCRs and their homologs in a large panel of species. We identify conservation hotspots and cancer-enriched positions in human aGPCRs and show the transferability of positional and structural information between GAIN domain homologs. The GAIN-GRN scheme provides a robust strategy to allocate structural homologies at the primary and secondary levels also to GAIN folds of GAIN domains of polycystic kidney disease 1/PKD1-like proteins, which now renders positions in both GAIN domain types comparable to one another.\u003c/p\u003e","manuscriptTitle":"Generic residue numbering of the GAIN domain of adhesion GPCRs","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-07-19 10:11:13","doi":"10.21203/rs.3.rs-4761600/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"aa5d6420-ed95-4a22-906e-b8713f4851f5","owner":[],"postedDate":"July 19th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":34783381,"name":"Structural Biology"},{"id":34783382,"name":"General Biochemistry"},{"id":34783383,"name":"Bioinformatics"}],"tags":[],"updatedAt":"2025-01-07T15:27:56+00:00","versionOfRecord":{"articleIdentity":"rs-4761600","link":"https://doi.org/10.1038/s41467-024-55466-6","journal":{"identity":"nature-communications","isVorOnly":false,"title":"Nature Communications"},"publishedOn":"2025-01-02 00:00:00","publishedOnDateReadable":"January 2nd, 2025"},"versionCreatedAt":"2024-07-19 10:11:13","video":"","vorDoi":"10.1038/s41467-024-55466-6","vorDoiUrl":"https://doi.org/10.1038/s41467-024-55466-6","workflowStages":[]},"version":"v1","identity":"rs-4761600","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4761600","identity":"rs-4761600","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00