Microsatellite expansions hidden within the human dark genome are translated in novel and toxic proteins causing muscle and neurodegenerative diseases

preprint OA: closed
Full text JSON View at publisher
Full text 328,461 characters · extracted from preprint-html · click to expand
Microsatellite expansions hidden within the human dark genome are translated in novel and toxic proteins causing muscle and neurodegenerative diseases | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Biological Sciences - Article Microsatellite expansions hidden within the human dark genome are translated in novel and toxic proteins causing muscle and neurodegenerative diseases Nicolas Charlet-Berguerand, Manon Boivin, Jiaxi Yu, Nobuyuki Eura, and 17 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6122917/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 17 Feb, 2026 Read the published version in Nature Genetics → Version 1 posted You are reading this latest preprint version Abstract The vast majority of the human genome is non-coding with one-half composed of repeated DNA elements, including microsatellites that are short repeated sequences of 1 to 6 nucleotides. Expansion of a subset of these microsatellites is the leading cause of over 60 neurological diseases. However, most of these short tandem repeat expansions are located in sequences annotated as non-coding, thus questioning how these mutations are pathogenic. Here, we found that GGC repeat expansions causing various neurological diseases, including oculopharyngodistal myopathy with or without leukoencephalopathy (OPDM/OPML) and neuronal intranuclear inclusion disease (NIID), while embedded in sequences considered as non-coding, are in reality located within small and previously unrecognized ORFs, resulting in their translation into novel and diverse polyglycine-containing proteins. Antibodies developed against these proteins stain the p62-positive inclusions typical of these diseases. Importantly, the sole expression of these polyglycine-containing proteins recapitulates key features of OPDM/OPML/NIID, namely the formation of p62-positive protein aggregates and locomotor and skeletal muscle alterations associated with neurodegeneration in cell, fly and mouse models. Moreover, these polyglycine proteins show unexpected variations in their interactants, half-life, aggregation and toxicity. These results stress a key role of the specific ORF sequences hosting the GGC repeats to modulate the aggregation and toxic properties of their central polyglycine core. Finally, we identified a pharmacological compound targeting expression of these polyglycine proteins, raising hope to develop a common therapy for these neuromuscular and neurodegenerative diseases. Overall, these results uncover a common and unified pathogenic mechanism for diverse neurological diseases where expansions of GGC repeats are translated in novel and toxic polyglycine-containing proteins driving formation of aggregates, as well as neuronal and muscle cell dysfunctions. Moreover, this work highlights the complexity and richness of the human “dark” proteome and the importance of mutations in yet unrecognized small ORFs resulting in expression of novel and pathogenic proteins in human pathologies. Health sciences/Medical research/Experimental models of disease Biological sciences/Genetics/Mutation/Genomic instability/Microsatellite instability Biological sciences/Molecular biology/Non-coding RNAs/Long non-coding RNAs Biological sciences/Neuroscience/Diseases of the nervous system/Neurodegeneration Trinucleotide repeat disorder non-coding sequences non-canonical translation genetic diseases muscle disorders neurodegeneration. Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 INTRODUCTION ~ 98% of the human genome is constituted of sequences annotated as non-coding, with half of them composed of repetitive DNA elements, including microsatellites, which are 1 to 6 nucleotides-long DNA motifs repeated in tandem (Venter et al., 2001; Lander et al., 2001 ; Nurk et al., 2022). These microsatellites, estimated between ~ 1,5 and 2 million in human, occupy 3 to 5% of our genome and are an essential source of genetic variations as they are highly heterogenous in size and sequences. Indeed, microsatellites have an excessively rapid mutation rate with frequent gain or loss of repeat units, resulting in significant variability in their length and thus highly contributing to allelic variability between individuals and generations. Consequently, microsatellite variability has important roles in genome evolution, gene regulation and human phenotypic trait diversity (Gymrek et al., 2016 ; Fotsing et al., 2019 ; Shi et al., 2023 ; Verbiest et al., 2023 ; review in Wright and Todd 2023 ). However, expansion of a subset of these microsatellites over a threshold size is also the leading cause of various human pathologies, including cancer and inherited diseases (Erwin et al., 2023 ; review in Malik et al., 2021 ; Depienne and Mandel 2021 ). In that aspect, > 60 neurodevelopmental, neuromuscular and neurodegenerative monogenic disorders are presently known to be caused by expansions of tri-, tetra-, penta- or hexa-nucleotides repeats. Remarkably, this number is rapidly increasing as advances in long-read and whole human genome sequencing have revealed ~ 20 novel pathogenic microsatellite expansions causing human genetic diseases in the recent years (Ishiura et al., 2018 ; Corbett et al., 2019 ; Florian et al., 2019 ; Yeetong et al., 2019 ; Cortese et al., 2019 ; Rafehi et al., 2019 ; Ishiura et al., 2019 ; Sone et al., 2019 ; Deng et al., 2019 ; Deng et al., 2020 ; Xi et al., 2021 ; Zeng et al., 2022 ; Pellerin et al., 2023 ; Rafehi et al., 2023; Figueroa et al., 2024 ; Cortese et al., 2024 ). When embedded within an exonic coding sequence, these repeat expansions are consequently translated, resulting in expression of a mutant protein containing a stretch of repeated amino acids. Archetype of this mechanism is the polyglutamine (polyQ) group of diseases, where expansions of CAG repeats, embedded within ORFs of diverse genes, are translated in toxic polyglutamine-containing proteins, ultimately resulting in neuronal cell dysfunctions and neuronal cell death (review in Paulson et al., 2017 ; Stoyas and La Spada, 2018 ; Lieberman et al., 2019 ). However, a majority of microsatellite expansions, notably most of the recently discovered ones, are located in genomic sequences ill-defined and annotated by default as non-coding (5’- and 3’-untranslated regions, introns, antisense RNAs, long non-coding RNAs, etc.; review in Vegezzi et al., 2024 ); thus, questioning how these mutations are pathogenic. Oculopharyngodistal myopathy with or without leukoencephalopathy (OPDM, OMIM #164310; OPML, OMIM #618637) are rare adult-onset and slowly progressive neuromuscular diseases firstly described in 1977 (Satoyoshi and Kinoshita, 1977 ). Clinical features of OPDM and OPML comprise ptosis, external ophthalmoplegia, dysphagia and dysarthria associated with facial and distal limb muscle weakness (Uyama et al., 1998 ; van der Sluijs et al., 2004 ; Lu et al., 2008 ; Durmus et al., 2011 ; Zhao et al., 2015 ; Ishiura et al., 2019 ; Gu et al., 2024 ; Shi et al., 2024 ; Pongpakdee et al., 2024 ). Histopathological changes in OPDM and OPML show increased variations in muscle fibre sizes with occurrence of small angular fibres, splitting fibres and increased internalized nuclei, associated with moderate and variable fibrosis and fatty replacement. Besides these classical myopathic signs, OPDM/OPML histopathology is also characterized by the presence of large cytoplasmic rimmed vacuoles and rare, but typical, eosinophilic intranuclear inclusions, which are p62- and ubiquitin-positives (Zhao et al., 2015 ; Deng et al., 2020 ; Saito et al., 2020 ; Ogasawara et al., 2020 ; Kumutpongpanich et al., 2021 ; Ogasawara et al., 2022 a; Ogasawara et al., 2022 b). These inclusions are reminiscent of typical protein aggregates observed in other neurological disorders, but are currently of unknown origin. Importantly, the genetic causes of OPDM and OPML were uncovered only recently as identical expansions of ~ 50 to 200–300 repeats of the tri-nucleotide GGC sequence, however located within diverse genomic regions, transcribed but annotated as non-coding and embedded in at least six different genes: LOC642361, LRP12 , GIPC1, NOTCH2NLC, RILPL1 and ABCD3 (Ishiura et al., 2019 ; Deng et al., 2020 ; Xi et al., 2021 ; Yu et al., 2021 ; Yu et al., 2022 ; Zeng et al., 2022 ; Cortese et al., 2024 ). Consequently, these pathologies are now classified in at least six subtypes according to the gene hosting the pathogenic GGC repeat expansion ( LOC642361 : OPML, LRP12 : OPDM1, GIPC1 : OPDM2, NOTCH2NLC : OPDM3, RILPL1 : OPDM4 and ABCD3 : OPDM5). Of interest, recent clinical studies indicate that OPDM and OPML have a much wider clinical spectrum than previously thought, with evidence of neurological manifestations and reports of variable tremor, ataxia, visual disturbance, peripheral neuropathy and/or association with movement disorders, amyotrophic lateral sclerosis or Parkinson disease (Saito et al., 2020 ; Fan et al., 2022 ; Pan et al., 2022 ; Gu et al., 2023 ; Kume et al., 2023 ; Hobara et al., 2024 ; Murayama et al., 2024 ). Finally, it is striking to note that OPDM3 shares the same genetic cause than neuronal intranuclear inclusion disease (NIID) (Sone et al., 2019 ; Ishiura et al., 2019 ; Tian et al., 2019 ; Deng et al., 2019 ). NIID is a neurological disease characterized by variable muscle weakness associated with heterogenous dysfunctions of the central and peripheral nervous system (Sone et al., 2016 ; Takahashi-Fujigasaki et al., 2016 ; Tai et al., 2023; review in Fan et al., 2022 ; Bao et al., 2023 ). These genetic similitudes and clinical overlaps suggest that OPDM, OPML and NIID belong to a new continuum of neuromuscular and neurodegenerative diseases, which probably share a common pathophysiological mechanism (review in Liufu et al., 2022 ; Boivin et al., 2022 ; Ishiura et al., 2023 ). Moreover, these observations highlight that GGC repeat expansions in OPDM, OPML and NIID are widely pathogenic for both muscle and neuronal cells. However, it remains to determine how these mutations, located within genomic regions annotated as non-coding, can lead to the formation of protein inclusions and cause muscle and neuronal dysfunctions. Here, we found that the OPDM/OPML GGC repeats located in the long “non”-coding LOC642361 RNA, as well as in the “non-coding” sequences of the GIPC1 , NOTCH2NLC and RILPL1 genes, are located within small and previously unrecognized ORFs, resulting in expression of novel proteins where each GGC repeat encodes for a glycine amino acid. Consequently, these GGC repeat expansions are translated into novel polyglycine-containing proteins. Of interest, we found that the GIPC1 small ORF is translated in absence of any ATG start codon, but instead translation initiation takes place at a CTG near-cognate start codon located upstream of the GGC repeats. Near-cognate start codons (CTG, GTG, ACG, TTG) are codons differing from the cognate AUG start codon by one nucleotide, but that can nonetheless initiate translation through mispairing with the initiator methionine-tRNA. Antibodies developed against these diverse proteins confirmed their expression in patients, notably their localization in p62-positive inclusions in muscle sections of individuals with OPDM and OPML. Moreover, expression of these polyglycine proteins in muscle cells and animal models is sufficient to induce formation of the characteristic OPDM/NIID/OPML p62-positive inclusions. Importantly, both mouse and fly models expressing these diverse polyglycine proteins show locomotor alterations associated with muscle fiber atrophy and muscle weakness, as well as tremor and ataxia associated with neurodegeneration and neuroinflammation, thus recapitulating key clinical features of OPDM, OPML and NIID. Of interest, side-by-side comparison of these diverse polyglycine proteins in cell and animal models reveals unexpected variations in their expression, aggregation, half-life, interactants and toxicity, highlighting a key contribution of the specific amino acid sequences originating from their hosting small ORFs. Finally, we tested various pharmacological compounds known to target GGC repeats and/or modulate protein aggregates degradation, and identified the cationic porphyrin TMPyP4 as a potential therapeutic option for these diverse neuromuscular and neurodegenerative disorders. Overall, these data provide a common and unified pathogenic mechanism for the skeletal muscle and central nervous system dysfunctions observed in individuals with OPDM, NIID and OPML, where expansions of GGC repeats are translated in novel and toxic polyglycine-containing proteins driving formation of p62-positive inclusions and muscle and neuronal cell dysfunctions. Moreover, this study highlights the richness and complexity of the human “dark” genome, notably the existence of numerous uncharted small ORFs in genomic sequences originally annotated as non-coding, resulting in translation of their embedded microsatellite mutations in novel and toxic proteins. RESULTS OPDM and OPML GGC repeats are translated into novel polyglycine proteins. Oculopharyngodistal myopathy with or without leukoencephalopathy (OPDM & OPML) and neuronal intranuclear inclusion disease (NIID) are neurological diseases caused by identical expansions of 50 to 200 GGC repeats located in diverse sequences annotated as non-coding (Fig. 1 A and supplementary figures S1A to S1C). The pathogenic mechanism at play in these pathologies is yet to be identified, but a loss of function is unlikely as expression of the genes hosting these GGC mutations is consistently found unaltered in tissue samples from individuals with OPDM, NIID or OPML (Deng et al., 2020 ; Yu et al., 2022 ; Zeng et al., 2022 ; Shi et al., 2024 ). These observations exclude a classical promoter silencing mechanism but question how these GGC mutations are pathogenic. As non-canonical translation of repeat expansions is an established mechanism of pathogenicity in microsatellite diseases, notably in NIID (Zu et al., 2011 ; Boivin et al., 2021 , Zhong et al., 2021 review in Gao et al., 2017 ; Kearse and Wilusz 2017 ), we investigated the potential translation of the OPDM and OPML mutations. Three representative non-coding sequences, namely the 5’-untranslated region (5’UTR) of GIPC1 , the antisense transcript of RILPL1 and the LOC642361 long non-coding RNA (lncRNA), which are respectively the cause of OPDM2, OPDM4 and OPML, were cloned with their GGC repeats and fused to the GFP in the three possible frames potentially encoded by these repeats (Fig. 1 A and supplementary figures S1A to S1C). Briefly, ( GG C )n-GFP would produce a potential polyglycine-GFP, ( G C G )n-GFP may express a putative polyalanine-GFP and ( C GG )n-GFP might encode a tentative polyarginine-GFP protein (Fig. 1 B, sequences in supplementary figure S1D to S1F). Of technical interest, the GFP sequence is deleted of its natural ATG start codon, so that GFP expression is now dependent of translation initiation occurring within the repeats or inside their hosting sequences. Moreover, this plasmid also contains an independent expression cassette producing the Cherry protein under its own promoter and ATG start codon, enabling to assess cell transfection efficiency and cell viability, independently of the expression of the GFP (Fig. 1 B). Importantly, cell transfection followed by direct observation of the GFP fluorescence and FACS analysis indicate that the OPDM and OPML GGC repeats are predominantly translated in the glycine frame, while GFP expression in the alanine or arginine frames is negligeable (Figs. 1 C, 1 D and 1 E, supplementary figures S1G to S1L). In contrast, analysis of the sense RILPL1 RNA with a CCG expansion shows no detectable translation of these repeats in any frames (proline, alanine and arginine) (supplementary figures S1M to S1P). Of interest, RT-qPCR quantification shows similar RNA expression levels, excluding a bias of transcription or RNA stability between these constructs (Fig. 1 F and supplementary figures S1Q and S1R). Western blotting analyses confirmed translation of the OPDM and OPML expanded GGC repeats in the glycine frame (Fig. 1 G). Furthermore, treatment of cell extracts by lysostaphin, a glycyl-glycine endopeptidase, cleaves these proteins in smaller products, thus confirming the presence of a polyglycine stretch within these proteins (supplementary figure S1S, S1T and S1U). As additional controls, examination of the Cherry expression by fluorescence observation, FACS analysis or immunoblotting indicate that these diverse GGC repeats-GFP constructs have similar transfection efficiency, whatever their glycine, alanine or arginine frame (Figs. 1 C, 1 D and 1 G, supplementary figure S1G to S1L). These controls ensure that the lack of GGC repeats translation in the polyalanine (polyA) or polyarginine (polyR) frames is not caused by a difference in construct expression, a potential toxicity leading to cell loss or another bias impairing observation of GGC repeat translation in the alanine or arginine frames. Overall, these results indicate that the OPDM and OPML GGC repeat expansions, while located in sequences annotated as non-coding, are nonetheless translated into novel polyglycine-containing proteins, yet to be characterized. OPDM and OPML GGC repeats are translated through initiation at start codons. To uncover how these GGC repeats are translated, we immunoprecipitated the OPDM and OPML polyglycine proteins and determined their N-terminal sequences by mass spectrometry analysis (Fig. 2 A). Of technical interest, peptides identification was carried out by mining custom databases compiling human non-coding sequences putatively translated in their three possible frames. Importantly, for all three archetypes of non-coding sequences tested, namely the 5’UTR of GIPC1 causing OPDM2, the antisense transcript of RILPL1 causing OPDM4, and the LOC642361 lncRNA causing OPML, mass spectrometry analyses reveal presence of N-terminal peptides starting with a typical acetylated methionine (M ac ), which correspond to initiation at standard start codons located upstream of the GGC repeats (Figs. 2 B, 2 C and 2 D, supplementary figures S2A, S2B, S2C and S2D). Indeed, translation initiations of the RILPL1 antisense RNA and of the LOC642361 lncRNA occur at classical ATG start codons located upstream of the repeats, while translation of the GIPC1 5’UTR occurs in absence of any ATG start codon, but instead initiates at a CTG near-cognate start codon also located ahead of the repeats (Figs. 2 B to 2 D, supplementary figures S2A to S2D). Translation initiation at near-cognate start codons is typically less efficient compared to ATG start codons and thus, conditioned by the bordering Kozak sequence (consensus: CCRCC AUG G). In that aspect, the GIPC1 CTG near-cognate start codon is embedded in a correct Kozak environment (CCGGT CUG G) (Fig. 2 B). Finally, immunoblotting, fluorescence observation and FACS analyses consistently show that deletion of these ATG or CTG start codons abolishes expression of the OPDM/OPML polyglycine proteins, demonstrating their importance to translate these GGC repeats (Fig. 2 E and supplementary figures S2E to S2P). As controls, investigation of GFP RNA levels or of the Cherry fluorescence, expressed from an independent cassette, indicate that deletion of these start codons does not alter GFP RNA expression or cell viability (Fig. 2 E and supplementary figures S2E to S2P). Overall, these data reveal that the 5’-untranslated region of the GIPC1 gene, the antisense transcript of RILPL1 and the LOC642361 long non-coding RNA all contain previously unrecognized open reading frames (ORFs), which translations initiate at start codons located ahead of the GGC repeats, resulting in expression of novel proteins where each GGC repeat encodes for a glycine amino acid. Consequently, these OPDM and OPML GGC repeat expansions are translated into novel polyglycine-containing proteins, which were named uGIPpolyG, asRILpolyG and LOC6polyG for upstream of GIPC1, antisense of RILPL1 and LOC642361-encoded polyglycine proteins, respectively (Fig. 2 F, sequences in supplementary figures S2Q, S2R and S2S). Of interest, analysis of databases indicates that the 5’UTR of GIPC1 is subject to alternative splicing with its exon 1, containing the GGC repeats, bridged to either its exon 2 or 4 (supplementary figure S1A), thus resulting in two small ORFs and a polyglycine protein with two different C-terminal sequences (supplementary figure S2Q). RT-qPCR performed on muscle tissue of individuals with OPDM2 confirmed no significant differences of GIPC1 mRNA expression compared to control individuals (supplementary figure S2T), while isoform specific RT-PCR indicates a slight increase alternative splicing of GIPC1 exon 1 toward exon 2 in OPDM2, with concomitant decrease splicing of exon 1 to exon 4 (supplementary figure S2U). These data suggest that the GGC repeat expansion, which is located only 7 nucleotides away from the 5’ splice site of GIPC1 exon 1, may change its alternative splicing, resulting in increased inclusion of GIPC1 exon 2. Moreover, RT-qPCR quantification of the RILPL1 antisense transcript and of the LOC642361 lncRNA uncovered that these RNAs are expressed in human skeletal muscles, which is consistent with the tissue clinically affected in OPDM/OPML (supplementary figures S2V and S2W). Finally, the RILPL1 antisense ORF is conserved among primates, with presence of a conserved polyglycine stretch in chimpanzee, gorilla and marmoset, but with a predicted shorter protein with no polyglycine in macaque and other mammals including mouse (supplementary figure S2X). In contrast, the LOC642361 small ORF is unfound in most mammals, but strikingly identical to the C-terminal part of a longer protein of unknown function found in Gibbon Lesser Apes and Tufted capuchin (supplementary figure S2Y). Polyglycine proteins are present in the typical OPDM/OPML p62-positive inclusions. To confirm that GGC repeat expansions are translated into novel polyglycine-containing proteins in individuals with OPDM/OPML, we developed antibodies directed against these proteins. However, we failed to obtain an antibody specific to their common polyglycine stretch. As an alternative, we developed various antibodies directed against their specific N- and/or C-terminal sequences, hence specific to each GGC-repeat hosting novel ORF (Fig. 3 and supplementary figure S3). Antibodies specificities were confirmed by immunoblot and immunofluorescence on transfected cells (supplementary figures S3A to S3L). Importantly, immunofluorescence staining performed on skeletal muscle sections of individuals with OPDM2, OPDM4 and OPML revealed presence of their respective polyglycine proteins (uGIPpolyG, asRILpolyG and LOC6polyG) within the p62-positive cytoplasmic rimmed vacuoles and intranuclear inclusions typical of these diseases (Figs. 3 A, 3 B and 3 C, supplementary figure S3M). To confirm these results, we developed another set of antibodies directed against different sequences of the asRILpolyG and LOC6polyG proteins and observed identical results with staining of the typical OPDM/OPML p62-positive inclusions (supplementary figures S3N and S3O). Moreover, as OPDM3 and NIID have an identical genetic cause, namely an expansion of GGC repeats in the 5’UTR of the NOTCH2NLC gene, and as this expansion was recently found to belong to a small ORF translated in a polyglycine protein, uN2CpolyG (Boivin et al., 2021 ; Zhong et al., 2021 ), we developed novel antibodies against this protein and uncovered its presence within the typical p62-positive inclusions in muscle sections of individuals with OPDM3 (Fig. 3 D and supplementary figure S3P). Another antibody directed against a different sequence of the uN2CpolyG protein similarly stains the typical p62-positive inclusions, confirming expression of this polyglycine protein in OPDM3 (supplementary figure S3P). Of interest, no or only faint staining was observed in non-OPDM individuals (Figs. 3 A, 3 B, 3 C and 3 D), as without a GGC repeat expansion and thus without a polyglycine stretch, these microproteins do not aggregate and their very small sizes prevent their detection by immunoblotting or immunofluorescence. Moreover, as each of these antibodies is directed against a specific ORF sequence, they are thus specific to each OPDM subtype and indeed, do not stain p62-positive inclusions in other OPDM/OPML subtypes (supplementary figures S3Q, S3R, S3S and S3T). These results further support the existence of specific polyglycine-proteins expressed in each OPDM subtype, as well as confirm the specificity of these antibodies. Finally, as various microsatellite expansions have been reported to be RAN translated in their three potential frames (Zu et al., 2011 ; review in Guo et al., 2022 ), and as a short expansion of GCN repeats in PABPN1 are translated in a protein with an extended polyalanine stretch that causes oculopharyngeal muscular dystrophy (OPMD) (Brais et al., 1998 ; review in Banerjee et al., 2013 ), we also investigated a potential translation of the OPDM GGC repeats in the alanine frame. However, two independent antibodies developed against a putative GIPC1 polyalanine protein do not stain intranuclear inclusions or rimmed vacuoles in muscle sections of individuals with OPDM2, arguing against translation of GGC repeats in the alanine frame (supplementary figures S3U, S3V and S3W). Overall, this work highlights that GGC repeat expansions causing the neurological OPDM, OPML and NIID disorders, while located in transcripts initially annotated as non-coding, are embedded in previously unrecognized ORFs and consequently translated into novel polyglycine-containing proteins. These results question whether these diverse proteins are pathogenic. Expression of polyglycine proteins forms inclusions and is pathogenic in muscle cells. To further study the various OPDM, NIID and OPML polyglycine-containing proteins we cloned the GIPC1, NOTCH2NLC , asRILPL1 and LOC642361 ORFs, with either a control (8 to 12x) or an expanded (100x) size of polyglycine, from their ATG or near-cognate GIPC1 CTG start codon to their last coding codon, fused to the GFP. To exclude any bias of repeat instability, the GGC repeats were modified to include GGN alternative codons, which still encode for glycine but ensure that an identical and stable size of polyglycine is studied. This strategy also avoids expression of a pure GGC RNA hairpin, dismissing interferences with a putative toxic RNA gain-of-function mechanism. Moreover, to take in consideration GIPC1 exon 1 alternative splicing to either its exon 2 or 4, we also cloned the uGIPpolyG protein with its two possible C-termini, thus either ending with 8 amino acids (uGIPpolyG ex2) or 28 amino acids (uGIPpolyG ex4) (sequences in supplementary figure S4A). We first assessed the localization of these diverse polyglycine-containing proteins. Importantly, their expression in human LHCN-M2 differentiated muscle cells followed by immunofluorescence revealed that they form cytoplasmic and intranuclear inclusions, which are p62-positive and thus reminiscent of the histopathological features typical of OPDM, OPML and NIID (Figs. 4 A and 4 B, supplementary figure S4B. Identical results were obtained in immortalized U2OS cell lines (supplementary videos 1 and 2). Inclusion formation is likely driven by their polyglycine expansion, as expression of an artificial protein mainly composed of a pure polyglycine stretch (ATG polyG-GFP) also forms cytoplasmic and nuclear inclusions (Fig. 4 A), while expression of these proteins with a control length of glycine (8 to 12 GGC repeats) does not promote the formation of protein aggregates. Further analysis by correlative light and electron microscopy (CLEM) shows that these polyglycine inclusions appear as round-shaped electron-dense deposits composed of filamentous structures without membrane boundaries (Fig. 4 C), which is fully consistent with observations in OPDM individuals (Zhao et al., 2015 ; Saito et al., 2020 ; Kumutpongpanich et al., 2021 ). Of interest, we noted that these different polyglycine proteins present some differences in their localization, with the OPDM4 asRILpolyG protein being systematically more nuclear than the other (Fig. 4 D). These data reveal that despite a common polyglycine central core, these diverse proteins are not strictly identical, suggesting a potential modulation from their specific N- and C-terminal ORF sequences. Thus, we investigated further these proteins, notably their expression, their potential interactants, as well as their toxicity. Concerning the expression levels of these diverse polyglycine-containing proteins, as expected from their cloning in an identical vector backbone, which transcription is driven by a heterologous viral minimal CMV promoter, their RNA levels assessed by RT-qPCR show similar expression levels (supplementary figure S4C). In contrast, their protein expression assessed by immunoblotting against the GFP revealed unexpected variations, with the artificial ATG polyG and the uN2CpolyG (OPDM3/NIID) proteins consistently less observed (Fig. 4 D). Further immunoblot analysis upon Cycloheximide (CHX) chase uncovered different protein half-life, with the OPDM2 uGIPpolyG protein being the most stable, while the OPDM3/NIID uN2CpolyG protein shows a rapid turnover rate (supplementary figure S4D). As these polyglycine-containing proteins accumulate in cellular inclusions that may correspond to insoluble protein aggregates, which classically escape to immunoblot detection performed on the soluble cell fraction, we also performed in parallel dot blot analysis of the cell lysate pellet sonicated in urea and SDS loading dye (Fig. 4 E). This assay exposed further disparities between these polyglycine proteins, with the uN2CpolyG, asRILpolyG and LOC6polyG proteins notably more present in the insoluble protein fraction (Fig. 4 E). These results were confirmed by quantification of the localization of these diverse polyglycine proteins in LHCN-M2 muscle cells, notably their presence in inclusion versus a diffuse localization (supplementary figure S4E). Next, we searched for potential interactants to these polyglycine-containing proteins. Interestingly, muscle cell transfection followed by GFP immunoprecipitation and mass spectrometry analysis did not identify a common partner, but unveiled diverse interactants specific to each polyglycine protein (Fig. 4 F and supplementary table 1). In that aspect, the uN2CpolyG protein interacts with the KU70/80 dimer involved in DNA repair, while the LOC6polyG interacts with ribosomal proteins (Fig. 4 G). Of interest, immunoprecipitation of these proteins with a control length of glycine recapitulates these interactions (Fig. 4 G), indicating that interactants of these polyglycine-containing proteins are independent of their central polyglycine core, but instead are determined by their distinct N- and C-terminal sequences, which originate from their hosting small ORFs. These data suggest that these newly identified small ORFs may encode for potentially functional microproteins with relevant physiological roles. Finally, we investigated the pathogenicity of these diverse polyglycine proteins. Importantly, their expression in human LHCN-M2 differentiated muscle cells is toxic and causes cell death, while expression of the GFP control construct is not (Fig. 4 H). Of interest, while all these polyglycine proteins induce cell death, we noted some differences with a higher toxicity of the uN2CpolyG, asRILpolyG and LOC6polyG proteins compared to the uGIPpolyG protein or the artificial ATG polyG-GFP protein (Fig. 4 H). These results suggest that toxicity of the central polyglycine core of these proteins is modulated by their specific ORF hosting sequences. Overall, these results indicate that these diverse polyglycine proteins present the common properties to form p62-positive cytoplasmic and intranuclear inclusions, as well as to induce muscle cell death, which recapitulates key features of OPDM and OPML. Of interest, the localization, half-life, aggregation and interactants of these polyglycine-containing proteins vary, unveiling an unexpected modulation by their N- and C-terminal specific hosting ORF sequences. However, these data were obtained in muscle cell cultures, questioning toxicity of these diverse polyG proteins in animals. Polyglycine proteins form inclusions and are pathogenic for muscles in animal To determine the physiological impact of expressing these OPDM polyglycine proteins in skeletal muscles of animals, we cloned, produced and injected in wild type adult C57BL/6 mice recombinant adeno-associated viral (rAAV) particles expressing either the GFP-tagged uGIPpolyG, uN2CpolyG, asRILpolyG or LOC6polyG protein (Fig. 5 A). Of technical interest, we used a novel capsid variant, MyoAAV 4A, which specifically targets rodent muscles upon a single intravenous injection (Tabebordbar et al., 2021 ). As controls, we employed a similar rAAV strategy to express the GFP alone or an artificial protein, ATG polyG-GFP, composed of a polyglycine stretch deprived of its natural N- and C-terminal sequences (Fig. 5 A). Importantly, histological analysis of mouse tibialis anterior (TA) muscles 5 months after rAAV injection show that expressing OPDM polyglycine proteins is toxic and promotes muscle fibre size variations with presence of internalized or centralized nuclei (Fig. 5 B and supplementary figure S5A). Moreover, quantification of muscle fiber areas revealed that expression of these polyglycine proteins induces skeletal muscle atrophy (Fig. 5 C); however, with some striking differences with the OPDM4 asRILpolyG, OPML LOC6polyG and OPDM3 uN2CpolyG proteins causing muscle fiber atrophy and histological changes as early as 4 to 5 months after rAAV injection (Fig. 5 B and upper panel of Fig. 5 C), while the OPDM2 uGIPpolyG protein shows a lesser toxicity with some muscle atrophy detected 9 months after rAAV injection (Fig. 5 C, lower panel). Similarly, expression of ATG polyG, a protein deprived of any OPDM natural bordering sequences shows a limited and delayed pathogenicity (Figs. 5 B and 5 C, and supplementary figure S5A). Analysis of the gastrocnemius skeletal muscle shows identical results. Quantification of GFP RNA levels indicates that all rAAV are expressed at similar RNA levels (supplementary figure S5B). Further analyses, notably p62 staining, a classical marker of OPDM, revealed numerous p62-positive cytoplasmic and intranuclear inclusions (Figs. 5 B, 5 D and supplementary figure S5C). As observed in OPDM patients, these inclusions are eosinophilic, which is especially apparent in the uN2CpolyG expressing mice (Fig. 5 B). Of interest, all OPDM polyglycine proteins form inclusions in mouse skeletal muscle, but with some notable differences, with observation of frequent OPML LOC6polyG and OPDM3 uN2CpolyG aggregates, while ATG-polyG, OPDM2 uGIPpolyG and OPDM4 asRILpolyG inclusions are less represented (Figs. 5 D and 5 E, supplementary figure S5C). Similarly, the localization of these polyglycine proteins, notably their cytoplasmic versus nuclear distribution, vary, with the OPDM4 asRILpolyG protein more observed in nuclei compared to the other polyGly proteins (Figs. 5 D and 5 E, supplementary figure S5C). Of interest, single nuclei RNA sequencing revealed an increase in macrophages and B-cells, as well as in regenerative muscle fibres in OPDM versus control animals (Fig. 5 F). These results indicate signs of inflammation and muscle regeneration consistent with myopathic changes in OPDM mice. However, these alterations were mild, with no global or massive variations in cell populations and only limited transcriptomic changes (Fig. 5 F and supplementary table 2). Correspondingly, we observed only minor changes in muscle fibre types and no overt muscle regeneration by histology staining and quantitative RT-PCR (supplementary figures S5D and S5E). Similarly, animal performances were only slightly altered in rotarod and open field locomotor tests (supplementary figures S5F and S5G). These data indicate that expression of polyglycine proteins in mice drives progressive muscle fiber atrophy and histological changes reminiscent of OPDM, but with specific and limited myopathic alterations, at least in the time frame analyzed. Finally, expression of the asRILpolyG and LOC6polyG proteins is remarkably deleterious as these mice die suddenly around 5–6 months or 8 to 9 months post rAAV injection, respectively (Fig. 5 G). Further analysis revealed that asRILpolyG and LOC6polyG-expressing animals present dilated cardiomyopathy with presence of numerous p62-positive inclusions in cardiomyocytes (supplementary figure S5H). Abundance of these aggregates mirrors their toxicity with rare ATG polyG and uGIPpolyG inclusions, an intermediate situation for uN2CpolyG, while the asRILpolyG and LOC6polyG proteins form numerous large aggregates associated with notable myopathic changes (supplementary figure S5H). Of clinical interest, these data are reminiscent of the cardiac dysfunctions reported in individuals with OPDM (Oyer et al., 1991 ; Chen et al., 2020 ; Kumutpongpanich et al., 2021 ; Gu et al., 2023 ; Pan et al., 2023 ). These observations lead us to investigate the toxicity of these polyglycine proteins in other tissues, notably the central nervous system, especially in regards of the neurological manifestations, notably tremor and ataxia, recently reported in individuals with OPDM2, OPDM3/NIID and OPML (Ishiura et al., 2019 ; Fan et al., 2022 ; Pan et al., 2022 ; Kume et al., 2023 ; Gu et al., 2023 ; Hobara et al., 2024 ; Murayama et al., 2024 ). Polyglycine proteins form inclusions and are pathogenic for the CNS in animal To specifically target the mouse central nervous system (CNS), we used a similar rAAV strategy to express either GFP-tagged uGIPpolyG (OPDM2), uN2CpolyG (OPDM3) or LOCpolyG (OPML) (Fig. 6 A), taking advantage of the PHP.eB rAAV serotype that crosses C57BL/6 mouse blood-brain barrier and efficiently targets neurons upon a single intravenous injection (Chan et al., 2017 ). As controls, we developed PHP.eB rAAV expressing the GFP protein alone, or an artificial ATG polyG-GFP protein (Fig. 6 A). Also, in absence of any clinical reports of neurological symptoms in individuals with OPDM4, asRILpolyG was not included in this CNS study. Interestingly, longitudinal follow up of these animals indicate that expression of these polyglycine proteins is toxic for the nervous system, with a progressive alteration of their motor performances and coordination (Figs. 6 B and 6 C, supplementary Fig. 6A). However, we noted some notable differences between these diverse polyG proteins, with mice expressing the OPML LOC6polyG and the OPDM3 uN2CpolyG proteins showing evident difficulties to sustain the rotarod test (Fig. 6 B), and largely increased number of errors and slips on the notched bar test (Fig. 6 C and supplementary Fig. 6A) as early as 3 months post rAAV injection. In contrast, mice expressing the OPDM2 uGIPpolyG or the artificial ATG polyG proteins show milder changes and, at later time points, respectively 6- and 9-months post rAAV injection (Figs. 6 B and 6 C). These changes in locomotor coordination likely originates from specific neuronal dysfunctions as these animals present normal performance in the open field test (supplementary Figs. 6B and 6C). Finally, expression of these diverse OPDM proteins is deleterious, but with some striking differences with mice expressing the OPDM2 uGIPpolyG protein showing a milder pathogenicity and longer lifespan compared to animals expressing OPML LOC6polyG or the OPDM3 uN2CpolyG protein (Fig. 6 D). Moreover, mice expressing an artificial ATG polyG protein show a normal life span up to 15 months of age (Fig. 6 D). These results highlight the importance of the specific ORF sequences bordering their common and identical polyglycine stretch to modulate toxicity of these proteins. Next, p62 staining revealed that all polyG proteins form numerous cytoplasmic and intranuclear inclusions, recapitulating a key histopathological feature of OPDM, OPML and NIID. Importantly, localization and abundance of these polyglycine proteins faithfully mirror their toxicity with scarce ATG polyG and uGIPpolyG protein aggregates, while the uN2CpolyG and LOC6polyG proteins form numerous nuclear inclusions in the cerebellum, brainstem and thalamus of these animals at 3 months post AAV injection (Fig. 6 E). Of interest, accumulation of these polyG inclusions increased with animal age (supplementary figure S6D), and polyG inclusions are also evident in tyrosine hydroxylase (TH)-positive neurons of the substantia nigra (supplementary figure S6E). In contrast, the cortex and hippocampus of these animals is relatively spared of aggregates (Fig. 6 E). As expected from their expression from a common rAAV backbone with a heterologous CMV-based promoter, their abundance is independent of their expression at the transcription level with similar RNA expression quantified by RT-qPCR (supplementary figure S6F). These results underline the intrinsic differences between these polyG proteins in their expression and abilities to form and accumulate in protein inclusions despite having an identical central polyglycine core. Further analysis revealed extensive neuronal cell death, notably loss of Purkinje cells in the cerebellum, especially in mice expressing the LOC6polyG and uN2CpolyG proteins (Fig. 6 F), which is consistent with the progressive loss of motor balance and coordination observed in these animals. Moreover, Gfap staining and RT-qPCR indicated increased neuroinflammation in polyG-expressing animals, especially in mice with the uN2CpolyG and LOC6polyG proteins (supplementary figures S6G and S6H). In contrast, the uGIPpolyG and ATG polyG expressing mice show milder neuronal cell loss and lesser signs of neuroinflammation (Fig. 6 F and supplementary figures S6G and S6H), which is consistent with the reduced number of p62-positive inclusions and milder phenotype observed in these animals. Overall, these data confirm that expression of these diverse polyglycine-containing proteins is toxic for both muscle and neuronal cells and recapitulate key clinical features of OPDM, OPML and NIID, notably myopathic changes and muscle fiber atrophy associated with neurological signs and neurodegeneration; as well as their typical histological presentation with presence of characteristic p62-positive protein inclusions. Moreover, side-by-side analysis of these diverse polyglycine proteins also revealed some notable and unexpected differences in their expression, their localization and their toxicity, highlighting the importance of their specific N- and C-terminal sequences to modulate the toxic properties of their central polyglycine core. The porphyrin TMPYP4 alleviates aggregation and toxicity of polyglycine proteins. Altogether, these data support a pathogenic model where expression of toxic polyglycine proteins drives muscle cells and neurons dysfunctions in OPDM, NIID and OPML. Hence, search for compounds inhibiting translation and/or accumulation of these proteins may represent an attractive therapeutic option. In that aspect, various pharmacological molecules, including SRPIN340, H89 and TMPyP4, have been identified in the past years to prevent the nuclear export or the translation of GC-rich RNA in toxic proteins (Green et al., 2019 ; Mori et al., 2021 ; Malik et al., 2021 ; Licata et al., 2022 ). Similarly, various compounds have been identified to promote the autophagic degradation of toxic proteins prone to aggregation in cell and animal models of neurodegeneration (review in Menzies et al., 2017 ; Palmer et al., 2025 ). Thus, we tested a selection of these compounds in our OPDM muscle cell model and found one, the cationic porphyrin TMPyP4, which efficiently prevents expression of both uGIPpolyG, uN2CpolyG, asRILpolyG and LOC6polyG at the protein level (Fig. 7 A). In contrast, RT-qPCR analysis revealed that their RNA expression was largely unaffected (supplementary figure S7A). Importantly, TMPyP4 corrects polyG protein toxicity, restoring normal cell viability to LHCN-M2 muscle cells (Fig. 7 B). Furthermore, RNA sequencing and mass spectrometry revealed that TMPyP4 induces only limited changes and no global transcriptomic or proteome alterations (supplementary figures S7B to S7E and supplementary tables 3 and 4). Of interest, pathway analysis revealed that TMPyP4 acts principally on translation (supplementary table 5), which is fully consistent with its known inhibitory function on the translation of GC-rich microsatellites (Green et al., 2019 ; Mori et al., 2021 ). In addition, test of various analogs of TMPyP4 revealed no other functional molecules able to reduce polyglycine protein toxicity (supplementary figure S7F). Next, to investigate TMPyP4 effects in animals we developed Drosophila expressing the OPDM polyglycine proteins. Fly was considered over mouse as Drosophila is an established animal model to study polyG toxicity (Todd et al., 2013 ; Kong et al., 2022 ; Yu et al., 2022 ), and drug testing is incomparably faster and less complex in flies over mammals. Thus, OPDM2 uGIPpolyG and OPDM4 asRILpolyG GFP-tagged proteins, representing respectively the lesser and more toxic polyG protein found in cell and mouse models, were cloned and expressed under the upstream active sequence-Galactose-regulated promoter element (UAS-Gal4). Eyes being the most accessible part of the nervous system, we first used a glass multiple reporter (GMR) Gal4 driver and found that expression of either uGIPpolyG or asRILpolyG leads to a rough eye phenotype with ommatidial degeneration and loss of rhabdomeres (Fig. 7 C). As controls, flies expressing GFP or the small upstream GIPC1 or RIPL1 antisense ORFs with a normal length of glycine stretch show intact ommatidia structures (supplementary figure S7G). Importantly, TMPyP4 corrects polyG protein toxicity, restoring normal eye structure and rhabdomeres in both uGIPpolyG and asRILpolyG expressing Drosophila (Fig. 7 D). To investigate the toxicity of these polyglycine proteins further, they were ubiquitously expressed using an Actin5C-Gal4 driver and we examined adult fly locomotor abilities. Interesting, expression of uGIPpolyG leads to a progressively reduced mobility and shortened lifespan (supplementary figures S7H and S7I), while expression of asRILpolyG was particularly toxic with no or very few animals surviving to the adult stage. These results are consistent with the higher toxicity of asRILpolyG over uGIPpolyG observed in cells and mice, demonstrating in a third model that these polyG proteins, despite harboring a common and identical polyglycine tract, are not strictly identical. Overall, these data further highlight the importance of the specific ORF sequences flanking the polyglycine stretch to modulate the toxicity and biological properties of these proteins (Fig. 8 ). Moreover, these data further strengthen that expression of polyglycine proteins reproduces the locomotor and neurodegenerative clinical features observed in the OPDM, OPML and NIID disorders. Of clinical interest, these results also suggest that modulating polyglycine expression could be of therapeutic interest in these neuromuscular and neurodegenerative diseases. DISCUSSION Oculopharyngodistal myopathy (OPDM), neuronal intranuclear inclusion disease (NIID) and oculopharyngeal myopathy with leukoencephalopathy (OPML) are inherited neurological diseases caused by identical GGC repeat expansions, however embedded in sequences annotated as non-coding in diverse genes ( LOC642361, LRP12 , GIPC1 , NOTCH2NLC and RILPL1 ). Here, we found that these GGC repeats are located in previously uncharted small ORFs and thus, are translated into novel polyglycine-containing proteins, which form p62-positive protein inclusions and are toxic in cell and animal models. These data are reminiscent of the fragile X-associated tremor/ataxia syndrome (FXTAS) and the recently uncovered spinocerebellar ataxia 4 (SCA4), where GGC repeat expansions, respectively located in a small upstream ORF of the FMR1 gene or within the main ORF of the ZFHX3 protein, are translated into polyglycine-containing proteins, which are toxic and accumulate in p62-positive inclusions (Todd et al., 2013 ; Sellier et al., 2017 ; Wallenius et al., 2023; Figueroa et al., 2024 ; Chen et al., 2024) (Fig. 8 ). Thus, these data support the existence of a novel group of human disorders, the polyG (or polyGly) diseases, where identical expansions of GGC repeats are located in diverse, previously ill-charted, ORFs and consequently translated into various polyglycine-containing proteins, which form protein inclusions and are toxic for muscle and neuronal cells (Fig. 8 ). Moreover, this work reinforces the proposition that OPDM, OPML, NIID, FXTAS and SCA4, are all part of a novel polyGly-caused continuum of neuromuscular and neurodegenerative diseases with overlapping clinical and histopathological presentations (review in Liufu et al., 2022 ; Boivin et al., 2022 ; Ishiura et al., 2023 ). However, this work also raises several questions. Notably, we found that OPDM/OPML GGC repeats are essentially expressed in the glycine frame through canonical translation initiation at upstream AUG or near-cognate start codons, but our assays may not be sensitive enough to detect low level of RAN translation with non-canonical initiation starting directly within the repeats and in the three frames. Alternatively, considering that an expanded stretch of GCN repeats in PABPN1 is translated in a protein with a short (7 to 13) run of polyalanine, which causes oculopharyngeal muscular dystrophy (OPMD) (Brais et al., 1998 ; review in Banerjee et al., 2013 ), and that extended but relatively short (~ 30) stretches of polyalanine in various transcription factors lead to severe developmental diseases (Brown and Brown, 2004 ; Messaed and Rouleau, 2009 ), it is also possible that longer expansions (> 50) of GCG repeat in the alanine frame could be especially deleterious and thus, not represented in late onset inherited neurological diseases such as OPDM and OPML. Another topic of discussion is whether other polyglycine-containing proteins causing additional GGC-repeat expansion disorders remain to be discovered. Similarly, this study questions how these polyGly proteins are pathogenic, and conversely, how to prevent their toxicity. Concerning additional microsatellite expansions translated in novel and toxic proteins, yet to be identified; this work highlights the complexity and diversity of the human “dark” genome, notably the existence of numerous and yet uncharted small ORFs hidden in ill-described genomic sequences, annotated by default as non-coding. In that aspect, recent advances in bioinformatics, ribosome footprint profiling and high-resolution mass spectrometry have unveiled thousands of novel and non-canonical ORFs, with their vast majority having a size below 100 amino acids and yet to be studied (Ji et al., 2015 ; Raj et al., 2016 ; Chen et al., 2020 ; Chothani et al., 2022 ; Duffy et al., 2022 ; Mudge et al., 2022 ; review in Wright et al., 2022 ; Dong et al., 2023 ; Deutsch et al., 2024 ). These large-scale analyses also revealed that most mammalian genes, including long non-coding RNAs and pseudogenes, contain small and/or upstream ORFs, with their majority initiating at near-cognate start codons (Ingolia et al., 2011 ; Lee et al., 2012 ; Fields et al., 2015 ; Johnstone et al., 2016 ). Near-cognate initiation codons differ from the cognate AUG start codon by one nucleotide, but that can still initiate translation through mispairing with the initiator methionine-tRNA. In vitro experiments and ribosome profiling revealed that predominantly four near-cognate start codons (CUG, GUG, UUG, and ACG) are tolerated and can initiate translation, however with a lesser efficiency compared to an AUG start codon (Kozak, 1989 ; Peabody, 1989 , Ingolia et al., 2011 ; Lee et al., 2012 ; review in Kearse and wilusz, 2017 ). This imperfect initiation mechanism enables leaky ribosomal scanning, which ultimately multiplies and complexifies the number of open reading frames encoded by mammalian genes. In parallel, recent advances in whole genome and long read sequencing revealed that microsatellite expansions are much more frequent than previously expected, with hundreds of GGC microsatellites now identified in the human genome (Annear et al., 2021 ; Ziaei Jam et al., 2023 ; Shi et al., 2024 ; Ibañez et al., 2024 ; Jadhav et al., 2024 ). Thus, as the human genome contains up to 2 million of microsatellites, which populate 3 to 5% of our DNA, it is thus foreseeable that some microsatellites will inevitably fall in one of these numerous, small and ill-described ORFs. In short, the present report of novel polyglycine-containing proteins embedded in previously uncharted ORFs may represent only the tip of the iceberg. In this aspect, patent candidates would be the recently identified GGC repeat expansions located in sequences annotated as non-coding in the LRP12, ABCD3 and FAM193b genes and that cause OPDM1, 5 and 6, respectively (Ishiura et al., 2019 ; Cortese et al., 2024 ; Danzi et al., 2025). Whether these GGC repeat expansions located in sequences annotated as non-coding are nonetheless translated into novel and toxic proteins is an exciting question for future studies. Regarding the potential mechanisms of toxicity of these polyglycine-containing proteins, how they cause muscle and neuronal cell dysfunctions is unclear. These proteins form large cytoplasmic and intranuclear inclusions, which is consistent with the known self-aggregation properties of glycine homopolypeptides that form amyloid-like fibrils (Lorusso et al., 2011; Plumley et al., 2011 ). However, it remains to determine whether these polyglycine proteins are pathogenic in their aggregated form, or under their soluble monomeric form. Similarly, it is unclear whether their nuclear localization is important for their toxicity. In that aspect, it remains to determine how these polyglycine proteins travel toward the nucleus in absence of any evident nuclear localization signal (NLS). A tentative hypothesis would be that benefiting from their relatively small sizes, soluble polyglycine proteins may potentially diffuse freely through the nuclear pore and then accumulate in the nucleus, where they would aggregate away from the cytoplasmic autophagic clearance pathway. In absence of cell division and nuclear membrane collapse, notably in neurons and muscle cells, these polyglycine proteins would keep accumulating and promote toxicity. Also, observation of heart defects in mice expressing these polyglycine proteins questions whether cardiac changes have been underestimated in individuals with OPDM, and conversely whether yet unidentified GGC repeat mutations remain to be discovered in individuals with cardiomyopathy of unknown genetic causes. Another point of interest is the side-by-side comparison of these diverse polyglycine proteins in diverse and complementary cell and animal models. Assessment of these different polyG proteins revealed that their villainous abilities to form inclusions and promote cellular death originate from their central and common polyglycine core. However, analysis of these proteins beside each other also revealed that their expression, half-life, aggregating properties and interactions with other proteins, are modulated by their N- and C-terminal sequences, which are specific to their hosting ORFs. In that aspect, the microproteins expressed from the NOTCH2NLC upstream ORF and from the LOC642361 small ORF specifically interact with proteins involved in double strand DNA breaks repairs and translation, respectively. These data suggest that these newly identified human microproteins may have relevant physiological functions, which remain to be thoroughly investigated. Lastly, toxicity of these various polyglycine proteins, encoded by diverse genes, is likely to be conditioned by their tissue distribution and expression levels, two key parameters depending on the strength of their translation start codons and Kozak sequences, notably canonical ATG vs near-cognate start codons; as well as the strength and tissue specificity of their respective promoters. In that aspect, it is notable that expansion of these GGC repeats over a threshold limit (~ 200–300 repeats) induces DNA methylation changes, ultimately resulting in silencing of their promoter. In regard of our present finding, this inhibition may be debated as a protective mechanism against expression of proteins with a polyglycine stretch extended over 200 repeats, which would likely present a higher toxicity and/or deleterious at a younger age. However, such putative protective mechanism would have evolve at the cost of favoring a deleterious loss-of-function mechanism for alleles present in single copy, such as CGG expansions over 200 repeats in the FMR1 gene located on the X chromosome that cause the fragile-X syndrome in males, or when loss-of-function mutations occur in the second allele, such as in the XYLT1 gene associated with the Baratela-Scott syndrome (BSS). In conclusion, our work provides a unified pathogenic mechanism for the skeletal muscle and central nervous system dysfunctions observed in individuals with OPDM, NIID and OPML, where GGC repeat expansions, identified in sequence originally annotated as non-coding, are in reality embedded in small open reading frames and consequently translated in novel and toxic polyglycine-containing proteins. Consistent with a common mechanism of pathogenicity, our data also provide a proof of concept that a unique therapeutic approach may be of interest for both OPDM, OPML and NIID. In that aspect, treatment of human muscle cells and drosophila animal model with the porphyrin TMPyP4 (5,10,15,20-tetra(N-methyl-4-pyridyl) alleviates the expression and toxicity of these polyG proteins. Importantly, TMPyP4 has no apparent deleterious effect on global cellular transcription and translation. Furthermore, this compound binds to G-quadruplex structure and to GC-rich RNA, notably the FMR1 GGC repeats causing FXTAS or the C9ORF72 GGGGCC repeats causing ALS, preventing their translation in toxic polyglycine-rich proteins (Ofer et al., 2009 ; Zamiri et al., 2014 ; Green et al., 2019 ; Mori et al., 2021 ). Thus, these data raise hope to identify a common therapeutic approach for these various neuromuscular and neurodegenerative diseases caused by microsatellite mutations of similar GC-rich sequences. MATERIAL & METHODS RESOURCE AVAILABILITY Lead Contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Nicolas Charlet-Berguerand ( [email protected] ). Materials Availability All unique reagents (e.g., polyG DNA constructs, antibodies, etc.) generated in this study are available from the Lead Contact under MTA subject to restrictions from commercial source. Data and Code Availability RNA sequencing and mass spectrometry datasets are available in supplementary Tables S1 to S5. Complete transcriptomic and proteomics source data are available from the corresponding author upon reasonable request. Human samples Human muscle samples were sampled with the informed consent of individuals and families and approved by the Institutional Review Board of the Peking University First Hospital, First Affiliated Hospital of Fujian Medical University and National Center of Neurology and Psychiatry. This study was approved by the Ethics Committees of Peking University First Hospital, First Affiliated Hospital of Fujian Medical University and National Center of Neurology and Psychiatry, and all procedures were conducted in accordance with relevant guidelines and regulations. Muscle biopsy samples from patients with OPDM, OPML and NIID and age-matched control subjects were examined. All clinical materials were obtained for diagnostic purposes after informed consent was provided. Prior to this study, all samples had been analyzed using routine histology techniques and electron microscopy. Fresh frozen samples were stored at −80°C until use. Mice All animal work was performed with approval from the IGBMC/ICS Animal Care Committee and of the French agency for research on animal (DGRI) authorization number APAFIS#33864-2021111217327782. C57BL/6 wild-type male mice were retro-orbitally AAV-injected at 2 months and then housed for 6 to 8 months in a temperature-controlled room (19–22°C) with a 12:12-hours light/dark cycle and free access to food and water. Mice were sacrificed by carbon dioxide (CO2) inhalation to dissect the different skeletal muscles, heart and brain which were subsequently frozen for molecular biology, freezing using pre-chilled isopentane or PFA-fixed and embedded in paraffin for histology. Cell cultures U2OS and HEK293 cells were grown in DMEM 1 g/L glucose with 10% FCS and gentamycin at 37°C in 5% CO2. LHCN-M2 cells were grown in DMEM 4,5 g/L with 20% FCS, w/o PyrNa/M199, 25 µg/mL Fetuin, 5 mg/mL hEGF, 0,5 mg/mL human bFGF, 5 µg/mL humain insulin, 0,2 µg/mL dexamethasone and gentamycin at 37°C in 5% CO2. Differentiation of the LHCN-M2 cells was induced by serum removal. U2OS T-Rex cells (ThermoFisher) stably expressing Nup50-cherry were Lipofectamine-transfected with Pci1-linearized pcDNA3-TetOn expressing Nup50 fused to the mCherry and sectioned for neomycin resistance for two weeks. Constructs Human GIPC1 exon 1, antisense RILPL1 and LOC642361 lncRNA sequences upstream of their GGC repeats were cloned into pcDNA3.1 fused to a GFP deleted of its ATG and in all three frames. Mutations of the ATG or CTG start codons, or within ORFs were achieved by inverse PCR or by oligonucleotide ligations. GIPC1 upstream ORF, antisense RILPL1 and LOC642361 small ORFS with either 12 or 100 optimized GGN repeats were synthetized by GenScript and fused to the GFP into a pAAV2-CAG vector. To ensure repeat expansions stability, all GGC repeat-containing plasmids were transformed into STBL3 bacterial strain (Invitrogen) and all constructs were confirmed by sequencing. Cell transfection and treatments For transient transfection, cells were plated in DMEM and 0.1% fetal bovine serum or without serum for LHCN-M2 cells and transfected for 5 hours using Lipofectamine 2000 (Fisher Scientific). After 1 to 4 days post transient transfection, cells were analyzed by live imaging, immunofluorescence, RT-qPCR, cell viability dot blotting or western blotting. For treatments, LHCN-M2 cells were incubated overnight with indicated concentration of SRPIN340, H-89, fluphenazine TMPyP4, 5,10,15,20-Tetra(4-pyridyl)-21H,23H-porphine, 5,10,15,20-Tetraphenyl-21H,23H-porphine (Sigma), 5,10,15,20-Tetrakis(4-aminophenyl)-21H,23H-porphine, 5,10,15,20-Tetrakis(4-ethynylphenyl)-21H,23H-porphine, 5,10,15,20-Tetra(pyridin-2-yl)porphyrin, (Porphyrin-5,10,15,20-tetrayltetrakis(benzene-4,1-diyl))tetraboronic acid, 4,4',4'',4'''-(21H,23H-porphine-5,10,15,20-tetrayl)tetrakis-Phenol (BLD Pharm). For cycloheximide treatment, HEK293 cells were treated 1-day post-transfection with 50 µg/mL of cycloheximide during 1, 3, 8 or 24 hours. Cell viability assay LHCN-M2 cells were transiently transfected during 48h to 72h with the different polyGlycine-expressing constructs and treated overnight with the indicated drug concentration. After addition of 0,5 µM of TO-PRO-3 (Thermofisher), live cells were imaged using the CX7 Cellular Imaging System (25 fields per well at 10x magnification) followed by a cell-to-cell analysis using Cellomics HCS Studio software (CellHealth Bioapplication). Transfected cells were detected using GFP staining and dead cells were identified using TO-PRO-3 intensity within cell mask. FACS analysis HEK293 cells transfected for 24 hours with the different frame constructs were trypsinized, centrifuged 5 min at 700 rpm and resuspended in 500 µL of PBS. Cells were analyzed by the BD LSRFortessa X-20 and results were construed by FlowJo. Co-immunoprecipitation assay 24 h after transfection of HEK293 or LHCN-M2 cells with 3 µg of the different plasmid constructs in Lipofectamine 2000 (Invitrogen), cells were lysed in RIPA buffer (50 mM Tris-HCl pH 7.5, 0.15 M NaCl, 0.5% Triton X-100) supplemented with protease inhibitor cocktail (Roche) and clarified by centrifugation at 14000 rpm for 10 min. Immunoprecipitations were performed at 4°C for 1 h using pre-washed Anti-GFP (Abcam, ab193983) Magnetic Beads in RIPA buffer, washed three time, then bound proteins were eluted by 3 min denaturation step at 95°C with Laemmli buffer followed by mass spectrometry or western blot. Mass spectrometry analysis HEK293 cells were transfected with GIP(GGC) Gly-frame -GFP, asRILPL1(GGC) Gly-frame -GFP, LOC642361(GGC) Gly-frame -GFP plasmids for N-terminus determination or with GFP, ATG polyG-GFP, uGIPpolyG-GFP (exon 2 or 4), uN2CpolyG-GFP, asRILpolyG-GFP or LOC6polyG-GFP plasmids for interactome identification, using Lipofectamine 2000 (Fisher Scientific) for 24 hours and proteins were purified by GFP-trap immunoprecipitation (abcam). LHCN-M2 were treated or not with 3 µM of TMPyP4 overnight. Protein mixtures were TCA-precipitated overnight at 4°C, pellets were washed twice with 1 mL cold acetone, dried and dissolved in 2 M urea in 0.1 mM Tris-HCl pH 8.5 for reduction (5 mM TCEP, 30 min.) and alkylation (10 mM iodoacetamide, 30 min.). Trypsin digestion was carried out at 37°C overnight. Extracted peptides were then analyzed using an Ultimate 3000 nano-RSLC (Thermo Scientific) coupled in line with an Orbitrap ELITE or an Orbitrap Exploris 480 via a nano-electrospray ionization source (Thermo Scientific, San Jose California) and the FAIMS pro interface. Peptides were separated on a C18 Acclaim PepMap nano-column (75 µm ID x 25 cm, 2.6 µm, 150Å, Thermo Fisher Scientific) with a 20 minutes linear gradient from 8% to 35% buffer B (A: 0.1% FA in H 2 O; B: 0.1% FA in 80% ACN, 400 nl/min, 50°C) followed by a regeneration step at 90% B and a equilibration at 8% B. The total chromatography was 30 minutes. The mass spectrometer was operated in positive ionization mode in Data-Dependent Acquisition (DDA) with FAIMS compensation voltages set to CV = -45V. The DDA cycle consisted of one survey scans (350-1400 m/z, 90,000 FWHM) followed by MS² spectra (HCD; 30% normalized energy; 2 m/z window; 22,500 FWMH) in the limit of 1 sec. Unassigned and single charged states were rejected. Exclusion duration was set for 40 s with mass width was ± 10 ppm. Proteins were identified with Proteome Discoverer 2.5 software (Thermo Fisher Scientific) and Homo Sapiens proteome database or against a homemade database of all potential three frames translated proteins or peptides from the human GIPC1 5‘UTR, RILPL1 antisense or LOC642361 sequences. Precursor and fragment mass tolerances were set at 10 ppm and 0.02 Da Da respectively, and up to 2 missed cleavages were allowed. Oxidation (M) and N-terminal Acetylation were set as variable modification, and Carbamidomethylation (C) as fixed modification. Peptides were filtered with a false discovery rate (FDR) at 1%, rank 1. Proteins were quantified with a minimum of 1 unique peptide based on the XIC (sum of the Extracted Ion Chromatogram). The quantification values were exported in Perseus (1.6.15.0) for statistical analysis involving a log[2] transform, imputation, normalization. Western blotting Proteins were denatured 3 min at 95°C, separated on 4-12% bis-Tris Gel (NuPAGE), transferred on nitrocellulose membranes (Amersham Protan), blocked with 5% non-fat dry milk in Tris Buffer Saline buffer plus 0,1% Tween-20 (TBS-T), incubated with anti-GFP (Abcam, ab290, 1/10000), anti-GFP (Abcam, ab1218, 1/10000), mCherry (Abcam, ab167453, 1/5000), GAPDH (Abcam, ab8245, 1/10000), Ku70 (SantaCruz, sc-56129, 1/5000), Ku80 (Abcam, ab119935, 1/10000), RPL10A (Thermofisher, MA5-27171, 1/3000), RPL36 (Abcam, ab241584, 1/10000), HA (Abcam, ab130275, 1/5000), uGIP pAb or uN2C pAb (rabbit polyclonal homemade, 1/1000), uN2C 4D12, asRIL 2D8 or 4B9 or LOC6 2E8 (mouse monoclonal homemade, 1/1000) in TBS-T plus 5% non-fat dry milk overnight at 4°C. The membranes were washed 3 times and incubated with anti-rabbit or mouse Peroxidase antibody (CST, 7074S or 7076S, 1/10000) 1 hour in TBS, followed by washing and ECL Prime chemiluminescence revelation kit (Millipore). Dot blotting LHCN-M2 cells transfected with ATG polyG-GFP, uGIPex2polyG-GFP, uGIPex4polyG-GFP, uN2CpolyG-GFP, asRILpolyG-GFP or LOC6polyG-GFP constructs during 48h were scrapped and centrifuged during 10 min at 3000 rpm at 4°C. The cell pellet was freezed, resuspended in 200 µL of RIPA, freezed and centrifuged during 10 min at 14000 rpm at 4°C. The pellet was resuspended in 200µL of 2X Laemmli blue, sonicated 5 sec at 20% amplitude, warmed 3 min at 95°C. Proteins were directly loaded on nitrocellulose membranes (Amersham Protan), wash 2 times with Towbin buffer, blocked with 5% non-fat dry milk in Tris Buffer Saline buffer plus 0,1% Tween-20 (TBS-T), incubated with anti-GFP (Abcam, ab290, 1/10000) in TBS-T plus 5% non-fat dry milk overnight at 4°C. The membranes were washed 3 times and incubated with anti-rabbit Peroxidase antibody (CST, 7074S, 1/10000) 1 hour in TBS, followed by washing and ECL Prime chemiluminescence revelation kit (Millipore). Lysostaphin treatment HEK293 cells transfected with uGIPpolyG-GFP, asRILpolyG-GFP or LOC6polyG-GFP constructs were scrapped and centrifuged during 10 min at 3000 rpm at 4°C. The cell pellet was resuspended in 300 µL of RIPA, centrifuged during 10 min at 14000 rpm at 4°C. 30 µL of supernatant extract was incubated with 10 ng /µL of lysostaphin (Prospec, ENZ-269) during 1 to 20 minutes at 37°C. Laemmli buffer was add to the mix and proteins were analyze by western blot. Antibody production To generate monoclonal antibodies directed against uGIPpolyAla, asRILpolyG or LOC6polyG, two months old female BALB/c mice were injected intraperitoneally with KLH conjugated peptides (1A7 and 3G4: RRAEPGAHGEAEAA for uGIPpolyAla antibodies production, 2D8: GPGVWAPGSARSC and 4B9: GGSGEGARVRRPAAPPKLGSELRS for asRILpolyG antibodies generation or 2E8: CAWVGAPERSWPAGGPDALRGRDGAKEAGR for LOC6polyG antibodies generation) with 200 ug of poly(I/C) as adjuvant. Three injections were performed at 2 weeks intervals and four days prior to hybridoma fusion, mice with positively reacting sera were re-injected. Spleen cells were fused with Sp2/0.Agl4 myeloma cells. Supernatants of hybridoma cultures were tested at day 10 by ELISA for cross-reaction with peptides. Positive supernatants were then tested by immunofluorescence and western blot on transfected HEK293 cells. Specific cultures were cloned twice on soft agar. Specific hybridomas were established and ascites fluid was prepared by injection of 2x106 hybridoma cells into Freund adjuvant-primed BALB/c mice. All animal experimental procedures were performed according to the French and European authority guidelines. Rabbit polyclonal antibodies directed against uGIPpolyG or uN2CpolyG were generated by Eurogentec with the following KLH conjugated peptides: uGIP pAb: MEFAEGRAGC, uN2C pAb: GGGDREDARPAPLC. AAV production and retro-orbital injection Recombinant AAV were generated by triple-transfection of HEK293T/17cell line with the pAAV expression plasmids (expressing: GFP, ATG polyG-GFP, uGIPex2polyG-GFP, uGIPex4polyG-GFP, uN2CpolyG-GFP, asRILpolyG-GFP or LOC6polyG-GFP), the auxiliary plasmid pHelper (Agilent) encoding the adenovirus helper functions and the capsid plasmid pUCmini-iCAP-PHP.eB (Addgene #103005) or pMyoAAV-4A. The pMyoAAV-4A was previously generated by the IGBMC facility using available literature (Tabebordbar et al., 2021). The rAAV were harvested from cell lysate and treated with Benzonase (Merck) at 100U/mL. Recombinant vectors were purified by Iodixanol gradient ultracentrifugation (OptiprepTM, Axis Shield), followed by dialysis and concentration (Amicon Ultra-15 Centrifugal Filter Device 100 K) against sterile PBS (Dulbecco’s PBS containing 0.5 mM MgCl2). Particles were quantified by real time PCR and vector titers were expressed as viral genomes per ml (vg/ml). 2 months old C57BL/6 male mice were injected retro-orbitally with 100 µL of sterile NaCl with 1.5x1013 vg/kg of AAV. Mouse phenotyping Rotarod test (Bioseb, Chaville, France) was performed with three testing trials during which the rotation speed accelerated from 4 to 40 rpm in 5 min. Trials were separated by 10-15 min interval. The average latency was used as index of motor coordination performance. Notched bar test: mice were tested under 100-lux lighting on a 2 cm-wide and 50 cm-long natural wooden piece notched bar comprising 12 platforms of 2 cm spaced by 13 gaps of 2 cm and bearing a 6 cm2 terminal platform. Animals had to cross the notched bar twice for training and 3 times for the test. Every instance of a back paw going through the gap was considered an error, and the global error percentage was calculated. Open field test: mice were tested in automated open fields (Panlab, Barcelona, Spain), each virtually divided into central and peripheral regions. The open fields were placed in a room homogeneously illuminated at 120 Lux. Each mouse was placed in the periphery of the open field and allowed to explore freely the apparatus for 30 min, with the experimenter out of the animal’s sight. The distance traveled, the number of rears, and time spent in the central and peripheral regions were recorded over the test session. The number of entries and the percent time spent in center area are used as index of emotionality/anxiety. Immunofluoresence on PFA-fixed cells Glass coverslips containing plated cells were fixed for 15 min in PBS with 4% paraformaldehyde, washed with PBS and incubated in PBS plus 0.5% Triton X-100 during 5 min. The coverslips were incubated during one hour with primary antibody against p62 (Abcam, ab56416, 1/1000), p62 (Abcam, ab109012, 1/1000) desmin (Abcam, ab32362, 1/500), Lamin A/C (Abcam, ab238303, 1/1000), uGIP pAb or uN2C pAb (rabbit polyclonal homemade, 1/100), uN2C 4D12, asRIL 2D8 or 4B9 or LOC6 2E8 (mouse monoclonal homemade, 1/100). After washing with PBS, the coverslips were incubated with donkey anti-mouse or donkey anti-rabbit secondary antibodies conjugated with Alexa 488, CY3 or CY5 (Jackson Immunoresearch, 1/500) for one hour, washed twice with PBS and incubated for 3 min in PBS/DAPI (1/10000 dilution). Coverslips were rinsed twice before mounting in Pro-Long media (Molecular Probes). Immunofluoresence or immunochemistry on PFA-fixed tissue sections For immunochemistry follow by cresyl violet counterstaining, buffers were DEPC-treated and autoclave as described in Kádár et al., 2009. Brain sections were deparaffinized for 10 min in Sub-X (Leica) and dehydrated as follows: ethanol 100% (10 min), ethanol 90% (5 min), ethanol 70% (5 min), and rinsed in water. Antigen retrieval was performed in pressure cooker in 10 mM Tris pH9, 1 mM EDTA or in 10 mM sodium citrate pH6. For immunochemistry, endogenous peroxidase activity was blocked 15 min with 3% H2O2. Slides were blocked 1 h with PBS, 0,5% Triton X-100 and 5% Horse Serum for immunofluorescence of with PBS, 0,1% Tween-20 and 5% BSA for immunochemistry following by overnight incubation at 4°C with primary antibody against Calbindin (CST, 13176S, 1/800), GFAP (Abcam, ab68428, 1/10000), p62 (Abcam, ab56416, 1/1000), p62 (CST, 23214S, 1/500) or Tyrosine Hydroxylase (Abcam, ab112, 1/2000). For immunofluorescence, slides were washed with PBS plus 0.1% Triton X-100, incubated with donkey anti-mouse or donkey anti-rabbit secondary antibodies conjugated with Alexa 488 or CY3 (Jackson Immunoresearch, 1/500) for one hour, washed twice with PBS-0,1% Triton X-100 and incubated for 3 min in PBS/DAPI (1/1000 dilution). Slides were rinsed twice in PBS before mounting in Pro-Long media (Molecular Probes). For immunochemistry, slides were washed with PBS plus 0,1% Tween-20, incubated with horse anti-mouse or anti-rabbit coupled to peroxidase (Vector, MP-7402 or MP-7401) for 30 min, washed with PBS-0,1% Tween-20 and then, revealed by DAB EqV substrate (Vector, SK-4103) under binocular magnifier. The reaction was stopped by immersing the slide in PBS. Then, the slides were washed 15 min in water, stain in 1% cresyl violet solution at 55°C during 10 min. Slides were washed in water, quickly dehydrated in 100% ethanol, immersed in Sub-X and mounted in CV Ultra mounting medium (Leica). Immunofluoresence or immunochemistry on isopentane-frozen sections For immunochemistry, endogenous peroxidase activity was blocked 15 min with 3% H2O2. Muscle sections were blocked 1 h with PBS and 3% BSA and incubated overnight at 4°C with primary antibody directed against p62 (Abcam, ab109012, 1/1000), p62 (CST, 23214S, 1/500), Lamin A/C (Abcam, ab238303, 1/1000), Lamin B1 (Proteintech, 12987-1-AP, 1/500), type I fibers (DSHB, BA-D5, 1/50), type IIa fibers (DSHB, SC-71, 1/50), type IIb fibers (DSHB, BF-F3, 1/50), uGIPpolyGly pAb or uN2C pAb (rabbit polyclonal homemade, 1/100), uGIPpolyAla 1A7 or 3G4, uN2C 4D12, asRIL 2D8 or 4B9 or LOC6 2E8 (mouse monoclonal homemade, 1/100). After washing with PBS, the slides were incubated with donkey anti-mouse or donkey anti-rabbit secondary antibodies conjugated with Alexa 488, CY3 or CY5 (Jackson Immunoresearch, 1/500), goat anti-mouse IgM DyLight 405, IgG2b CY3 and IgG1 CY5 (Jackson Immunoresearch, 1/100) and Wheat Germ Agglutinin (WGA) conjugated to Alexa 555 (1/300) for one hour, washed twice with PBS and incubated for 3 min in PBS/DAPI (1/1000 dilution) for immunofluorescence. Concerning immunochemistry, slides were washed in PBS, incubated with with horse anti-rabbit coupled to peroxidase (Vector, MP-7401) for 30 min, washed in PBS and revealed by DAB EqV substrate (Vector, SK-4103). The reaction was stopped by immersing the slide in PBS. Then, the slides were washed 15 min in water, H&E stain following the constructor instructions (Abcam, ab245880). For SDH activity, tissue sections were incubated in an SDH reaction mixture (1.5 mM nitroblue tetrazolium (NBT), 5 mM EDTA, 48 mM succinic acid, 750 µM sodium azide, 30 mM methyl-phenylmethlyl sulfate, phosphate buffered to pH 7.6). Slides were immersed in Sub-X and mounted in CV Ultra mounting medium (Leica). Image acquisition Slides were imaged by spinning disk Yokogawa CSU W1 mounted with Leica Dmi 8 microscope with x20 or x63 objectives or by Zeiss Axioscan 7 scanner with x20 objective for immunohistochemistry or x40 objective for immunofluorescence. Correlative light and electron microscopy (CLEM) Cells were grown to near confluency directly on carbon-coated sapphire disks (3x0.05mm; engineering office M. Wohlwend GMBH). The sapphire disks were then transferred to 300µm deep flat carriers and subjected to high-pressure freezing with the HPM10 BALTEC apparatus. Automated FS (freeze substitution) was performed in the chamber of an AFS2 device (Leica Microsystems GmbH). The sample were kept at -90°C in dry acetone containing 0.1% uranyl acetate over a period of 24h. The temperature was gradually increased to -45°C at a rate 5°C/h, followed by 5h at -45°C. The samples were washed with pure acetone and infiltrated with graded concentrations of Lowicryl HM20. Polymerization was achieved by UV light exposure at -25°C for 48h, followed by an additional 9h at room temperature (20°C). Ultrathin sections were cut 90 nm with Leica Ultracut, picked up on 200 mesh copper grids coated with a carbon film. Sections were viewed on spinning disk Yokogawa X1 mounted with Nikon TI2 microscope to locate fluorescent signal. Then, after post-stained for 10min in 2% aqueous UA and 5min in lead citrate, sections were viewed on a Tecnai G2-20 transmission electron microscope operated at 120kV, and images were acquired on TVIPS TemCam F416 camera. Muscle fiber segmentation H&E staining or WGA channel for fluorescent images were used to segments muscle fibers. The muscle fibers were segmented from in Qupath (Bankhead et a., 2017) with Cellpose (Stringer et al., 2021). Then, the fibers classification is calculated with a Python script and reimported into Qupath to display the result. Quantitative real time RT-PCR Total RNAs from mouse tissues or cells were isolated by TriReagent (Merck). DNA was removed by treating samples following the instructions of the Turbo DNA-free Kit (Thermofisher). cDNAs were generated using the Transcriptor High Fidelity cDNA synthesis kit (Roche) for quantification of mRNAs. qPCR were realized using the LightCycler 480 SYBR Green I Master (Roche) in a Lightcycler 480 (Roche) with 15 min at 94°C followed by 50 cycles of 15 sec at 94°C, 20 sec at 58°C and 20 sec at 72°C. RPLPO mRNA for human samples and Ubiquitin mRNA for mouse samples were used as standard and data were analyzed using the Lightcycler 480 analysis software (2ΔCt method). RNAseq on LHCN-M2 cells Total RNAs from LHCN-M2 cells treated 3h with 1 µM of TMPyP4 were isolated by TriReagent (Merck). DNA was removed by treating samples following the instructions of the Turbo DNA-free Kit (Thermofisher). Library preparation was performed at the GenomEast platform at the Institute of Genetics and Molecular and Cellular Biology using Illumina Stranded mRNA Prep Ligation - Reference Guide - PN 1000000124518. RNA-Seq libraries were generated from 500 ng of total RNA using Illumina Stranded mRNA Prep, Ligation kit and IDT for Illumina RNA UD Indexes Ligation (Illumina, San Diego, USA) according to manufacturer’s instructions. Following polyA selection, mRNAs were fragmented at 94°C for 8 minutes. DNA libraries were amplified using 13 cycles of PCR. Surplus PCR primers were further removed by two successive purifications using SPRIselect beads (Beckman-Coulter, Villepinte, France). The final libraries were checked for quality and quantified using Bioanalyzer 2100 system (Agilent technologies, Les Ulis, France). Libraries were sequenced on an Illumina NextSeq 2000 sequencer as paired-end 50 base reads. Image analysis and base calling were performed using RTA version 2.7.7 and BCL Convert version 3.8.4. The GSEA analysis was based on enrichment analysis of Kyoto Encyclopedia of Genes and Genomes (KEGG). RNAseq on muscle nuclei Single nuclei RNA-seq analysis was performed using 10X Genomics FLEX multiplex technology which uses barcoded probe sets targeting the whole transcriptome and enabling sample multiplexing. Both Tibialis Anterior (TA), gastrocnemius and quadriceps muscles of 7 months post-AAV injection of two individuals per genotype (NaCl-, GFP-, ATG polyG-GFP-, uGIPpolyG-GFP-, uN2CpolyG-GFP- or LOC6polyG-GFP- injected mice) were isolated and dissociated with scissor in hypotonic buffer (250 mM sucrose, 10 mM KCl, 5 mM MgCl2, 10 mM Tris-HCl pH8, 25 mM HEPES, 0,2 mM PMSF, 0,1 mM DTT, 0,3% Triton X-100, 0,2 U/µL RNase inhibitor). The muscles were homogenized with Precellys 24 Touch (Bertin) for 25 sec at 5000 rpm. Samples were filtered through 100 µm and 20 µm filters and centrifuged 10 min at 400 G at 4°C. Pellets were washed in PBS, 2% BSA, 0,2 U/µL RNase inhibitor, centrifuged 10 min at 400 G at 4°C, resuspended in PBS, 2% BSA, 0,2 U/µL RNase inhibitor, 200 nM DAPI to sort nuclei with FACSAria Fusion or FACSAria II. More than 3 millions of isolated nuclei were centrifuged 12 min at 400 G at 4°C and fixed with formaldehyde using Chromium Next GEM Single Cell Fixed RNA Sample Preparation Kit (PN 1000414) following 10X genomics procedure (CG000478). A single 8-plex library was generated using the Chromium Next GEM Single Cell Fixed RNA reagent set (10x Genomics, Leiden, The Netherlands), according to the manufacturer's recommendations. During this process, nuclei counting was performed by a Trypan Blue exclusion assay on a Buerker Chamber. For each of the 8 samples, from 133000 to 1060000 starting nuclei were hybridized with a unique Mouse WTA Probe Barcode (10x Genomics, PN-1000492). Hybridizations were completed in 80 µL of hybridization mixture plus 20 µL of WTA probes at 42°C for 21h. Following nuclei re-counting, an equal number of nuclei from each hybridization reaction was combined in a single pool before washing. After re-counting, nuclei were then loaded onto the Chromium iX using a Next GEM Chip Q, targeting 5000 nuclei per hybridization reaction for a total of 96000 nuclei loaded per library (8-plex). Full-length cDNA amplification was completed using 8 cycles of PCR. Dual-index library construction was performed with 10 cycles during sample index PCR. Final library quantification and quality control were performed using the Bioanalyzer 2100 (Agilent Technologies, Les Ulis, France). Libraries were sequenced on an Illumina NextSeq 2000 sequencer as paired-end 28 and 85 bases reads. Image analysis and base calling were performed using RTA (v.4.12.2) and BCL Convert (v.4.2.7). Alignment, filtering, UMI counting and assigning reads to samples were carried out using cellranger multi pipeline (v.8.0.1). The output filtered feature matrix of each sample was input to Seurat R package (v.5.1.0). The standard workflow for scRNA-seq with default parameters was followed. Integration of datasets was performed using the anchor-based RPCA integration method. SNN was built from the first 20 principal components, and the resolution of the clustering was set to 1.2. Marker genes were identified using the FindConservedMarkers function, the literature and scCATCH R package (v.3.2.2). Differential expression analysis across conditions was performed using DESeq2 as parameter of the FindMarkers function after pseudobulking (AggregateExpression function). Transgenic Constructs and Drosophila Strains The upstream open reading frames (uORFs) of human GIPC1 (including wild-type uGIP-GFP and mutant uGIPpolyG-GFP), and human asRILPL1 (including wild-type asRIL-GFP and mutant asRILpolyG-GFP) were subcloned into the attB-pUAST vector. This vector features a UAS sequence in its promoter region (ref: A. H. Brand, N. Perrimon, Targeted gene expression as a means of altering cell fates and generating dominant phenotypes. Development 118, 401-415, 1993). Following cDNA sequence verification, these constructs were integrated into the attP2 site of phiC31 stocks through standard microinjection procedures, as outlined in prior research (ref: Yu J, Liufu T, Zheng Y, Xu J, Meng L, Zhang W, et al. CGG repeat expansion in NOTCH2NLC causes mitochondrial dysfunction and progressive neurodegeneration in Drosophila model. Proc Natl Acad Sci U S A. 2022 Oct 11;119(41):e2208649119). Transgenic Drosophila lines expressing either GFP vector control, uGIP-GFP, uGIPpolyG-GFP, asRIL-GFP or uRILpolyG-GFP were successfully generated. All Gal4 driver lines were acquired from the Bloomington Drosophila Stock Center. The fly strains were maintained under standard conditions at 25°C on cornmeal agar medium, with a regulated 12-hour light-dark cycle. Fly Electron Microscopy Analysis Electron microscopy was performed following established protocols (Ref. Yu J, Liufu T, Zheng Y, Xu J, Meng L, Zhang W, et al. CGG repeat expansion in NOTCH2NLC causes mitochondrial dysfunction and progressive neurodegeneration in Drosophila model. Proc Natl Acad Sci U S A. 2022 Oct 11;119(41):e2208649119). Briefly, samples were collected and fixed in 2.5% glutaraldehyde at 4°C overnight. Subsequently, the samples were sectioned using a Leica EM UC6/FC6 Ultramicrotome. To verify the proper orientation and quality of the sections, they were stained with toluidine blue. The selected sections were then transferred to copper grids and subjected to counterstaining with uranyl acetate and lead citrate to enhance contrast. Finally, the prepared samples were imaged using an electron microscope. Fly Climbing Assay and Lifespan Assay Sex-specific climbing and lifespan assays were conducted following established protocols (ref. Yu J, Liufu T, Zheng Y, Xu J, Meng L, Zhang W, et al. CGG repeat expansion in NOTCH2NLC causes mitochondrial dysfunction and progressive neurodegeneration in Drosophila model. Proc Natl Acad Sci U S A. 2022 Oct 11;119(41):e2208649119). Briefly, vials containing 20–30 flies of the same genotype were used. Flies were separated by sex within 24 hours of eclosion and transferred to fresh vials every 4 days throughout the experimental period. For climbing assays, flies were gently tapped to the vial bottom, and the number crossing the 5 cm mark within 15 seconds was recorded. Each trial was repeated five times, with mean and standard error calculated. Lifespan assays used two gender- and age-matched groups. Both climbing ability and lifespan were assessed and recorded every 5 days. Fly Porphyrin TMPyP4 Administration Porphyrin TMPyP4 was obtained from Selleck Chemicals and stored as a 1 mM stock solution at −20°C. For Drosophila studies, the compound was dissolved in sterile water to achieve final concentrations of 30 μM, 100 μM, and 200 μM. Before each experiment, fresh dilutions were prepared in Drosophila cornmeal agar medium. The drug was administered starting from the egg stage, and adult flies were transferred to fresh vials containing Porphyrin TMPyP4 every three days. Quantification and statistical analysis To eliminate bias, image or animal analyses were either completely automated or blinded. All statistical analyses were performed using Excel (Microsoft) and online web statistical calculator (Astatsa). Experiments are represented as either mean value ± Standard Error of Mean (SEM) or box-and-whisker plots with box upper and lower limits representing the 25th and 75th quartiles, respectively, the whiskers depicting the lowest and highest data points and the horizontal line through the box represent the median. The statistical tests used are two-tailed paired student t-test or one-way ANOVA with post-hoc Tukey HSD test. Significance was set as p < 0.05; **p < 0.01 and ***p < 0.001 for student t-test as *p < 0.01; **p < 0.001 and ***p < 0.0001 for ANOVA with post-hoc Tukey HSD test. Sample-sizes were determined based on past experiments and to minimize the number of mice used. No statistical method was used to determine whether data meet assumptions of the statistical approach. Detailed statistical information, including the statistical test, measures, number “n” of animals, cells and/ or experiments are indicated in the figures and their respective legends. Declarations DECLARATION OF INTERESTS The authors declare no conflicts of interest. AUTHOR CONTRIBUTIONS Experiments were performed by M.B., J.Y., N.E., L.S., E.G., C.N., A.M., B.M., C.G., P.G-R., E.L., A.P., and M.O-A. Control and OPDM cases originate from I.N., Z.W., K.Y., N.W., and J.D. Data were collected and analyzed by M.B., J.Y., N.E., E.G., D.P., A.M., B.M., C.T., J.D. and N.C-B. The study was designed, coordinated and written by J.D., M.B. and N.C-B. with input from all authors. ACKNOWLEDGMENTS This work was supported by the following grants: National Natural Science Foundation of China (82071409, U20A20356, 82171846, 82422025, 82430059 and 82401635); Beijing Nova Program (20220484017, 20230484403); Beijing Natural Science Foundation (7244421); the National High-Level Hospital Clinical Research Funding (2023HQ03); Association Française contre les Myopathies AFM-Téléthon 28811 (Manon Boivin) ; Fondation pour la Recherche sur le Cerveau FRC 2023, Fondation pour la Recherche Médicale FRM PMT202306017578 and FRM EQU202103012936 (Nicolas Charlet-Berguerand). Manon Boivin was supported by a 1-year post-doctoral ITI IMCBio funding. The GenomEast Sequencing platform at IGBMC is a member of the national France Génomique consortium supported by the French National Research Agency (ANR-10-INBS-0009). The Light Microscopy Facility at the IGBMC imaging center is member of the national infrastructure France-BioImaging and is supported by the French National Research Agency (ANR-10-INBS-04). Finally, this work of the Interdisciplinary Thematic Institute IMCBio+, as part of the ITI 2021–2028 program of the University of Strasbourg, CNRS and Inserm, was supported by IdEx Unistra (ANR-10-IDEX-0002), and by SFRI-STRAT’US project (ANR-20-SFRI-0012) and EUR IMCBio (ANR-17-EURE-0023) under the framework of the France 2030 Program (IGBMC). We extend our gratitude to the patients and their families for their invaluable participation in this study. Special thanks to Jin Xu (Peking University First Hospital) for expert acquisition of electron microscopy images and to Jing Liu and Qingqing Wang (Peking University First Hospital) for their preparation of histopathological sections. Data and Code Availability RNA sequencing and mass spectrometry datasets are available in supplementary Tables S1 to S5. Complete transcriptomic and proteomics source data are available from the corresponding author upon reasonable request. References Annear DJ, Vandeweyer G, Elinck E, Sanchis-Juan A, French CE, Raymond L, Kooy RF. Abundancy of polymorphic CGG repeats in the human genome suggest a broad involvement in neurological disease. Sci Rep. 2021 Jan 28;11(1):2515. doi: 10.1038/s41598-021-82050-5 Banerjee A, Apponi LH, Pavlath GK, Corbett AH. PABPN1: molecular function and muscle disease. FEBS J. 2013 Sep;280(17):4230-50. doi: 10.1111/febs.12294. Bankhead P, Loughrey MB, Fernández JA, Dombrowski Y, McArt DG, Dunne PD, McQuaid S, Gray RT, Murray LJ, Coleman HG, James JA, Salto-Tellez M, Hamilton PW. QuPath: Open-source software for digital pathology image analysis. Sci Rep. 2017 Dec 4;7(1):16878. doi: 10.1038/s41598-017-17204-5 Bao L, Zuo D, Li Q, Chen H, Cui G. Current advances in neuronal intranuclear inclusion disease. Neurol Sci. 2023 Jun;44(6):1881-1889. doi: 10.1007/s10072-023-06677-0. Boivin M, Deng J, Pfister V, Grandgirard E, Oulad-Abdelghani M, Morlet B, Ruffenach F, Negroni L, Koebel P, Jacob H, Riet F, Dijkstra AA, McFadden K, Clayton WA, Hong D, Miyahara H, Iwasaki Y, Sone J, Wang Z, Charlet-Berguerand N. Translation of GGC repeat expansions into a toxic polyglycine protein in NIID defines a novel class of human genetic disorders: The polyG diseases. Neuron. 2021 Jun 2;109(11):1825-1835.e5. doi: 10.1016/j.neuron.2021.03.038. Boivin M, Charlet-Berguerand N. Trinucleotide CGG Repeat Diseases: An Expanding Field of Polyglycine Proteins? Front Genet. 2022 Feb 28;13:843014. doi: 10.3389/fgene.2022.843014. Brais B, Bouchard JP, Xie YG, Rochefort DL, Chretien N, Tome FM, Lafreniere RG, Rommens JM, Uyama E, Nohira O et al. Short GCG expansions in the PABP2 gene cause oculopharyngeal muscular dystrophy. Nat Genet (1998) 18, 164–167. Brown LY, Brown SA. Alanine tracts: the expanding story of human illness and trinucleotide repeats. Trends Genet. 2004 Jan;20(1):51-8. doi: 10.1016/j.tig.2003.11.002. Chan KY, Jang MJ, Yoo BB, Greenbaum A, Ravi N, Wu WL, Sánchez-Guardado L, Lois C, Mazmanian SK, Deverman BE, Gradinaru V. Engineered AAVs for efficient noninvasive gene delivery to the central and peripheral nervous systems. Nat Neurosci. 2017 Aug;20(8):1172-1179. doi: 10.1038/nn.4593. Chen J, Brunner AD, Cogan JZ, Nuñez JK, Fields AP, Adamson B, Itzhak DN, Li JY, Mann M, Leonetti MD, Weissman JS. Pervasive functional translation of noncanonical human open reading frames. Science. 2020 Mar 6;367(6482):1140-1146. Chen H, Lu L, Wang B, Cui G, Wang X, Wang Y, et al. Re-defining the clinicopathological spectrum of neuronal intranuclear inclusion disease. Ann Clin Transl Neurol. 2020;7(10):1930–41. https://doi. org/10.1002/acn3.51189. Chen Z, Gustavsson EK, Macpherson H, Anderson C, Clarkson C, Rocca C, Self E, Alvarez Jerez P, Scardamaglia A, Pellerin D, Montgomery K, Lee J, Gagliardi D, Luo H; Genomics England Research Consortium; Hardy J, Polke J, Singleton AB, Blauwendraat C, Mathews KD, Tucci A, Fu YH, Houlden H, Ryten M, Ptáček LJ. Adaptive Long-Read Sequencing Reveals GGC Repeat Expansion in ZFHX3 Associated with Spinocerebellar Ataxia Type 4. Mov Disord. 2024 Mar;39(3):486-497. doi: 10.1002/mds.29704. Chothani SP, Adami E, Widjaja AA, Langley SR, Viswanathan S, Pua CJ, Zhihao NT, Harmston N, D'Agostino G, Whiffin N, Mao W, Ouyang JF, Lim WW, Lim S, Lee CQE, Grubman A, Chen J, Kovalik JP, Tryggvason K, Polo JM, Ho L, Cook SA, Rackham OJL, Schafer S. A high-resolution map of human RNA translation. Mol Cell. 2022 Aug 4;82(15):2885-2899.e8. Corbett MA, Kroes T, Veneziano L, Bennett MF, Florian R, Schneider AL, Coppola A, Licchetta L, Franceschetti S, Suppa A, Wenger A, Mei D, Pendziwiat M, Kaya S, Delledonne M, Straussberg R, Xumerle L, Regan B, Crompton D, van Rootselaar AF, Correll A, Catford R, Bisulli F, Chakraborty S, Baldassari S, Tinuper P, Barton K, Carswell S, Smith M, Berardelli A, Carroll R, Gardner A, Friend KL, Blatt I, Iacomino M, Di Bonaventura C, Striano S, Buratti J, Keren B, Nava C, Forlani S, Rudolf G, Hirsch E, Leguern E, Labauge P, Balestrini S, Sander JW, Afawi Z, Helbig I, Ishiura H, Tsuji S, Sisodiya SM, Casari G, Sadleir LG, van Coller R, Tijssen MAJ, Klein KM, van den Maagdenberg AMJM, Zara F, Guerrini R, Berkovic SF, Pippucci T, Canafoglia L, Bahlo M, Striano P, Scheffer IE, Brancati F, Depienne C, Gecz J. Intronic ATTTC repeat expansions in STARD7 in familial adult myoclonic epilepsy linked to chromosome 2. Nat Commun. 2019 Oct 29;10(1):4920. doi: 10.1038/s41467-019-12671-y. Cortese A, Beecroft SJ, Facchini S, Curro R, Cabrera-Serrano M, Stevanovski I, Chintalaphani SR, Gamaarachchi H, Weisburd B, Folland C, Monahan G, Scriba CK, Dofash L, Johari M, Grosz BR, Ellis M, Fearnley LG, Tankard R, Read J, Merve A, Dominik N, Vegezzi E, Schnekenberg RP, Fernandez-Eulate G, Masingue M, Giovannini D, Delatycki MB, Storey E, Gardner M, Amor DJ, Nicholson G, Vucic S, Henderson RD, Robertson T, Dyke J, Fabian V, Mastaglia F, Davis MR, Kennerson M; OPDM study group; Quinlivan R, Hammans S, Tucci A, Bahlo M, McLean CA, Laing NG, Stojkovic T, Houlden H, Hanna MG, Deveson IW, Lockhart PJ, Lamont PJ, Fahey MC, Bugiardini E, Ravenscroft G. A CCG expansion in ABCD3 causes oculopharyngodistal myopathy in individuals of European ancestry. Nat Commun. 2024 Jul 27;15(1):6327. Cortese A, Simone R, Sullivan R, Vandrovcova J, Tariq H, Yau WY, Humphrey J, Jaunmuktane Z, Sivakumar P, Polke J, Ilyas M, Tribollet E, Tomaselli PJ, Devigili G, Callegari I, Versino M, Salpietro V, Efthymiou S, Kaski D, Wood NW, Andrade NS, Buglo E, Rebelo A, Rossor AM, Bronstein A, Fratta P, Marques WJ, Züchner S, Reilly MM, Houlden H. Biallelic expansion of an intronic repeat in RFC1 is a common cause of late-onset ataxia. Nat Genet. 2019 Apr;51(4):649-658. Danzi MC, Xu IRL, Fazal S, Dolzhenko E, Pellerin D, Weisburd B, Reuter C, Sampson J, Folland C, Wheeler M, O'Donnell-Luria A, Wuchty S, Ravenscroft G, Eberle MA; All of Us Research Program Long Read Working Group; Zuchner S. Detailed tandem repeat allele profiling in 1,027 long-read genomes reveals genome-wide patterns of pathogenicity. bioRxiv [Preprint]. 2025 Jan 20:2025.01.06.631535. doi: 10.1101/2025.01.06.631535. Deng J, Gu M, Miao Y, Yao S, Zhu M, Fang P, Yu X, Li P, Su Y, Huang J, Zhang J, Yu J, Li F, Bai J, Sun W, Huang Y, Yuan Y, Hong D, Wang Z. Long-read sequencing identified repeat expansions in the 5'UTR of the NOTCH2NLC gene from Chinese patients with neuronal intranuclear inclusion disease. J Med Genet. 2019 Nov;56(11):758-764. Deng J, Yu J, Li P, Luan X, Cao L, Zhao J, Yu M, Zhang W, Lv H, Xie Z, Meng L, Zheng Y, Zhao Y, Gang Q, Wang Q, Liu J, Zhu M, Guo X, Su Y, Liang Y, Liang F, Hayashi T, Maeda MH, Sato T, Ura S, Oya Y, Ogasawara M, Iida A, Nishino I, Zhou C, Yan C, Yuan Y, Hong D, Wang Z. Expansion of GGC Repeat in GIPC1 Is Associated with Oculopharyngodistal Myopathy. Am J Hum Genet. 2020 Jun 4;106(6):793-804. Deutsch EW, Kok LW, Mudge JM, Ruiz-Orera J, Fierro-Monti I, Sun Z, Abelin JG, Alba MM, Aspden JL, Bazzini AA, Bruford EA, Brunet MA, Calviello L, Carr SA, Carvunis AR, Chothani S, Clauwaert J, Dean K, Faridi P, Frankish A, Hubner N, Ingolia NT, Magrane M, Martin MJ, Martinez TF, Menschaert G, Ohler U, Orchard S, Rackham O, Roucou X, Slavoff SA, Valen E, Wacholder A, Weissman JS, Wu W, Xie Z, Choudhary J, Bassani-Sternberg M, Vizcaíno JA, Ternette N, Moritz RL, Prensner JR, van Heesch S. High-quality peptide evidence for annotating non-canonical open reading frames as human proteins. bioRxiv [Preprint]. 2024 Sep 9:2024.09.09.612016. Depienne C, Mandel JL. 30 years of repeat expansion disorders: What have we learned and what are the remaining challenges? Am J Hum Genet. 2021 May 6;108(5):764-785. Dong X, Zhang K, Xun C, Chu T, Liang S, Zeng Y, Liu Z. Small Open Reading Frame-Encoded Micro-Peptides: An Emerging Protein World. Int J Mol Sci. 2023 Jun 23;24(13):10562. Duffy EE, Finander B, Choi G, Carter AC, Pritisanac I, Alam A, Luria V, Karger A, Phu W, Sherman MA, Assad EG, Pajarillo N, Khitun A, Crouch EE, Ganesh S, Chen J, Berger B, Sestan N, O'Donnell-Luria A, Huang EJ, Griffith EC, Forman-Kay JD, Moses AM, Kalish BT, Greenberg ME. Developmental dynamics of RNA translation in the human brain. Nat Neurosci. 2022 Oct;25(10):1353-1365. Durmus H, Laval SH, Deymeer F, Parman Y, Kiyan E, Gokyigiti M, Ertekin C, Ercan I, Solakoglu S, Karcagi V, Straub V, Bushby K, Lochmüller H, Serdaroglu-Oflazer P. Oculopharyngodistal myopathy is a distinct entity: clinical and genetic features of 47 patients. Neurology. 2011 Jan 18;76(3):227-35. doi: 10.1212/WNL.0b013e318207b043. Erwin GS, Gürsoy G, Al-Abri R, Suriyaprakash A, Dolzhenko E, Zhu K, Hoerner CR, White SM, Ramirez L, Vadlakonda A, Vadlakonda A, von Kraut K, Park J, Brannon CM, Sumano DA, Kirtikar RA, Erwin AA, Metzner TJ, Yuen RKC, Fan AC, Leppert JT, Eberle MA, Gerstein M, Snyder MP. Recurrent repeat expansions in human cancer genomes. Nature. 2023 Jan;613(7942):96-102. doi: 10.1038/s41586-022-05515-1. Fan Y, Shen S, Yang J, Yao D, Li M, Mao C, Wang Y, Hao X, Ma D, Li J, Shi J, Guo M, Li S, Yuan Y, Liu F, Yang Z, Zhang S, Hu Z, Fan L, Liu H, Zhang C, Wang Y, Wang Q, Zheng H, He Y, Song B, Xu Y, Shi C. GIPC1 CGG Repeat Expansion Is Associated with Movement Disorders. Ann Neurol. 2022 May;91(5):704-715. Fan Y, Xu Y, Shi C. NOTCH2NLC-related disorders: the widening spectrum and genotype-phenotype correlation. J Med Genet. 2022 Jan;59(1):1-9. doi: 10.1136/jmedgenet-2021-107883. Fields AP, Rodriguez EH, Jovanovic M, Stern-Ginossar N, Haas BJ, Mertins P, Raychowdhury R, Hacohen N, Carr SA, Ingolia NT, Regev A, Weissman JS. A Regression-Based Analysis of Ribosome-Profiling Data Reveals a Conserved Complexity to Mammalian Translation. Mol Cell. 2015 Dec 3;60(5):816-827. doi: 10.1016/j.molcel.2015.11.013. Figueroa KP, Gross C, Buena-Atienza E, Paul S, Gandelman M, Kakar N, Sturm M, Casadei N, Admard J, Park J, Zühlke C, Hellenbroich Y, Pozojevic J, Balachandran S, Händler K, Zittel S, Timmann D, Erdlenbruch F, Herrmann L, Feindt T, Zenker M, Klopstock T, Dufke C, Scoles DR, Koeppen A, Spielmann M, Riess O, Ossowski S, Haack TB, Pulst SM. A GGC-repeat expansion in ZFHX3 encoding polyglycine causes spinocerebellar ataxia type 4 and impairs autophagy. Nat Genet. 2024 Jun;56(6):1080-1089. doi: 10.1038/s41588-024-01719-5. Florian RT, Kraft F, Leitão E, Kaya S, Klebe S, Magnin E, van Rootselaar AF, Buratti J, Kühnel T, Schröder C, Giesselmann S, Tschernoster N, Altmueller J, Lamiral A, Keren B, Nava C, Bouteiller D, Forlani S, Jornea L, Kubica R, Ye T, Plassard D, Jost B, Meyer V, Deleuze JF, Delpu Y, Avarello MDM, Vijfhuizen LS, Rudolf G, Hirsch E, Kroes T, Reif PS, Rosenow F, Ganos C, Vidailhet M, Thivard L, Mathieu A, Bourgeron T, Kurth I, Rafehi H, Steenpass L, Horsthemke B; FAME consortium; LeGuern E, Klein KM, Labauge P, Bennett MF, Bahlo M, Gecz J, Corbett MA, Tijssen MAJ, van den Maagdenberg AMJM, Depienne C. Unstable TTTTA/TTTCA expansions in MARCH6 are associated with Familial Adult Myoclonic Epilepsy type 3. Nat Commun. 2019 Oct 29;10(1):4919. doi: 10.1038/s41467-019-12763-9. Fotsing SF, Margoliash J, Wang C, Saini S, Yanicky R, Shleizer-Burko S, Goren A, Gymrek M. The impact of short tandem repeat variation on gene expression. Nat Genet. 2019 Nov;51(11):1652-1659. doi: 10.1038/s41588-019-0521-9. Gao FB, Richter JD, Cleveland DW. Rethinking Unconventional Translation in Neurodegeneration. Cell. 2017 Nov 16;171(5):994-1000. doi: 10.1016/j.cell.2017.10.042. Green KM, Sheth UJ, Flores BN, Wright SE, Sutter AB, Kearse MG, Barmada SJ, Ivanova MI, Todd PK. High-throughput screening yields several small-molecule inhibitors of repeat-associated non-AUG translation. J Biol Chem. 2019 Dec 6;294(49):18624-18638. doi: 10.1074/jbc.RA119.009951. Gu X, Yue D, Qiao K, Huang G, Zhu W, Xi J, et al. NOTCH2NLC-related ocu-lopharyngodistal myopathy type 3 with cardiomyopathy and nephropathy. Muscle Nerve. 2023;67(5):E18–21. https://doi.org/10. 1002/mus. 27808 Gu X, Jiao K, Yue D, Wang X, Qiao K, Gao M, Lin J, Sun C, Zhao C, Zhu W, Xi J. Intrafamilial phenotypic heterogeneity in GIPC1-related oculopharyngodistal myopathy type 2: a case report. Neuromuscul Disord. 2023 Sep;33(9):93-97. Gu X, Yu J, Jiao K, Deng J, Xia X, Qiao K, Yue D, Gao M, Zhao C, Dong J, Huang G, Shan J, Yan C, Di L, Da Y, Zhu W, Xi J, Wang Z. Non-coding CGG repeat expansion in LOC642361/NUTM2B-AS1 is associated with a phenotype of oculopharyngodistal myopathy. J Med Genet. 2024 Mar 21;61(4):340-346. Guo S, Nguyen L, Ranum LPW. RAN proteins in neurodegenerative disease: Repeating themes and unifying therapeutic strategies. Curr Opin Neurobiol. 2022 Feb;72:160-170. doi: 10.1016/j.conb.2021.11.001. Gymrek M, Willems T, Guilmatre A, Zeng H, Markus B, Georgiev S, Daly MJ, Price AL, Pritchard JK, Sharp AJ, Erlich Y. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat Genet. 2016 Jan;48(1):22-9. doi: 10.1038/ng.3461. Hobara T, Ando M, Higuchi Y, Yuan JH, Yoshimura A, Kojima F, Noguchi Y, Takei J, Hiramatsu Y, Nozuma S, Nakamura T, Adachi T, Toyooka K, Yamashita T, Sakiyama Y, Hashiguchi A, Matsuura E, Okamoto Y, Takashima H. Linking LRP12 CGG repeat expansion to inherited peripheral neuropathy. J Neurol Neurosurg Psychiatry. 2024 Jul 16:jnnp-2024-333403. Ibañez K, Jadhav B, Zanovello M, Gagliardi D, Clarkson C, Facchini S, Garg P, Martin-Trujillo A, Gies SJ, Galassi Deforie V, Dalmia A, Hensman Moss DJ, Vandrovcova J, Rocca C, Moutsianas L, Marini-Bettolo C, Walker H, Turner C, Shoai M, Long JD, Fratta P, Langbehn DR, Tabrizi SJ, Caulfield MJ, Cortese A, Escott-Price V, Hardy J, Houlden H, Sharp AJ, Tucci A. Increased frequency of repeat expansion mutations across different populations. Nat Med. 2024 Oct 1. doi: 10.1038/s41591-024-03190-5. Ingolia NT, Lareau LF, Weissman JS. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell. 2011 Nov 11;147(4):789-802. doi: 10.1016/j.cell.2011.10.002. Ishiura H, Doi K, Mitsui J, Yoshimura J, Matsukawa MK, Fujiyama A, Toyoshima Y, Kakita A, Takahashi H, Suzuki Y, Sugano S, Qu W, Ichikawa K, Yurino H, Higasa K, Shibata S, Mitsue A, Tanaka M, Ichikawa Y, Takahashi Y, Date H, Matsukawa T, Kanda J, Nakamoto FK, Higashihara M, Abe K, Koike R, Sasagawa M, Kuroha Y, Hasegawa N, Kanesawa N, Kondo T, Hitomi T, Tada M, Takano H, Saito Y, Sanpei K, Onodera O, Nishizawa M, Nakamura M, Yasuda T, Sakiyama Y, Otsuka M, Ueki A, Kaida KI, Shimizu J, Hanajima R, Hayashi T, Terao Y, Inomata-Terada S, Hamada M, Shirota Y, Kubota A, Ugawa Y, Koh K, Takiyama Y, Ohsawa-Yoshida N, Ishiura S, Yamasaki R, Tamaoka A, Akiyama H, Otsuki T, Sano A, Ikeda A, Goto J, Morishita S, Tsuji S. Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy. Nat Genet. 2018 Apr;50(4):581-590. doi: 10.1038/s41588-018-0067-2. Ishiura H, Shibata S, Yoshimura J, Suzuki Y, Qu W, Doi K, Almansour MA, Kikuchi JK, Taira M, Mitsui J, Takahashi Y, Ichikawa Y, Mano T, Iwata A, Harigaya Y, Matsukawa MK, Matsukawa T, Tanaka M, Shirota Y, Ohtomo R, Kowa H, Date H, Mitsue A, Hatsuta H, Morimoto S, Murayama S, Shiio Y, Saito Y, Mitsutake A, Kawai M, Sasaki T, Sugiyama Y, Hamada M, Ohtomo G, Terao Y, Nakazato Y, Takeda A, Sakiyama Y, Umeda-Kameyama Y, Shinmi J, Ogata K, Kohno Y, Lim SY, Tan AH, Shimizu J, Goto J, Nishino I, Toda T, Morishita S, Tsuji S. Noncoding CGG repeat expansions in neuronal intranuclear inclusion disease, oculopharyngodistal myopathy and an overlapping disease. Nat Genet. 2019 Aug;51(8):1222-1232. Ishiura H, Tsuji S, Toda T. Recent advances in CGG repeat diseases and a proposal of fragile X-associated tremor/ataxia syndrome, neuronal intranuclear inclusion disease, and oculophryngodistal myopathy (FNOP) spectrum disorder. J Hum Genet. 2023 Mar;68(3):169-174. Jadhav B, Garg P, van Vugt JJFA, Ibanez K, Gagliardi D, Lee W, Shadrina M, Mokveld T, Dolzhenko E, Martin-Trujillo A, Gies SJ, Altman G, Rocca C, Barbosa M, Jain M, Lahiri N, Lachlan K, Houlden H, Paten B; Genomics England Research Consortium; Project MinE ALS Sequencing Consortium; Veldink J, Tucci A, Sharp AJ. A phenome-wide association study of methylated GC-rich repeats identifies a GCC repeat expansion in AFF3 associated with intellectual disability. Nat Genet. 2024 Sep 23. doi: 10.1038/s41588-024-01917-1. Ji Z, Song R, Regev A, Struhl K. Many lncRNAs, 5'UTRs, and pseudogenes are translated and some are likely to express functional proteins. Elife. 2015 Dec 19;4:e08890. Johnstone TG, Bazzini AA, Giraldez AJ. Upstream ORFs are prevalent translational repressors in vertebrates. EMBO J. 2016 Apr 1;35(7):706-23. doi: 10.15252/embj.201592759. Epub 2016 Feb 19. Katoh M. Functional proteomics, human genetics and cancer biology of GIPC family members. Exp Mol Med. 2013 Jun 7;45(6):e26. doi: 10.1038/emm.2013.49. Kearse MG, Wilusz JE. Non-AUG translation: a new start for protein synthesis in eukaryotes. Genes Dev. 2017 Sep 1;31(17):1717-1731. doi: 10.1101/gad.305250.117. Kong HE, Lim J, Linsalata A, Kang Y, Malik I, Allen EG, Cao Y, Shubeck L, Johnston R, Huang Y, Gu Y, Guo X, Zwick ME, Qin Z, Wingo TS, Juncos J, Nelson DL, Epstein MP, Cutler DJ, Todd PK, Sherman SL, Warren ST, Jin P. Identification of PSMB5 as a genetic modifier of fragile X-associated tremor/ataxia syndrome. Proc Natl Acad Sci U S A. 2022 May 31;119(22):e2118124119. doi: 10.1073/pnas.2118124119. Kume K, Kurashige T, Muguruma K, Morino H, Tada Y, Kikumoto M, Miyamoto T, Akutsu SN, Matsuda Y, Matsuura S, Nakamori M, Nishiyama A, Izumi R, Niihori T, Ogasawara M, Eura N, Kato T, Yokomura M, Nakayama Y, Ito H, Nakamura M, Saito K, Riku Y, Iwasaki Y, Maruyama H, Aoki Y, Nishino I, Izumi Y, Aoki M, Kawakami H. CGG repeat expansion in LRP12 in amyotrophic lateral sclerosis. Am J Hum Genet. 2023 Jul 6;110(7):1086-1097. Kumutpongpanich, T., Ogasawara, M., Ozaki, A., Ishiura, H., Tsuji, S., Minami, N., et al.. Clinicopathologic Features of Oculopharyngodistal Myopathy with LRP12 CGG Repeat Expansions Compared with Other Oculopharyngodistal Myopathy Subtypes. JAMA Neurol. (2021) 78 (7), 853–863. Kozak M. Context effects and inefficient initiation at non-AUG codons in eucaryotic cell-free translation systems. Mol Cell Biol. 1989 Nov;9(11):5073-80. doi: 10.1128/mcb.9.11.5073-5080.1989. Lander ES et al., Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921. doi: 10.1038/35057062. Lee S, Liu B, Lee S, Huang SX, Shen B, Qian SB. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proc Natl Acad Sci U S A. 2012 Sep 11;109(37):E2424-32. doi: 10.1073/pnas.1207846109. Li Z, Liu L, Feng C, Qin Y, Xiao J, Zhang Z, Ma L. LncBook 2.0: integrating human long non-coding RNAs with multi-omics annotations. Nucleic Acids Res. 2023 Jan 6;51(D1):D186-D191. Lieberman AP, Shakkottai VG, Albin RL. Polyglutamine Repeats in Neurodegenerative Diseases. Annu Rev Pathol. 2019 Jan 24;14:1-27. doi: 10.1146/annurev-pathmechdis-012418-012857. Licata NV, Cristofani R, Salomonsson S, Wilson KM, Kempthorne L, Vaizoglu D, D'Agostino VG, Pollini D, Loffredo R, Pancher M, Adami V, Bellosta P, Ratti A, Viero G, Quattrone A, Isaacs AM, Poletti A, Provenzani A. C9orf72 ALS/FTD dipeptide repeat protein levels are reduced by small molecules that inhibit PKA or enhance protein degradation. EMBO J. 2022 Jan 4;41(1):e105026. doi: 10.15252/embj.2020105026 Liu Q, Zhang K, Kang Y, Li Y, Deng P, Li Y, Tian Y, Sun Q, Tang Y, Xu K, Zhou Y, Wang JL, Guo J, Li JD, Xia K, Meng Q, Allen EG, Wen Z, Li Z, Jiang H, Shen L, Duan R, Yao B, Tang B, Jin P, Pan Y. Expression of expanded GGC repeats within NOTCH2NLC causes behavioral deficits and neurodegeneration in a mouse model of neuronal intranuclear inclusion disease. Sci Adv. 2022 Nov 25;8(47):eadd6391. doi: 10.1126/sciadv.add6391. Liufu T, Zheng Y, Yu J, Yuan Y, Wang Z, Deng J, Hong D. The polyG diseases: a new disease entity. Acta Neuropathol Commun. 2022 May 31;10(1):79. Lorusso Marina, Pepe Antonietta, Ibris Neluta, Bochicchio Brigida. Molecular and supramolecular studies on polyglycine and poly-l-proline. Soft Matter 2011, 7 (13) , 6327. Lu H, Luan X, Yuan Y, Dong M, Sun W, Yan C. The clinical and myopathological features of oculopharyngodistal myopathy in a Chinese family. Neuropathology. 2008 Dec;28(6):599-603. doi: 10.1111/j.1440-1789.2008.00924.x. Malik I, Kelley CP, Wang ET, Todd PK. Molecular mechanisms underlying nucleotide repeat expansion disorders. Nat Rev Mol Cell Biol. 2021 Sep;22(9):589-607. Malik I, Tseng YJ, Wright SE, Zheng K, Ramaiyer P, Green KM, Todd PK. SRSF protein kinase 1 modulates RAN translation and suppresses CGG repeat toxicity. EMBO Mol Med. 2021 Nov 8;13(11):e14163. doi: 10.15252/emmm.202114163. Matsui T, Ohbayashi N, Fukuda M. The Rab interacting lysosomal protein (RILP) homology domain functions as a novel effector domain for small GTPase Rab36: Rab36 regulates retrograde melanosome transport in melanocytes. J Biol Chem. 2012 Aug 17;287(34):28619-31. Menzies FM, Fleming A, Caricasole A, Bento CF, Andrews SP, Ashkenazi A, Füllgrabe J, Jackson A, Jimenez Sanchez M, Karabiyik C, Licitra F, Lopez Ramirez A, Pavel M, Puri C, Renna M, Ricketts T, Schlotawa L, Vicinanza M, Won H, Zhu Y, Skidmore J, Rubinsztein DC. Autophagy and Neurodegeneration: Pathogenic Mechanisms and Therapeutic Opportunities. Neuron. 2017 Mar 8;93(5):1015-1034. doi: 10.1016/j.neuron.2017.01.022 Messaed C, Rouleau GA. Molecular mechanisms underlying polyalanine diseases. Neurobiol Dis. 2009 Jun;34(3):397-405. doi: 10.1016/j.nbd.2009.02.013. Mori K, Gotoh S, Yamashita T, Uozumi R, Kawabe Y, Tagami S, Kamp F, Nuscher B, Edbauer D, Haass C, Nagai Y, Ikeda M. The porphyrin TMPyP4 inhibits elongation during the noncanonical translation of the FTLD/ALS-associated GGGGCC repeat in the C9orf72 gene. J Biol Chem. 2021 Oct;297(4):101120. doi: 10.1016/j.jbc.2021.101120. Mudge JM, Ruiz-Orera J, Prensner JR, Brunet MA, Calvet F, Jungreis I, Gonzalez JM, Magrane M, Martinez TF, Schulz JF, Yang YT, Albà MM, Aspden JL, Baranov PV, Bazzini AA, Bruford E, Martin MJ, Calviello L, Carvunis AR, Chen J, Couso JP, Deutsch EW, Flicek P, Frankish A, Gerstein M, Hubner N, Ingolia NT, Kellis M, Menschaert G, Moritz RL, Ohler U, Roucou X, Saghatelian A, Weissman JS, van Heesch S. Standardized annotation of translated open reading frames. Nat Biotechnol. 2022 Jul;40(7):994-999. Murayama A, Nagaoka U, Sugaya K, Shimazaki R, Miyamoto K, Matsubara S, Ogasawara M, Iida A, Nishino I, Takahashi K. Sequential development of parkinsonism in two patients with oculopharyngodistal type myopathy in GIPC1-related repeat expansion disorder. Neuromuscul Disord. 2024 Nov;44:104465. doi: 10.1016/j.nmd.2024.104465. Nassar LR, Barber GP, Benet-Pagès A, Casper J, Clawson H, Diekhans M, Fischer C, Gonzalez JN, Hinrichs AS, Lee BT, Lee CM, Muthuraman P, Nguy B, Pereira T, Nejad P, Perez G, Raney BJ, Schmelter D, Speir ML, Wick BD, Zweig AS, Haussler D, Kuhn RM, Haeussler M, Kent WJ. The UCSC Genome Browser database: 2023 update. Nucleic Acids Res. 2023 Jan 6;51(D1):D1188-D1195. Nurk et al., The complete sequence of a human genome.Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Ofer N, Weisman-Shomer P, Shklover J, Fry M. The quadruplex r(CGG)n destabilizing cationic porphyrin TMPyP4 cooperates with hnRNPs to increase the translation efficiency of fragile X premutation mRNA. Nucleic Acids Res. 2009 May;37(8):2712-22. doi: 10.1093/nar/gkp130. Ogasawara M, Iida A, Kumutpongpanich T, Ozaki A, Oya Y, Konishi H, Nakamura A, Abe R, Takai H, Hanajima R, Doi H, Tanaka F, Nakamura H, Nonaka I, Wang Z, Hayashi S, Noguchi S, Nishino I. CGG expansion in NOTCH2NLC is associated with oculopharyngodistal myopathy with neurological manifestations. Acta Neuropathol Commun. 2020 Nov 25;8(1):204. Ogasawara M, Eura N, Nagaoka U, Sato T, Arahata H, Hayashi T, Okamoto T, Takahashi Y, Mori-Yoshimura M, Oya Y, Nakamura A, Shimazaki R, Sano T, Kumutpongpanich T, Minami N, Hayashi S, Noguchi S, Iida A, Takao M, Nishino I. Intranuclear inclusions in skin biopsies are not limited to neuronal intranuclear inclusion disease but can also be seen in oculopharyngodistal myopathy. Neuropathol Appl Neurobiol. 2022 Apr;48(3):e12787. doi: 10.1111/nan.12787. Ogasawara M, Eura N, Iida A, Kumutpongpanich T, Minami N, Nonaka I, Hayashi S, Noguchi S, Nishino I. Intranuclear inclusions in muscle biopsy can differentiate oculopharyngodistal myopathy and oculopharyngeal muscular dystrophy. Acta Neuropathol Commun. 2022 Dec 7;10(1):176. doi: 10.1186/s40478-022-01482-w Oyer CE, Cortez S, O’Shea P, Popovic M. Cardiomyopathy and myocyte intranuclear inclusions in neuronal intranuclear inclusion disease: a case report. Hum Pathol. 1991;22(7):722–4. doi. org/10. 1016/0046-8177(91) 90296-2. Palmer JE, Wilson N, Son SM, Obrocki P, Wrobel L, Rob M, Takla M, Korolchuk VI, Rubinsztein DC. Autophagy, aging, and age-related neurodegeneration. Neuron. 2025 Jan 8;113(1):29-48. doi: 10.1016/j.neuron.2024.09.015. Pan Y, Xue J, Chen J, Zhang X, Tu T, Xiao Q, Huang W, Liu Q, Zhu L, Li J, Zhou X, Xu Q, Sun Q, Tan J, Yan X, Li J, Guo J, Tang B, Duan R, Liu Z. Assessment of GGC Repeat Expansion in GIPC1 in Patients with Parkinson's Disease. Mov Disord. 2022 Jul;37(7):1557-1559. Pan Y, Jiang Y, Wan J, Hu Z, Jiang H, Shen L, Tang B, Tian Y, Liu Q. Expression of expanded GGC repeats within NOTCH2NLC causes cardiac dysfunction in mouse models. Cell Biosci. 2023 Aug 29;13(1):157. doi: 10.1186/s13578-023-01111-6. Paulson HL, Shakkottai VG, Clark HB, Orr HT. Polyglutamine spinocerebellar ataxias - from genes to potential treatments. Nat Rev Neurosci. 2017 Oct;18(10):613-626. doi: 10.1038/nrn.2017.92. Peabody DS. Translation initiation at non-AUG triplets in mammalian cells. J Biol Chem. 1989 Mar 25;264(9):5031-5. Pellerin D, Danzi MC, Wilke C, Renaud M, Fazal S, Dicaire MJ, Scriba CK, Ashton C, Yanick C, Beijer D, Rebelo A, Rocca C, Jaunmuktane Z, Sonnen JA, Larivière R, Genís D, Molina Porcel L, Choquet K, Sakalla R, Provost S, Robertson R, Allard-Chamard X, Tétreault M, Reiling SJ, Nagy S, Nishadham V, Purushottam M, Vengalil S, Bardhan M, Nalini A, Chen Z, Mathieu J, Massie R, Chalk CH, Lafontaine AL, Evoy F, Rioux MF, Ragoussis J, Boycott KM, Dubé MP, Duquette A, Houlden H, Ravenscroft G, Laing NG, Lamont PJ, Saporta MA, Schüle R, Schöls L, La Piana R, Synofzik M, Zuchner S, Brais B. Deep Intronic FGF14 GAA Repeat Expansion in Late-Onset Cerebellar Ataxia. N Engl J Med. 2023 Jan 12;388(2):128-141. doi: 10.1056/NEJMoa2207406. Plumley JA, Tsai MI, Dannenberg JJ. Aggregation of capped hexaglycine strands into hydrogen-bonding motifs representative of pleated and rippled β-sheets, collagen, and polyglycine I and II crystal structures. A density functional theory study. J Phys Chem B. 2011 Feb 17;115(6):1562-70. Pongpakdee S, Apiwattanakul M, Termglinchan T, Witoonpanich R, Dejthevaporn C, Lee T, Wansophonkul S, Yamanaka A, Funaguma S, Lida A, Nishino I. CGG/CCG Repeat Expansions in LOC642361/NUTM2B-AS1 in Thai Patients With Oculopharyngodistal Myopathy. Neurol Genet. 2024 Jul 8;10(4):e200170. doi: 10.1212/NXG.0000000000200170. Rafehi H, Szmulewicz DJ, Bennett MF, Sobreira NLM, Pope K, Smith KR, Gillies G, Diakumis P, Dolzhenko E, Eberle MA, Barcina MG, Breen DP, Chancellor AM, Cremer PD, Delatycki MB, Fogel BL, Hackett A, Halmagyi GM, Kapetanovic S, Lang A, Mossman S, Mu W, Patrikios P, Perlman SL, Rosemergy I, Storey E, Watson SRD, Wilson MA, Zee DS, Valle D, Amor DJ, Bahlo M, Lockhart PJ. Bioinformatics-Based Identification of Expanded Repeats: A Non-reference Intronic Pentamer Expansion in RFC1 Causes CANVAS. Am J Hum Genet. 2019 Jul 3;105(1):151-165. Raj A, Wang SH, Shim H, Harpak A, Li YI, Engelmann B, Stephens M, Gilad Y, Pritchard JK. Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling. Elife. 2016 May 27;5:e13328. Saito R, Shimizu H, Miura T, Hara N, Mezaki N, Higuchi Y, Miyashita A, Kawachi I, Sanpei K, Honma Y, Onodera O, Ikeuchi T, Kakita A. Oculopharyngodistal myopathy with coexisting histology of systemic neuronal intranuclear inclusion disease: Clinicopathologic features of an autopsied patient harboring CGG repeat expansions in LRP12. Acta Neuropathol Commun. 2020 Jun 3;8(1):75. Satoyoshi E, Kinoshita M. Oculopharyngodistal myopathy. Arch Neurol. 1977 Feb;34(2):89-92. doi: 10.1001/archneur.1977.00500140043007. Sellier C, Buijsen RAM, He F, Natla S, Jung L, Tropel P, Gaucherot A, Jacobs H, Meziane H, Vincent A, Champy MF, Sorg T, Pavlovic G, Wattenhofer-Donze M, Birling MC, Oulad-Abdelghani M, Eberling P, Ruffenach F, Joint M, Anheim M, Martinez-Cerdeno V, Tassone F, Willemsen R, Hukema RK, Viville S, Martinat C, Todd PK, Charlet-Berguerand N. Translation of Expanded CGG Repeats into FMRpolyG Is Pathogenic and May Contribute to Fragile X Tremor Ataxia Syndrome. Neuron. 2017 Jan 18;93(2):331-347. Shi Y, Cao C, Zeng Y, Ding Y, Chen L, Zheng F, Chen X, Zhou F, Yang X, Li J, Xu L, Xu G, Lin M, Ishiura H, Tsuji S, Wang N, Wang Z, Chen WJ, Yang K. CGG repeat expansion in LOC642361/NUTM2B-AS1 typically presents as oculopharyngodistal myopathy. J Genet Genomics. 2024 Feb;51(2):184-196. Shi Y, Niu Y, Zhang P, Luo H, Liu S, Zhang S, Wang J, Li Y, Liu X, Song T, Xu T, He S. Characterization of genome-wide STR variation in 6487 human genomes. Nat Commun. 2023 Apr 12;14(1):2092. doi: 10.1038/s41467-023-37690-8. Sone J, Mori K, Inagaki T, Katsumata R, Takagi S, Yokoi S, Araki K, Kato T, Nakamura T, Koike H, Takashima H, Hashiguchi A, Kohno Y, Kurashige T, Kuriyama M, Takiyama Y, Tsuchiya M, Kitagawa N, Kawamoto M, Yoshimura H, Suto Y, Nakayasu H, Uehara N, Sugiyama H, Takahashi M, Kokubun N, Konno T, Katsuno M, Tanaka F, Iwasaki Y, Yoshida M, Sobue G. Clinicopathological features of adult-onset neuronal intranuclear inclusion disease. Brain. 2016 Dec;139(Pt 12):3170-3186. Sone J, Mitsuhashi S, Fujita A, Mizuguchi T, Hamanaka K, Mori K, Koike H, Hashiguchi A, Takashima H, Sugiyama H, Kohno Y, Takiyama Y, Maeda K, Doi H, Koyano S, Takeuchi H, Kawamoto M, Kohara N, Ando T, Ieda T, Kita Y, Kokubun N, Tsuboi Y, Katoh K, Kino Y, Katsuno M, Iwasaki Y, Yoshida M, Tanaka F, Suzuki IK, Frith MC, Matsumoto N, Sobue G. Long-read sequencing identifies GGC repeat expansions in NOTCH2NLC associated with neuronal intranuclear inclusion disease. Nat Genet. 2019 Aug;51(8):1215-1221. Steger M, Diez F, Dhekne HS, Lis P, Nirujogi RS, Karayel O, Tonelli F, Martinez TN, Lorentzen E, Pfeffer SR, Alessi DR, Mann M. Systematic proteomic analysis of LRRK2-mediated Rab GTPase phosphorylation establishes a connection to ciliogenesis. Elife. 2017 Nov 10;6:e31012. Stringer C, Wang T, Michaelos M, Pachitariu M. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods. 2021 Jan;18(1):100-106. doi: 10.1038/s41592-020-01018-x. Stoyas CA, La Spada AR. The CAG-polyglutamine repeat diseases: a clinical, molecular, genetic, and pathophysiologic nosology. Handb Clin Neurol. 2018;147:143-170. doi: 10.1016/B978-0-444-63233-3.00011-7. Tabebordbar M, Lagerborg KA, Stanton A, King EM, Ye S, Tellez L, Krunnfusz A, Tavakoli S, Widrick JJ, Messemer KA, Troiano EC, Moghadaszadeh B, Peacker BL, Leacock KA, Horwitz N, Beggs AH, Wagers AJ, Sabeti PC. Directed evolution of a family of AAV capsid variants enabling potent muscle-directed gene delivery across species. Cell. 2021 Sep 16;184(19):4919-4938.e22. doi: 10.1016/j.cell.2021.08.028. Takahashi-Fujigasaki J, Nakano Y, Uchino A, Murayama S. Adult-onset neuronal intranuclear hyaline inclusion disease is not rare in older adults. Geriatr Gerontol Int. 2016 Mar;16 Suppl 1:51-6. Tang H, Xiong Y, Jiang K, Shen Y, Yu Y, Huang P, Zhu M, Li X, Zheng Y, Zhou M, Yu J, Deng J, Wang Z, Hong D, Qiu Y, Tan D. Clinical and pathological characteristics of OPDM4 patients in advanced disease. Muscle Nerve. 2024 Jul 23. doi: 10.1002/mus.28200. Online ahead of print. Tian Y, Wang JL, Huang W, Zeng S, Jiao B, Liu Z, Chen Z, Li Y, Wang Y, Min HX, Wang XJ, You Y, Zhang RX, Chen XY, Yi F, Zhou YF, Long HY, Zhou CJ, Hou X, Wang JP, Xie B, Liang F, Yang ZY, Sun QY, Allen EG, Shafik AM, Kong HE, Guo JF, Yan XX, Hu ZM, Xia K, Jiang H, Xu HW, Duan RH, Jin P, Tang BS, Shen L. Expansion of Human-Specific GGC Repeat in Neuronal Intranuclear Inclusion Disease-Related Disorders. Am J Hum Genet. 2019 Jul 3;105(1):166-176. Todd PK, Oh SY, Krans A, He F, Sellier C, Frazer M, Renoux AJ, Chen KC, Scaglione KM, Basrur V, Elenitoba-Johnson K, Vonsattel JP, Louis ED, Sutton MA, Taylor JP, Mills RE, Charlet-Berguerand N, Paulson HL. CGG repeat-associated translation mediates neurodegeneration in fragile X tremor ataxia syndrome. Neuron. 2013 May 8;78(3):440-55. Uyama E, Uchino M, Chateau D, Tomé FM. Autosomal recessive oculopharyngodistal myopathy in light of distal myopathy with rimmed vacuoles and oculopharyngeal muscular dystrophy. Neuromuscul Disord. 1998 Apr;8(2):119-25. doi: 10.1016/s0960-8966(98)00002-9 van der Sluijs BM, ter Laak HJ, Scheffer H, van der Maarel SM, van Engelen BG. Autosomal recessive oculopharyngodistal myopathy: a distinct phenotypical, histological, and genetic entity. J Neurol Neurosurg Psychiatry. 2004 Oct;75(10):1499-501. doi: 10.1136/jnnp.2003.025072. Vegezzi E, Ishiura H, Bragg DC, Pellerin D, Magrinelli F, Currò R, Facchini S, Tucci A, Hardy J, Sharma N, Danzi MC, Zuchner S, Brais B, Reilly MM, Tsuji S, Houlden H, Cortese A. Neurological disorders caused by novel non-coding repeat expansions: clinical features and differential diagnosis. Lancet Neurol. 2024 Jul;23(7):725-739. doi: 10.1016/S1474-4422(24)00167-4. Venter et al., The sequence of the human genome.Science. 2001 Feb 16;291(5507):1304-51. doi: 10.1126/science.1058040. Verbiest M, Maksimov M, Jin Y, Anisimova M, Gymrek M, Bilgin Sonay T. Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species. J Evol Biol. 2023 Feb;36(2):321-336. doi: 10.1111/jeb.14106. Wallenius J, Kafantari E, Jhaveri E, Gorcenco S, Ameur A, Karremo C, Dobloug S, Karrman K, de Koning T, Ilinca A, Landqvist Waldö M, Arvidsson A, Persson S, Englund E, Ehrencrona H, Puschmann A. Exonic trinucleotide repeat expansions in ZFHX3 cause spinocerebellar ataxia type 4: A poly-glycine disease. Am J Hum Genet. 2023 Nov 28:S0002-9297(23)00403-2. Wright BW, Yi Z, Weissman JS, Chen J. The dark proteome: translation from noncanonical open reading frames. Trends Cell Biol. 2022 Mar;32(3):243-258. Wright SE, Todd PK. Native functions of short tandem repeats. Elife. 2023 Mar 20;12:e84043. doi: 10.7554/eLife.84043. Xi J, Wang X, Yue D, Dou T, Wu Q, Lu J, Liu Y, Yu W, Qiao K, Lin J, Luo S, Li J, Du A, Dong J, Chen Y, Luo L, Yang J, Niu Z, Liang Z, Zhao C, Lu J, Zhu W, Zhou Y. 5' UTR CGG repeat expansion in GIPC1 is associated with oculopharyngodistal myopathy. Brain. 2021 Mar 3;144(2):601-614. Yang X, Zhang D, Shen S, Li P, Li M, Niu J, Ma D, Xu D, Li S, Guo X, Wang Z, Zhao Y, Ren H, Ling C, Wang Y, Fan Y, Shen J, Zhu Y, Wang D, Cui L, Chen L, Shi C, Dai Y. A large pedigree study confirmed the CGG repeat expansion of RILPL1 Is associated with oculopharyngodistal myopathy. BMC Med Genomics. 2023 Oct 20;16(1):253. doi: 10.1186/s12920-023-01586-9. Yeetong P, Pongpanich M, Srichomthong C, Assawapitaksakul A, Shotelersuk V, Tantirukdham N, Chunharas C, Suphapeetiporn K, Shotelersuk V. TTTCA repeat insertions in an intron of YEATS2 in benign adult familial myoclonic epilepsy type 4. Brain. 2019 Nov 1;142(11):3360-3366. doi: 10.1093/brain/awz267. Yu J, Deng J, Guo X, Shan J, Luan X, Cao L, Zhao J, Yu M, Zhang W, Lv H, Xie Z, Meng L, Zheng Y, Zhao Y, Gang Q, Wang Q, Liu J, Zhu M, Zhou B, Li P, Liu Y, Wang Y, Yan C, Hong D, Yuan Y, Wang Z. The GGC repeat expansion in NOTCH2NLC is associated with oculopharyngodistal myopathy type 3. Brain. 2021 Mar 9:awab077. doi: 10.1093/brain/awab077. Yu J, Liufu T, Zheng Y, Xu J, Meng L, Zhang W, Yuan Y, Hong D, Charlet-Berguerand N, Wang Z, Deng J. CGG repeat expansion in NOTCH2NLC causes mitochondrial dysfunction and progressive neurodegeneration in Drosophila model. Proc Natl Acad Sci U S A. 2022 Oct 11;119(41):e2208649119. doi: 10.1073/pnas.2208649119. Yu J, Shan J, Yu M, Di L, Xie Z, Zhang W, Lv H, Meng L, Zheng Y, Zhao Y, Gang Q, Guo X, Wang Y, Xi J, Zhu W, Da Y, Hong D, Yuan Y, Yan C, Wang Z, Deng J. The CGG repeat expansion in RILPL1 is associated with oculopharyngodistal myopathy type 4. Am J Hum Genet. 2022 Mar 3;109(3):533-541. doi: 10.1016/j.ajhg.2022.01.012. Zamiri B, Reddy K, Macgregor RB Jr, Pearson CE. TMPyP4 porphyrin distorts RNA G-quadruplex structures of the disease-associated r(GGGGCC)n repeat of the C9orf72 gene and blocks interaction of RNA-binding proteins. J Biol Chem. 2014 Feb 21;289(8):4653-9. doi: 10.1074/jbc.C113.502336. Zeng YH, Yang K, Du GQ, Chen YK, Cao CY, Qiu YS, He J, Lv HD, Qu QQ, Chen JN, Xu GR, Chen L, Zheng FZ, Zhao M, Lin MT, Chen WJ, Hu J, Wang ZQ, Wang N. GGC Repeat Expansion of RILPL1 is Associated with Oculopharyngodistal Myopathy. Ann Neurol. 2022 Sep;92(3):512-526. doi: 10.1002/ana.26436. Epub 2022 Jul 2. PMID: 35700120 Zhao J, Liu J, Xiao J, Du J, Que C, Shi X, Liang W, Sun W, Zhang W, Lv H, Yuan Y, Wang Z. Clinical and muscle imaging findings in 14 mainland chinese patients with oculopharyngodistal myopathy. PLoS One. 2015 Jun 3;10(6):e0128629. Zhong S, Lian Y, Luo W, Luo R, Wu X, Ji J, Ji Y, Ding J, Wang X. Upstream open reading frame with NOTCH2NLC GGC expansion generates polyglycine aggregates and disrupts nucleocytoplasmic transport: implications for polyglycine diseases. Acta Neuropathol. 2021 Dec;142(6):1003-1023. doi: 10.1007/s00401-021-02375-3. Ziaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E, Qiu Y, Kakembo FE, Joseph H, Onyido B, Adeyemi J, Bakhtiari M, Park J, Javadzadeh S, Jjingo D, Adebiyi E, Bafna V, Gymrek M. A deep population reference panel of tandem repeat variation. Nat Commun. 2023 Oct 23;14(1):6711. doi: 10.1038/s41467-023-42278-3. Zu, T., Gibbens, B., Doty, N. S., Gomes-Pereira, M., Huguet, A., Stone, M. D., Margolis, J., Peterson, M., Markowski, T. W., Ingram, M. A., et al. (2011). Non-ATG-initiated translation directed by microsatellite expansions. Proc Natl Acad Sci U S A 108(1):260-5. Additional Declarations There is NO Competing Interest. Supplementary Files SupTable1MassSpecInteractants.xlsx Supplemental Table 1. Related to Figure 4. Mass spectrometry analysis of proteins interacting with OPDM/OPML polyG proteins SupTable2snRNAeqtranscriptomicchanges.xlsx Supplemental Table 2. Related to Figure 5. Single nuclei RNA sequencing analysis of OPDM/OPML vs. control mouse skeletal muscle. SupTable3TMPRNAseq.xlsx Supplemental Table 3. Related to Figure 7. RNA sequencing analysis of TMPyP4 treated vs. control OPDM4 LHCN-M2 muscle cells. SupTable4TMPMassSpec.xlsx Supplemental Table 4. Related to Figure 7. Mass spectrometry analysis of TMPyP4 treated vs. control OPDM4 LHCN-M2 muscle cells. SupTable5PathwayTMPRNAseq.xls Supplemental Table 5. Related to Figure 7. Mis-regulated Pathways analysis of TMPyP4 treated vs. control OPDM4 LHCN-M2 cells. SupVideo1.avi Supplemental Videos. Related to Figure 4. Live imaging of U2OS cells expressing OPDM4 asRILpolyG-GFP and Cherry-NUP50. SupVideo2.avi Live cell nuclear accumulation of OPDM4 asRILpolyPG-GFP video 2 DarkGenomeSupFigs.pdf Suplementary figures Cite Share Download PDF Status: Published Journal Publication published 17 Feb, 2026 Read the published version in Nature Genetics → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6122917","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Biological Sciences - Article","associatedPublications":[],"authors":[{"id":422364486,"identity":"a79d14ba-8f7e-4459-8925-9e1a73258daf","order_by":0,"name":"Nicolas Charlet-Berguerand","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAxUlEQVRIie3PsQrCMBCA4YOCWQJdL+hDHARaCsWXKXSq4maHUgJdBVcfp1Jol7pK3UTBF3By0pQubk03wfyQwEE+kgDYbD8Z12sD4Dow60fXkBCgKAYilDGh0pT47HR8phTmsmHnW5cB+mMm2K0j0VKMXsW3MqkBF+UIoY6TUFT1JJ6vFOQ49jBN5EvRG2UxEDQhnr6lRHJYbUjaxAsUReJQcab/ggakaeVFpUvX3TePe5KF4+QrTnqbAgDYddJxm81m+58+iWU1ewtxSzAAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0002-4423-4920","institution":"Institute of Genetics and Molecular and Cellular Biology","correspondingAuthor":true,"prefix":"","firstName":"Nicolas","middleName":"","lastName":"Charlet-Berguerand","suffix":""},{"id":422364487,"identity":"c621f08d-c17c-4580-b5ec-6e9e45b11724","order_by":1,"name":"Manon Boivin","email":"","orcid":"","institution":"IGBMC","correspondingAuthor":false,"prefix":"","firstName":"Manon","middleName":"","lastName":"Boivin","suffix":""},{"id":422364488,"identity":"3b184636-de42-4936-8ffe-c8d8012a40e6","order_by":2,"name":"Jiaxi Yu","email":"","orcid":"","institution":"Peking University First Hospital","correspondingAuthor":false,"prefix":"","firstName":"Jiaxi","middleName":"","lastName":"Yu","suffix":""},{"id":422364489,"identity":"03e24f07-53bf-42c9-a56a-05d9852c00d1","order_by":3,"name":"Nobuyuki Eura","email":"","orcid":"","institution":"National Centre of Neurology and Psychiatry","correspondingAuthor":false,"prefix":"","firstName":"Nobuyuki","middleName":"","lastName":"Eura","suffix":""},{"id":422364490,"identity":"7f1a9a6a-f0c9-4b14-9306-b57c26771ef7","order_by":4,"name":"Léa Schmitt","email":"","orcid":"","institution":"IGBMC","correspondingAuthor":false,"prefix":"","firstName":"Léa","middleName":"","lastName":"Schmitt","suffix":""},{"id":422364491,"identity":"c99a10f9-e009-4db3-8149-9677313ab7af","order_by":5,"name":"Erwan Grandgirard","email":"","orcid":"","institution":"IGBMC","correspondingAuthor":false,"prefix":"","firstName":"Erwan","middleName":"","lastName":"Grandgirard","suffix":""},{"id":422364492,"identity":"22478090-7690-4f9f-a48a-3ef28800a3d4","order_by":6,"name":"Damien Plassard","email":"","orcid":"","institution":"IGBMC","correspondingAuthor":false,"prefix":"","firstName":"Damien","middleName":"","lastName":"Plassard","suffix":""},{"id":422364493,"identity":"cb24d625-f366-41e3-9603-42bb48155bcc","order_by":7,"name":"Chadia Nahy","email":"","orcid":"","institution":"IGBMC","correspondingAuthor":false,"prefix":"","firstName":"Chadia","middleName":"","lastName":"Nahy","suffix":""},{"id":422364494,"identity":"c0afb35e-feb8-444f-a90c-df96baf0f9f4","order_by":8,"name":"Anne Maglott","email":"","orcid":"","institution":"IGBMC","correspondingAuthor":false,"prefix":"","firstName":"Anne","middleName":"","lastName":"Maglott","suffix":""},{"id":422364495,"identity":"c2e86d56-7e47-497d-a2bd-2c05a54ac8ea","order_by":9,"name":"Bastien Morlet","email":"","orcid":"","institution":"IGBMC","correspondingAuthor":false,"prefix":"","firstName":"Bastien","middleName":"","lastName":"Morlet","suffix":""},{"id":422364496,"identity":"bc04e119-8927-4c28-8bf5-d2c2960d4f10","order_by":10,"name":"Patrice Goetz","email":"","orcid":"","institution":"IGBMC","correspondingAuthor":false,"prefix":"","firstName":"Patrice","middleName":"","lastName":"Goetz","suffix":""},{"id":422364497,"identity":"46213576-8f88-4ab8-abd6-a311211ae1f5","order_by":11,"name":"Chao Gao","email":"","orcid":"","institution":"Peking University First Hospital","correspondingAuthor":false,"prefix":"","firstName":"Chao","middleName":"","lastName":"Gao","suffix":""},{"id":422364498,"identity":"6ea9ac93-06cf-4ddc-9ead-ce4f1f0ec9b6","order_by":12,"name":"Elise Lefebvre","email":"","orcid":"","institution":"IGBMC","correspondingAuthor":false,"prefix":"","firstName":"Elise","middleName":"","lastName":"Lefebvre","suffix":""},{"id":422364499,"identity":"e72cfa61-6fa7-46cc-b036-f20efd6f5d02","order_by":13,"name":"Angelique Pichot","email":"","orcid":"","institution":"IGBMC","correspondingAuthor":false,"prefix":"","firstName":"Angelique","middleName":"","lastName":"Pichot","suffix":""},{"id":422364500,"identity":"f6ff702e-a131-4130-b0c6-aa85584feea7","order_by":14,"name":"Christelle Thibault","email":"","orcid":"","institution":"IGBMC","correspondingAuthor":false,"prefix":"","firstName":"Christelle","middleName":"","lastName":"Thibault","suffix":""},{"id":422364501,"identity":"b568424b-df90-4e03-a3c5-1f475902633a","order_by":15,"name":"Mustapha Oulad","email":"","orcid":"","institution":"IGBMC","correspondingAuthor":false,"prefix":"","firstName":"Mustapha","middleName":"","lastName":"Oulad","suffix":""},{"id":422364502,"identity":"457f17d1-17cf-4615-8f24-7fd51f8b0130","order_by":16,"name":"Ichizo Nishino","email":"","orcid":"https://orcid.org/0000-0001-9452-112X","institution":"National Center of Neurology and Psychiatry","correspondingAuthor":false,"prefix":"","firstName":"Ichizo","middleName":"","lastName":"Nishino","suffix":""},{"id":422364503,"identity":"6daa67ab-e1a0-4f5d-babc-1db2b891cfa1","order_by":17,"name":"Kang Yang","email":"","orcid":"","institution":"Fujian Medical University","correspondingAuthor":false,"prefix":"","firstName":"Kang","middleName":"","lastName":"Yang","suffix":""},{"id":422364504,"identity":"9b7d78b8-d4cc-486b-93c2-c0960eac8632","order_by":18,"name":"Ning Wang","email":"","orcid":"","institution":"Fujian Medical University","correspondingAuthor":false,"prefix":"","firstName":"Ning","middleName":"","lastName":"Wang","suffix":""},{"id":422364505,"identity":"be301bef-e19c-4341-8282-f285c193947b","order_by":19,"name":"Zhaoxia Wang","email":"","orcid":"","institution":"Institute of Biophysics, Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Zhaoxia","middleName":"","lastName":"Wang","suffix":""},{"id":422364506,"identity":"bbbdacc0-a02d-402b-b059-3eded3026d42","order_by":20,"name":"Jianwen Deng","email":"","orcid":"","institution":"Peking University First Hospital","correspondingAuthor":false,"prefix":"","firstName":"Jianwen","middleName":"","lastName":"Deng","suffix":""}],"badges":[],"createdAt":"2025-02-27 16:57:18","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6122917/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6122917/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41588-026-02507-z","type":"published","date":"2026-02-17T05:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":80825410,"identity":"5e529636-b908-4ea6-9b58-b9e539cb17b1","added_by":"auto","created_at":"2025-04-17 13:07:47","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":628165,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOPDM and OPML GGC repeats are translated into novel polyglycine proteins.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Scheme of the OPDM/OPML GGC repeat expansions located within the \u003cem\u003eLRP12\u003c/em\u003e, \u003cem\u003eGIPC1, NOTCH2NLC, asRILPL1 \u003c/em\u003eand\u003cem\u003e LOC642361 \u003c/em\u003enon-coding sequences. (\u003cstrong\u003eB\u003c/strong\u003e)\u003cstrong\u003e \u003c/strong\u003eScheme of the construct expressing OPDM/OPML GGC repeats embedded in their host sequences and fused to the GFP in their 3 potential encoded frames. This plasmid also contains an independent Cherry expression cassette. (\u003cstrong\u003eC-D\u003c/strong\u003e) GFP and Cherry direct fluorescence and FACS analysis of HEK293 cells transfected for 24 hours with a plasmid expressing GGC repeats embedded in the OPML \u003cem\u003eLOC642361 \u003c/em\u003esequence\u003cem\u003e \u003c/em\u003efused to the GFP in the glycine, alanine or arginine frames, while the Cherry is expressed independently.\u003cem\u003e \u003c/em\u003e(\u003cstrong\u003eE\u003c/strong\u003e) Quantification of the FACS analysis shown in (D). (\u003cstrong\u003eF\u003c/strong\u003e) RT-qPCR analysis of GFP RNA expression of HEK293 cells transfected as in (C). (\u003cstrong\u003eG\u003c/strong\u003e) Immunoblot against the GFP, Cherry or GAPDH of proteins extracted from HEK293 cells transfected for 24 hours with a plasmid expressing GGC repeats embedded in either the OPDM2 \u003cem\u003eGIPC1 \u003c/em\u003e5’UTR, OPDM4\u003cem\u003e RILPL1 \u003c/em\u003eantisense transcript or OPML \u003cem\u003eLOC642361 \u003c/em\u003elncRNA fused to the GFP in the glycine, alanine or arginine frames, while the Cherry is expressed independently. See also Figure S1.\u003c/p\u003e","description":"","filename":"DarkGenomeFig1.png","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/7700d0d074bc41edaa132586.png"},{"id":80825016,"identity":"9e38c9da-2114-4b1a-bf92-69d4611a76f6","added_by":"auto","created_at":"2025-04-17 12:59:47","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":273719,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOPDM and OPML GGC repeats are translated through initiation at start codons.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Scheme of the GFP-immunoprecipitation and mass spectrometry analysis of OPDM/OPML proteins expressed from GGC repeats located within their host sequences and fused to the GFP in the glycine frame. (\u003cstrong\u003eB-D\u003c/strong\u003e) LC-MS/MS spectra and their corresponding translated N-terminal parts aligned to the nucleotide sequences of either OPDM2 \u003cem\u003eGIPC1 \u003c/em\u003e5’UTR (B), OPDM4\u003cem\u003e RILPL1 \u003c/em\u003eantisense transcript (C) or OPML \u003cem\u003eLOC642361\u003c/em\u003e (D) obtained from mass spectrometry analysis of GFP-immunoprecipitated proteins from 24 hours transfected HEK293 cells. (\u003cstrong\u003eE\u003c/strong\u003e) Immunoblot against the GFP, Cherry or GAPDH of proteins extracted from HEK293 cells transfected for 24 hours with wild type or mutant (∆CTG/ATG start codons) plasmids expressing GGC repeats embedded in either the OPDM2 \u003cem\u003eGIPC1 \u003c/em\u003e5’UTR, OPDM4\u003cem\u003e RILPL1 \u003c/em\u003eantisense transcript or OPML \u003cem\u003eLOC642361 \u003c/em\u003elncRNA fused to the GFP in the glycine frame, while the Cherry is expressed independently. (\u003cstrong\u003eF\u003c/strong\u003e) Schemes of the novel ORFs identified in the OPDM2 \u003cem\u003eGIPC1 \u003c/em\u003e5’UTR, OPDM4\u003cem\u003e RILPL1 \u003c/em\u003eantisense transcript or OPML \u003cem\u003eLOC642361\u003c/em\u003e lncRNA with their corresponding polyglycine-encoded proteins. uORF: upstream ORF, sORF: short ORF. See also Figure S2.\u003c/p\u003e","description":"","filename":"DarkGenomeFig2.png","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/1eb3a211ba6a79dc92ad90ca.png"},{"id":80824162,"identity":"670a7477-fb66-4e01-95af-1db4aa9409da","added_by":"auto","created_at":"2025-04-17 12:51:47","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1016806,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePolyglycine proteins are present in the typical OPDM/OPML p62-positive inclusions.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003eA-D\u003c/strong\u003e) Upper panels, amino acid sequences of the uGIPpolyG, asRILpolyG, LOC6polyG and uN2CpolyG ORFs embedded respectively within the OPDM2 \u003cem\u003eGIPC1 \u003c/em\u003e5’UTR (A), OPDM4\u003cem\u003e RILPL1 \u003c/em\u003eantisense transcript (B), OPML \u003cem\u003eLOC642361\u003c/em\u003e lncRNA (C) and \u003cem\u003eNOTCH2NLC\u003c/em\u003e 5’UTR (D). Peptide sequence against which is directed each specific antibody is indicated in bold with an underlined bracket. Lower panels, immunofluorescence against p62 and either uGIPpolyG, asRILpolyG, LOC6polyG or uN2CpolyG using custom antibodies on skeletal muscle sections of individuals with OPDM/OPML or age-matched control individuals. See also Figure S3.\u003c/p\u003e","description":"","filename":"DarkGenomeFig3.png","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/4dccbc035a0fc070b8581bd2.png"},{"id":80824165,"identity":"99e22180-dbf8-4729-aa3c-21db4d915ab9","added_by":"auto","created_at":"2025-04-17 12:51:47","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1289512,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eExpression of polyglycine proteins forms inclusions and is pathogenic in muscle cells.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003eA\u003c/strong\u003e) GFP fluorescence and immunofluorescence against the desmin and lamin A/C proteins of LHCN-M2 muscle cells differentiated for 4 days and expressing either the GFP or GFP-tagged ATG polyG, OPDM2 uGIPpolyG, OPDM3 uN2CpolyG, OPDM4 asRILpolyG or OPML LOC6polyG.\u003cstrong\u003e \u003c/strong\u003e(\u003cstrong\u003eB-C\u003c/strong\u003e) GFP fluorescence with immunofluorescence against p62 and lamin A/C (B) or electron microscopy (C) of respectively 4 and 3 days-differentiated LHCN-M2 muscle cells expressing the OPDM4 asRILpolyG-GFP tagged protein. (\u003cstrong\u003eD\u003c/strong\u003e) Quantification of nuclear vs. cytoplasmic localization of the OPDM/OPML polyglycine proteins shown in (A). (\u003cstrong\u003eE\u003c/strong\u003e) SDS-PAGE gel and immunoblot against the GFP or the GAPDH of soluble proteins (upper panel), and dot blot against the GFP or Ponceau staining of the insoluble proteins (lower panel) extracted from 48 hours differentiated LHCN-M2 muscle cells expressing either the GFP or GFP-tagged ATG polyG, OPDM2 uGIPpolyG, OPDM3 uN2CpolyG, OPDM4 asRILpolyG or OPML LOC6polyG. (\u003cstrong\u003eF-G\u003c/strong\u003e) Upper panel, scheme and mass spectrometry heat map of the interactants of GFP immunoprecipitated OPDM/OPML polyglycine proteins expressed in 48 hours-differentiated LHCN-M2 muscle cells. Lower panel, immunoblotting of the KU70/80 and RPL10/36 proteins from protein lysates (input) or GFP-immunoprecipitated proteins from 24 hours-transfected HEK293 cells expressing either the GFP, GFP-tagged ATG polyG, OPDM2 \u003cem\u003eGIPC1\u003c/em\u003e 5’UTR, OPDM3\u003cem\u003e NOTCH2NLC\u003c/em\u003e 5’UTR, OPDM4\u003cem\u003e RILPL1 \u003c/em\u003eantisense transcript or OPML \u003cem\u003eLOC642361\u003c/em\u003e lncRNA ORFs with or without a polyglycine stretch. (\u003cstrong\u003eH\u003c/strong\u003e) Cell viability of LHCN-M2 muscle cells differentiated for 3 days and expressing GFP, ATG polyG-GFP or GFP-tagged OPDM/OPML polyglycine proteins (OPDM2 uGIPpolyG, OPDM3 uN2CpolyG, OPDM4 asRILpolyG or OPML LOC6polyG). N=6. Error bars indicate SEM. Student t-test compared to the GFP control condition, * p\u0026lt;0.01, ** p\u0026lt;0.001, *** p\u0026lt;0.0001. See also Figure S4.\u003c/p\u003e","description":"","filename":"DarkGenomeFig4.png","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/edbca59a282bb2ea710f77e1.png"},{"id":80825018,"identity":"7a11ca17-12a0-4bee-aee0-593cdc9c6c0d","added_by":"auto","created_at":"2025-04-17 12:59:47","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1692193,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eExpression of polyG proteins forms inclusions and is pathogenic in animal muscles\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Scheme of the AAV strategy to study OPDM/OPML polyglycine toxicity in mouse skeletal muscles. (\u003cstrong\u003eB\u003c/strong\u003e) Hematoxylin and eosin (H\u0026amp;E) staining of \u003cem\u003eTibialis Anterior\u003c/em\u003e (TA) frozen sections of 5-months AAV-injected male mice expressing GFP, ATG polyG-GFP or GFP-tagged OPDM2 uGIPpolyG, OPDM3 uN2CpolyG, OPDM4 asRILpolyG or OPML LOC6polyG. The last image shows a representative p62 immunohistochemistry, which reveals numerous protein inclusions. (\u003cstrong\u003eC\u003c/strong\u003e) Quantification of mouse TA muscle fiber area 5- (upper panel) or 9- (lower panel) months post-injection of AAV expressing the OPDM/OPML polyglycine proteins and controls shown in (A). N=4 AAV injected male mice. Error bars represent SEM. One-way ANOVA with Tukey post hoc test, * p\u0026lt;0.05, ** p\u0026lt;0.01, *** p\u0026lt;0.01. (\u003cstrong\u003eD\u003c/strong\u003e) GFP fluorescence and immunofluorescence against p62 with counterstaining of membranes by fluorescent-conjugated wheat germ agglutinin (WGA) and nuclear DNA by DAPI on frozen TA muscle sections 5-months post-injection of AAV expressing the OPDM/OPML polyglycine proteins and controls described in (A). (\u003cstrong\u003eE\u003c/strong\u003e) Quantification of GFP-positive inclusions in TA frozen sections of controls and OPDM/OPML polyglycine-expressing mice. N=3 mice, at least 1000 muscle fibers were counted per animal. (\u003cstrong\u003eF\u003c/strong\u003e) Single nuclei RNA sequencing of leg muscles (TA, gactrocnemius, quadriceps) of 2 different individuals at 7-months NaCl-injected or AAV-injected male mice expressing GFP, ATG polyG-GFP or GFP-tagged OPDM2 uGIPpolyG, OPDM3 uN2CpolyG or OPML LOC6polyG. Results are represented by UMAP (upper panel) and abundancy of cell types (lower panel) of merged control (NaCl- and GFP-injected mice) vs merged polyglycine-expressing mice (ATG polyG-GFP-, uGIPpolyG-GFP-, uN2CpolyG-GFP- and LOC6polyG-GFP-injected mice). (\u003cstrong\u003eG\u003c/strong\u003e) Kaplan-Meier survival curve of controls and OPDM/OPML polyglycine-expressing mice. N=8 AAV injected male mice. See also Figure S5.\u003c/p\u003e","description":"","filename":"DarkGenomeFig5.png","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/a0a6bd3f1ddb933a873dc4b2.png"},{"id":80824166,"identity":"7c484658-697a-4885-8eb3-3908e2e9e0dc","added_by":"auto","created_at":"2025-04-17 12:51:47","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":1968610,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eExpression of polyglycine proteins forms inclusions and is pathogenic in animals CNS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Scheme of the AAV strategy to study OPDM/OPML polyglycine toxicity in mouse central nervous system. (\u003cstrong\u003eB-C\u003c/strong\u003e) Time before falling from a rotating rod (B) and numbers of paw slips and errors on a notched bar (C) of male mice 1-, 3-, 6- and 9-months post-injection of AAV expressing GFP, ATG polyG-GFP or GFP-tagged OPDM2 uGIPpolyG, OPDM3 uN2CpolyG or OPML LOC6polyG. N=10 AAV injected male mice. Box-and-whisker plot, box upper and lower limits represent 25th and 75th percentiles, whiskers represent minimum and maximum values and the horizontal line across the box represents the median. Bar graphs indicate standard error of the mean (SEM). One-way ANOVA with Tukey post hoc test, * p\u0026lt;0.05, ** p\u0026lt;0.01, *** p\u0026lt;0.01. (\u003cstrong\u003eD\u003c/strong\u003e) Kaplan-Meier survival curve of controls and OPDM/OPML polyglycine-expressing mice. N=10 AAV injected male mice. (\u003cstrong\u003eE\u003c/strong\u003e) Upper panel, immunohistochemistry against p62 with cresyl violet (Nissl) counterstaining of various mouse brain areas 3-months post-injection of AAV expressing GFP, ATG polyG-GFP or GFP-tagged OPDM2 uGIPpolyG, OPDM3 uN2CpolyG or OPML LOC6polyG. Lower panel, quantification of p62-positive inclusions. N=3 mice, \u0026gt;200 nuclei were counted per brain region and per animal. (\u003cstrong\u003eF\u003c/strong\u003e) Upper panel, immunofluorescence against p62 and calbindin on the cerebellum 3-months post-injection of AAV expressing controls or OPDM/OPML GFP-tagged polyG proteins. Lower panel, quantification of Purkinje cell (PC) number. N=3 mice, 4 mm\u003csup\u003e2\u003c/sup\u003e was counted per animal. See also Figure S6.\u003c/p\u003e","description":"","filename":"DarkGenomeFig6.png","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/4b4917e5cbc77fbc2393e9ad.png"},{"id":80824168,"identity":"2498ef36-c93f-44ba-a137-bb54db39bc7c","added_by":"auto","created_at":"2025-04-17 12:51:48","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":1555267,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe porphyrin TMPYP4 alleviates aggregation and toxicity of polyglycine proteins.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Dot blot against the GFP or Ponceau staining of the insoluble proteins extracted from 48 hours differentiated LHCN-M2 muscle cells expressing GFP-tagged ATG polyG, OPDM2 uGIPpolyG, OPDM3 uN2CpolyG, OPDM4 asRILpolyG or OPML LOC6polyG and treated overnight with the indicated drug concentration. (\u003cstrong\u003eB\u003c/strong\u003e) Cell viability of LHCN-M2 muscle cells differentiated for 3 days and expressing GFP, ATG polyG-GFP or GFP-tagged OPDM2 uGIPpolyG, OPDM3 uN2CpolyG, OPDM4 asRILpolyG or OPML LOC6polyG and treated overnight with no or 0.3, 1 or 3 µM of TMPyP4. N=6. Error bars indicate SEM. Student t-test compared to each non-treated control condition, * p\u0026lt;0.01, ** p\u0026lt;0.001, *** p\u0026lt;0.0001. (\u003cstrong\u003eC\u003c/strong\u003e) Left panels, representative light microscopy (upper panel) and electron microscopy (lower panel) images of fly eyes expressing GFP (control), uGIPpolyG-GFP or asRILpolyG-GFP. Right panel, quantification of intact rhabdomeres per ommatidium. Error bars represent SEM. One-way ANOVA with Bonferroni post hoc test, * p\u0026lt;0.05, ****p \u0026lt; 0.0001. (\u003cstrong\u003eD\u003c/strong\u003e) Porphyrin TMPyP4 ameliorates ommatidial degeneration in \u003cem\u003eDrosophila\u003c/em\u003e models of polyG-expanded proteins. Left panels, fly eyes representative electron microscopy images of 20-days old OPDM2 uGIPpolyG or OPDM4 asRILpolyG-expressing \u003cem\u003eDrosophila\u003c/em\u003e fed with no, 30, 100 or 200 μM of TMPyP4. Scale bars, 5 μm in columns 1 and 3, and 2 μm in columns 2 and 4. Right panel, quantification analysis revealed that TMPyP4 significantly preserved ommatidial integrity in OPDM polyglycine-expressing flies. One-way ANOVA with Bonferroni post hoc test. ** p\u0026lt;0.01, *** p\u0026lt;0.001, ****p \u0026lt; 0.0001. See also Figure S7.\u003c/p\u003e","description":"","filename":"DarkGenomeFig7.png","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/364055976872c5fddfe9ff82.png"},{"id":80825019,"identity":"ed23ccbd-2110-47fa-8a59-5b76a4db4515","added_by":"auto","created_at":"2025-04-17 12:59:47","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":103205,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eModel of polyG toxicity in OPDM/OPML neurological diseases.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eExpanded GGC repeats embedded within sequences originally annotated as non-coding are embedded in previously undescribed open reading frames, resulting in expression of novel polyglycine-containing proteins, which form protein inclusions and are toxic for neuronal and muscle cells. Of interest, toxicity and biological properties of their central polyglycine core is modulates by their bordering sequences, which are specific to each hosting ORFs. Moreover, these data recall the neurodegenerative NIID, FXTAS and SCA4 disorders, suggesting exitance of a wider neurological spectrum of diseases caused by polyGly proteins.\u003c/p\u003e","description":"","filename":"DarkGenomeFig8.png","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/bdec58c1d98d64cf30c48008.png"},{"id":102901179,"identity":"bf2b43e3-8f58-4208-a19d-b32afa0eb463","added_by":"auto","created_at":"2026-02-18 08:12:39","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":10718846,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/3d1a5c1e-11d3-420f-8f96-dde62715f69c.pdf"},{"id":80825020,"identity":"b95bcb96-57f3-4681-b659-77fde2cea087","added_by":"auto","created_at":"2025-04-17 12:59:48","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":3222317,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplemental Table 1. Related to Figure 4.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMass spectrometry analysis of proteins interacting with OPDM/OPML polyG proteins\u003c/p\u003e","description":"","filename":"SupTable1MassSpecInteractants.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/094375e4f8561b0fffbfa146.xlsx"},{"id":80824159,"identity":"807ce0e8-e033-4292-af3f-35cbb65bad40","added_by":"auto","created_at":"2025-04-17 12:51:47","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":14312,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplemental Table 2. Related to Figure 5.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSingle nuclei RNA sequencing analysis of OPDM/OPML vs. control mouse skeletal muscle.\u003c/p\u003e","description":"","filename":"SupTable2snRNAeqtranscriptomicchanges.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/4748617c7f5d3bedb10bf23d.xlsx"},{"id":80824175,"identity":"1ce39498-7000-41f6-88d6-dfd04b87c911","added_by":"auto","created_at":"2025-04-17 12:51:48","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":15888942,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplemental Table 3. Related to Figure 7.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eRNA sequencing analysis of TMPyP4 treated vs. control OPDM4 LHCN-M2 muscle cells.\u003c/p\u003e","description":"","filename":"SupTable3TMPRNAseq.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/fe6652959f7326d0fec9428e.xlsx"},{"id":80824173,"identity":"2d852573-8250-480b-85bb-45385795b919","added_by":"auto","created_at":"2025-04-17 12:51:48","extension":"xlsx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":4465196,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplemental Table 4. Related to Figure 7.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMass spectrometry analysis of TMPyP4 treated vs. control OPDM4 LHCN-M2 muscle cells.\u003c/p\u003e","description":"","filename":"SupTable4TMPMassSpec.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/f78fdcf0870643ce1d88743b.xlsx"},{"id":80825021,"identity":"5403831b-d82e-4e49-a0be-16defbc7af20","added_by":"auto","created_at":"2025-04-17 12:59:49","extension":"xls","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":1089536,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplemental Table 5. Related to Figure 7.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eMis-regulated Pathways analysis of TMPyP4 treated vs. control OPDM4 LHCN-M2 cells.\u003c/p\u003e","description":"","filename":"SupTable5PathwayTMPRNAseq.xls","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/9e433b0af2103ad96e7db124.xls"},{"id":80824171,"identity":"af3ffa30-b5cf-4972-a62f-4dcf64c91914","added_by":"auto","created_at":"2025-04-17 12:51:48","extension":"avi","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":2243286,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplemental Videos. Related to Figure 4.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eLive imaging of U2OS cells expressing OPDM4 asRILpolyG-GFP and Cherry-NUP50.\u003c/p\u003e","description":"","filename":"SupVideo1.avi","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/9a815ff54c8c529e143cbb4a.avi"},{"id":80824170,"identity":"26837f7f-4145-404f-86b9-2c6479b9d2bf","added_by":"auto","created_at":"2025-04-17 12:51:48","extension":"avi","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":1832802,"visible":true,"origin":"","legend":"\u003cp\u003eLive cell nuclear accumulation of OPDM4 asRILpolyPG-GFP video 2\u003c/p\u003e","description":"","filename":"SupVideo2.avi","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/e1ce9c8e8af67db8b02b3ff4.avi"},{"id":80824174,"identity":"f86690ee-5654-4aec-b670-e2d95874c87b","added_by":"auto","created_at":"2025-04-17 12:51:48","extension":"pdf","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":7411295,"visible":true,"origin":"","legend":"\u003cp\u003eSuplementary figures\u003c/p\u003e","description":"","filename":"DarkGenomeSupFigs.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6122917/v1/36877056af5f71d78c12013a.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Microsatellite expansions hidden within the human dark genome are translated in novel and toxic proteins causing muscle and neurodegenerative diseases","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003e~\u0026thinsp;98% of the human genome is constituted of sequences annotated as non-coding, with half of them composed of repetitive DNA elements, including microsatellites, which are 1 to 6 nucleotides-long DNA motifs repeated in tandem (Venter et al., 2001; Lander et al., \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2001\u003c/span\u003e; Nurk et al., 2022). These microsatellites, estimated between ~\u0026thinsp;1,5 and 2\u0026nbsp;million in human, occupy 3 to 5% of our genome and are an essential source of genetic variations as they are highly heterogenous in size and sequences. Indeed, microsatellites have an excessively rapid mutation rate with frequent gain or loss of repeat units, resulting in significant variability in their length and thus highly contributing to allelic variability between individuals and generations. Consequently, microsatellite variability has important roles in genome evolution, gene regulation and human phenotypic trait diversity (Gymrek et al., \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Fotsing et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Shi et al., \u003cspan citationid=\"CR94\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Verbiest et al., \u003cspan citationid=\"CR109\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; review in Wright and Todd \u003cspan citationid=\"CR112\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eHowever, expansion of a subset of these microsatellites over a threshold size is also the leading cause of various human pathologies, including cancer and inherited diseases (Erwin et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; review in Malik et al., \u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Depienne and Mandel \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). In that aspect, \u0026gt;\u0026thinsp;60 neurodevelopmental, neuromuscular and neurodegenerative monogenic disorders are presently known to be caused by expansions of tri-, tetra-, penta- or hexa-nucleotides repeats. Remarkably, this number is rapidly increasing as advances in long-read and whole human genome sequencing have revealed\u0026thinsp;~\u0026thinsp;20 novel pathogenic microsatellite expansions causing human genetic diseases in the recent years (Ishiura et al., \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Corbett et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Florian et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Yeetong et al., \u003cspan citationid=\"CR115\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Cortese et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Rafehi et al., \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Ishiura et al., \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Sone et al., \u003cspan citationid=\"CR96\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Deng et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Deng et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Xi et al., \u003cspan citationid=\"CR113\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Zeng et al., \u003cspan citationid=\"CR120\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Pellerin et al., \u003cspan citationid=\"CR85\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Rafehi et al., 2023; Figueroa et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Cortese et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). When embedded within an exonic coding sequence, these repeat expansions are consequently translated, resulting in expression of a mutant protein containing a stretch of repeated amino acids. Archetype of this mechanism is the polyglutamine (polyQ) group of diseases, where expansions of CAG repeats, embedded within ORFs of diverse genes, are translated in toxic polyglutamine-containing proteins, ultimately resulting in neuronal cell dysfunctions and neuronal cell death (review in Paulson et al., \u003cspan citationid=\"CR83\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Stoyas and La Spada, \u003cspan citationid=\"CR99\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Lieberman et al., \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). However, a majority of microsatellite expansions, notably most of the recently discovered ones, are located in genomic sequences ill-defined and annotated by default as non-coding (5\u0026rsquo;- and 3\u0026rsquo;-untranslated regions, introns, antisense RNAs, long non-coding RNAs, etc.; review in Vegezzi et al., \u003cspan citationid=\"CR107\" class=\"CitationRef\"\u003e2024\u003c/span\u003e); thus, questioning how these mutations are pathogenic.\u003c/p\u003e \u003cp\u003eOculopharyngodistal myopathy with or without leukoencephalopathy (OPDM, OMIM #164310; OPML, OMIM #618637) are rare adult-onset and slowly progressive neuromuscular diseases firstly described in 1977 (Satoyoshi and Kinoshita, \u003cspan citationid=\"CR91\" class=\"CitationRef\"\u003e1977\u003c/span\u003e). Clinical features of OPDM and OPML comprise ptosis, external ophthalmoplegia, dysphagia and dysarthria associated with facial and distal limb muscle weakness (Uyama et al., \u003cspan citationid=\"CR105\" class=\"CitationRef\"\u003e1998\u003c/span\u003e; van der Sluijs et al., \u003cspan citationid=\"CR106\" class=\"CitationRef\"\u003e2004\u003c/span\u003e; Lu et al., \u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e2008\u003c/span\u003e; Durmus et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Zhao et al., \u003cspan citationid=\"CR122\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Ishiura et al., \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Gu et al., \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Shi et al., \u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Pongpakdee et al., \u003cspan citationid=\"CR87\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Histopathological changes in OPDM and OPML show increased variations in muscle fibre sizes with occurrence of small angular fibres, splitting fibres and increased internalized nuclei, associated with moderate and variable fibrosis and fatty replacement. Besides these classical myopathic signs, OPDM/OPML histopathology is also characterized by the presence of large cytoplasmic rimmed vacuoles and rare, but typical, eosinophilic intranuclear inclusions, which are p62- and ubiquitin-positives (Zhao et al., \u003cspan citationid=\"CR122\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Deng et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Saito et al., \u003cspan citationid=\"CR90\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Ogasawara et al., \u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Kumutpongpanich et al., \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Ogasawara et al., \u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e2022\u003c/span\u003ea; Ogasawara et al., \u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e2022\u003c/span\u003eb). These inclusions are reminiscent of typical protein aggregates observed in other neurological disorders, but are currently of unknown origin. Importantly, the genetic causes of OPDM and OPML were uncovered only recently as identical expansions of ~\u0026thinsp;50 to 200\u0026ndash;300 repeats of the tri-nucleotide GGC sequence, however located within diverse genomic regions, transcribed but annotated as non-coding and embedded in at least six different genes: \u003cem\u003eLOC642361, LRP12\u003c/em\u003e, \u003cem\u003eGIPC1, NOTCH2NLC, RILPL1\u003c/em\u003e and \u003cem\u003eABCD3\u003c/em\u003e (Ishiura et al., \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Deng et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Xi et al., \u003cspan citationid=\"CR113\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Yu et al., \u003cspan citationid=\"CR116\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Yu et al., \u003cspan citationid=\"CR118\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Zeng et al., \u003cspan citationid=\"CR120\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Cortese et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Consequently, these pathologies are now classified in at least six subtypes according to the gene hosting the pathogenic GGC repeat expansion (\u003cem\u003eLOC642361\u003c/em\u003e: OPML, \u003cem\u003eLRP12\u003c/em\u003e: OPDM1, \u003cem\u003eGIPC1\u003c/em\u003e: OPDM2, \u003cem\u003eNOTCH2NLC\u003c/em\u003e: OPDM3, \u003cem\u003eRILPL1\u003c/em\u003e: OPDM4 and \u003cem\u003eABCD3\u003c/em\u003e: OPDM5). Of interest, recent clinical studies indicate that OPDM and OPML have a much wider clinical spectrum than previously thought, with evidence of neurological manifestations and reports of variable tremor, ataxia, visual disturbance, peripheral neuropathy and/or association with movement disorders, amyotrophic lateral sclerosis or Parkinson disease (Saito et al., \u003cspan citationid=\"CR90\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Fan et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Pan et al., \u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Gu et al., \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Kume et al., \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Hobara et al., \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Murayama et al., \u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Finally, it is striking to note that OPDM3 shares the same genetic cause than neuronal intranuclear inclusion disease (NIID) (Sone et al., \u003cspan citationid=\"CR96\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Ishiura et al., \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Tian et al., \u003cspan citationid=\"CR103\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Deng et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). NIID is a neurological disease characterized by variable muscle weakness associated with heterogenous dysfunctions of the central and peripheral nervous system (Sone et al., \u003cspan citationid=\"CR95\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Takahashi-Fujigasaki et al., \u003cspan citationid=\"CR101\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Tai et al., 2023; review in Fan et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Bao et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). These genetic similitudes and clinical overlaps suggest that OPDM, OPML and NIID belong to a new continuum of neuromuscular and neurodegenerative diseases, which probably share a common pathophysiological mechanism (review in Liufu et al., \u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Boivin et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Ishiura et al., \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Moreover, these observations highlight that GGC repeat expansions in OPDM, OPML and NIID are widely pathogenic for both muscle and neuronal cells. However, it remains to determine how these mutations, located within genomic regions annotated as non-coding, can lead to the formation of protein inclusions and cause muscle and neuronal dysfunctions.\u003c/p\u003e \u003cp\u003eHere, we found that the OPDM/OPML GGC repeats located in the long \u0026ldquo;non\u0026rdquo;-coding LOC642361 RNA, as well as in the \u0026ldquo;non-coding\u0026rdquo; sequences of the \u003cem\u003eGIPC1\u003c/em\u003e, \u003cem\u003eNOTCH2NLC\u003c/em\u003e and \u003cem\u003eRILPL1\u003c/em\u003e genes, are located within small and previously unrecognized ORFs, resulting in expression of novel proteins where each GGC repeat encodes for a glycine amino acid. Consequently, these GGC repeat expansions are translated into novel polyglycine-containing proteins. Of interest, we found that the \u003cem\u003eGIPC1\u003c/em\u003e small ORF is translated in absence of any ATG start codon, but instead translation initiation takes place at a CTG near-cognate start codon located upstream of the GGC repeats. Near-cognate start codons (CTG, GTG, ACG, TTG) are codons differing from the cognate AUG start codon by one nucleotide, but that can nonetheless initiate translation through mispairing with the initiator methionine-tRNA. Antibodies developed against these diverse proteins confirmed their expression in patients, notably their localization in p62-positive inclusions in muscle sections of individuals with OPDM and OPML. Moreover, expression of these polyglycine proteins in muscle cells and animal models is sufficient to induce formation of the characteristic OPDM/NIID/OPML p62-positive inclusions. Importantly, both mouse and fly models expressing these diverse polyglycine proteins show locomotor alterations associated with muscle fiber atrophy and muscle weakness, as well as tremor and ataxia associated with neurodegeneration and neuroinflammation, thus recapitulating key clinical features of OPDM, OPML and NIID. Of interest, side-by-side comparison of these diverse polyglycine proteins in cell and animal models reveals unexpected variations in their expression, aggregation, half-life, interactants and toxicity, highlighting a key contribution of the specific amino acid sequences originating from their hosting small ORFs. Finally, we tested various pharmacological compounds known to target GGC repeats and/or modulate protein aggregates degradation, and identified the cationic porphyrin TMPyP4 as a potential therapeutic option for these diverse neuromuscular and neurodegenerative disorders.\u003c/p\u003e \u003cp\u003eOverall, these data provide a common and unified pathogenic mechanism for the skeletal muscle and central nervous system dysfunctions observed in individuals with OPDM, NIID and OPML, where expansions of GGC repeats are translated in novel and toxic polyglycine-containing proteins driving formation of p62-positive inclusions and muscle and neuronal cell dysfunctions. Moreover, this study highlights the richness and complexity of the human \u0026ldquo;dark\u0026rdquo; genome, notably the existence of numerous uncharted small ORFs in genomic sequences originally annotated as non-coding, resulting in translation of their embedded microsatellite mutations in novel and toxic proteins.\u003c/p\u003e"},{"header":"RESULTS","content":"\u003cp\u003e \u003cb\u003eOPDM and OPML GGC repeats are translated into novel polyglycine proteins.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eOculopharyngodistal myopathy with or without leukoencephalopathy (OPDM \u0026amp; OPML) and neuronal intranuclear inclusion disease (NIID) are neurological diseases caused by identical expansions of 50 to 200 GGC repeats located in diverse sequences annotated as non-coding (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA and supplementary figures S1A to S1C). The pathogenic mechanism at play in these pathologies is yet to be identified, but a loss of function is unlikely as expression of the genes hosting these GGC mutations is consistently found unaltered in tissue samples from individuals with OPDM, NIID or OPML (Deng et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Yu et al., \u003cspan citationid=\"CR118\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Zeng et al., \u003cspan citationid=\"CR120\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Shi et al., \u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). These observations exclude a classical promoter silencing mechanism but question how these GGC mutations are pathogenic. As non-canonical translation of repeat expansions is an established mechanism of pathogenicity in microsatellite diseases, notably in NIID (Zu et al., \u003cspan citationid=\"CR125\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Boivin et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2021\u003c/span\u003e, Zhong et al., \u003cspan citationid=\"CR123\" class=\"CitationRef\"\u003e2021\u003c/span\u003e review in Gao et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Kearse and Wilusz \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2017\u003c/span\u003e), we investigated the potential translation of the OPDM and OPML mutations. Three representative non-coding sequences, namely the 5\u0026rsquo;-untranslated region (5\u0026rsquo;UTR) of \u003cem\u003eGIPC1\u003c/em\u003e, the antisense transcript of \u003cem\u003eRILPL1\u003c/em\u003e and the \u003cem\u003eLOC642361\u003c/em\u003e long non-coding RNA (lncRNA), which are respectively the cause of OPDM2, OPDM4 and OPML, were cloned with their GGC repeats and fused to the GFP in the three possible frames potentially encoded by these repeats (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA and supplementary figures S1A to S1C). Briefly, (\u003cb\u003eGG\u003c/b\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eC\u003c/span\u003e)n-GFP would produce a potential polyglycine-GFP, (\u003cb\u003eG\u003c/b\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eC\u003c/span\u003e\u003cb\u003eG\u003c/b\u003e)n-GFP may express a putative polyalanine-GFP and (\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eC\u003c/span\u003e\u003cb\u003eGG\u003c/b\u003e)n-GFP might encode a tentative polyarginine-GFP protein (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB, sequences in supplementary figure S1D to S1F). Of technical interest, the GFP sequence is deleted of its natural ATG start codon, so that GFP expression is now dependent of translation initiation occurring within the repeats or inside their hosting sequences. Moreover, this plasmid also contains an independent expression cassette producing the Cherry protein under its own promoter and ATG start codon, enabling to assess cell transfection efficiency and cell viability, independently of the expression of the GFP (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB). Importantly, cell transfection followed by direct observation of the GFP fluorescence and FACS analysis indicate that the OPDM and OPML GGC repeats are predominantly translated in the glycine frame, while GFP expression in the alanine or arginine frames is negligeable (Figs.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eC, \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eD and \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eE, supplementary figures S1G to S1L). In contrast, analysis of the sense \u003cem\u003eRILPL1\u003c/em\u003e RNA with a CCG expansion shows no detectable translation of these repeats in any frames (proline, alanine and arginine) (supplementary figures S1M to S1P). Of interest, RT-qPCR quantification shows similar RNA expression levels, excluding a bias of transcription or RNA stability between these constructs (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eF and supplementary figures S1Q and S1R). Western blotting analyses confirmed translation of the OPDM and OPML expanded GGC repeats in the glycine frame (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eG). Furthermore, treatment of cell extracts by lysostaphin, a glycyl-glycine endopeptidase, cleaves these proteins in smaller products, thus confirming the presence of a polyglycine stretch within these proteins (supplementary figure S1S, S1T and S1U). As additional controls, examination of the Cherry expression by fluorescence observation, FACS analysis or immunoblotting indicate that these diverse GGC repeats-GFP constructs have similar transfection efficiency, whatever their glycine, alanine or arginine frame (Figs.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eC, \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eD and \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eG, supplementary figure S1G to S1L). These controls ensure that the lack of GGC repeats translation in the polyalanine (polyA) or polyarginine (polyR) frames is not caused by a difference in construct expression, a potential toxicity leading to cell loss or another bias impairing observation of GGC repeat translation in the alanine or arginine frames. Overall, these results indicate that the OPDM and OPML GGC repeat expansions, while located in sequences annotated as non-coding, are nonetheless translated into novel polyglycine-containing proteins, yet to be characterized.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eOPDM and OPML GGC repeats are translated through initiation at start codons.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo uncover how these GGC repeats are translated, we immunoprecipitated the OPDM and OPML polyglycine proteins and determined their N-terminal sequences by mass spectrometry analysis (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA). Of technical interest, peptides identification was carried out by mining custom databases compiling human non-coding sequences putatively translated in their three possible frames. Importantly, for all three archetypes of non-coding sequences tested, namely the 5\u0026rsquo;UTR of \u003cem\u003eGIPC1\u003c/em\u003e causing OPDM2, the antisense transcript of \u003cem\u003eRILPL1\u003c/em\u003e causing OPDM4, and the \u003cem\u003eLOC642361\u003c/em\u003e lncRNA causing OPML, mass spectrometry analyses reveal presence of N-terminal peptides starting with a typical acetylated methionine (M\u003csup\u003eac\u003c/sup\u003e), which correspond to initiation at standard start codons located upstream of the GGC repeats (Figs.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB, \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC and \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eD, supplementary figures S2A, S2B, S2C and S2D). Indeed, translation initiations of the \u003cem\u003eRILPL1\u003c/em\u003e antisense RNA and of the \u003cem\u003eLOC642361\u003c/em\u003e lncRNA occur at classical ATG start codons located upstream of the repeats, while translation of the \u003cem\u003eGIPC1\u003c/em\u003e 5\u0026rsquo;UTR occurs in absence of any ATG start codon, but instead initiates at a CTG near-cognate start codon also located ahead of the repeats (Figs.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB to \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eD, supplementary figures S2A to S2D). Translation initiation at near-cognate start codons is typically less efficient compared to ATG start codons and thus, conditioned by the bordering Kozak sequence (consensus: CCRCC\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eAUG\u003c/span\u003eG). In that aspect, the \u003cem\u003eGIPC1\u003c/em\u003e CTG near-cognate start codon is embedded in a correct Kozak environment (CCGGT\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eCUG\u003c/span\u003eG) (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB). Finally, immunoblotting, fluorescence observation and FACS analyses consistently show that deletion of these ATG or CTG start codons abolishes expression of the OPDM/OPML polyglycine proteins, demonstrating their importance to translate these GGC repeats (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eE and supplementary figures S2E to S2P). As controls, investigation of GFP RNA levels or of the Cherry fluorescence, expressed from an independent cassette, indicate that deletion of these start codons does not alter GFP RNA expression or cell viability (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eE and supplementary figures S2E to S2P).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eOverall, these data reveal that the 5\u0026rsquo;-untranslated region of the \u003cem\u003eGIPC1\u003c/em\u003e gene, the antisense transcript of \u003cem\u003eRILPL1\u003c/em\u003e and the \u003cem\u003eLOC642361\u003c/em\u003e long non-coding RNA all contain previously unrecognized open reading frames (ORFs), which translations initiate at start codons located ahead of the GGC repeats, resulting in expression of novel proteins where each GGC repeat encodes for a glycine amino acid. Consequently, these OPDM and OPML GGC repeat expansions are translated into novel polyglycine-containing proteins, which were named uGIPpolyG, asRILpolyG and LOC6polyG for upstream of GIPC1, antisense of RILPL1 and LOC642361-encoded polyglycine proteins, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eF, sequences in supplementary figures S2Q, S2R and S2S). Of interest, analysis of databases indicates that the 5\u0026rsquo;UTR of \u003cem\u003eGIPC1\u003c/em\u003e is subject to alternative splicing with its exon 1, containing the GGC repeats, bridged to either its exon 2 or 4 (supplementary figure S1A), thus resulting in two small ORFs and a polyglycine protein with two different C-terminal sequences (supplementary figure S2Q). RT-qPCR performed on muscle tissue of individuals with OPDM2 confirmed no significant differences of \u003cem\u003eGIPC1\u003c/em\u003e mRNA expression compared to control individuals (supplementary figure S2T), while isoform specific RT-PCR indicates a slight increase alternative splicing of \u003cem\u003eGIPC1\u003c/em\u003e exon 1 toward exon 2 in OPDM2, with concomitant decrease splicing of exon 1 to exon 4 (supplementary figure S2U). These data suggest that the GGC repeat expansion, which is located only 7 nucleotides away from the 5\u0026rsquo; splice site of \u003cem\u003eGIPC1\u003c/em\u003e exon 1, may change its alternative splicing, resulting in increased inclusion of \u003cem\u003eGIPC1\u003c/em\u003e exon 2. Moreover, RT-qPCR quantification of the \u003cem\u003eRILPL1\u003c/em\u003e antisense transcript and of the \u003cem\u003eLOC642361\u003c/em\u003e lncRNA uncovered that these RNAs are expressed in human skeletal muscles, which is consistent with the tissue clinically affected in OPDM/OPML (supplementary figures S2V and S2W). Finally, the \u003cem\u003eRILPL1\u003c/em\u003e antisense ORF is conserved among primates, with presence of a conserved polyglycine stretch in chimpanzee, gorilla and marmoset, but with a predicted shorter protein with no polyglycine in macaque and other mammals including mouse (supplementary figure S2X). In contrast, the LOC642361 small ORF is unfound in most mammals, but strikingly identical to the C-terminal part of a longer protein of unknown function found in Gibbon Lesser Apes and Tufted capuchin (supplementary figure S2Y).\u003c/p\u003e \u003cp\u003e \u003cb\u003ePolyglycine proteins are present in the typical OPDM/OPML p62-positive inclusions.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo confirm that GGC repeat expansions are translated into novel polyglycine-containing proteins in individuals with OPDM/OPML, we developed antibodies directed against these proteins. However, we failed to obtain an antibody specific to their common polyglycine stretch. As an alternative, we developed various antibodies directed against their specific N- and/or C-terminal sequences, hence specific to each GGC-repeat hosting novel ORF (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e and supplementary figure S3). Antibodies specificities were confirmed by immunoblot and immunofluorescence on transfected cells (supplementary figures S3A to S3L). Importantly, immunofluorescence staining performed on skeletal muscle sections of individuals with OPDM2, OPDM4 and OPML revealed presence of their respective polyglycine proteins (uGIPpolyG, asRILpolyG and LOC6polyG) within the p62-positive cytoplasmic rimmed vacuoles and intranuclear inclusions typical of these diseases (Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA, \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB and \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC, supplementary figure S3M). To confirm these results, we developed another set of antibodies directed against different sequences of the asRILpolyG and LOC6polyG proteins and observed identical results with staining of the typical OPDM/OPML p62-positive inclusions (supplementary figures S3N and S3O). Moreover, as OPDM3 and NIID have an identical genetic cause, namely an expansion of GGC repeats in the 5\u0026rsquo;UTR of the \u003cem\u003eNOTCH2NLC\u003c/em\u003e gene, and as this expansion was recently found to belong to a small ORF translated in a polyglycine protein, uN2CpolyG (Boivin et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Zhong et al., \u003cspan citationid=\"CR123\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), we developed novel antibodies against this protein and uncovered its presence within the typical p62-positive inclusions in muscle sections of individuals with OPDM3 (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eD and supplementary figure S3P). Another antibody directed against a different sequence of the uN2CpolyG protein similarly stains the typical p62-positive inclusions, confirming expression of this polyglycine protein in OPDM3 (supplementary figure S3P). Of interest, no or only faint staining was observed in non-OPDM individuals (Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA, \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB, \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC and \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eD), as without a GGC repeat expansion and thus without a polyglycine stretch, these microproteins do not aggregate and their very small sizes prevent their detection by immunoblotting or immunofluorescence. Moreover, as each of these antibodies is directed against a specific ORF sequence, they are thus specific to each OPDM subtype and indeed, do not stain p62-positive inclusions in other OPDM/OPML subtypes (supplementary figures S3Q, S3R, S3S and S3T). These results further support the existence of specific polyglycine-proteins expressed in each OPDM subtype, as well as confirm the specificity of these antibodies.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFinally, as various microsatellite expansions have been reported to be RAN translated in their three potential frames (Zu et al., \u003cspan citationid=\"CR125\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; review in Guo et al., \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), and as a short expansion of GCN repeats in \u003cem\u003ePABPN1\u003c/em\u003e are translated in a protein with an extended polyalanine stretch that causes oculopharyngeal muscular dystrophy (OPMD) (Brais et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e1998\u003c/span\u003e; review in Banerjee et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2013\u003c/span\u003e), we also investigated a potential translation of the OPDM GGC repeats in the alanine frame. However, two independent antibodies developed against a putative GIPC1 polyalanine protein do not stain intranuclear inclusions or rimmed vacuoles in muscle sections of individuals with OPDM2, arguing against translation of GGC repeats in the alanine frame (supplementary figures S3U, S3V and S3W).\u003c/p\u003e \u003cp\u003eOverall, this work highlights that GGC repeat expansions causing the neurological OPDM, OPML and NIID disorders, while located in transcripts initially annotated as non-coding, are embedded in previously unrecognized ORFs and consequently translated into novel polyglycine-containing proteins. These results question whether these diverse proteins are pathogenic.\u003c/p\u003e \u003cp\u003e \u003cb\u003eExpression of polyglycine proteins forms inclusions and is pathogenic in muscle cells.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo further study the various OPDM, NIID and OPML polyglycine-containing proteins we cloned the \u003cem\u003eGIPC1, NOTCH2NLC\u003c/em\u003e, \u003cem\u003easRILPL1\u003c/em\u003e and \u003cem\u003eLOC642361\u003c/em\u003e ORFs, with either a control (8 to 12x) or an expanded (100x) size of polyglycine, from their ATG or near-cognate \u003cem\u003eGIPC1\u003c/em\u003e CTG start codon to their last coding codon, fused to the GFP. To exclude any bias of repeat instability, the GGC repeats were modified to include GGN alternative codons, which still encode for glycine but ensure that an identical and stable size of polyglycine is studied. This strategy also avoids expression of a pure GGC RNA hairpin, dismissing interferences with a putative toxic RNA gain-of-function mechanism. Moreover, to take in consideration \u003cem\u003eGIPC1\u003c/em\u003e exon 1 alternative splicing to either its exon 2 or 4, we also cloned the uGIPpolyG protein with its two possible C-termini, thus either ending with 8 amino acids (uGIPpolyG ex2) or 28 amino acids (uGIPpolyG ex4) (sequences in supplementary figure S4A).\u003c/p\u003e \u003cp\u003eWe first assessed the localization of these diverse polyglycine-containing proteins. Importantly, their expression in human LHCN-M2 differentiated muscle cells followed by immunofluorescence revealed that they form cytoplasmic and intranuclear inclusions, which are p62-positive and thus reminiscent of the histopathological features typical of OPDM, OPML and NIID (Figs.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA and \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB, supplementary figure S4B. Identical results were obtained in immortalized U2OS cell lines (supplementary videos 1 and 2). Inclusion formation is likely driven by their polyglycine expansion, as expression of an artificial protein mainly composed of a pure polyglycine stretch (ATG polyG-GFP) also forms cytoplasmic and nuclear inclusions (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA), while expression of these proteins with a control length of glycine (8 to 12 GGC repeats) does not promote the formation of protein aggregates. Further analysis by correlative light and electron microscopy (CLEM) shows that these polyglycine inclusions appear as round-shaped electron-dense deposits composed of filamentous structures without membrane boundaries (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eC), which is fully consistent with observations in OPDM individuals (Zhao et al., \u003cspan citationid=\"CR122\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Saito et al., \u003cspan citationid=\"CR90\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Kumutpongpanich et al., \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Of interest, we noted that these different polyglycine proteins present some differences in their localization, with the OPDM4 asRILpolyG protein being systematically more nuclear than the other (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eD). These data reveal that despite a common polyglycine central core, these diverse proteins are not strictly identical, suggesting a potential modulation from their specific N- and C-terminal ORF sequences. Thus, we investigated further these proteins, notably their expression, their potential interactants, as well as their toxicity.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eConcerning the expression levels of these diverse polyglycine-containing proteins, as expected from their cloning in an identical vector backbone, which transcription is driven by a heterologous viral minimal CMV promoter, their RNA levels assessed by RT-qPCR show similar expression levels (supplementary figure S4C). In contrast, their protein expression assessed by immunoblotting against the GFP revealed unexpected variations, with the artificial ATG polyG and the uN2CpolyG (OPDM3/NIID) proteins consistently less observed (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eD). Further immunoblot analysis upon Cycloheximide (CHX) chase uncovered different protein half-life, with the OPDM2 uGIPpolyG protein being the most stable, while the OPDM3/NIID uN2CpolyG protein shows a rapid turnover rate (supplementary figure S4D). As these polyglycine-containing proteins accumulate in cellular inclusions that may correspond to insoluble protein aggregates, which classically escape to immunoblot detection performed on the soluble cell fraction, we also performed in parallel dot blot analysis of the cell lysate pellet sonicated in urea and SDS loading dye (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE). This assay exposed further disparities between these polyglycine proteins, with the uN2CpolyG, asRILpolyG and LOC6polyG proteins notably more present in the insoluble protein fraction (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE). These results were confirmed by quantification of the localization of these diverse polyglycine proteins in LHCN-M2 muscle cells, notably their presence in inclusion versus a diffuse localization (supplementary figure S4E).\u003c/p\u003e \u003cp\u003eNext, we searched for potential interactants to these polyglycine-containing proteins. Interestingly, muscle cell transfection followed by GFP immunoprecipitation and mass spectrometry analysis did not identify a common partner, but unveiled diverse interactants specific to each polyglycine protein (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eF and supplementary table 1). In that aspect, the uN2CpolyG protein interacts with the KU70/80 dimer involved in DNA repair, while the LOC6polyG interacts with ribosomal proteins (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eG). Of interest, immunoprecipitation of these proteins with a control length of glycine recapitulates these interactions (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eG), indicating that interactants of these polyglycine-containing proteins are independent of their central polyglycine core, but instead are determined by their distinct N- and C-terminal sequences, which originate from their hosting small ORFs. These data suggest that these newly identified small ORFs may encode for potentially functional microproteins with relevant physiological roles.\u003c/p\u003e \u003cp\u003eFinally, we investigated the pathogenicity of these diverse polyglycine proteins. Importantly, their expression in human LHCN-M2 differentiated muscle cells is toxic and causes cell death, while expression of the GFP control construct is not (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eH). Of interest, while all these polyglycine proteins induce cell death, we noted some differences with a higher toxicity of the uN2CpolyG, asRILpolyG and LOC6polyG proteins compared to the uGIPpolyG protein or the artificial ATG polyG-GFP protein (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eH). These results suggest that toxicity of the central polyglycine core of these proteins is modulated by their specific ORF hosting sequences.\u003c/p\u003e \u003cp\u003eOverall, these results indicate that these diverse polyglycine proteins present the common properties to form p62-positive cytoplasmic and intranuclear inclusions, as well as to induce muscle cell death, which recapitulates key features of OPDM and OPML. Of interest, the localization, half-life, aggregation and interactants of these polyglycine-containing proteins vary, unveiling an unexpected modulation by their N- and C-terminal specific hosting ORF sequences. However, these data were obtained in muscle cell cultures, questioning toxicity of these diverse polyG proteins in animals.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003ePolyglycine proteins form inclusions and are pathogenic for muscles in animal\u003c/h2\u003e \u003cp\u003eTo determine the physiological impact of expressing these OPDM polyglycine proteins in skeletal muscles of animals, we cloned, produced and injected in wild type adult C57BL/6 mice recombinant adeno-associated viral (rAAV) particles expressing either the GFP-tagged uGIPpolyG, uN2CpolyG, asRILpolyG or LOC6polyG protein (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA). Of technical interest, we used a novel capsid variant, MyoAAV 4A, which specifically targets rodent muscles upon a single intravenous injection (Tabebordbar et al., \u003cspan citationid=\"CR100\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). As controls, we employed a similar rAAV strategy to express the GFP alone or an artificial protein, ATG polyG-GFP, composed of a polyglycine stretch deprived of its natural N- and C-terminal sequences (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA). Importantly, histological analysis of mouse tibialis anterior (TA) muscles 5 months after rAAV injection show that expressing OPDM polyglycine proteins is toxic and promotes muscle fibre size variations with presence of internalized or centralized nuclei (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB and supplementary figure S5A). Moreover, quantification of muscle fiber areas revealed that expression of these polyglycine proteins induces skeletal muscle atrophy (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eC); however, with some striking differences with the OPDM4 asRILpolyG, OPML LOC6polyG and OPDM3 uN2CpolyG proteins causing muscle fiber atrophy and histological changes as early as 4 to 5 months after rAAV injection (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB and upper panel of Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eC), while the OPDM2 uGIPpolyG protein shows a lesser toxicity with some muscle atrophy detected 9 months after rAAV injection (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eC, lower panel). Similarly, expression of ATG polyG, a protein deprived of any OPDM natural bordering sequences shows a limited and delayed pathogenicity (Figs.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB and \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eC, and supplementary figure S5A). Analysis of the gastrocnemius skeletal muscle shows identical results. Quantification of GFP RNA levels indicates that all rAAV are expressed at similar RNA levels (supplementary figure S5B). Further analyses, notably p62 staining, a classical marker of OPDM, revealed numerous p62-positive cytoplasmic and intranuclear inclusions (Figs.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB, \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eD and supplementary figure S5C). As observed in OPDM patients, these inclusions are eosinophilic, which is especially apparent in the uN2CpolyG expressing mice (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB). Of interest, all OPDM polyglycine proteins form inclusions in mouse skeletal muscle, but with some notable differences, with observation of frequent OPML LOC6polyG and OPDM3 uN2CpolyG aggregates, while ATG-polyG, OPDM2 uGIPpolyG and OPDM4 asRILpolyG inclusions are less represented (Figs.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eD and \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eE, supplementary figure S5C). Similarly, the localization of these polyglycine proteins, notably their cytoplasmic versus nuclear distribution, vary, with the OPDM4 asRILpolyG protein more observed in nuclei compared to the other polyGly proteins (Figs.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eD and \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eE, supplementary figure S5C). Of interest, single nuclei RNA sequencing revealed an increase in macrophages and B-cells, as well as in regenerative muscle fibres in OPDM versus control animals (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eF). These results indicate signs of inflammation and muscle regeneration consistent with myopathic changes in OPDM mice. However, these alterations were mild, with no global or massive variations in cell populations and only limited transcriptomic changes (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eF and supplementary table 2). Correspondingly, we observed only minor changes in muscle fibre types and no overt muscle regeneration by histology staining and quantitative RT-PCR (supplementary figures S5D and S5E). Similarly, animal performances were only slightly altered in rotarod and open field locomotor tests (supplementary figures S5F and S5G). These data indicate that expression of polyglycine proteins in mice drives progressive muscle fiber atrophy and histological changes reminiscent of OPDM, but with specific and limited myopathic alterations, at least in the time frame analyzed. Finally, expression of the asRILpolyG and LOC6polyG proteins is remarkably deleterious as these mice die suddenly around 5\u0026ndash;6 months or 8 to 9 months post rAAV injection, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eG). Further analysis revealed that asRILpolyG and LOC6polyG-expressing animals present dilated cardiomyopathy with presence of numerous p62-positive inclusions in cardiomyocytes (supplementary figure S5H). Abundance of these aggregates mirrors their toxicity with rare ATG polyG and uGIPpolyG inclusions, an intermediate situation for uN2CpolyG, while the asRILpolyG and LOC6polyG proteins form numerous large aggregates associated with notable myopathic changes (supplementary figure S5H). Of clinical interest, these data are reminiscent of the cardiac dysfunctions reported in individuals with OPDM (Oyer et al., \u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e1991\u003c/span\u003e; Chen et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Kumutpongpanich et al., \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Gu et al., \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Pan et al., \u003cspan citationid=\"CR82\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). These observations lead us to investigate the toxicity of these polyglycine proteins in other tissues, notably the central nervous system, especially in regards of the neurological manifestations, notably tremor and ataxia, recently reported in individuals with OPDM2, OPDM3/NIID and OPML (Ishiura et al., \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Fan et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Pan et al., \u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Kume et al., \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Gu et al., \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Hobara et al., \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Murayama et al., \u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003ePolyglycine proteins form inclusions and are pathogenic for the CNS in animal\u003c/h3\u003e\n\u003cp\u003eTo specifically target the mouse central nervous system (CNS), we used a similar rAAV strategy to express either GFP-tagged uGIPpolyG (OPDM2), uN2CpolyG (OPDM3) or LOCpolyG (OPML) (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA), taking advantage of the PHP.eB rAAV serotype that crosses C57BL/6 mouse blood-brain barrier and efficiently targets neurons upon a single intravenous injection (Chan et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). As controls, we developed PHP.eB rAAV expressing the GFP protein alone, or an artificial ATG polyG-GFP protein (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA). Also, in absence of any clinical reports of neurological symptoms in individuals with OPDM4, asRILpolyG was not included in this CNS study. Interestingly, longitudinal follow up of these animals indicate that expression of these polyglycine proteins is toxic for the nervous system, with a progressive alteration of their motor performances and coordination (Figs.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eB and \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eC, supplementary Fig.\u0026nbsp;6A). However, we noted some notable differences between these diverse polyG proteins, with mice expressing the OPML LOC6polyG and the OPDM3 uN2CpolyG proteins showing evident difficulties to sustain the rotarod test (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eB), and largely increased number of errors and slips on the notched bar test (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eC and supplementary Fig.\u0026nbsp;6A) as early as 3 months post rAAV injection. In contrast, mice expressing the OPDM2 uGIPpolyG or the artificial ATG polyG proteins show milder changes and, at later time points, respectively 6- and 9-months post rAAV injection (Figs.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eB and \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eC). These changes in locomotor coordination likely originates from specific neuronal dysfunctions as these animals present normal performance in the open field test (supplementary Figs.\u0026nbsp;6B and 6C). Finally, expression of these diverse OPDM proteins is deleterious, but with some striking differences with mice expressing the OPDM2 uGIPpolyG protein showing a milder pathogenicity and longer lifespan compared to animals expressing OPML LOC6polyG or the OPDM3 uN2CpolyG protein (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eD). Moreover, mice expressing an artificial ATG polyG protein show a normal life span up to 15 months of age (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eD). These results highlight the importance of the specific ORF sequences bordering their common and identical polyglycine stretch to modulate toxicity of these proteins. Next, p62 staining revealed that all polyG proteins form numerous cytoplasmic and intranuclear inclusions, recapitulating a key histopathological feature of OPDM, OPML and NIID. Importantly, localization and abundance of these polyglycine proteins faithfully mirror their toxicity with scarce ATG polyG and uGIPpolyG protein aggregates, while the uN2CpolyG and LOC6polyG proteins form numerous nuclear inclusions in the cerebellum, brainstem and thalamus of these animals at 3 months post AAV injection (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eE). Of interest, accumulation of these polyG inclusions increased with animal age (supplementary figure S6D), and polyG inclusions are also evident in tyrosine hydroxylase (TH)-positive neurons of the substantia nigra (supplementary figure S6E). In contrast, the cortex and hippocampus of these animals is relatively spared of aggregates (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eE). As expected from their expression from a common rAAV backbone with a heterologous CMV-based promoter, their abundance is independent of their expression at the transcription level with similar RNA expression quantified by RT-qPCR (supplementary figure S6F). These results underline the intrinsic differences between these polyG proteins in their expression and abilities to form and accumulate in protein inclusions despite having an identical central polyglycine core. Further analysis revealed extensive neuronal cell death, notably loss of Purkinje cells in the cerebellum, especially in mice expressing the LOC6polyG and uN2CpolyG proteins (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eF), which is consistent with the progressive loss of motor balance and coordination observed in these animals. Moreover, Gfap staining and RT-qPCR indicated increased neuroinflammation in polyG-expressing animals, especially in mice with the uN2CpolyG and LOC6polyG proteins (supplementary figures S6G and S6H). In contrast, the uGIPpolyG and ATG polyG expressing mice show milder neuronal cell loss and lesser signs of neuroinflammation (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eF and supplementary figures S6G and S6H), which is consistent with the reduced number of p62-positive inclusions and milder phenotype observed in these animals.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eOverall, these data confirm that expression of these diverse polyglycine-containing proteins is toxic for both muscle and neuronal cells and recapitulate key clinical features of OPDM, OPML and NIID, notably myopathic changes and muscle fiber atrophy associated with neurological signs and neurodegeneration; as well as their typical histological presentation with presence of characteristic p62-positive protein inclusions. Moreover, side-by-side analysis of these diverse polyglycine proteins also revealed some notable and unexpected differences in their expression, their localization and their toxicity, highlighting the importance of their specific N- and C-terminal sequences to modulate the toxic properties of their central polyglycine core.\u003c/p\u003e \u003cp\u003e \u003cb\u003eThe porphyrin TMPYP4 alleviates aggregation and toxicity of polyglycine proteins.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eAltogether, these data support a pathogenic model where expression of toxic polyglycine proteins drives muscle cells and neurons dysfunctions in OPDM, NIID and OPML. Hence, search for compounds inhibiting translation and/or accumulation of these proteins may represent an attractive therapeutic option. In that aspect, various pharmacological molecules, including SRPIN340, H89 and TMPyP4, have been identified in the past years to prevent the nuclear export or the translation of GC-rich RNA in toxic proteins (Green et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Mori et al., \u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Malik et al., \u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Licata et al., \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Similarly, various compounds have been identified to promote the autophagic degradation of toxic proteins prone to aggregation in cell and animal models of neurodegeneration (review in Menzies et al., \u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Palmer et al., \u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Thus, we tested a selection of these compounds in our OPDM muscle cell model and found one, the cationic porphyrin TMPyP4, which efficiently prevents expression of both uGIPpolyG, uN2CpolyG, asRILpolyG and LOC6polyG at the protein level (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eA). In contrast, RT-qPCR analysis revealed that their RNA expression was largely unaffected (supplementary figure S7A). Importantly, TMPyP4 corrects polyG protein toxicity, restoring normal cell viability to LHCN-M2 muscle cells (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eB). Furthermore, RNA sequencing and mass spectrometry revealed that TMPyP4 induces only limited changes and no global transcriptomic or proteome alterations (supplementary figures S7B to S7E and supplementary tables 3 and 4). Of interest, pathway analysis revealed that TMPyP4 acts principally on translation (supplementary table 5), which is fully consistent with its known inhibitory function on the translation of GC-rich microsatellites (Green et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Mori et al., \u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). In addition, test of various analogs of TMPyP4 revealed no other functional molecules able to reduce polyglycine protein toxicity (supplementary figure S7F).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eNext, to investigate TMPyP4 effects in animals we developed \u003cem\u003eDrosophila\u003c/em\u003e expressing the OPDM polyglycine proteins. Fly was considered over mouse as \u003cem\u003eDrosophila\u003c/em\u003e is an established animal model to study polyG toxicity (Todd et al., \u003cspan citationid=\"CR104\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Kong et al., \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Yu et al., \u003cspan citationid=\"CR118\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), and drug testing is incomparably faster and less complex in flies over mammals. Thus, OPDM2 uGIPpolyG and OPDM4 asRILpolyG GFP-tagged proteins, representing respectively the lesser and more toxic polyG protein found in cell and mouse models, were cloned and expressed under the upstream active sequence-Galactose-regulated promoter element (UAS-Gal4). Eyes being the most accessible part of the nervous system, we first used a glass multiple reporter (GMR) Gal4 driver and found that expression of either uGIPpolyG or asRILpolyG leads to a rough eye phenotype with ommatidial degeneration and loss of rhabdomeres (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eC). As controls, flies expressing GFP or the small upstream GIPC1 or RIPL1 antisense ORFs with a normal length of glycine stretch show intact ommatidia structures (supplementary figure S7G). Importantly, TMPyP4 corrects polyG protein toxicity, restoring normal eye structure and rhabdomeres in both uGIPpolyG and asRILpolyG expressing \u003cem\u003eDrosophila\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eD). To investigate the toxicity of these polyglycine proteins further, they were ubiquitously expressed using an Actin5C-Gal4 driver and we examined adult fly locomotor abilities. Interesting, expression of uGIPpolyG leads to a progressively reduced mobility and shortened lifespan (supplementary figures S7H and S7I), while expression of asRILpolyG was particularly toxic with no or very few animals surviving to the adult stage. These results are consistent with the higher toxicity of asRILpolyG over uGIPpolyG observed in cells and mice, demonstrating in a third model that these polyG proteins, despite harboring a common and identical polyglycine tract, are not strictly identical. Overall, these data further highlight the importance of the specific ORF sequences flanking the polyglycine stretch to modulate the toxicity and biological properties of these proteins (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e). Moreover, these data further strengthen that expression of polyglycine proteins reproduces the locomotor and neurodegenerative clinical features observed in the OPDM, OPML and NIID disorders. Of clinical interest, these results also suggest that modulating polyglycine expression could be of therapeutic interest in these neuromuscular and neurodegenerative diseases.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003eOculopharyngodistal myopathy (OPDM), neuronal intranuclear inclusion disease (NIID) and oculopharyngeal myopathy with leukoencephalopathy (OPML) are inherited neurological diseases caused by identical GGC repeat expansions, however embedded in sequences annotated as non-coding in diverse genes (\u003cem\u003eLOC642361, LRP12\u003c/em\u003e, \u003cem\u003eGIPC1\u003c/em\u003e, \u003cem\u003eNOTCH2NLC\u003c/em\u003e and \u003cem\u003eRILPL1\u003c/em\u003e). Here, we found that these GGC repeats are located in previously uncharted small ORFs and thus, are translated into novel polyglycine-containing proteins, which form p62-positive protein inclusions and are toxic in cell and animal models. These data are reminiscent of the fragile X-associated tremor/ataxia syndrome (FXTAS) and the recently uncovered spinocerebellar ataxia 4 (SCA4), where GGC repeat expansions, respectively located in a small upstream ORF of the \u003cem\u003eFMR1\u003c/em\u003e gene or within the main ORF of the ZFHX3 protein, are translated into polyglycine-containing proteins, which are toxic and accumulate in p62-positive inclusions (Todd et al., \u003cspan citationid=\"CR104\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Sellier et al., \u003cspan citationid=\"CR92\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Wallenius et al., 2023; Figueroa et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Chen et al., 2024) (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e). Thus, these data support the existence of a novel group of human disorders, the polyG (or polyGly) diseases, where identical expansions of GGC repeats are located in diverse, previously ill-charted, ORFs and consequently translated into various polyglycine-containing proteins, which form protein inclusions and are toxic for muscle and neuronal cells (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e). Moreover, this work reinforces the proposition that OPDM, OPML, NIID, FXTAS and SCA4, are all part of a novel polyGly-caused continuum of neuromuscular and neurodegenerative diseases with overlapping clinical and histopathological presentations (review in Liufu et al., \u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Boivin et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Ishiura et al., \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). However, this work also raises several questions. Notably, we found that OPDM/OPML GGC repeats are essentially expressed in the glycine frame through canonical translation initiation at upstream AUG or near-cognate start codons, but our assays may not be sensitive enough to detect low level of RAN translation with non-canonical initiation starting directly within the repeats and in the three frames. Alternatively, considering that an expanded stretch of GCN repeats in \u003cem\u003ePABPN1\u003c/em\u003e is translated in a protein with a short (7 to 13) run of polyalanine, which causes oculopharyngeal muscular dystrophy (OPMD) (Brais et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e1998\u003c/span\u003e; review in Banerjee et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2013\u003c/span\u003e), and that extended but relatively short (~\u0026thinsp;30) stretches of polyalanine in various transcription factors lead to severe developmental diseases (Brown and Brown, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2004\u003c/span\u003e; Messaed and Rouleau, \u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e2009\u003c/span\u003e), it is also possible that longer expansions (\u0026gt;\u0026thinsp;50) of GCG repeat in the alanine frame could be especially deleterious and thus, not represented in late onset inherited neurological diseases such as OPDM and OPML. Another topic of discussion is whether other polyglycine-containing proteins causing additional GGC-repeat expansion disorders remain to be discovered. Similarly, this study questions how these polyGly proteins are pathogenic, and conversely, how to prevent their toxicity.\u003c/p\u003e \u003cp\u003eConcerning additional microsatellite expansions translated in novel and toxic proteins, yet to be identified; this work highlights the complexity and diversity of the human \u0026ldquo;dark\u0026rdquo; genome, notably the existence of numerous and yet uncharted small ORFs hidden in ill-described genomic sequences, annotated by default as non-coding. In that aspect, recent advances in bioinformatics, ribosome footprint profiling and high-resolution mass spectrometry have unveiled thousands of novel and non-canonical ORFs, with their vast majority having a size below 100 amino acids and yet to be studied (Ji et al., \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Raj et al., \u003cspan citationid=\"CR89\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Chen et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Chothani et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Duffy et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Mudge et al., \u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; review in Wright et al., \u003cspan citationid=\"CR111\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Dong et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Deutsch et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). These large-scale analyses also revealed that most mammalian genes, including long non-coding RNAs and pseudogenes, contain small and/or upstream ORFs, with their majority initiating at near-cognate start codons (Ingolia et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Lee et al., \u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Fields et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Johnstone et al., \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Near-cognate initiation codons differ from the cognate AUG start codon by one nucleotide, but that can still initiate translation through mispairing with the initiator methionine-tRNA. In vitro experiments and ribosome profiling revealed that predominantly four near-cognate start codons (CUG, GUG, UUG, and ACG) are tolerated and can initiate translation, however with a lesser efficiency compared to an AUG start codon (Kozak, \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e1989\u003c/span\u003e; Peabody, \u003cspan citationid=\"CR84\" class=\"CitationRef\"\u003e1989\u003c/span\u003e, Ingolia et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Lee et al., \u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; review in Kearse and wilusz, \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). This imperfect initiation mechanism enables leaky ribosomal scanning, which ultimately multiplies and complexifies the number of open reading frames encoded by mammalian genes. In parallel, recent advances in whole genome and long read sequencing revealed that microsatellite expansions are much more frequent than previously expected, with hundreds of GGC microsatellites now identified in the human genome (Annear et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Ziaei Jam et al., \u003cspan citationid=\"CR124\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Shi et al., \u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Iba\u0026ntilde;ez et al., \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Jadhav et al., \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Thus, as the human genome contains up to 2\u0026nbsp;million of microsatellites, which populate 3 to 5% of our DNA, it is thus foreseeable that some microsatellites will inevitably fall in one of these numerous, small and ill-described ORFs. In short, the present report of novel polyglycine-containing proteins embedded in previously uncharted ORFs may represent only the tip of the iceberg. In this aspect, patent candidates would be the recently identified GGC repeat expansions located in sequences annotated as non-coding in the \u003cem\u003eLRP12, ABCD3\u003c/em\u003e and \u003cem\u003eFAM193b\u003c/em\u003e genes and that cause OPDM1, 5 and 6, respectively (Ishiura et al., \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Cortese et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Danzi et al., 2025). Whether these GGC repeat expansions located in sequences annotated as non-coding are nonetheless translated into novel and toxic proteins is an exciting question for future studies.\u003c/p\u003e \u003cp\u003eRegarding the potential mechanisms of toxicity of these polyglycine-containing proteins, how they cause muscle and neuronal cell dysfunctions is unclear. These proteins form large cytoplasmic and intranuclear inclusions, which is consistent with the known self-aggregation properties of glycine homopolypeptides that form amyloid-like fibrils (Lorusso et al., 2011; Plumley et al., \u003cspan citationid=\"CR86\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). However, it remains to determine whether these polyglycine proteins are pathogenic in their aggregated form, or under their soluble monomeric form. Similarly, it is unclear whether their nuclear localization is important for their toxicity. In that aspect, it remains to determine how these polyglycine proteins travel toward the nucleus in absence of any evident nuclear localization signal (NLS). A tentative hypothesis would be that benefiting from their relatively small sizes, soluble polyglycine proteins may potentially diffuse freely through the nuclear pore and then accumulate in the nucleus, where they would aggregate away from the cytoplasmic autophagic clearance pathway. In absence of cell division and nuclear membrane collapse, notably in neurons and muscle cells, these polyglycine proteins would keep accumulating and promote toxicity. Also, observation of heart defects in mice expressing these polyglycine proteins questions whether cardiac changes have been underestimated in individuals with OPDM, and conversely whether yet unidentified GGC repeat mutations remain to be discovered in individuals with cardiomyopathy of unknown genetic causes. Another point of interest is the side-by-side comparison of these diverse polyglycine proteins in diverse and complementary cell and animal models. Assessment of these different polyG proteins revealed that their villainous abilities to form inclusions and promote cellular death originate from their central and common polyglycine core. However, analysis of these proteins beside each other also revealed that their expression, half-life, aggregating properties and interactions with other proteins, are modulated by their N- and C-terminal sequences, which are specific to their hosting ORFs. In that aspect, the microproteins expressed from the \u003cem\u003eNOTCH2NLC\u003c/em\u003e upstream ORF and from the \u003cem\u003eLOC642361\u003c/em\u003e small ORF specifically interact with proteins involved in double strand DNA breaks repairs and translation, respectively. These data suggest that these newly identified human microproteins may have relevant physiological functions, which remain to be thoroughly investigated. Lastly, toxicity of these various polyglycine proteins, encoded by diverse genes, is likely to be conditioned by their tissue distribution and expression levels, two key parameters depending on the strength of their translation start codons and Kozak sequences, notably canonical ATG vs near-cognate start codons; as well as the strength and tissue specificity of their respective promoters. In that aspect, it is notable that expansion of these GGC repeats over a threshold limit (~\u0026thinsp;200\u0026ndash;300 repeats) induces DNA methylation changes, ultimately resulting in silencing of their promoter. In regard of our present finding, this inhibition may be debated as a protective mechanism against expression of proteins with a polyglycine stretch extended over 200 repeats, which would likely present a higher toxicity and/or deleterious at a younger age. However, such putative protective mechanism would have evolve at the cost of favoring a deleterious loss-of-function mechanism for alleles present in single copy, such as CGG expansions over 200 repeats in the \u003cem\u003eFMR1\u003c/em\u003e gene located on the X chromosome that cause the fragile-X syndrome in males, or when loss-of-function mutations occur in the second allele, such as in the \u003cem\u003eXYLT1\u003c/em\u003e gene associated with the Baratela-Scott syndrome (BSS).\u003c/p\u003e \u003cp\u003eIn conclusion, our work provides a unified pathogenic mechanism for the skeletal muscle and central nervous system dysfunctions observed in individuals with OPDM, NIID and OPML, where GGC repeat expansions, identified in sequence originally annotated as non-coding, are in reality embedded in small open reading frames and consequently translated in novel and toxic polyglycine-containing proteins. Consistent with a common mechanism of pathogenicity, our data also provide a proof of concept that a unique therapeutic approach may be of interest for both OPDM, OPML and NIID. In that aspect, treatment of human muscle cells and drosophila animal model with the porphyrin TMPyP4 (5,10,15,20-tetra(N-methyl-4-pyridyl) alleviates the expression and toxicity of these polyG proteins. Importantly, TMPyP4 has no apparent deleterious effect on global cellular transcription and translation. Furthermore, this compound binds to G-quadruplex structure and to GC-rich RNA, notably the FMR1 GGC repeats causing FXTAS or the C9ORF72 GGGGCC repeats causing ALS, preventing their translation in toxic polyglycine-rich proteins (Ofer et al., \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; Zamiri et al., \u003cspan citationid=\"CR119\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Green et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Mori et al., \u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Thus, these data raise hope to identify a common therapeutic approach for these various neuromuscular and neurodegenerative diseases caused by microsatellite mutations of similar GC-rich sequences.\u003c/p\u003e"},{"header":"MATERIAL \u0026 METHODS","content":"\u003cp\u003e\u003cstrong\u003eRESOURCE AVAILABILITY\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eLead Contact\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFurther information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Nicolas Charlet-Berguerand ([email protected]).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMaterials Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll unique reagents (e.g., polyG DNA constructs, antibodies, etc.) generated in this study are available from the Lead Contact under MTA subject to restrictions from commercial source.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData and Code Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eRNA sequencing and mass spectrometry datasets are available in supplementary Tables S1 to S5. Complete transcriptomic and proteomics source data are available from the corresponding author upon reasonable request.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHuman samples\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHuman muscle samples were sampled with the informed consent of individuals and families and approved by the Institutional Review Board of the Peking University First Hospital, First Affiliated Hospital of Fujian Medical University\u0026nbsp;and\u0026nbsp;National Center of Neurology and Psychiatry.\u0026nbsp;This study was approved by the Ethics Committees of Peking University First Hospital,\u0026nbsp;First Affiliated Hospital of Fujian Medical University\u0026nbsp;and\u0026nbsp;National Center of Neurology and Psychiatry, and all procedures were conducted in accordance with relevant guidelines and regulations. Muscle biopsy samples from patients with OPDM, OPML and NIID and age-matched control subjects were examined. All clinical materials were obtained for diagnostic purposes after informed consent was provided. Prior to this study, all samples had been analyzed using routine histology techniques and electron microscopy. Fresh frozen samples were stored at −80°C until use.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMice\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll animal work was performed with approval from the IGBMC/ICS Animal Care Committee and of the French agency for research on animal (DGRI) authorization number APAFIS#33864-2021111217327782. C57BL/6 wild-type male mice were retro-orbitally AAV-injected at 2 months and then housed for 6 to 8 months in a temperature-controlled room (19–22°C) with a 12:12-hours light/dark cycle and free access to food and water. Mice were sacrificed by carbon dioxide (CO2) inhalation to dissect the different skeletal muscles, heart and brain which were subsequently frozen for molecular biology, freezing using pre-chilled isopentane or PFA-fixed and embedded in paraffin for histology.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCell cultures\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eU2OS and HEK293 cells were grown in DMEM 1 g/L glucose with 10% FCS and gentamycin at 37°C in 5% CO2. LHCN-M2 cells were grown in DMEM 4,5 g/L with 20% FCS, w/o PyrNa/M199, 25 µg/mL Fetuin, 5 mg/mL hEGF, 0,5 mg/mL human bFGF, 5 µg/mL humain insulin, 0,2 µg/mL dexamethasone and gentamycin at 37°C in 5% CO2. Differentiation of the LHCN-M2 cells was induced by serum removal. U2OS T-Rex cells (ThermoFisher) stably expressing Nup50-cherry were Lipofectamine-transfected with Pci1-linearized pcDNA3-TetOn expressing Nup50 fused to the mCherry and sectioned for neomycin resistance for two weeks.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConstructs\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHuman GIPC1 exon 1, antisense RILPL1 and\u0026nbsp;LOC642361\u0026nbsp;lncRNA sequences upstream of their GGC repeats were cloned into pcDNA3.1 fused to a GFP deleted of its ATG and in all three frames. Mutations of the ATG or CTG start codons, or within ORFs were achieved by inverse PCR or by oligonucleotide ligations. GIPC1 upstream ORF, antisense RILPL1 and\u0026nbsp;LOC642361 small ORFS\u0026nbsp;with either 12 or 100 optimized GGN repeats\u0026nbsp;were\u0026nbsp;synthetized by GenScript and fused to the GFP into a pAAV2-CAG vector. To ensure repeat expansions stability, all GGC repeat-containing plasmids were transformed into STBL3 bacterial strain (Invitrogen) and all constructs were confirmed by sequencing.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCell transfection and treatments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFor transient transfection, cells were plated in DMEM and 0.1% fetal bovine serum or without serum for LHCN-M2 cells and transfected for 5 hours using Lipofectamine 2000 (Fisher Scientific). After 1 to 4 days post transient transfection, cells were analyzed by live imaging, immunofluorescence, RT-qPCR, cell viability dot blotting or western blotting. For treatments, LHCN-M2 cells were incubated overnight with indicated concentration of SRPIN340, H-89, fluphenazine TMPyP4, 5,10,15,20-Tetra(4-pyridyl)-21H,23H-porphine, 5,10,15,20-Tetraphenyl-21H,23H-porphine (Sigma), \u0026nbsp; 5,10,15,20-Tetrakis(4-aminophenyl)-21H,23H-porphine, 5,10,15,20-Tetrakis(4-ethynylphenyl)-21H,23H-porphine, 5,10,15,20-Tetra(pyridin-2-yl)porphyrin, (Porphyrin-5,10,15,20-tetrayltetrakis(benzene-4,1-diyl))tetraboronic acid, 4,4',4'',4'''-(21H,23H-porphine-5,10,15,20-tetrayl)tetrakis-Phenol (BLD Pharm). For cycloheximide treatment, HEK293 cells were treated 1-day post-transfection with 50 µg/mL of cycloheximide during 1, 3, 8 or 24 hours.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCell viability assay\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eLHCN-M2 cells were transiently transfected during 48h to 72h with the different polyGlycine-expressing constructs and treated overnight with the indicated drug concentration. After addition of 0,5 µM of TO-PRO-3 (Thermofisher), live cells were imaged using the CX7 Cellular Imaging System (25 fields per well at 10x magnification) followed by a cell-to-cell analysis using Cellomics HCS Studio software (CellHealth Bioapplication). Transfected cells were detected using GFP staining and dead cells were identified using TO-PRO-3 intensity within cell mask.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFACS analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHEK293 cells transfected for 24 hours with the different frame constructs were trypsinized, centrifuged 5 min at 700 rpm and resuspended in 500 µL of PBS. Cells were analyzed by the BD LSRFortessa X-20 and results were construed by FlowJo.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCo-immunoprecipitation assay\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e24 h after transfection of HEK293 or LHCN-M2 cells with 3 µg of the different plasmid constructs in Lipofectamine 2000 (Invitrogen), cells were lysed in RIPA buffer (50 mM Tris-HCl pH 7.5, 0.15 M NaCl, 0.5% Triton X-100) supplemented with protease inhibitor cocktail (Roche) and clarified by centrifugation at 14000 rpm for 10 min. Immunoprecipitations were performed at 4°C for 1 h using pre-washed Anti-GFP (Abcam, ab193983) Magnetic Beads in RIPA buffer, washed three time, then bound proteins were eluted by 3 min denaturation step at 95°C with Laemmli buffer followed by mass spectrometry or western blot.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMass spectrometry analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHEK293 cells were transfected with GIP(GGC)\u003csub\u003eGly-frame\u003c/sub\u003e-GFP, asRILPL1(GGC)\u003csub\u003eGly-frame\u003c/sub\u003e-GFP,\u0026nbsp;LOC642361(GGC)\u003csub\u003eGly-frame\u003c/sub\u003e-GFP plasmids for N-terminus determination or with GFP, ATG polyG-GFP, uGIPpolyG-GFP (exon 2 or 4), uN2CpolyG-GFP, asRILpolyG-GFP or LOC6polyG-GFP plasmids for interactome identification, using Lipofectamine 2000 (Fisher Scientific) for 24 hours and proteins were purified by GFP-trap immunoprecipitation (abcam). LHCN-M2 were treated or not with 3 µM of TMPyP4 overnight. Protein mixtures were TCA-precipitated overnight at 4°C, pellets\u0026nbsp;were washed twice with\u0026nbsp;1 mL cold acetone, dried and dissolved in\u0026nbsp;2 M urea in\u0026nbsp;0.1 mM\u0026nbsp;Tris-HCl\u0026nbsp;pH 8.5 for\u0026nbsp;reduction\u0026nbsp;(5 mM TCEP,\u0026nbsp;30 min.) and alkylation (10 mM iodoacetamide, 30 min.). Trypsin digestion was carried out at 37°C overnight. Extracted peptides were then analyzed using an Ultimate 3000 nano-RSLC (Thermo Scientific) coupled in line with an Orbitrap ELITE or an Orbitrap Exploris 480 via a nano-electrospray ionization source (Thermo Scientific, San Jose California) and the FAIMS pro interface. Peptides were separated on a C18 Acclaim PepMap nano-column (75 µm ID x 25 cm, 2.6 µm, 150Å, Thermo Fisher Scientific) with a 20 minutes linear gradient from 8% to 35% buffer B (A: 0.1% FA in H\u003csub\u003e2\u003c/sub\u003eO; B: 0.1% FA in 80% ACN, 400 nl/min, 50°C) followed by a regeneration step at 90% B and a equilibration at 8% B. The total chromatography was 30 minutes. The mass spectrometer was operated in positive ionization mode in Data-Dependent Acquisition (DDA) with FAIMS compensation voltages set to CV = -45V. The DDA cycle consisted of one survey scans (350-1400 m/z, 90,000 FWHM) followed by MS² spectra (HCD; 30% normalized energy; 2 m/z window; 22,500 FWMH) in the limit of 1 sec. Unassigned and single charged states were rejected. Exclusion duration was set for 40 s with mass width was ± 10 ppm. Proteins were identified with Proteome Discoverer 2.5 software (Thermo Fisher Scientific) and \u003cem\u003eHomo Sapiens\u0026nbsp;\u003c/em\u003eproteome database or against a homemade database of all potential three frames translated proteins or peptides from the human GIPC1 5‘UTR, RILPL1 antisense or LOC642361 sequences. Precursor and fragment mass tolerances were set at 10 ppm and 0.02 Da Da respectively, and up to 2 missed cleavages were allowed. Oxidation (M) and N-terminal Acetylation were set as variable modification, and Carbamidomethylation (C) as fixed modification. Peptides were filtered with a false discovery rate (FDR) at 1%, rank 1. Proteins were quantified with a minimum of 1 unique peptide based on the XIC (sum of the Extracted Ion Chromatogram). The quantification values were exported in Perseus (1.6.15.0) for statistical analysis involving a log[2] transform, imputation, normalization.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eWestern blotting\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eProteins were denatured 3 min at 95°C, separated on 4-12% bis-Tris Gel (NuPAGE), transferred on nitrocellulose membranes (Amersham Protan), blocked with 5% non-fat dry milk in Tris Buffer Saline buffer plus 0,1% Tween-20 (TBS-T), incubated with anti-GFP (Abcam, ab290, 1/10000), anti-GFP (Abcam, ab1218, 1/10000), mCherry (Abcam, ab167453, 1/5000), GAPDH (Abcam, ab8245, 1/10000), Ku70 (SantaCruz, sc-56129, 1/5000), Ku80 (Abcam, ab119935, 1/10000), RPL10A (Thermofisher, MA5-27171, 1/3000), RPL36 (Abcam, ab241584, 1/10000), HA (Abcam, ab130275, 1/5000), uGIP pAb or uN2C pAb (rabbit polyclonal homemade, 1/1000), uN2C 4D12, asRIL 2D8 or 4B9 or LOC6 2E8 (mouse monoclonal homemade, 1/1000) in TBS-T plus 5% non-fat dry milk overnight at 4°C. The membranes were washed 3 times and incubated with anti-rabbit or mouse Peroxidase antibody (CST, 7074S or 7076S, 1/10000) 1 hour in TBS, followed by washing and ECL Prime chemiluminescence revelation kit (Millipore).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDot blotting\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eLHCN-M2 cells transfected with ATG polyG-GFP, uGIPex2polyG-GFP, uGIPex4polyG-GFP, uN2CpolyG-GFP, asRILpolyG-GFP or LOC6polyG-GFP constructs during 48h were scrapped and centrifuged during 10 min at 3000 rpm at 4°C. The cell pellet was freezed, resuspended in 200 µL of RIPA, freezed and centrifuged during 10 min at 14000 rpm at 4°C. The pellet was resuspended in 200µL of 2X Laemmli blue, sonicated 5 sec at 20% amplitude, warmed 3 min at 95°C. Proteins were directly loaded on nitrocellulose membranes (Amersham Protan), wash 2 times with Towbin buffer, blocked with 5% non-fat dry milk in Tris Buffer Saline buffer plus 0,1% Tween-20 (TBS-T), incubated with anti-GFP (Abcam, ab290, 1/10000) in TBS-T plus 5% non-fat dry milk overnight at 4°C. The membranes were washed 3 times and incubated with anti-rabbit Peroxidase antibody (CST, 7074S, 1/10000) 1 hour in TBS, followed by washing and ECL Prime chemiluminescence revelation kit (Millipore).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eLysostaphin treatment\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHEK293 cells transfected with uGIPpolyG-GFP, asRILpolyG-GFP or LOC6polyG-GFP constructs were scrapped and centrifuged during 10 min at 3000 rpm at 4°C. The cell pellet was resuspended in 300 µL of RIPA, centrifuged during 10 min at 14000 rpm at 4°C. 30 µL of supernatant extract was incubated with 10 ng /µL of lysostaphin (Prospec, ENZ-269) during 1 to 20 minutes at 37°C. Laemmli buffer was add to the mix and proteins were analyze by western blot.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAntibody production\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo generate monoclonal antibodies directed against uGIPpolyAla, asRILpolyG or LOC6polyG, two months old female BALB/c mice were injected intraperitoneally with KLH conjugated peptides (1A7 and 3G4: RRAEPGAHGEAEAA for uGIPpolyAla antibodies production, 2D8: GPGVWAPGSARSC and 4B9: GGSGEGARVRRPAAPPKLGSELRS for asRILpolyG antibodies generation or 2E8: CAWVGAPERSWPAGGPDALRGRDGAKEAGR for LOC6polyG antibodies generation) with 200 ug of poly(I/C) as adjuvant. Three injections were performed at 2 weeks intervals and four days prior to hybridoma fusion, mice with positively reacting sera were re-injected. Spleen cells were fused with Sp2/0.Agl4 myeloma cells. Supernatants of hybridoma cultures were tested at day 10 by ELISA for cross-reaction with peptides. Positive supernatants were then tested by immunofluorescence and western blot on transfected HEK293 cells. Specific cultures were cloned twice on soft agar. Specific hybridomas were established and ascites fluid was prepared by injection of 2x106 hybridoma cells into Freund adjuvant-primed BALB/c mice. All animal experimental procedures were performed according to the French and European authority guidelines.\u003c/p\u003e\n\u003cp\u003eRabbit polyclonal antibodies directed against uGIPpolyG or uN2CpolyG were generated by Eurogentec with the following KLH conjugated peptides: uGIP pAb: MEFAEGRAGC, uN2C pAb: GGGDREDARPAPLC. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAAV production and retro-orbital injection\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eRecombinant AAV were generated by triple-transfection of HEK293T/17cell line with the pAAV expression plasmids (expressing: GFP, ATG polyG-GFP, uGIPex2polyG-GFP, uGIPex4polyG-GFP, uN2CpolyG-GFP, asRILpolyG-GFP or LOC6polyG-GFP), the auxiliary plasmid pHelper (Agilent) encoding the adenovirus helper functions and the capsid plasmid pUCmini-iCAP-PHP.eB (Addgene #103005) or pMyoAAV-4A. The pMyoAAV-4A was previously generated by the IGBMC facility using available literature (Tabebordbar et al., 2021). \u0026nbsp; The rAAV were harvested from cell lysate and treated with Benzonase (Merck) at 100U/mL. Recombinant vectors were purified by Iodixanol gradient ultracentrifugation (OptiprepTM, Axis Shield), followed by dialysis and concentration (Amicon Ultra-15 Centrifugal Filter Device 100 K) against sterile PBS (Dulbecco’s PBS containing 0.5 mM MgCl2). Particles were quantified by real time PCR and vector titers were expressed as viral genomes per ml (vg/ml). 2 months old C57BL/6 male mice were injected retro-orbitally with 100 µL of sterile NaCl with 1.5x1013 vg/kg of AAV.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMouse phenotyping\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eRotarod test (Bioseb, Chaville, France) was performed with three testing trials during which the rotation speed accelerated from 4 to 40 rpm in 5 min. Trials were separated by 10-15 min interval. The average latency was used as index of motor coordination performance. Notched bar test: mice were tested under 100-lux lighting on a 2 cm-wide and 50 cm-long natural wooden piece notched bar comprising 12 platforms of 2 cm spaced by 13 gaps of 2 cm and bearing a 6 cm2 terminal platform. Animals had to cross the notched bar twice for training and 3 times for the test. Every instance of a back paw going through the gap was considered an error, and the global error percentage was calculated. Open field test: mice were tested in automated open fields (Panlab, Barcelona, Spain), each virtually divided into central and peripheral regions. The open fields were placed in a room homogeneously illuminated at 120 Lux. Each mouse was placed in the periphery of the open field and allowed to explore freely the apparatus for 30 min, with the experimenter out of the animal’s sight. The distance traveled, the number of rears, and time spent in the central and peripheral regions were recorded over the test session. The number of entries and the percent time spent in center area are used as index of emotionality/anxiety.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eImmunofluoresence on PFA-fixed cells\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eGlass coverslips containing plated cells were fixed for 15 min in PBS with 4% paraformaldehyde, washed with PBS and incubated in PBS plus 0.5% Triton X-100 during 5 min. The coverslips were incubated during one hour with primary antibody against p62 (Abcam, ab56416, 1/1000), p62 (Abcam, ab109012, 1/1000) desmin (Abcam, ab32362, 1/500), Lamin A/C (Abcam, ab238303, 1/1000), uGIP pAb or uN2C pAb (rabbit polyclonal homemade, 1/100), uN2C 4D12, asRIL 2D8 or 4B9 or LOC6 2E8 (mouse monoclonal homemade, 1/100). After washing with PBS, the coverslips were incubated with donkey anti-mouse or donkey anti-rabbit secondary antibodies conjugated with Alexa 488, CY3 or CY5 (Jackson Immunoresearch, 1/500) for one hour, washed twice with PBS and incubated for 3 min in PBS/DAPI (1/10000 dilution). Coverslips were rinsed twice before mounting in Pro-Long media (Molecular Probes).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eImmunofluoresence or immunochemistry on PFA-fixed tissue sections\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFor immunochemistry follow by cresyl violet counterstaining, buffers were DEPC-treated and autoclave as described in Kádár et al., 2009. Brain sections were deparaffinized for 10 min in Sub-X (Leica) and dehydrated as follows: ethanol 100% (10 min), ethanol 90% (5 min), ethanol 70% (5 min), and rinsed in water. Antigen retrieval was performed in pressure cooker in 10 mM Tris pH9, 1 mM EDTA or in 10 mM sodium citrate pH6. For immunochemistry, endogenous peroxidase activity was blocked 15 min with 3% H2O2. Slides were blocked 1 h with PBS, 0,5% Triton X-100 and 5% Horse Serum for immunofluorescence of with PBS, 0,1% Tween-20 and 5% BSA for immunochemistry following by overnight incubation at 4°C with primary antibody against Calbindin (CST, 13176S, 1/800), GFAP (Abcam, ab68428, 1/10000), p62 (Abcam, ab56416, 1/1000), p62 (CST, 23214S, 1/500) or Tyrosine Hydroxylase (Abcam, ab112, 1/2000). For immunofluorescence, slides were washed with PBS plus 0.1% Triton X-100, incubated with donkey anti-mouse or donkey anti-rabbit secondary antibodies conjugated with Alexa 488 or CY3 (Jackson Immunoresearch, 1/500) for one hour, washed twice with PBS-0,1% Triton X-100 and incubated for 3 min in PBS/DAPI (1/1000 dilution). Slides were rinsed twice in PBS before mounting in Pro-Long media (Molecular Probes). For immunochemistry, slides were washed with PBS plus 0,1% Tween-20, incubated with horse anti-mouse or anti-rabbit coupled to peroxidase (Vector, MP-7402 or MP-7401) for 30 min, washed with PBS-0,1% Tween-20 and then, revealed by DAB EqV substrate (Vector, SK-4103) under binocular magnifier. The reaction was stopped by immersing the slide in PBS. Then, the slides were washed 15 min in water, stain in 1% cresyl violet solution at 55°C during 10 min. Slides were washed in water, quickly dehydrated in 100% ethanol, immersed in Sub-X and mounted in CV Ultra mounting medium (Leica).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eImmunofluoresence or immunochemistry on isopentane-frozen sections\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFor immunochemistry, endogenous peroxidase activity was blocked 15 min with 3% H2O2. Muscle sections were blocked 1 h with PBS and 3% BSA and incubated overnight at 4°C with primary antibody directed against p62 (Abcam, ab109012, 1/1000), p62 (CST, 23214S, 1/500), Lamin A/C (Abcam, ab238303, 1/1000), Lamin B1 (Proteintech, 12987-1-AP, 1/500), type I fibers (DSHB, BA-D5, 1/50), type IIa fibers (DSHB, SC-71, 1/50), type IIb fibers (DSHB, BF-F3, 1/50), uGIPpolyGly pAb or uN2C pAb (rabbit polyclonal homemade, 1/100), uGIPpolyAla 1A7 or 3G4, uN2C 4D12, asRIL 2D8 or 4B9 or LOC6 2E8 (mouse monoclonal homemade, 1/100). After washing with PBS, the slides were incubated with donkey anti-mouse or donkey anti-rabbit secondary antibodies conjugated with Alexa 488, CY3 or CY5 (Jackson Immunoresearch, 1/500), goat anti-mouse IgM DyLight 405, IgG2b CY3 and IgG1 CY5 (Jackson Immunoresearch, 1/100) and Wheat Germ Agglutinin (WGA) conjugated to Alexa 555 (1/300) for one hour, washed twice with PBS and incubated for 3 min in PBS/DAPI (1/1000 dilution) for immunofluorescence. Concerning immunochemistry, slides were washed in PBS, incubated with with horse anti-rabbit coupled to peroxidase (Vector, MP-7401) for 30 min, washed in PBS and revealed by DAB EqV substrate (Vector, SK-4103). The reaction was stopped by immersing the slide in PBS. Then, the slides were washed 15 min in water, H\u0026amp;E stain following the constructor instructions (Abcam, ab245880). For SDH activity, tissue sections were incubated in an SDH reaction mixture (1.5 mM nitroblue tetrazolium (NBT), 5 mM EDTA, 48 mM succinic acid, 750 µM sodium azide, 30 mM methyl-phenylmethlyl sulfate, phosphate buffered to pH 7.6). Slides were immersed in Sub-X and mounted in CV Ultra mounting medium (Leica).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eImage acquisition\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSlides were imaged by spinning disk Yokogawa CSU W1 mounted with Leica Dmi 8 microscope with x20 or x63 objectives or by Zeiss Axioscan 7 scanner with x20 objective for immunohistochemistry or x40 objective for immunofluorescence.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCorrelative light and electron microscopy (CLEM)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eCells were grown to near confluency directly on carbon-coated sapphire disks (3x0.05mm; engineering office M. Wohlwend GMBH). The sapphire disks were then transferred to 300µm deep flat carriers and subjected to high-pressure freezing with the HPM10 BALTEC apparatus. Automated FS (freeze substitution) was performed in the chamber of an AFS2 device (Leica Microsystems GmbH). The sample were kept at -90°C in dry acetone containing 0.1% uranyl acetate over a period of 24h. The temperature was gradually increased to -45°C at a rate 5°C/h, followed by 5h at -45°C. The samples were washed with pure acetone and infiltrated with graded concentrations of Lowicryl HM20. Polymerization was achieved by UV light exposure at -25°C for 48h, followed by an additional 9h at room temperature (20°C). Ultrathin sections were cut 90 nm with Leica Ultracut, picked up on 200 mesh copper grids coated with a carbon film. Sections were viewed on spinning disk Yokogawa X1 mounted with Nikon TI2 microscope to locate fluorescent signal. Then, after post-stained for 10min in 2% aqueous UA and 5min in lead citrate, sections were viewed on a Tecnai G2-20 transmission electron microscope operated at 120kV, and images were acquired on TVIPS TemCam F416 camera.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMuscle fiber segmentation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eH\u0026amp;E staining or WGA channel for fluorescent images were used to segments muscle fibers. The muscle fibers were segmented from in Qupath (Bankhead et a., 2017) with Cellpose (Stringer et al., 2021). Then, the fibers classification is calculated with a Python script and reimported into Qupath to display the result.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eQuantitative real time RT-PCR\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTotal RNAs from mouse tissues or cells were isolated by TriReagent (Merck). DNA was removed by treating samples following the instructions of the Turbo DNA-free Kit (Thermofisher). cDNAs were generated using the Transcriptor High Fidelity cDNA synthesis kit (Roche) for quantification of mRNAs. qPCR were realized using the LightCycler 480 SYBR Green I Master (Roche) in a Lightcycler 480 (Roche) with 15 min at 94°C followed by 50 cycles of 15 sec at 94°C, 20 sec at 58°C and 20 sec at 72°C. RPLPO mRNA for human samples and Ubiquitin mRNA for mouse samples were used as standard and data were analyzed using the Lightcycler 480 analysis software (2ΔCt method).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRNAseq on LHCN-M2 cells\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTotal RNAs from LHCN-M2 cells treated 3h with 1 µM of TMPyP4 were isolated by TriReagent (Merck). DNA was removed by treating samples following the instructions of the Turbo DNA-free Kit (Thermofisher). Library preparation was performed at the GenomEast platform at the Institute of Genetics and Molecular and Cellular Biology using Illumina Stranded mRNA Prep Ligation - Reference Guide - PN 1000000124518. RNA-Seq libraries were generated from 500 ng of total RNA using Illumina Stranded mRNA Prep, Ligation kit and IDT for Illumina RNA UD Indexes Ligation (Illumina, San Diego, USA) according to manufacturer’s instructions. Following polyA selection, mRNAs were fragmented at 94°C for 8 minutes. DNA libraries were amplified using 13 cycles of PCR. Surplus PCR primers were further removed by two successive purifications using SPRIselect beads (Beckman-Coulter, Villepinte, France). The final libraries were checked for quality and quantified using Bioanalyzer 2100 system (Agilent technologies, Les Ulis, France). Libraries were sequenced on an Illumina NextSeq 2000 sequencer as paired-end 50 base reads. Image analysis and base calling were performed using RTA version 2.7.7 and BCL Convert version 3.8.4. The GSEA analysis was based on enrichment analysis of Kyoto Encyclopedia of Genes and Genomes (KEGG).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRNAseq on muscle nuclei\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSingle nuclei RNA-seq analysis was performed using 10X Genomics FLEX multiplex technology which uses barcoded probe sets targeting the whole transcriptome and enabling sample multiplexing. Both Tibialis Anterior (TA), gastrocnemius and quadriceps muscles of 7 months post-AAV injection of two individuals per genotype (NaCl-, GFP-, ATG polyG-GFP-, uGIPpolyG-GFP-, uN2CpolyG-GFP- or LOC6polyG-GFP- injected mice) were isolated and dissociated with scissor in hypotonic buffer (250 mM sucrose, 10 mM KCl, 5 mM MgCl2, 10 mM Tris-HCl pH8, 25 mM HEPES, 0,2 mM PMSF, 0,1 mM DTT, 0,3% Triton X-100, 0,2 U/µL RNase inhibitor). The muscles were homogenized with Precellys 24 Touch (Bertin) for 25 sec at 5000 rpm. Samples were filtered through 100 µm and 20 µm filters and centrifuged 10 min at 400 G at 4°C. Pellets were washed in PBS, 2% BSA, 0,2 U/µL RNase inhibitor, centrifuged 10 min at 400 G at 4°C, resuspended in PBS, 2% BSA, 0,2 U/µL RNase inhibitor, 200 nM DAPI to sort nuclei with FACSAria Fusion or FACSAria II. More than 3 millions of isolated nuclei were centrifuged 12 min at 400 G at 4°C and fixed with formaldehyde using Chromium Next GEM Single Cell Fixed RNA Sample Preparation Kit (PN 1000414) following 10X genomics procedure (CG000478). \u0026nbsp;A single 8-plex library was generated using the Chromium Next GEM Single Cell Fixed RNA reagent set (10x Genomics, Leiden, The Netherlands), according to the manufacturer's recommendations. During this process, nuclei counting was performed by a Trypan Blue exclusion assay on a Buerker Chamber. For each of the 8 samples, from 133000 to 1060000 starting nuclei were hybridized with a unique Mouse WTA Probe Barcode (10x Genomics, PN-1000492). Hybridizations were completed in 80 µL of hybridization mixture plus 20 µL of WTA probes at 42°C for 21h. Following nuclei re-counting, an equal number of nuclei from each hybridization reaction was combined in a single pool before washing. After re-counting, nuclei were then loaded onto the Chromium iX using a Next GEM Chip Q, targeting 5000 nuclei per hybridization reaction for a total of 96000 nuclei loaded per library (8-plex). Full-length cDNA amplification was completed using 8 cycles of PCR. Dual-index library construction was performed with 10 cycles during sample index PCR. Final library quantification and quality control were performed using the Bioanalyzer 2100 (Agilent Technologies, Les Ulis, France). Libraries were sequenced on an Illumina NextSeq 2000 sequencer as paired-end 28 and 85 bases reads. Image analysis and base calling were performed using RTA (v.4.12.2) and BCL Convert (v.4.2.7). Alignment, filtering, UMI counting and assigning reads to samples were carried out using cellranger multi pipeline (v.8.0.1). The output filtered feature matrix of each sample was input to Seurat R package (v.5.1.0). The standard workflow for scRNA-seq with default parameters was followed. Integration of datasets was performed using the anchor-based RPCA integration method. SNN was built from the first 20 principal components, and the resolution of the clustering was set to 1.2. Marker genes were identified using the FindConservedMarkers function, the literature and scCATCH R package (v.3.2.2). Differential expression analysis across conditions was performed using DESeq2 as parameter of the FindMarkers function after pseudobulking (AggregateExpression function).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTransgenic Constructs and Drosophila Strains\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe upstream open reading frames (uORFs) of human GIPC1 (including wild-type uGIP-GFP and mutant uGIPpolyG-GFP), and human asRILPL1 (including wild-type asRIL-GFP and mutant asRILpolyG-GFP) were subcloned into the attB-pUAST vector. This vector features a UAS sequence in its promoter region (ref: A. H. Brand, N. Perrimon, Targeted gene expression as a means of altering cell fates and generating dominant phenotypes. Development 118, 401-415, 1993). Following cDNA sequence verification, these constructs were integrated into the attP2 site of phiC31 stocks through standard microinjection procedures, as outlined in prior research (ref: Yu J, Liufu T, Zheng Y, Xu J, Meng L, Zhang W, et al. CGG repeat expansion in NOTCH2NLC causes mitochondrial dysfunction and progressive neurodegeneration in Drosophila model. Proc Natl Acad Sci U S A. 2022 Oct 11;119(41):e2208649119). Transgenic Drosophila lines expressing either GFP vector control, uGIP-GFP, uGIPpolyG-GFP, asRIL-GFP or uRILpolyG-GFP were successfully generated. All Gal4 driver lines were acquired from the Bloomington Drosophila Stock Center. The fly strains were maintained under standard conditions at 25°C on cornmeal agar medium, with a regulated 12-hour light-dark cycle.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFly Electron Microscopy Analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eElectron microscopy was performed following established protocols (Ref. Yu J, Liufu T, Zheng Y, Xu J, Meng L, Zhang W, et al. CGG repeat expansion in NOTCH2NLC causes mitochondrial dysfunction and progressive neurodegeneration in Drosophila model. Proc Natl Acad Sci U S A. 2022 Oct 11;119(41):e2208649119). Briefly, samples were collected and fixed in 2.5% glutaraldehyde at 4°C overnight. Subsequently, the samples were sectioned using a Leica EM UC6/FC6 Ultramicrotome. To verify the proper orientation and quality of the sections, they were stained with toluidine blue. The selected sections were then transferred to copper grids and subjected to counterstaining with uranyl acetate and lead citrate to enhance contrast. Finally, the prepared samples were imaged using an electron microscope.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFly Climbing Assay and Lifespan Assay\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSex-specific climbing and lifespan assays were conducted following established protocols (ref. Yu J, Liufu T, Zheng Y, Xu J, Meng L, Zhang W, et al. CGG repeat expansion in NOTCH2NLC causes mitochondrial dysfunction and progressive neurodegeneration in Drosophila model. Proc Natl Acad Sci U S A. 2022 Oct 11;119(41):e2208649119). Briefly, vials containing 20–30 flies of the same genotype were used. Flies were separated by sex within 24 hours of eclosion and transferred to fresh vials every 4 days throughout the experimental period. For climbing assays, flies were gently tapped to the vial bottom, and the number crossing the 5 cm mark within 15 seconds was recorded. Each trial was repeated five times, with mean and standard error calculated. Lifespan assays used two gender- and age-matched groups. Both climbing ability and lifespan were assessed and recorded every 5 days.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFly Porphyrin TMPyP4 Administration\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePorphyrin TMPyP4 was obtained from Selleck Chemicals and stored as a 1 mM stock solution at −20°C. For \u003cem\u003eDrosophila\u003c/em\u003e studies, the compound was dissolved in sterile water to achieve final concentrations of 30 μM, 100 μM, and 200 μM. Before each experiment, fresh dilutions were prepared in \u003cem\u003eDrosophila\u003c/em\u003e cornmeal agar medium. The drug was administered starting from the egg stage, and adult flies were transferred to fresh vials containing Porphyrin TMPyP4 every three days.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eQuantification and statistical analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo eliminate bias, image or animal analyses were either completely automated or blinded. All statistical analyses were performed using Excel (Microsoft) and online web statistical calculator (Astatsa). Experiments are represented as either mean value ± Standard Error of Mean (SEM) or box-and-whisker plots with box upper and lower limits representing the 25th and 75th quartiles, respectively, the whiskers depicting the lowest and highest data points and the horizontal line through the box represent the median. The statistical tests used are two-tailed paired student t-test or one-way ANOVA with post-hoc Tukey HSD test. Significance was set as p \u0026lt; 0.05; **p \u0026lt; 0.01 and ***p \u0026lt; 0.001 for student t-test as *p \u0026lt; 0.01; **p \u0026lt; 0.001 and ***p \u0026lt; 0.0001 for ANOVA with post-hoc Tukey HSD test. Sample-sizes were determined based on past experiments and to minimize the number of mice used. No statistical method was used to determine whether data meet assumptions of the statistical approach. Detailed statistical information, including the statistical test, measures, number “n” of animals, cells and/ or experiments are indicated in the figures and their respective legends.\u003c/p\u003e"},{"header":"Declarations","content":"\n\u003cp\u003e\u003cstrong\u003eDECLARATION OF INTERESTS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no conflicts of interest.\u003c/p\u003e\n\u003ch2\u003eAUTHOR CONTRIBUTIONS\u003c/h2\u003e \u003cp\u003eExperiments were performed by M.B., J.Y., N.E., L.S., E.G., C.N., A.M., B.M., C.G., P.G-R., E.L., A.P., and M.O-A. Control and OPDM cases originate from I.N., Z.W., K.Y., N.W., and J.D. Data were collected and analyzed by M.B., J.Y., N.E., E.G., D.P., A.M., B.M., C.T., J.D. and N.C-B. The study was designed, coordinated and written by J.D., M.B. and N.C-B. with input from all authors.\u003c/p\u003e\u003ch2\u003eACKNOWLEDGMENTS\u003c/h2\u003e \u003cp\u003eThis work was supported by the following grants: National Natural Science Foundation of China (82071409, U20A20356, 82171846, 82422025, 82430059 and 82401635); Beijing Nova Program (20220484017, 20230484403); Beijing Natural Science Foundation (7244421); the National High-Level Hospital Clinical Research Funding (2023HQ03); Association Fran\u0026ccedil;aise contre les Myopathies AFM-T\u0026eacute;l\u0026eacute;thon 28811 (Manon Boivin) ; Fondation pour la Recherche sur le Cerveau FRC 2023, Fondation pour la Recherche M\u0026eacute;dicale FRM PMT202306017578 and FRM EQU202103012936 (Nicolas Charlet-Berguerand). Manon Boivin was supported by a 1-year post-doctoral ITI IMCBio funding. The GenomEast Sequencing platform at IGBMC is a member of the national France G\u0026eacute;nomique consortium supported by the French National Research Agency (ANR-10-INBS-0009). The Light Microscopy Facility at the IGBMC imaging center is member of the national infrastructure France-BioImaging and is supported by the French National Research Agency (ANR-10-INBS-04). Finally, this work of the Interdisciplinary Thematic Institute IMCBio+, as part of the ITI 2021\u0026ndash;2028 program of the University of Strasbourg, CNRS and Inserm, was supported by IdEx Unistra (ANR-10-IDEX-0002), and by SFRI-STRAT\u0026rsquo;US project (ANR-20-SFRI-0012) and EUR IMCBio (ANR-17-EURE-0023) under the framework of the France 2030 Program (IGBMC).\u003c/p\u003e \u003cp\u003eWe extend our gratitude to the patients and their families for their invaluable participation in this study. Special thanks to Jin Xu (Peking University First Hospital) for expert acquisition of electron microscopy images and to Jing Liu and Qingqing Wang (Peking University First Hospital) for their preparation of histopathological sections.\u003c/p\u003e\u003ch2\u003eData and Code Availability\u003c/h2\u003e \u003cp\u003eRNA sequencing and mass spectrometry datasets are available in supplementary Tables S1 to S5. Complete transcriptomic and proteomics source data are available from the corresponding author upon reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAnnear DJ, Vandeweyer G, Elinck E, Sanchis-Juan A, French CE, Raymond L, Kooy RF. Abundancy of polymorphic CGG repeats in the human genome suggest a broad involvement in neurological disease. Sci Rep. 2021 Jan 28;11(1):2515. doi: 10.1038/s41598-021-82050-5\u003c/li\u003e\n\u003cli\u003eBanerjee A, Apponi LH, Pavlath GK, Corbett AH. PABPN1: molecular function and muscle disease. FEBS J. 2013 Sep;280(17):4230-50. doi: 10.1111/febs.12294.\u003c/li\u003e\n\u003cli\u003eBankhead P, Loughrey MB, Fern\u0026aacute;ndez JA, Dombrowski Y, McArt DG, Dunne PD, McQuaid S, Gray RT, Murray LJ, Coleman HG, James JA, Salto-Tellez M, Hamilton PW. QuPath: Open-source software for digital pathology image analysis. Sci Rep. 2017 Dec 4;7(1):16878. doi: 10.1038/s41598-017-17204-5\u003c/li\u003e\n\u003cli\u003eBao L, Zuo D, Li Q, Chen H, Cui G. Current advances in neuronal intranuclear inclusion disease. Neurol Sci. 2023 Jun;44(6):1881-1889. doi: 10.1007/s10072-023-06677-0.\u003c/li\u003e\n\u003cli\u003eBoivin M, Deng J, Pfister V, Grandgirard E, Oulad-Abdelghani M, Morlet B, Ruffenach F, Negroni L, Koebel P, Jacob H, Riet F, Dijkstra AA, McFadden K, Clayton WA, Hong D, Miyahara H, Iwasaki Y, Sone J, Wang Z, Charlet-Berguerand N. Translation of GGC repeat expansions into a toxic polyglycine protein in NIID defines a novel class of human genetic disorders: The polyG diseases. Neuron. 2021 Jun 2;109(11):1825-1835.e5. doi: 10.1016/j.neuron.2021.03.038.\u003c/li\u003e\n\u003cli\u003eBoivin M, Charlet-Berguerand N. Trinucleotide CGG Repeat Diseases: An Expanding Field of Polyglycine Proteins? Front Genet. 2022 Feb 28;13:843014. doi: 10.3389/fgene.2022.843014.\u003c/li\u003e\n\u003cli\u003eBrais B, Bouchard JP, Xie YG, Rochefort DL, Chretien N, Tome FM, Lafreniere RG, Rommens JM, Uyama E, Nohira O et al. Short GCG expansions in the PABP2 gene cause oculopharyngeal muscular dystrophy. Nat Genet (1998) 18, 164\u0026ndash;167.\u003c/li\u003e\n\u003cli\u003eBrown LY, Brown SA. Alanine tracts: the expanding story of human illness and trinucleotide repeats. Trends Genet. 2004 Jan;20(1):51-8. doi: 10.1016/j.tig.2003.11.002.\u003c/li\u003e\n\u003cli\u003eChan KY, Jang MJ, Yoo BB, Greenbaum A, Ravi N, Wu WL, S\u0026aacute;nchez-Guardado L, Lois C, Mazmanian SK, Deverman BE, Gradinaru V. Engineered AAVs for efficient noninvasive gene delivery to the central and peripheral nervous systems. Nat Neurosci. 2017 Aug;20(8):1172-1179. doi: 10.1038/nn.4593.\u003c/li\u003e\n\u003cli\u003eChen J, Brunner AD, Cogan JZ, Nu\u0026ntilde;ez JK, Fields AP, Adamson B, Itzhak DN, Li JY, Mann M, Leonetti MD, Weissman JS. Pervasive functional translation of noncanonical human open reading frames. Science. 2020 Mar 6;367(6482):1140-1146.\u003c/li\u003e\n\u003cli\u003eChen H, Lu L, Wang B, Cui G, Wang X, Wang Y, et al. Re-defining the clinicopathological spectrum of neuronal intranuclear inclusion disease. Ann Clin Transl Neurol. 2020;7(10):1930\u0026ndash;41. https://doi. org/10.1002/acn3.51189.\u003c/li\u003e\n\u003cli\u003eChen Z, Gustavsson EK, Macpherson H, Anderson C, Clarkson C, Rocca C, Self E, Alvarez Jerez P, Scardamaglia A, Pellerin D, Montgomery K, Lee J, Gagliardi D, Luo H; Genomics England Research Consortium; Hardy J, Polke J, Singleton AB, Blauwendraat C, Mathews KD, Tucci A, Fu YH, Houlden H, Ryten M, Pt\u0026aacute;ček LJ. Adaptive Long-Read Sequencing Reveals GGC Repeat Expansion in ZFHX3 Associated with Spinocerebellar Ataxia Type 4.\u003c/li\u003e\n\u003cli\u003eMov Disord. 2024 Mar;39(3):486-497. doi: 10.1002/mds.29704.\u003c/li\u003e\n\u003cli\u003eChothani SP, Adami E, Widjaja AA, Langley SR, Viswanathan S, Pua CJ, Zhihao NT, Harmston N, D\u0026apos;Agostino G, Whiffin N, Mao W, Ouyang JF, Lim WW, Lim S, Lee CQE, Grubman A, Chen J, Kovalik JP, Tryggvason K, Polo JM, Ho L, Cook SA, Rackham OJL, Schafer S. A high-resolution map of human RNA translation. Mol Cell. 2022 Aug 4;82(15):2885-2899.e8.\u003c/li\u003e\n\u003cli\u003eCorbett MA, Kroes T, Veneziano L, Bennett MF, Florian R, Schneider AL, Coppola A, Licchetta L, Franceschetti S, Suppa A, Wenger A, Mei D, Pendziwiat M, Kaya S, Delledonne M, Straussberg R, Xumerle L, Regan B, Crompton D, van Rootselaar AF, Correll A, Catford R, Bisulli F, Chakraborty S, Baldassari S, Tinuper P, Barton K, Carswell S, Smith M, Berardelli A, Carroll R, Gardner A, Friend KL, Blatt I, Iacomino M, Di Bonaventura C, Striano S, Buratti J, Keren B, Nava C, Forlani S, Rudolf G, Hirsch E, Leguern E, Labauge P, Balestrini S, Sander JW, Afawi Z, Helbig I, Ishiura H, Tsuji S, Sisodiya SM, Casari G, Sadleir LG, van Coller R, Tijssen MAJ, Klein KM, van den Maagdenberg AMJM, Zara F, Guerrini R, Berkovic SF, Pippucci T, Canafoglia L, Bahlo M, Striano P, Scheffer IE, Brancati F, Depienne C, Gecz J. Intronic ATTTC repeat expansions in STARD7 in familial adult myoclonic epilepsy linked to chromosome 2. Nat Commun. 2019 Oct 29;10(1):4920. doi: 10.1038/s41467-019-12671-y.\u003c/li\u003e\n\u003cli\u003eCortese A, Beecroft SJ, Facchini S, Curro R, Cabrera-Serrano M, Stevanovski I, Chintalaphani SR, Gamaarachchi H, Weisburd B, Folland C, Monahan G, Scriba CK, Dofash L, Johari M, Grosz BR, Ellis M, Fearnley LG, Tankard R, Read J, Merve A, Dominik N, Vegezzi E, Schnekenberg RP, Fernandez-Eulate G, Masingue M, Giovannini D, Delatycki MB, Storey E, Gardner M, Amor DJ, Nicholson G, Vucic S, Henderson RD, Robertson T, Dyke J, Fabian V, Mastaglia F, Davis MR, Kennerson M; OPDM study group; Quinlivan R, Hammans S, Tucci A, Bahlo M, McLean CA, Laing NG, Stojkovic T, Houlden H, Hanna MG, Deveson IW, Lockhart PJ, Lamont PJ, Fahey MC, Bugiardini E, Ravenscroft G. A CCG expansion in ABCD3 causes oculopharyngodistal myopathy in individuals of European ancestry. Nat Commun. 2024 Jul 27;15(1):6327.\u003c/li\u003e\n\u003cli\u003eCortese A, Simone R, Sullivan R, Vandrovcova J, Tariq H, Yau WY, Humphrey J, Jaunmuktane Z, Sivakumar P, Polke J, Ilyas M, Tribollet E, Tomaselli PJ, Devigili G, Callegari I, Versino M, Salpietro V, Efthymiou S, Kaski D, Wood NW, Andrade NS, Buglo E, Rebelo A, Rossor AM, Bronstein A, Fratta P, Marques WJ, Z\u0026uuml;chner S, Reilly MM, Houlden H. Biallelic expansion of an intronic repeat in RFC1 is a common cause of late-onset ataxia. Nat Genet. 2019 Apr;51(4):649-658.\u003c/li\u003e\n\u003cli\u003eDanzi MC, Xu IRL, Fazal S, Dolzhenko E, Pellerin D, Weisburd B, Reuter C, Sampson J, Folland C, Wheeler M, O\u0026apos;Donnell-Luria A, Wuchty S, Ravenscroft G, Eberle MA; All of Us Research Program Long Read Working Group; Zuchner S. Detailed tandem repeat allele profiling in 1,027 long-read genomes reveals genome-wide patterns of pathogenicity.\u003c/li\u003e\n\u003cli\u003ebioRxiv [Preprint]. 2025 Jan 20:2025.01.06.631535. doi: 10.1101/2025.01.06.631535.\u003c/li\u003e\n\u003cli\u003eDeng J, Gu M, Miao Y, Yao S, Zhu M, Fang P, Yu X, Li P, Su Y, Huang J, Zhang J, Yu J, Li F, Bai J, Sun W, Huang Y, Yuan Y, Hong D, Wang Z. Long-read sequencing identified repeat expansions in the 5\u0026apos;UTR of the NOTCH2NLC gene from Chinese patients with neuronal intranuclear inclusion disease. J Med Genet. 2019 Nov;56(11):758-764.\u003c/li\u003e\n\u003cli\u003eDeng J, Yu J, Li P, Luan X, Cao L, Zhao J, Yu M, Zhang W, Lv H, Xie Z, Meng L, Zheng Y, Zhao Y, Gang Q, Wang Q, Liu J, Zhu M, Guo X, Su Y, Liang Y, Liang F, Hayashi T, Maeda MH, Sato T, Ura S, Oya Y, Ogasawara M, Iida A, Nishino I, Zhou C, Yan C, Yuan Y, Hong D, Wang Z. Expansion of GGC Repeat in GIPC1 Is Associated with Oculopharyngodistal Myopathy. Am J Hum Genet. 2020 Jun 4;106(6):793-804.\u003c/li\u003e\n\u003cli\u003eDeutsch EW, Kok LW, Mudge JM, Ruiz-Orera J, Fierro-Monti I, Sun Z, Abelin JG, Alba MM, Aspden JL, Bazzini AA, Bruford EA, Brunet MA, Calviello L, Carr SA, Carvunis AR, Chothani S, Clauwaert J, Dean K, Faridi P, Frankish A, Hubner N, Ingolia NT, Magrane M, Martin MJ, Martinez TF, Menschaert G, Ohler U, Orchard S, Rackham O, Roucou X, Slavoff SA, Valen E, Wacholder A, Weissman JS, Wu W, Xie Z, Choudhary J, Bassani-Sternberg M, Vizca\u0026iacute;no JA, Ternette N, Moritz RL, Prensner JR, van Heesch S. High-quality peptide evidence for annotating non-canonical open reading frames as human proteins. bioRxiv [Preprint]. 2024 Sep 9:2024.09.09.612016.\u003c/li\u003e\n\u003cli\u003eDepienne C, Mandel JL. 30 years of repeat expansion disorders: What have we learned and what are the remaining challenges? Am J Hum Genet. 2021 May 6;108(5):764-785.\u003c/li\u003e\n\u003cli\u003eDong X, Zhang K, Xun C, Chu T, Liang S, Zeng Y, Liu Z. Small Open Reading Frame-Encoded Micro-Peptides: An Emerging Protein World. Int J Mol Sci. 2023 Jun 23;24(13):10562.\u003c/li\u003e\n\u003cli\u003eDuffy EE, Finander B, Choi G, Carter AC, Pritisanac I, Alam A, Luria V, Karger A, Phu W, Sherman MA, Assad EG, Pajarillo N, Khitun A, Crouch EE, Ganesh S, Chen J, Berger B, Sestan N, O\u0026apos;Donnell-Luria A, Huang EJ, Griffith EC, Forman-Kay JD, Moses AM, Kalish BT, Greenberg ME. Developmental dynamics of RNA translation in the human brain. Nat Neurosci. 2022 Oct;25(10):1353-1365.\u003c/li\u003e\n\u003cli\u003eDurmus H, Laval SH, Deymeer F, Parman Y, Kiyan E, Gokyigiti M, Ertekin C, Ercan I, Solakoglu S, Karcagi V, Straub V, Bushby K, Lochm\u0026uuml;ller H, Serdaroglu-Oflazer P. Oculopharyngodistal myopathy is a distinct entity: clinical and genetic features of 47 patients. Neurology. 2011 Jan 18;76(3):227-35. doi: 10.1212/WNL.0b013e318207b043.\u003c/li\u003e\n\u003cli\u003eErwin GS, G\u0026uuml;rsoy G, Al-Abri R, Suriyaprakash A, Dolzhenko E, Zhu K, Hoerner CR, White SM, Ramirez L, Vadlakonda A, Vadlakonda A, von Kraut K, Park J, Brannon CM, Sumano DA, Kirtikar RA, Erwin AA, Metzner TJ, Yuen RKC, Fan AC, Leppert JT, Eberle MA, Gerstein M, Snyder MP. Recurrent repeat expansions in human cancer genomes. Nature. 2023 Jan;613(7942):96-102. doi: 10.1038/s41586-022-05515-1.\u003c/li\u003e\n\u003cli\u003eFan Y, Shen S, Yang J, Yao D, Li M, Mao C, Wang Y, Hao X, Ma D, Li J, Shi J, Guo M, Li S, Yuan Y, Liu F, Yang Z, Zhang S, Hu Z, Fan L, Liu H, Zhang C, Wang Y, Wang Q, Zheng H, He Y, Song B, Xu Y, Shi C. GIPC1 CGG Repeat Expansion Is Associated with Movement Disorders. Ann Neurol. 2022 May;91(5):704-715.\u003c/li\u003e\n\u003cli\u003eFan Y, Xu Y, Shi C. NOTCH2NLC-related disorders: the widening spectrum and genotype-phenotype correlation. J Med Genet. 2022 Jan;59(1):1-9. doi: 10.1136/jmedgenet-2021-107883.\u003c/li\u003e\n\u003cli\u003eFields AP, Rodriguez EH, Jovanovic M, Stern-Ginossar N, Haas BJ, Mertins P, Raychowdhury R, Hacohen N, Carr SA, Ingolia NT, Regev A, Weissman JS. A Regression-Based Analysis of Ribosome-Profiling Data Reveals a Conserved Complexity to Mammalian Translation. Mol Cell. 2015 Dec 3;60(5):816-827. doi: 10.1016/j.molcel.2015.11.013.\u003c/li\u003e\n\u003cli\u003eFigueroa KP, Gross C, Buena-Atienza E, Paul S, Gandelman M, Kakar N, Sturm M, Casadei N, Admard J, Park J, Z\u0026uuml;hlke C, Hellenbroich Y, Pozojevic J, Balachandran S, H\u0026auml;ndler K, Zittel S, Timmann D, Erdlenbruch F, Herrmann L, Feindt T, Zenker M, Klopstock T, Dufke C, Scoles DR, Koeppen A, Spielmann M, Riess O, Ossowski S, Haack TB, Pulst SM. A GGC-repeat expansion in ZFHX3 encoding polyglycine causes spinocerebellar ataxia type 4 and impairs autophagy. Nat Genet. 2024 Jun;56(6):1080-1089. doi: 10.1038/s41588-024-01719-5.\u003c/li\u003e\n\u003cli\u003eFlorian RT, Kraft F, Leit\u0026atilde;o E, Kaya S, Klebe S, Magnin E, van Rootselaar AF, Buratti J, K\u0026uuml;hnel T, Schr\u0026ouml;der C, Giesselmann S, Tschernoster N, Altmueller J, Lamiral A, Keren B, Nava C, Bouteiller D, Forlani S, Jornea L, Kubica R, Ye T, Plassard D, Jost B, Meyer V, Deleuze JF, Delpu Y, Avarello MDM, Vijfhuizen LS, Rudolf G, Hirsch E, Kroes T, Reif PS, Rosenow F, Ganos C, Vidailhet M, Thivard L, Mathieu A, Bourgeron T, Kurth I, Rafehi H, Steenpass L, Horsthemke B; FAME consortium; LeGuern E, Klein KM, Labauge P, Bennett MF, Bahlo M, Gecz J, Corbett MA, Tijssen MAJ, van den Maagdenberg AMJM, Depienne C. Unstable TTTTA/TTTCA expansions in MARCH6 are associated with Familial Adult Myoclonic Epilepsy type 3. Nat Commun. 2019 Oct 29;10(1):4919. doi: 10.1038/s41467-019-12763-9.\u003c/li\u003e\n\u003cli\u003eFotsing SF, Margoliash J, Wang C, Saini S, Yanicky R, Shleizer-Burko S, Goren A, Gymrek M. The impact of short tandem repeat variation on gene expression. Nat Genet. 2019 Nov;51(11):1652-1659. doi: 10.1038/s41588-019-0521-9.\u003c/li\u003e\n\u003cli\u003eGao FB, Richter JD, Cleveland DW. Rethinking Unconventional Translation in Neurodegeneration. Cell. 2017 Nov 16;171(5):994-1000. doi: 10.1016/j.cell.2017.10.042.\u003c/li\u003e\n\u003cli\u003eGreen KM, Sheth UJ, Flores BN, Wright SE, Sutter AB, Kearse MG, Barmada SJ, Ivanova MI, Todd PK. High-throughput screening yields several small-molecule inhibitors of repeat-associated non-AUG translation. J Biol Chem. 2019 Dec 6;294(49):18624-18638. doi: 10.1074/jbc.RA119.009951.\u003c/li\u003e\n\u003cli\u003eGu X, Yue D, Qiao K, Huang G, Zhu W, Xi J, et al. NOTCH2NLC-related ocu-lopharyngodistal myopathy type 3 with cardiomyopathy and nephropathy. Muscle Nerve. 2023;67(5):E18\u0026ndash;21. https://doi.org/10. 1002/mus. 27808\u003c/li\u003e\n\u003cli\u003eGu X, Jiao K, Yue D, Wang X, Qiao K, Gao M, Lin J, Sun C, Zhao C, Zhu W, Xi J. Intrafamilial phenotypic heterogeneity in GIPC1-related oculopharyngodistal myopathy type 2: a case report. Neuromuscul Disord. 2023 Sep;33(9):93-97.\u003c/li\u003e\n\u003cli\u003eGu X, Yu J, Jiao K, Deng J, Xia X, Qiao K, Yue D, Gao M, Zhao C, Dong J, Huang G, Shan J, Yan C, Di L, Da Y, Zhu W, Xi J, Wang Z. Non-coding CGG repeat expansion in LOC642361/NUTM2B-AS1 is associated with a phenotype of oculopharyngodistal myopathy. J Med Genet. 2024 Mar 21;61(4):340-346.\u003c/li\u003e\n\u003cli\u003eGuo S, Nguyen L, Ranum LPW. RAN proteins in neurodegenerative disease: Repeating themes and unifying therapeutic strategies. Curr Opin Neurobiol. 2022 Feb;72:160-170. doi: 10.1016/j.conb.2021.11.001.\u003c/li\u003e\n\u003cli\u003eGymrek M, Willems T, Guilmatre A, Zeng H, Markus B, Georgiev S, Daly MJ, Price AL, Pritchard JK, Sharp AJ, Erlich Y. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat Genet. 2016 Jan;48(1):22-9. doi: 10.1038/ng.3461.\u003c/li\u003e\n\u003cli\u003eHobara T, Ando M, Higuchi Y, Yuan JH, Yoshimura A, Kojima F, Noguchi Y, Takei J, Hiramatsu Y, Nozuma S, Nakamura T, Adachi T, Toyooka K, Yamashita T, Sakiyama Y, Hashiguchi A, Matsuura E, Okamoto Y, Takashima H. Linking LRP12 CGG repeat expansion to inherited peripheral neuropathy. J Neurol Neurosurg Psychiatry. 2024 Jul 16:jnnp-2024-333403.\u003c/li\u003e\n\u003cli\u003eIba\u0026ntilde;ez K, Jadhav B, Zanovello M, Gagliardi D, Clarkson C, Facchini S, Garg P, Martin-Trujillo A, Gies SJ, Galassi Deforie V, Dalmia A, Hensman Moss DJ, Vandrovcova J, Rocca C, Moutsianas L, Marini-Bettolo C, Walker H, Turner C, Shoai M, Long JD, Fratta P, Langbehn DR, Tabrizi SJ, Caulfield MJ, Cortese A, Escott-Price V, Hardy J, Houlden H, Sharp AJ, Tucci A. Increased frequency of repeat expansion mutations across different populations. Nat Med. 2024 Oct 1. doi: 10.1038/s41591-024-03190-5.\u003c/li\u003e\n\u003cli\u003eIngolia NT, Lareau LF, Weissman JS. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell. 2011 Nov 11;147(4):789-802. doi: 10.1016/j.cell.2011.10.002.\u003c/li\u003e\n\u003cli\u003eIshiura H, Doi K, Mitsui J, Yoshimura J, Matsukawa MK, Fujiyama A, Toyoshima Y, Kakita A, Takahashi H, Suzuki Y, Sugano S, Qu W, Ichikawa K, Yurino H, Higasa K, Shibata S, Mitsue A, Tanaka M, Ichikawa Y, Takahashi Y, Date H, Matsukawa T, Kanda J, Nakamoto FK, Higashihara M, Abe K, Koike R, Sasagawa M, Kuroha Y, Hasegawa N, Kanesawa N, Kondo T, Hitomi T, Tada M, Takano H, Saito Y, Sanpei K, Onodera O, Nishizawa M, Nakamura M, Yasuda T, Sakiyama Y, Otsuka M, Ueki A, Kaida KI, Shimizu J, Hanajima R, Hayashi T, Terao Y, Inomata-Terada S, Hamada M, Shirota Y, Kubota A, Ugawa Y, Koh K, Takiyama Y, Ohsawa-Yoshida N, Ishiura S, Yamasaki R, Tamaoka A, Akiyama H, Otsuki T, Sano A, Ikeda A, Goto J, Morishita S, Tsuji S. Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy. Nat Genet. 2018 Apr;50(4):581-590. doi: 10.1038/s41588-018-0067-2.\u003c/li\u003e\n\u003cli\u003eIshiura H, Shibata S, Yoshimura J, Suzuki Y, Qu W, Doi K, Almansour MA, Kikuchi JK, Taira M, Mitsui J, Takahashi Y, Ichikawa Y, Mano T, Iwata A, Harigaya Y, Matsukawa MK, Matsukawa T, Tanaka M, Shirota Y, Ohtomo R, Kowa H, Date H, Mitsue A, Hatsuta H, Morimoto S, Murayama S, Shiio Y, Saito Y, Mitsutake A, Kawai M, Sasaki T, Sugiyama Y, Hamada M, Ohtomo G, Terao Y, Nakazato Y, Takeda A, Sakiyama Y, Umeda-Kameyama Y, Shinmi J, Ogata K, Kohno Y, Lim SY, Tan AH, Shimizu J, Goto J, Nishino I, Toda T, Morishita S, Tsuji S. Noncoding CGG repeat expansions in neuronal intranuclear inclusion disease, oculopharyngodistal myopathy and an overlapping disease. Nat Genet. 2019 Aug;51(8):1222-1232.\u003c/li\u003e\n\u003cli\u003eIshiura H, Tsuji S, Toda T. Recent advances in CGG repeat diseases and a proposal of fragile X-associated tremor/ataxia syndrome, neuronal intranuclear inclusion disease, and oculophryngodistal myopathy (FNOP) spectrum disorder. J Hum Genet. 2023 Mar;68(3):169-174.\u003c/li\u003e\n\u003cli\u003eJadhav B, Garg P, van Vugt JJFA, Ibanez K, Gagliardi D, Lee W, Shadrina M, Mokveld T, Dolzhenko E, Martin-Trujillo A, Gies SJ, Altman G, Rocca C, Barbosa M, Jain M, Lahiri N, Lachlan K, Houlden H, Paten B; Genomics England Research Consortium; Project MinE ALS Sequencing Consortium; Veldink J, Tucci A, Sharp AJ. A phenome-wide association study of methylated GC-rich repeats identifies a GCC repeat expansion in AFF3 associated with intellectual disability. Nat Genet. 2024 Sep 23. doi: 10.1038/s41588-024-01917-1.\u003c/li\u003e\n\u003cli\u003eJi Z, Song R, Regev A, Struhl K. Many lncRNAs, 5\u0026apos;UTRs, and pseudogenes are translated and some are likely to express functional proteins. Elife. 2015 Dec 19;4:e08890.\u003c/li\u003e\n\u003cli\u003eJohnstone TG, Bazzini AA, Giraldez AJ. Upstream ORFs are prevalent translational repressors in vertebrates. EMBO J. 2016 Apr 1;35(7):706-23. doi: 10.15252/embj.201592759. Epub 2016 Feb 19.\u003c/li\u003e\n\u003cli\u003eKatoh M. Functional proteomics, human genetics and cancer biology of GIPC family members. Exp Mol Med. 2013 Jun 7;45(6):e26. doi: 10.1038/emm.2013.49.\u003c/li\u003e\n\u003cli\u003eKearse MG, Wilusz JE. Non-AUG translation: a new start for protein synthesis in eukaryotes. Genes Dev. 2017 Sep 1;31(17):1717-1731. doi: 10.1101/gad.305250.117.\u003c/li\u003e\n\u003cli\u003eKong HE, Lim J, Linsalata A, Kang Y, Malik I, Allen EG, Cao Y, Shubeck L, Johnston R, Huang Y, Gu Y, Guo X, Zwick ME, Qin Z, Wingo TS, Juncos J, Nelson DL, Epstein MP, Cutler DJ, Todd PK, Sherman SL, Warren ST, Jin P. Identification of PSMB5 as a genetic modifier of fragile X-associated tremor/ataxia syndrome. Proc Natl Acad Sci U S A. 2022 May 31;119(22):e2118124119. doi: 10.1073/pnas.2118124119.\u003c/li\u003e\n\u003cli\u003eKume K, Kurashige T, Muguruma K, Morino H, Tada Y, Kikumoto M, Miyamoto T, Akutsu SN, Matsuda Y, Matsuura S, Nakamori M, Nishiyama A, Izumi R, Niihori T, Ogasawara M, Eura N, Kato T, Yokomura M, Nakayama Y, Ito H, Nakamura M, Saito K, Riku Y, Iwasaki Y, Maruyama H, Aoki Y, Nishino I, Izumi Y, Aoki M, Kawakami H. CGG repeat expansion in LRP12 in amyotrophic lateral sclerosis. Am J Hum Genet. 2023 Jul 6;110(7):1086-1097.\u003c/li\u003e\n\u003cli\u003eKumutpongpanich, T., Ogasawara, M., Ozaki, A., Ishiura, H., Tsuji, S., Minami, N., et al.. Clinicopathologic Features of Oculopharyngodistal Myopathy with LRP12 CGG Repeat Expansions Compared with Other Oculopharyngodistal Myopathy Subtypes. JAMA Neurol. (2021) 78 (7), 853\u0026ndash;863.\u003c/li\u003e\n\u003cli\u003eKozak M. Context effects and inefficient initiation at non-AUG codons in eucaryotic cell-free translation systems. Mol Cell Biol. 1989 Nov;9(11):5073-80. doi: 10.1128/mcb.9.11.5073-5080.1989.\u003c/li\u003e\n\u003cli\u003eLander ES et al., Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921. doi: 10.1038/35057062.\u003c/li\u003e\n\u003cli\u003eLee S, Liu B, Lee S, Huang SX, Shen B, Qian SB. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proc Natl Acad Sci U S A. 2012 Sep 11;109(37):E2424-32. doi: 10.1073/pnas.1207846109.\u003c/li\u003e\n\u003cli\u003eLi Z, Liu L, Feng C, Qin Y, Xiao J, Zhang Z, Ma L. LncBook 2.0: integrating human long non-coding RNAs with multi-omics annotations. Nucleic Acids Res. 2023 Jan 6;51(D1):D186-D191.\u003c/li\u003e\n\u003cli\u003eLieberman AP, Shakkottai VG, Albin RL. Polyglutamine Repeats in Neurodegenerative Diseases. Annu Rev Pathol. 2019 Jan 24;14:1-27. doi: 10.1146/annurev-pathmechdis-012418-012857.\u003c/li\u003e\n\u003cli\u003eLicata NV, Cristofani R, Salomonsson S, Wilson KM, Kempthorne L, Vaizoglu D, D\u0026apos;Agostino VG, Pollini D, Loffredo R, Pancher M, Adami V, Bellosta P, Ratti A, Viero G, Quattrone A, Isaacs AM, Poletti A, Provenzani A. C9orf72 ALS/FTD dipeptide repeat protein levels are reduced by small molecules that inhibit PKA or enhance protein degradation. EMBO J. 2022 Jan 4;41(1):e105026. doi: 10.15252/embj.2020105026\u003c/li\u003e\n\u003cli\u003eLiu Q, Zhang K, Kang Y, Li Y, Deng P, Li Y, Tian Y, Sun Q, Tang Y, Xu K, Zhou Y, Wang JL, Guo J, Li JD, Xia K, Meng Q, Allen EG, Wen Z, Li Z, Jiang H, Shen L, Duan R, Yao B, Tang B, Jin P, Pan Y. Expression of expanded GGC repeats within NOTCH2NLC causes behavioral deficits and neurodegeneration in a mouse model of neuronal intranuclear inclusion disease. Sci Adv. 2022 Nov 25;8(47):eadd6391. doi: 10.1126/sciadv.add6391.\u003c/li\u003e\n\u003cli\u003eLiufu T, Zheng Y, Yu J, Yuan Y, Wang Z, Deng J, Hong D. The polyG diseases: a new disease entity. Acta Neuropathol Commun. 2022 May 31;10(1):79.\u003c/li\u003e\n\u003cli\u003eLorusso Marina, Pepe Antonietta, Ibris Neluta, Bochicchio Brigida. Molecular and supramolecular studies on polyglycine and poly-l-proline. Soft Matter 2011, 7 (13) , 6327.\u003c/li\u003e\n\u003cli\u003eLu H, Luan X, Yuan Y, Dong M, Sun W, Yan C. The clinical and myopathological features of oculopharyngodistal myopathy in a Chinese family. Neuropathology. 2008 Dec;28(6):599-603. doi: 10.1111/j.1440-1789.2008.00924.x.\u003c/li\u003e\n\u003cli\u003eMalik I, Kelley CP, Wang ET, Todd PK. Molecular mechanisms underlying nucleotide repeat expansion disorders. Nat Rev Mol Cell Biol. 2021 Sep;22(9):589-607.\u003c/li\u003e\n\u003cli\u003eMalik I, Tseng YJ, Wright SE, Zheng K, Ramaiyer P, Green KM, Todd PK. SRSF protein kinase 1 modulates RAN translation and suppresses CGG repeat toxicity. EMBO Mol Med. 2021 Nov 8;13(11):e14163. doi: 10.15252/emmm.202114163.\u003c/li\u003e\n\u003cli\u003eMatsui T, Ohbayashi N, Fukuda M. The Rab interacting lysosomal protein (RILP) homology domain functions as a novel effector domain for small GTPase Rab36: Rab36 regulates retrograde melanosome transport in melanocytes. J Biol Chem. 2012 Aug 17;287(34):28619-31.\u003c/li\u003e\n\u003cli\u003eMenzies FM, Fleming A, Caricasole A, Bento CF, Andrews SP, Ashkenazi A, F\u0026uuml;llgrabe J, Jackson A, Jimenez Sanchez M, Karabiyik C, Licitra F, Lopez Ramirez A, Pavel M, Puri C, Renna M, Ricketts T, Schlotawa L, Vicinanza M, Won H, Zhu Y, Skidmore J, Rubinsztein DC. Autophagy and Neurodegeneration: Pathogenic Mechanisms and Therapeutic Opportunities. Neuron. 2017 Mar 8;93(5):1015-1034. doi: 10.1016/j.neuron.2017.01.022\u003c/li\u003e\n\u003cli\u003eMessaed C, Rouleau GA. Molecular mechanisms underlying polyalanine diseases. Neurobiol Dis. 2009 Jun;34(3):397-405. doi: 10.1016/j.nbd.2009.02.013.\u003c/li\u003e\n\u003cli\u003eMori K, Gotoh S, Yamashita T, Uozumi R, Kawabe Y, Tagami S, Kamp F, Nuscher B, Edbauer D, Haass C, Nagai Y, Ikeda M. The porphyrin TMPyP4 inhibits elongation during the noncanonical translation of the FTLD/ALS-associated GGGGCC repeat in the C9orf72 gene. J Biol Chem. 2021 Oct;297(4):101120. doi: 10.1016/j.jbc.2021.101120.\u003c/li\u003e\n\u003cli\u003eMudge JM, Ruiz-Orera J, Prensner JR, Brunet MA, Calvet F, Jungreis I, Gonzalez JM, Magrane M, Martinez TF, Schulz JF, Yang YT, Alb\u0026agrave; MM, Aspden JL, Baranov PV, Bazzini AA, Bruford E, Martin MJ, Calviello L, Carvunis AR, Chen J, Couso JP, Deutsch EW, Flicek P, Frankish A, Gerstein M, Hubner N, Ingolia NT, Kellis M, Menschaert G, Moritz RL, Ohler U, Roucou X, Saghatelian A, Weissman JS, van Heesch S. Standardized annotation of translated open reading frames. Nat Biotechnol. 2022 Jul;40(7):994-999.\u003c/li\u003e\n\u003cli\u003eMurayama A, Nagaoka U, Sugaya K, Shimazaki R, Miyamoto K, Matsubara S, Ogasawara M, Iida A, Nishino I, Takahashi K. Sequential development of parkinsonism in two patients with oculopharyngodistal type myopathy in GIPC1-related repeat expansion disorder. Neuromuscul Disord. 2024 Nov;44:104465. doi: 10.1016/j.nmd.2024.104465.\u003c/li\u003e\n\u003cli\u003eNassar LR, Barber GP, Benet-Pag\u0026egrave;s A, Casper J, Clawson H, Diekhans M, Fischer C, Gonzalez JN, Hinrichs AS, Lee BT, Lee CM, Muthuraman P, Nguy B, Pereira T, Nejad P, Perez G, Raney BJ, Schmelter D, Speir ML, Wick BD, Zweig AS, Haussler D, Kuhn RM, Haeussler M, Kent WJ. The UCSC Genome Browser database: 2023 update. Nucleic Acids Res. 2023 Jan 6;51(D1):D1188-D1195. \u003c/li\u003e\n\u003cli\u003eNurk et al., The complete sequence of a human genome.Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987.\u003c/li\u003e\n\u003cli\u003eOfer N, Weisman-Shomer P, Shklover J, Fry M. The quadruplex r(CGG)n destabilizing cationic porphyrin TMPyP4 cooperates with hnRNPs to increase the translation efficiency of fragile X premutation mRNA. Nucleic Acids Res. 2009 May;37(8):2712-22. doi: 10.1093/nar/gkp130.\u003c/li\u003e\n\u003cli\u003eOgasawara M, Iida A, Kumutpongpanich T, Ozaki A, Oya Y, Konishi H, Nakamura A, Abe R, Takai H, Hanajima R, Doi H, Tanaka F, Nakamura H, Nonaka I, Wang Z, Hayashi S, Noguchi S, Nishino I. CGG expansion in NOTCH2NLC is associated with oculopharyngodistal myopathy with neurological manifestations. Acta Neuropathol Commun. 2020 Nov 25;8(1):204.\u003c/li\u003e\n\u003cli\u003eOgasawara M, Eura N, Nagaoka U, Sato T, Arahata H, Hayashi T, Okamoto T, Takahashi Y, Mori-Yoshimura M, Oya Y, Nakamura A, Shimazaki R, Sano T, Kumutpongpanich T, Minami N, Hayashi S, Noguchi S, Iida A, Takao M, Nishino I. Intranuclear inclusions in skin biopsies are not limited to neuronal intranuclear inclusion disease but can also be seen in oculopharyngodistal myopathy. Neuropathol Appl Neurobiol. 2022 Apr;48(3):e12787. doi: 10.1111/nan.12787.\u003c/li\u003e\n\u003cli\u003eOgasawara M, Eura N, Iida A, Kumutpongpanich T, Minami N, Nonaka I, Hayashi S, Noguchi S, Nishino I. Intranuclear inclusions in muscle biopsy can differentiate oculopharyngodistal myopathy and oculopharyngeal muscular dystrophy. Acta Neuropathol Commun. 2022 Dec 7;10(1):176. doi: 10.1186/s40478-022-01482-w\u003c/li\u003e\n\u003cli\u003eOyer CE, Cortez S, O\u0026rsquo;Shea P, Popovic M. Cardiomyopathy and myocyte intranuclear inclusions in neuronal intranuclear inclusion disease: a case report. Hum Pathol. 1991;22(7):722\u0026ndash;4. doi. org/10. 1016/0046-8177(91) 90296-2.\u003c/li\u003e\n\u003cli\u003ePalmer JE, Wilson N, Son SM, Obrocki P, Wrobel L, Rob M, Takla M, Korolchuk VI, Rubinsztein DC. Autophagy, aging, and age-related neurodegeneration. Neuron. 2025 Jan 8;113(1):29-48. doi: 10.1016/j.neuron.2024.09.015.\u003c/li\u003e\n\u003cli\u003ePan Y, Xue J, Chen J, Zhang X, Tu T, Xiao Q, Huang W, Liu Q, Zhu L, Li J, Zhou X, Xu Q, Sun Q, Tan J, Yan X, Li J, Guo J, Tang B, Duan R, Liu Z. Assessment of GGC Repeat Expansion in GIPC1 in Patients with Parkinson\u0026apos;s Disease. Mov Disord. 2022 Jul;37(7):1557-1559.\u003c/li\u003e\n\u003cli\u003ePan Y, Jiang Y, Wan J, Hu Z, Jiang H, Shen L, Tang B, Tian Y, Liu Q. Expression of expanded GGC repeats within NOTCH2NLC causes cardiac dysfunction in mouse models. Cell Biosci. 2023 Aug 29;13(1):157. doi: 10.1186/s13578-023-01111-6.\u003c/li\u003e\n\u003cli\u003ePaulson HL, Shakkottai VG, Clark HB, Orr HT. Polyglutamine spinocerebellar ataxias - from genes to potential treatments. Nat Rev Neurosci. 2017 Oct;18(10):613-626. doi: 10.1038/nrn.2017.92.\u003c/li\u003e\n\u003cli\u003ePeabody DS. Translation initiation at non-AUG triplets in mammalian cells. J Biol Chem. 1989 Mar 25;264(9):5031-5.\u003c/li\u003e\n\u003cli\u003ePellerin D, Danzi MC, Wilke C, Renaud M, Fazal S, Dicaire MJ, Scriba CK, Ashton C, Yanick C, Beijer D, Rebelo A, Rocca C, Jaunmuktane Z, Sonnen JA, Larivi\u0026egrave;re R, Gen\u0026iacute;s D, Molina Porcel L, Choquet K, Sakalla R, Provost S, Robertson R, Allard-Chamard X, T\u0026eacute;treault M, Reiling SJ, Nagy S, Nishadham V, Purushottam M, Vengalil S, Bardhan M, Nalini A, Chen Z, Mathieu J, Massie R, Chalk CH, Lafontaine AL, Evoy F, Rioux MF, Ragoussis J, Boycott KM, Dub\u0026eacute; MP, Duquette A, Houlden H, Ravenscroft G, Laing NG, Lamont PJ, Saporta MA, Sch\u0026uuml;le R, Sch\u0026ouml;ls L, La Piana R, Synofzik M, Zuchner S, Brais B. Deep Intronic FGF14 GAA Repeat Expansion in Late-Onset Cerebellar Ataxia. N Engl J Med. 2023 Jan 12;388(2):128-141. doi: 10.1056/NEJMoa2207406.\u003c/li\u003e\n\u003cli\u003ePlumley JA, Tsai MI, Dannenberg JJ. Aggregation of capped hexaglycine strands into hydrogen-bonding motifs representative of pleated and rippled \u0026beta;-sheets, collagen, and polyglycine I and II crystal structures. A density functional theory study. J Phys Chem B. 2011 Feb 17;115(6):1562-70.\u003c/li\u003e\n\u003cli\u003ePongpakdee S, Apiwattanakul M, Termglinchan T, Witoonpanich R, Dejthevaporn C, Lee T, Wansophonkul S, Yamanaka A, Funaguma S, Lida A, Nishino I. CGG/CCG Repeat Expansions in LOC642361/NUTM2B-AS1 in Thai Patients With Oculopharyngodistal Myopathy. Neurol Genet. 2024 Jul 8;10(4):e200170. doi: 10.1212/NXG.0000000000200170.\u003c/li\u003e\n\u003cli\u003eRafehi H, Szmulewicz DJ, Bennett MF, Sobreira NLM, Pope K, Smith KR, Gillies G, Diakumis P, Dolzhenko E, Eberle MA, Barcina MG, Breen DP, Chancellor AM, Cremer PD, Delatycki MB, Fogel BL, Hackett A, Halmagyi GM, Kapetanovic S, Lang A, Mossman S, Mu W, Patrikios P, Perlman SL, Rosemergy I, Storey E, Watson SRD, Wilson MA, Zee DS, Valle D, Amor DJ, Bahlo M, Lockhart PJ. Bioinformatics-Based Identification of Expanded Repeats: A Non-reference Intronic Pentamer Expansion in RFC1 Causes CANVAS. Am J Hum Genet. 2019 Jul 3;105(1):151-165.\u003c/li\u003e\n\u003cli\u003eRaj A, Wang SH, Shim H, Harpak A, Li YI, Engelmann B, Stephens M, Gilad Y, Pritchard JK. Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling. Elife. 2016 May 27;5:e13328.\u003c/li\u003e\n\u003cli\u003eSaito R, Shimizu H, Miura T, Hara N, Mezaki N, Higuchi Y, Miyashita A, Kawachi I, Sanpei K, Honma Y, Onodera O, Ikeuchi T, Kakita A. Oculopharyngodistal myopathy with coexisting histology of systemic neuronal intranuclear inclusion disease: Clinicopathologic features of an autopsied patient harboring CGG repeat expansions in LRP12. Acta Neuropathol Commun. 2020 Jun 3;8(1):75.\u003c/li\u003e\n\u003cli\u003eSatoyoshi E, Kinoshita M. Oculopharyngodistal myopathy. Arch Neurol. 1977 Feb;34(2):89-92. doi: 10.1001/archneur.1977.00500140043007.\u003c/li\u003e\n\u003cli\u003eSellier C, Buijsen RAM, He F, Natla S, Jung L, Tropel P, Gaucherot A, Jacobs H, Meziane H, Vincent A, Champy MF, Sorg T, Pavlovic G, Wattenhofer-Donze M, Birling MC, Oulad-Abdelghani M, Eberling P, Ruffenach F, Joint M, Anheim M, Martinez-Cerdeno V, Tassone F, Willemsen R, Hukema RK, Viville S, Martinat C, Todd PK, Charlet-Berguerand N. Translation of Expanded CGG Repeats into FMRpolyG Is Pathogenic and May Contribute to Fragile X Tremor Ataxia Syndrome. Neuron. 2017 Jan 18;93(2):331-347.\u003c/li\u003e\n\u003cli\u003eShi Y, Cao C, Zeng Y, Ding Y, Chen L, Zheng F, Chen X, Zhou F, Yang X, Li J, Xu L, Xu G, Lin M, Ishiura H, Tsuji S, Wang N, Wang Z, Chen WJ, Yang K. CGG repeat expansion in LOC642361/NUTM2B-AS1 typically presents as oculopharyngodistal myopathy. J Genet Genomics. 2024 Feb;51(2):184-196.\u003c/li\u003e\n\u003cli\u003eShi Y, Niu Y, Zhang P, Luo H, Liu S, Zhang S, Wang J, Li Y, Liu X, Song T, Xu T, He S. Characterization of genome-wide STR variation in 6487 human genomes. Nat Commun. 2023 Apr 12;14(1):2092. doi: 10.1038/s41467-023-37690-8.\u003c/li\u003e\n\u003cli\u003eSone J, Mori K, Inagaki T, Katsumata R, Takagi S, Yokoi S, Araki K, Kato T, Nakamura T, Koike H, Takashima H, Hashiguchi A, Kohno Y, Kurashige T, Kuriyama M, Takiyama Y, Tsuchiya M, Kitagawa N, Kawamoto M, Yoshimura H, Suto Y, Nakayasu H, Uehara N, Sugiyama H, Takahashi M, Kokubun N, Konno T, Katsuno M, Tanaka F, Iwasaki Y, Yoshida M, Sobue G. Clinicopathological features of adult-onset neuronal intranuclear inclusion disease. Brain. 2016 Dec;139(Pt 12):3170-3186.\u003c/li\u003e\n\u003cli\u003eSone J, Mitsuhashi S, Fujita A, Mizuguchi T, Hamanaka K, Mori K, Koike H, Hashiguchi A, Takashima H, Sugiyama H, Kohno Y, Takiyama Y, Maeda K, Doi H, Koyano S, Takeuchi H, Kawamoto M, Kohara N, Ando T, Ieda T, Kita Y, Kokubun N, Tsuboi Y, Katoh K, Kino Y, Katsuno M, Iwasaki Y, Yoshida M, Tanaka F, Suzuki IK, Frith MC, Matsumoto N, Sobue G. Long-read sequencing identifies GGC repeat expansions in NOTCH2NLC associated with neuronal intranuclear inclusion disease. Nat Genet. 2019 Aug;51(8):1215-1221. \u003c/li\u003e\n\u003cli\u003eSteger M, Diez F, Dhekne HS, Lis P, Nirujogi RS, Karayel O, Tonelli F, Martinez TN, Lorentzen E, Pfeffer SR, Alessi DR, Mann M. Systematic proteomic analysis of LRRK2-mediated Rab GTPase phosphorylation establishes a connection to ciliogenesis. Elife. 2017 Nov 10;6:e31012.\u003c/li\u003e\n\u003cli\u003eStringer C, Wang T, Michaelos M, Pachitariu M. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods. 2021 Jan;18(1):100-106. doi: 10.1038/s41592-020-01018-x.\u003c/li\u003e\n\u003cli\u003eStoyas CA, La Spada AR. The CAG-polyglutamine repeat diseases: a clinical, molecular, genetic, and pathophysiologic nosology. Handb Clin Neurol. 2018;147:143-170. doi: 10.1016/B978-0-444-63233-3.00011-7.\u003c/li\u003e\n\u003cli\u003eTabebordbar M, Lagerborg KA, Stanton A, King EM, Ye S, Tellez L, Krunnfusz A, Tavakoli S, Widrick JJ, Messemer KA, Troiano EC, Moghadaszadeh B, Peacker BL, Leacock KA, Horwitz N, Beggs AH, Wagers AJ, Sabeti PC. Directed evolution of a family of AAV capsid variants enabling potent muscle-directed gene delivery across species. Cell. 2021 Sep 16;184(19):4919-4938.e22. doi: 10.1016/j.cell.2021.08.028.\u003c/li\u003e\n\u003cli\u003eTakahashi-Fujigasaki J, Nakano Y, Uchino A, Murayama S. Adult-onset neuronal intranuclear hyaline inclusion disease is not rare in older adults. Geriatr Gerontol Int. 2016 Mar;16 Suppl 1:51-6.\u003c/li\u003e\n\u003cli\u003eTang H, Xiong Y, Jiang K, Shen Y, Yu Y, Huang P, Zhu M, Li X, Zheng Y, Zhou M, Yu J, Deng J, Wang Z, Hong D, Qiu Y, Tan D. Clinical and pathological characteristics of OPDM4 patients in advanced disease. Muscle Nerve. 2024 Jul 23. doi: 10.1002/mus.28200. Online ahead of print.\u003c/li\u003e\n\u003cli\u003eTian Y, Wang JL, Huang W, Zeng S, Jiao B, Liu Z, Chen Z, Li Y, Wang Y, Min HX, Wang XJ, You Y, Zhang RX, Chen XY, Yi F, Zhou YF, Long HY, Zhou CJ, Hou X, Wang JP, Xie B, Liang F, Yang ZY, Sun QY, Allen EG, Shafik AM, Kong HE, Guo JF, Yan XX, Hu ZM, Xia K, Jiang H, Xu HW, Duan RH, Jin P, Tang BS, Shen L. Expansion of Human-Specific GGC Repeat in Neuronal Intranuclear Inclusion Disease-Related Disorders. Am J Hum Genet. 2019 Jul 3;105(1):166-176.\u003c/li\u003e\n\u003cli\u003eTodd PK, Oh SY, Krans A, He F, Sellier C, Frazer M, Renoux AJ, Chen KC, Scaglione KM, Basrur V, Elenitoba-Johnson K, Vonsattel JP, Louis ED, Sutton MA, Taylor JP, Mills RE, Charlet-Berguerand N, Paulson HL. CGG repeat-associated translation mediates neurodegeneration in fragile X tremor ataxia syndrome. Neuron. 2013 May 8;78(3):440-55.\u003c/li\u003e\n\u003cli\u003eUyama E, Uchino M, Chateau D, Tom\u0026eacute; FM. Autosomal recessive oculopharyngodistal myopathy in light of distal myopathy with rimmed vacuoles and oculopharyngeal muscular dystrophy. Neuromuscul Disord. 1998 Apr;8(2):119-25. doi: 10.1016/s0960-8966(98)00002-9\u003c/li\u003e\n\u003cli\u003evan der Sluijs BM, ter Laak HJ, Scheffer H, van der Maarel SM, van Engelen BG. Autosomal recessive oculopharyngodistal myopathy: a distinct phenotypical, histological, and genetic entity. J Neurol Neurosurg Psychiatry. 2004 Oct;75(10):1499-501. doi: 10.1136/jnnp.2003.025072.\u003c/li\u003e\n\u003cli\u003eVegezzi E, Ishiura H, Bragg DC, Pellerin D, Magrinelli F, Curr\u0026ograve; R, Facchini S, Tucci A, Hardy J, Sharma N, Danzi MC, Zuchner S, Brais B, Reilly MM, Tsuji S, Houlden H, Cortese A. Neurological disorders caused by novel non-coding repeat expansions: clinical features and differential diagnosis. Lancet Neurol. 2024 Jul;23(7):725-739. doi: 10.1016/S1474-4422(24)00167-4.\u003c/li\u003e\n\u003cli\u003eVenter et al., The sequence of the human genome.Science. 2001 Feb 16;291(5507):1304-51. doi: 10.1126/science.1058040.\u003c/li\u003e\n\u003cli\u003eVerbiest M, Maksimov M, Jin Y, Anisimova M, Gymrek M, Bilgin Sonay T. Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species. J Evol Biol. 2023 Feb;36(2):321-336. doi: 10.1111/jeb.14106.\u003c/li\u003e\n\u003cli\u003eWallenius J, Kafantari E, Jhaveri E, Gorcenco S, Ameur A, Karremo C, Dobloug S, Karrman K, de Koning T, Ilinca A, Landqvist Wald\u0026ouml; M, Arvidsson A, Persson S, Englund E, Ehrencrona H, Puschmann A. Exonic trinucleotide repeat expansions in ZFHX3 cause spinocerebellar ataxia type 4: A poly-glycine disease. Am J Hum Genet. 2023 Nov 28:S0002-9297(23)00403-2.\u003c/li\u003e\n\u003cli\u003eWright BW, Yi Z, Weissman JS, Chen J. The dark proteome: translation from noncanonical open reading frames. Trends Cell Biol. 2022 Mar;32(3):243-258.\u003c/li\u003e\n\u003cli\u003eWright SE, Todd PK. Native functions of short tandem repeats. Elife. 2023 Mar 20;12:e84043. doi: 10.7554/eLife.84043.\u003c/li\u003e\n\u003cli\u003eXi J, Wang X, Yue D, Dou T, Wu Q, Lu J, Liu Y, Yu W, Qiao K, Lin J, Luo S, Li J, Du A, Dong J, Chen Y, Luo L, Yang J, Niu Z, Liang Z, Zhao C, Lu J, Zhu W, Zhou Y. 5\u0026apos; UTR CGG repeat expansion in GIPC1 is associated with oculopharyngodistal myopathy. Brain. 2021 Mar 3;144(2):601-614.\u003c/li\u003e\n\u003cli\u003eYang X, Zhang D, Shen S, Li P, Li M, Niu J, Ma D, Xu D, Li S, Guo X, Wang Z, Zhao Y, Ren H, Ling C, Wang Y, Fan Y, Shen J, Zhu Y, Wang D, Cui L, Chen L, Shi C, Dai Y. A large pedigree study confirmed the CGG repeat expansion of RILPL1 Is associated with oculopharyngodistal myopathy. BMC Med Genomics. 2023 Oct 20;16(1):253. doi: 10.1186/s12920-023-01586-9.\u003c/li\u003e\n\u003cli\u003eYeetong P, Pongpanich M, Srichomthong C, Assawapitaksakul A, Shotelersuk V, Tantirukdham N, Chunharas C, Suphapeetiporn K, Shotelersuk V. TTTCA repeat insertions in an intron of YEATS2 in benign adult familial myoclonic epilepsy type 4. Brain. 2019 Nov 1;142(11):3360-3366. doi: 10.1093/brain/awz267.\u003c/li\u003e\n\u003cli\u003eYu J, Deng J, Guo X, Shan J, Luan X, Cao L, Zhao J, Yu M, Zhang W, Lv H, Xie Z, Meng L, Zheng Y, Zhao Y, Gang Q, Wang Q, Liu J, Zhu M, Zhou B, Li P, Liu Y, Wang Y, Yan C, Hong D, Yuan Y, Wang Z. The GGC repeat expansion in NOTCH2NLC is associated with oculopharyngodistal myopathy type 3. Brain. 2021 Mar 9:awab077. doi: 10.1093/brain/awab077. \u003c/li\u003e\n\u003cli\u003eYu J, Liufu T, Zheng Y, Xu J, Meng L, Zhang W, Yuan Y, Hong D, Charlet-Berguerand N, Wang Z, Deng J. CGG repeat expansion in NOTCH2NLC causes mitochondrial dysfunction and progressive neurodegeneration in Drosophila model. Proc Natl Acad Sci U S A. 2022 Oct 11;119(41):e2208649119. doi: 10.1073/pnas.2208649119.\u003c/li\u003e\n\u003cli\u003eYu J, Shan J, Yu M, Di L, Xie Z, Zhang W, Lv H, Meng L, Zheng Y, Zhao Y, Gang Q, Guo X, Wang Y, Xi J, Zhu W, Da Y, Hong D, Yuan Y, Yan C, Wang Z, Deng J. The CGG repeat expansion in RILPL1 is associated with oculopharyngodistal myopathy type 4. Am J Hum Genet. 2022 Mar 3;109(3):533-541. doi: 10.1016/j.ajhg.2022.01.012.\u003c/li\u003e\n\u003cli\u003eZamiri B, Reddy K, Macgregor RB Jr, Pearson CE. TMPyP4 porphyrin distorts RNA G-quadruplex structures of the disease-associated r(GGGGCC)n repeat of the C9orf72 gene and blocks interaction of RNA-binding proteins. J Biol Chem. 2014 Feb 21;289(8):4653-9. doi: 10.1074/jbc.C113.502336.\u003c/li\u003e\n\u003cli\u003eZeng YH, Yang K, Du GQ, Chen YK, Cao CY, Qiu YS, He J, Lv HD, Qu QQ, Chen JN, Xu GR, Chen L, Zheng FZ, Zhao M, Lin MT, Chen WJ, Hu J, Wang ZQ, Wang N. GGC Repeat Expansion of RILPL1 is Associated with Oculopharyngodistal Myopathy. Ann Neurol. 2022 Sep;92(3):512-526. doi: 10.1002/ana.26436. Epub 2022 Jul 2. PMID: 35700120\u003c/li\u003e\n\u003cli\u003eZhao J, Liu J, Xiao J, Du J, Que C, Shi X, Liang W, Sun W, Zhang W, Lv H, Yuan Y, Wang Z. Clinical and muscle imaging findings in 14 mainland chinese patients with oculopharyngodistal myopathy. PLoS One. 2015 Jun 3;10(6):e0128629.\u003c/li\u003e\n\u003cli\u003eZhong S, Lian Y, Luo W, Luo R, Wu X, Ji J, Ji Y, Ding J, Wang X. Upstream open reading frame with NOTCH2NLC GGC expansion generates polyglycine aggregates and disrupts nucleocytoplasmic transport: implications for polyglycine diseases. Acta Neuropathol. 2021 Dec;142(6):1003-1023. doi: 10.1007/s00401-021-02375-3.\u003c/li\u003e\n\u003cli\u003eZiaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E, Qiu Y, Kakembo FE, Joseph H, Onyido B, Adeyemi J, Bakhtiari M, Park J, Javadzadeh S, Jjingo D, Adebiyi E, Bafna V, Gymrek M. A deep population reference panel of tandem repeat variation. Nat Commun. 2023 Oct 23;14(1):6711. doi: 10.1038/s41467-023-42278-3.\u003c/li\u003e\n\u003cli\u003eZu, T., Gibbens, B., Doty, N. S., Gomes-Pereira, M., Huguet, A., Stone, M. D., Margolis, J., Peterson, M., Markowski, T. W., Ingram, M. A., et al. (2011). Non-ATG-initiated translation directed by microsatellite expansions. Proc Natl Acad Sci U S A 108(1):260-5.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Trinucleotide repeat disorder, non-coding sequences, non-canonical translation, genetic diseases, muscle disorders, neurodegeneration.","lastPublishedDoi":"10.21203/rs.3.rs-6122917/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6122917/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe vast majority of the human genome is non-coding with one-half composed of repeated DNA elements, including microsatellites that are short repeated sequences of 1 to 6 nucleotides. Expansion of a subset of these microsatellites is the leading cause of over 60 neurological diseases. However, most of these short tandem repeat expansions are located in sequences annotated as non-coding, thus questioning how these mutations are pathogenic.\u003c/p\u003e \u003cp\u003eHere, we found that GGC repeat expansions causing various neurological diseases, including oculopharyngodistal myopathy with or without leukoencephalopathy (OPDM/OPML) and neuronal intranuclear inclusion disease (NIID), while embedded in sequences considered as non-coding, are in reality located within small and previously unrecognized ORFs, resulting in their translation into novel and diverse polyglycine-containing proteins. Antibodies developed against these proteins stain the p62-positive inclusions typical of these diseases. Importantly, the sole expression of these polyglycine-containing proteins recapitulates key features of OPDM/OPML/NIID, namely the formation of p62-positive protein aggregates and locomotor and skeletal muscle alterations associated with neurodegeneration in cell, fly and mouse models. Moreover, these polyglycine proteins show unexpected variations in their interactants, half-life, aggregation and toxicity. These results stress a key role of the specific ORF sequences hosting the GGC repeats to modulate the aggregation and toxic properties of their central polyglycine core. Finally, we identified a pharmacological compound targeting expression of these polyglycine proteins, raising hope to develop a common therapy for these neuromuscular and neurodegenerative diseases.\u003c/p\u003e \u003cp\u003eOverall, these results uncover a common and unified pathogenic mechanism for diverse neurological diseases where expansions of GGC repeats are translated in novel and toxic polyglycine-containing proteins driving formation of aggregates, as well as neuronal and muscle cell dysfunctions. Moreover, this work highlights the complexity and richness of the human \u0026ldquo;dark\u0026rdquo; proteome and the importance of mutations in yet unrecognized small ORFs resulting in expression of novel and pathogenic proteins in human pathologies.\u003c/p\u003e","manuscriptTitle":"Microsatellite expansions hidden within the human dark genome are translated in novel and toxic proteins causing muscle and neurodegenerative diseases","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-17 12:51:43","doi":"10.21203/rs.3.rs-6122917/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-genetics","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"ng","sideBox":"Learn more about [Nature Genetics](http://www.nature.com/ng/)","snPcode":"","submissionUrl":"","title":"Nature Genetics","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"00d8bf6f-3f1b-459b-995b-6862ebf079ae","owner":[],"postedDate":"April 17th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":45022694,"name":"Health sciences/Medical research/Experimental models of disease"},{"id":45022695,"name":"Biological sciences/Genetics/Mutation/Genomic instability/Microsatellite instability"},{"id":45022696,"name":"Biological sciences/Molecular biology/Non-coding RNAs/Long non-coding RNAs"},{"id":45022697,"name":"Biological sciences/Neuroscience/Diseases of the nervous system/Neurodegeneration"}],"tags":[],"updatedAt":"2026-02-18T08:12:29+00:00","versionOfRecord":{"articleIdentity":"rs-6122917","link":"https://doi.org/10.1038/s41588-026-02507-z","journal":{"identity":"nature-genetics","isVorOnly":false,"title":"Nature Genetics"},"publishedOn":"2026-02-17 05:00:00","publishedOnDateReadable":"February 17th, 2026"},"versionCreatedAt":"2025-04-17 12:51:43","video":"","vorDoi":"10.1038/s41588-026-02507-z","vorDoiUrl":"https://doi.org/10.1038/s41588-026-02507-z","workflowStages":[]},"version":"v1","identity":"rs-6122917","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6122917","identity":"rs-6122917","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00