Optimizing Single-Cell Long-Read Sequencing for Enhanced Isoform Detection in Pancreatic Islets

doi:10.1101/2025.04.30.651101

Optimizing Single-Cell Long-Read Sequencing for Enhanced Isoform Detection in Pancreatic Islets

2025 · doi:10.1101/2025.04.30.651101

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 45,504 characters · extracted from oa-pdf · 6 sections · click to expand

Keywords

single cell long read RNA-sequencing, transcriptomics, islet biology, RNA isoforms, RNA splicing Word count: 2,614 # Figures: 3 # Tables: 0 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 2

Abstract

Alternative splicing is an essential mechanism for generating protein diversity by producing distinct isoforms from a single gene. Dysregulation of splicing that aYects pancreatic function, and immune tolerance has been linked to both type 1 and type 2 diabetes. Next- generation sequencing technologies, with their short read lengths, are limited in their ability to accurately detect splice variants. Long-read sequencing technologies oYer the potential to overcome these limitations by providing full-length transcript information; however, their application in single -cell RNA sequencing has been hindered by technical challenges, including insuYicient read lengths and higher error rates. Furthermore, cell types that produce high levels of a single transcript, such as islet endocrine cells, can obscure identiﬁcation of lower abundance transcripts. In this study, we optimized a protocol for single-cell long-read sequencing in pancreatic islets to improve read length and transcript detection. Our ﬁndings demonstrate that 5’ library preparation protocols outperform 3’ protocols, resulting in better transcript identiﬁcation. Furthermore, we show that targeted depletion of insulin transcripts enhances the detection of informative reads, highlighting the utility of transcript depletion strategies . This optimized protocol enables isoform -speciﬁc gene expression analysis and reveals diYerential transcript usage across the various cell types in pancreatic islets. By leveraging this approach, we gain deeper insights into the transcriptomic complexity and cellular heterogeneity within pancreatic islets. .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 3 Article Highlights • This study addresses the limitations of current single-cell long-read RNA-sequencing (sclrRNA-seq) technologies in detecting full-length transcripts and isoform diversity, particularly in pancreatic islets. • We sought to determine whether optimizing single-cell library preparation protocols could enhance read length and transcript identiﬁcation in pancreatic islets. • We found that 5’ capture methods, combined with targeted insulin depletion and extended reverse transcription, signiﬁcantly improved read length and isoform detection compared to standard protocols, while maximizing the number of informative reads. • These improvements yield longer reads in single -cell experiments, substantially enhancing transcript identiﬁcation and enabling more accurate analysis of isoform diversity.

Introduction

Alternative splicing (AS) plays a critical role in generating protein diversity from the ~22,000 known protein -coding genes, leading to the production of over 140,000 distinct transcripts [1]. This process allows for the generation of proteins with diYerent amino acid sequences, impacting their functions and localization within the cell , and allowing them to respond readily to changes in the environment [2, 3]. Splicing dysregulation is a key factor in many diseases, including diabetes, either due to inherent mutations in splice sites or RNA- binding protein s (RBPs), or in response to changes in environmental conditions, such as inﬂammatory stress or hyperglycemia [4, 5]. In the context of type 1 diabetes (T1D), diversity .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 4 in isoform expression has signiﬁcant implications for pancreatic function and immune tolerance. For example, diYerential isoform expression of autoantigens IA-2 and G6pc2 between the pancreas and thymus has been proposed to contribute to the generation of autoreactive T cells in T1D [6, 7] . Furthermore, dysregulated splicing events have been observed in islets from individuals with type 2 diabetes (T2D), underscoring the importance of splicing regulation in maintaining proper cellular function and immune homeostasis [5]. As one speciﬁc example, SNAP -25, a component of the SNARE complex responsible for vesicle fusion and exocytosis, exists as two isoforms (SNAP -25a and SNAP-25b). In SNAP- 25b-deﬁcient mice, [Ca²⁺] elevations are prematurely activated and delayed in termination, and insulin secretion is increased [8]. Despite the critical need to detect splice variants in the context of diabetes, next generation sequencing (NGS) technologies remain insuYicient for this task. Identifying isoform-speciﬁc gene expression requires sequencing reads that span multiple exons of the mRNA transcript. In the human genome, transcript lengths are estimated to average between 1,800 and 4,900 bp, with the mode of the distribution around 2,000 bp [9]. NGS technologies have read lengths of 150 base pairs , making it diYicult to identify isoforms. In contrast, long -read sequencing technologies, such as PacBio and Oxford Nanopore Technologies, oYer the generation of full -length reads that can capture the full RNA molecule, thereby providing a clearer picture of isoform diversity. Published single-cell long- read RNA sequencing (sclrRNA -seq) libraries often report shorter read lengths than expected, which may limit transcript coverage and isoform detection. For instance, a recent study reported a median read length of 900 bp for sclrRNA-seq of two cancer cell lines [10]. .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 5 Advances in sequencing technologies, especially single -cell approaches, have revealed the complex heterogeneity within the pancreas, uncovering distinct functional and transcriptomic subpopulations across diYerent cell types. In pancreatic islets, single -cell genomics and patch-seq have identiﬁed transcriptionally and functionally distinct beta-cell subpopulations directly linking gene expression to key physiological processes such as vesicle exocytosis [11]. This heterogeneity underscores the importance of characterizing splicing events and their resulting isoforms at the single -cell level. However, sclrRNA-seq technologies come with inherent limitations. Nanopore ﬂow cells produce fewer reads than Illumina, with around 2 0,000 reads per cell for a 5,000 -cell experiment, well below the typical 30,000 –50,000 reads per cell common in NGS. Moreover, Nanopore’s higher error rate (1%) increases the likelihood of incorrect barcode and UMI assignments. To overcome these challenges, we have optimized a protocol for pancreatic islets that improves read length, advancing the utility of long-read sequencing for single-cell transcriptomics. Research Design and Methods Dissociation of pancreatic islets 10-week-old female C57BL/6 mice were obtained from Jackson laboratories. Pancreatic islets were isolated from mice under ketamine/xylazine/acepromazine anesthesia by collagenase delivery into the pancreas via injection into the bile duct. The collagenase-inﬂated pancreas was surgically removed and digested. After isolation, islets were dissociated using Accutase in a 37°C bead bath for 25 -30 minutes. Single -cell suspension was ﬁltered through a 40 mm ﬁlter and quenched in RPMI media + 10% FBS. .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 6 Cells were washed again with RPMI +10% FBS and with PBS + 0.1% BSA . Single-cell suspensions were loaded into a Genomics Chromium targeting 4000 cells per sample. Dissociation of spleens Spleens were isolated from mice under ketamine/xylazine/acepromazine anesthesia. Spleens were dissociated through a 70 µm strainer in cIMDM using a 3 mL syringe plunger. Cells were washed, centrifuged , and treated with 1 mL Ammonium- Chloride-Potassium (ACK) lysis buYer for 30 seconds, followed by dilution in cIMDM and a second spin. After one additional cIMDM wash, cells were resuspended in PBS + 0.1% BSA. Single-cell suspensions were loaded into a Genomics Chromium targeting 4000 cells per sample scRNA-seq library preparation and insulin depletion Single-cell libraries were prepared using either the Chromium Next GEM Single Cell 3ʹ Kit v3.1 or the Chromium Next GEM Single Cell 5' Kit v2 following the protocol up to and including step 2.4, stopping just before fragmentation. For the optimized libraries, the following modiﬁcations were applied to the 5’ library preparation: 1 ul 10 mM dNTP solution (Thermo Scientiﬁc FERR0191) was added to the reaction in step 1.1. The extension time was increased from 45 minutes to 2 hours in step 1.5. 1 ul 10 mM dNT P solution was added to the reaction in step 2.2 and the extension time was increased from 1 minute to 3 minutes. Insulin depletion was performed on cDNA from step 2.4 of the 10X Genomics Chromium library preparation using the DepleteXTM RNA Depletion Panel (Insulin) kit from Jumpcode Genomics. We followed the PacBio MAS-IsoSeq protocol (December 2022, Version 1.0) with the following modiﬁcations: During RNP Complex Formation (Step A), we used 0.9 ul Cas9 .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 7 instead of 2.3 ul, and 1.6 ul Insulin Guide RNA instead of 4.0 ul Single Cell Boost Guide RNA. During Bead Cleanup (Step D), we used 50 ul (1X) AMPure XP Beads instead of 75 ul 1.5X SMRTbell Cleanup Beads. Following insulin depletion, long -read libraries were prepared from the cDNA using Sequencing Kit V14 ( Nanopore SQK-LSK114) and the PCR Expansion ( Nanopore EXP- PCA001). For 3’ libraries, the Ligation sequencing V14 — single-cell transcriptomics with 3' cDNA prepared using 10X Genomics on PromethION (SQK-LSK114) protocol was used. For 5’ libraries, the Ligation sequencing V14 - Single-cell transcriptomics with 5' cDNA prepared using 10X Genomics on PromethION (SQK -LSK114) protocol was used. Short Fragment BuYer (SFB) was used for library preparation instead of Long Fragment buYer (LFB). Library Beads (LIB) were used for the ﬂow cell priming mix stead of Library Solution (LIS). Libraries were sequenced on R10.4.1 ﬂow cells on either a PromethION 2 Solo (P2S) or PromethION 2 Integrated (P2i). Data availability All data will be made available on GEO at time of publication.

Results

Evaluation of read lengths and isoform detection in published single cell long read datasets We aimed to identify isoform diYerences between islet cell types and subtypes using single cell RNA-sequencing. To evaluate the ability of single cell sequencing technologies to generate full -length reads , we reanalyzed previously published sclrRNA-seq datasets generated using 10x Genomics and Oxford Nanopore Technologies, focusing on their ability .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 8 to capture full -length transcripts and detect isoform -speciﬁc transcript expression. Our analysis included seven sclrRNA-seq libraries from ﬁve diYerent studies [10, 12 -15]. The reanalysis revealed a n average read length of 794 bp , and an average mode of 582 bp, compared to the expected mode distribution of ~2,000 bp in the human genome [9] (Figure 1A). This discrepancy between the average read length and the expected transcript length underscores the ongoing challenge of capturing full-length transcripts. This shortfall in read length is important because it limits the transcript detection ability. Where gene detection ranges from 60-75% of total reads, transcript detection ranges from 30 -60% of total read s (Figure 1B). These ﬁndings highlight the limitations of current sclrRNA -seq technologies in achieving comprehensive transcript-level resolution. EYicient and speciﬁc depletion of insulin from islet sequencing libraries generates enhanced read diversity Analyzing transcript expression requires a higher overall read depth than gene expression analysis, as each gene is associated with multiple transcripts. Initial analysis of our sclrRNA-seq libraries of mouse pancreatic islets led to the discovery that the two mouse insulin genes, Ins1 and Ins2 made up 25% of the total reads, impeding our ability to achieve optimal read depth (Figure 1C). To overcome this issue, we incorporated an insulin depletion step into the protocol and validated the speciﬁcity and eYiciency of insulin depletion in a bulk short -read RNA -sequencing library of mouse pancreatic islet s. The depletion was remarkably eYicient and highly speciﬁc: insulin transcripts were uniquely depleted, while all other genes remained completely unaYected (Figure 1D). The same insulin depletion was then applied to a single cell pancreatic islet library followed by long -read Nanopore .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 9 sequencing. Importantly, the insulin depletion was as eYicient as in the bulk sample (Figure 1C). This strategy was applied to all subsequent pancreatic islet libraries generated for this study. Protocol modiﬁcations enhance read length and transcript identiﬁcation in islet single cell long read libraries Most high -throughput sclrRNA-seq methods rely on 10x Genomics single -cell capture and library preparation, which was originally optimized to generate and amplify shorter sequences, raising the question of whether it can eYectively amplify full -length transcripts. 10x Genomics oYers two types of transcriptomic proﬁling for single -cell RNA- seq: one that captures the 3’ end of transcripts and another that captures the 5’ end. Studies have shown that 3' RNA libraries frequently contain internal priming artifacts [16] that would prevent the ampliﬁcation of full length reads. To test for internal priming in 3’ vs 5’ libraries, we downloaded libraries generated using each technology in human melanoma samples from the datasets created by 10x genomics and analyzed the genomic coverage [17]. 3’ libraries exhibited a notably higher degree of internal priming compared to 5’ libraries, as evidenced by an increased number of reads mapping to the central regions of transcripts in genomic coverage plots (Figure 1E). Because reads generated through internal priming cannot span the full length of a transcript, this phenomenon likely contributes to the shorter read lengths observed in these libraries. To test this, sclrRNA-seq libraries were prepared from mouse pancreatic islets in parallel using both 3’ and 5’ capture technologies (Figure 1F). Remarkably, the library prepared with 5’ technology generated longer reads than the 3’ .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 10 library (p < 2 x 10 -16) ( Figure 1G-H) and provided substantially improved transcript identiﬁcation (Figure 1B). To further improve the read length, several additional optimization steps were introduced into the islet 5’ library preparation protocol (Chromium Next GEM Single Cell 5' Reagent Kits v2), including increasing the extension time from 45 minutes to 2 hours during GEM-RT Incubation and from 1 to 3 minutes during cDNA ampliﬁcation , based on the approach outlined by Lebrigand et al. [12] and increasing the amount of dNTPs. Remarkably, these modiﬁcations resulted in longer reads than those from the 5’ library without modiﬁcations (p < 2 x 10 -16) (Figure 1I), and better transcript identiﬁcation than any of the published datasets (Figure 1B). Overall, this emphasizes the preference for 5’ capture over 3’ capture and highlights the necessity for library prep optimizations to enhance the ampliﬁcation of full-length reads. Isolating high-quality RNA from pancreatic islets is notoriously diYicult, primarily due to the presence of digestive enzymes, including RN ases that are secreted by the exocrine pancreas. To explore whether a diYerent cell type might yield still longer reads, we applied 3’ , 5’ , and 5’ optimized library preparations, as described above, to lymphocytes isolated from dissociated mouse spleens. The 5’ lymphocyte sample demonstrated better transcript identiﬁcation compared to the 3’ pancreatic islet sample (Supplemental ﬁgure 1). However, the 10x Genomics Chromium library prep modiﬁcations for the 5’ sample did not yield the same improvements in the lymphocyte sample as observed with the pancreatic islet sample (Supplemental ﬁgure 1) . This suggests that the beneﬁts of these optimizations might be .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 11 tissue-speciﬁc and highlights the need for further reﬁnements tailored to diYerent tissue types. Isoform variants identiﬁed between alpha and beta cells and within beta cell subpopulations With the improved library preparation, the optimized 5’ sclrRNA-seq dataset from mouse pancreatic islets was used to explore whether splicing changes could be detected from diYerent cell types and cell states. Importantly, the sclrRNA-seq dataset allowed clear identiﬁcation of all expected cell populations ( Figure 2A-B). Furthermore, the analysis revealed that cell type identiﬁcation remains robust whether using gene -level or transcript- level expression data for dimensionality reduction and clustering , w ith over 90% concordance between the two approaches (Figure 2D-G). This stability in broad cell type classiﬁcation aligns with the understanding that major cell types are deﬁned by distinct gene expression patterns. However, when examining substructure within these cell types, substantial diYerences emerged between gene -level and transcript -level analyses, with consistency ranging from 12% to 92% across subclusters (Figure 2I, Supplemental ﬁgure 2). These ﬁndings suggest that while gene-level expression is suYicient for identifying major cell types, transcript-level analysis provides crucial insights into subtle variations within cell populations. Such variations may reﬂect diYerent cell states, functions, or responses that are not captured by gene-level analysis alone. The primary strength of sclrRNA-seq lies in its ability to capture cell-speciﬁc isoform expression. To assess diYerential splicing, diYerential transcript usage (DTU) analysis was conducted alongside diYerential gene expression (DGE) analysis. DTU analysis [18] .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 12 identiﬁes proportional diYerences in the transcript composition of a gene, comparing how much each transcript contributes to the total gene expression across conditions. Using this analysis, 342 DTU events were identiﬁed between alpha and beta cells, and 57 DTU events across subpopulations of beta cells (Supplemental table). Speciﬁcally, when comparing alpha and beta cells, we identiﬁed isoform-speciﬁc diYerences in Atp5a1, a gene involved in ATP production and insulin and glucagon secretion (Figure 3A). Similarly, G6pc2, a known autoantigen in T1D, displays distinct isoform expression between two beta cell subpopulations (0_beta and 3_beta), despite similar overall gene expression levels (Figure 3B). Interestingly, alternative splicing of G6pc2 has been shown to drive diYerential expression of G6PC2 transcripts between the pancreas and thymus, highlighting its potential as a critical target for isoform -speciﬁc studies [7]. Neither of these genes were identiﬁed by DGE, underscoring the importance of long read sequencing for identifying previously unidentiﬁed RNA diYerences between cell types and cell states.

Discussion

This study demonstrates how an improved sclrRNA-seq library preparation protocol from isolated islets produces longer reads and increases the proportion of reads that can be conﬁdently assigned to speciﬁc transcripts, improving the utility of long -read sequencing data for identifying splice variants and cellular heterogeneity in pancreatic endocrine cell populations. Speciﬁcally, this study demonstrates that islet sclrRNA-seq libraries prepared with 5’ protocols outperform those prepared with 3’ protocol s for long -read sequencing. Enhancements to the 5’ library preparation further improve s read length and transcript .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 13 tagging eYiciency in pancreatic islets. Furthermore, depleting insulin transcripts from the pancreatic islet libraries proved to be a highly eYective strategy for maximizing informative reads, demonstrating the broader potential of targeted transcript depletion in single -cell RNA-sequencing experiments. While the modiﬁed 5' protocol signiﬁcantly improved read length in islet samples, lymphocyte samples showed signiﬁcant improvement only with the unmodiﬁed 5' protocol, with no additional beneﬁt from the modiﬁcations. This indicates that individual cell types will require unique modiﬁcations and further optimizations . Despite the signiﬁcant improvements in read length achieved with the modiﬁed protocol, it did not meet expectations for full -length transcript coverage. Achieving this goal will requir e further modiﬁcations to the 10x chemistry, including adjustments to the master mix and reverse transcriptase. Although full-length coverage was not achieved for all transcripts, we successfully analyzed transcript expression and identiﬁed diYerential transcript usage across cell types and cell subpopulations. These advancements are critical for uncovering the full complexity of transcriptomes and hold immense potential for broad application across tissues, enabling deeper insights into cellular heterogeneity, isoform regulation, and functional diversity. Furthermore, investigating these variations at the single -cell level enables us to uncover the intricate heterogeneity within tissues, oYering a deeper understanding of the functional and transcriptional diversity that would otherwise go unnoticed. Understanding splicing dysregulation in pancreatic islets is particularly important, as it may reveal how .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 14 alternative splicing shapes beta cell function, immune tolerance, and beta cell susceptibility in diabetes. Acknowledgments We thank Laura White, PhD, and Jay Hesselberth, PhD, for their guidance and support with Nanopore long-read sequencing technologies. We also thank Scott Beard, BDC Cytometer Core Manager, for islet and spleen isolations. Funding. This work was supported by grants from the National Institutes of Health (P30DK116073 [Lori Sussel] , R01 DK082590 [Lori Sussel], and U01 DK127505 [Lori Sussel]). Duality of interest. No potential conﬂicts of interest relevant to this article were reported. Author Contributions. M.S.H. was responsible for data acquisition and prepared the original manuscript. K.L.W. and L.S. reviewed and edited the manuscript. C.J.H. and K.L.W. developed the computational pipelines. M.S.H., C.J.H., and K.L.W. contributed to data analysis and the g raphical presentation of results. All authors contributed to the study’s methodology and conceptualization. K.L.W. is the guarantor of this work and had full access to all data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 15

References

1. Gonzalez-Porta, M., et al., Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene. Genome Biol, 2013. 14(7): p. R70. 2. Black, D.L., Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem, 2003. 72: p. 291-336. 3. Piazzi, M., et al., Alternative Splicing, RNA Editing, and the Current Limits of Next Generation Sequencing. Genes (Basel), 2023. 14(7). 4. Juan-Mateu, J., O. Villate, and D.L. Eizirik, MECHANISMS IN ENDOCRINOLOGY: Alternative splicing: the new frontier in diabetes research. Eur J Endocrinol, 2016. 174(5): p. R225-38. 5. JeYery, N., et al., Cellular stressors may alter islet hormone cell proportions by moderation of alternative splicing patterns. Hum Mol Genet, 2019. 28(16): p. 2763- 2774. 6. Diez, J., et al., Di\erential splicing of the IA-2 mRNA in pancreas and lymphoid organs as a permissive genetic mechanism for autoimmunity against the IA-2 type 1 diabetes autoantigen. Diabetes, 2001. 50(4): p. 895-900. 7. Dogra, R.S., et al., Alternative splicing of G6PC2, the gene coding for the islet- speciﬁc glucose-6-phosphatase catalytic subunit-related protein (IGRP), results in di\erential expression in human thymus and spleen compared with pancreas. Diabetologia, 2006. 49(5): p. 953-7. 8. Daraio, T., et al., SNAP-25b-deﬁciency increases insulin secretion and changes spatiotemporal proﬁle of Ca(2+)oscillations in beta cell networks. Sci Rep, 2017. 7(1): p. 7744. 9. Lopes, I., et al., Gene Size Matters: An Analysis of Gene Length in the Human Genome. Front Genet, 2021. 12: p. 559998. 10. Shiau, C.K., et al., High throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors. Nat Commun, 2023. 14(1): p. 4124. 11. Camunas-Soler, J., et al., Patch-Seq Links Single-Cell Transcriptomes to Human Islet Dysfunction in Diabetes. Cell Metab, 2020. 31(5): p. 1017-1031 e4. 12. Lebrigand, K., et al., High throughput error corrected Nanopore single cell transcriptome sequencing. Nat Commun, 2020. 11(1): p. 4025. 13. Tian, L., et al., Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing. Genome Biol, 2021. 22(1): p. 310. 14. Wang, Q., et al., Single cell transcriptome sequencing on the Nanopore platform with ScNapBar. RNA, 2021. 27(7): p. 763-70. 15. You, Y., et al., Identiﬁcation of cell barcodes from long-read single-cell RNA-seq with BLAZE. Genome Biol, 2023. 24(1): p. 66. 16. Svoboda, M., H.R. Frost, and G. Bosco, Internal oligo(dT) priming introduces systematic bias in bulk and single-cell RNA sequencing count data. NAR Genom Bioinform, 2022. 4(2): p. lqac035. .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 16 17. Shen, L., et al., ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC Genomics, 2014. 15: p. 284. 18. Tekath, T. and M. Dugas, Di\erential transcript usage analysis of bulk and single-cell RNA-seq data with DTUrtle. Bioinformatics, 2021. 37(21): p. 3781-3787. Figure legends Figure 1 Read length and transcript identiﬁcation comparison between single -cell long -read RNA- sequencing (sclrRNA-seq) libraries. (A) Read length distribution of published sclrRNA -seq libraries prepared with 10x Genomics and Nanopore technology. Biological replicates are included for datasets from Lebrigand et al., 2020 and Wang et al., 2021; other datasets are shown as single samples. (B) Proportion of reads across datasets where the gene is identiﬁed, the transcript is identiﬁed, or neither is identiﬁed. Shown are published reanalyzed datasets and three mouse pancreatic islet samples: one prepared with 3′ 10x Genomics technology, one with 5′ 10x Genomics technology, and one with 5′ 10x Genomics technology incorporating library preparation optimizations. (C) Proport ion of reads aligned to Ins1 or Ins2 in a single -cell RNA-seq analysis of mouse pancreatic islets pre - and post- insulin depletion. (D) Volcano plot depicting diYerential gene expression between non - depleted and insulin-depleted bulk RNA-seq libraries from mouse pancreatic islets. (E) NGS coverage plot indicating read start sites across the genomic region. Libraries are single cell 10x Genomics preparations derived from human DTC melanoma cells. (F) Overview of the experimental workﬂow. (G) Read length dis tribution comparing a mouse pancreatic islet sclrRNA-seq library prepared using 3′ 10x Genomics technology to published sclrRNA-seq .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 17 datasets. (H) Read length distribution comparing a mouse pancreatic islet sclrRNA -seq library prepared with 5′ 10x Genomics technology to published sclrRNA -seq datasets. (I) Read length distribution comparing a mouse islet sclrRNA-seq library prepared using 5′ 10x Genomics technology with protocol optimizations to published sclrRNA-seq datasets. Supplemental Figure 1 (A) Proportion of reads across datasets where the gene is identiﬁed, the transcript is identiﬁed, or neither is identiﬁed. Shown are reanalyzed published datasets, three mouse pancreatic islet samples (prepared with 3′ 10x Genomics technology, 5′ 10x Genomics technology, and 5′ 10x Genomics technology with library preparation optimizations), and three mouse lymphocyte/spleen samples prepared using the same methods as the pancreatic islet libraries. Figure 2 Comparison of single -cell clustering based on gene expression versus transcript -level expression. (A) UMAP projection of single cells based on gene -level expression. Cells are grouped by gene expression proﬁles reﬂecting major pancreatic cell types. (B) Heatmap showing expression of cell type-speciﬁc markers across single cells from mouse pancreatic islets. (C) UMAP projection of single cells based on gene-level expression, grouped by gene expression proﬁles reﬂecting cell subpopulations. (D) UMAP projection of single cells based on transcript-level (isoform) expression. Grouped by transcript expression proﬁles reﬂecting major pancreatic cell types. (E) UMAP projection of single cells based on transcript-level .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 18 expression, grouped by transcript expression proﬁles reﬂecting cell subpopulations. (F) Confusion matrix showing concordance in cell type identiﬁcation between gene-based and transcript-based clustering. (G) Bar plot quantifying cell type concordance between clustering methods. (H) Confusion matrix showing low concor dance in beta cell subpopulation identiﬁcation between clustering methods. (I) Bar plot quantifying beta cell subpopulation concordance between clustering methods. Supplemental Figure 2 (A) Confusion matrix showing concordance in alpha cell subpopulation identiﬁcation between gene -based and transcript -based clustering. (B) Bar plot quantifying alpha cell subpopulation concordance between clustering methods. (C) Confusion matrix showing low concordance in delta cell subpopulation identiﬁcation between clustering methods. (D) Bar plot quantifying delta cell subpopulation concordance between clustering methods. Figure 3 DiYerential transcript usage between cell types and cell subpopulations. (A) DiYerential gene expression (DGE), diYerential transcript expression (DTE), and diYerential transcript usage (DTU) analysis of Atp5a1. (B) DGE, DTE, DTU analysis of G6pc2. .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint Ins1 Ins2 0 30 60 90 −5 0 5 Log2 fold change −Log10 P NS p−value p−value and log2 FC 0 25 50 75 100 shiau23you233' islet tian21wangfc1 wangfc2 lebrigand190 5' islet lebr igand9505' mod islet Percent Type Untagged Gene Transcript 0 1000 2000 3000 Density lebrigand190 lebrigand900 shiau23 tian21 wangfc1 wangfc2 you23 Read length (bp) BA 1 2 3 4 Genomic Region (5' −> 3') Read count per million mapped reads −2000 TSS 33% 66% TES 2000 3’ library 5’ library 0 1000 2000 3000 Read length (bp) Density 3' islet lebrigand190 lebrigand900 shiau23 tian21 wangfc1 wangfc2 you23 0 1000 2000 3000 Read length (bp) Density 5' mod islet lebrigand190 lebrigand900 shiau23 tian21 wangfc1 wangfc2 you23 0 1000 2000 3000 Read length (bp) Density 5' islet lebrigand190 lebrigand900 shiau23 tian21 wangfc1 wangfc2 you23 C D E G H I 0 25 50 75 100 No depletion Insulin depletion Percent reads all other reads Ins1 reads Ins2 reads F internal priming Figure 1: Read length and transcript identification comparison between single-cell long-read RNA-sequencing (sclrRNA-seq) libraries. (Read length and transcript identification comparison between single-cell long-read RNA-sequencing (sclrRNA-seq) libraries. (A) Read length distribution of published sclrRNA-seq libraries prepared with 10x Genomics and Nanopore technology. Biological replicates are included for datasets from Lebrigand et al., 2020 and Wang et al., 2021; other datasets are shown as single samples. (B) Proportion of reads across datasets where the gene is identified, the transcript is identified, or neither is identified. Shown are published reanalyzed datasets and three mouse pancreatic islet samples: one prepared with 3′ 10x Genomics technology, one with 5′ 10x Genomics technology, and one with 5′ 10x Genomics technology incorporating library preparation optimizations. (C) Proportion of reads aligned to Ins1 or Ins2 in a single-cell RNA-seq analysis of mouse pancreatic islets pre- and post-insulin depletion. (D) Volcano plot depicting differential gene expression between non-depleted and insulin-depleted bulk RNA-seq libraries from mouse pancreatic islets. (E) NGS coverage plot indicating read start sites across the genomic region. Libraries are single cell 10x Genomics preparations derived from human DTC melanoma cells. (F) Overview of the experimental workflow. (G) Read length distribution comparing a mouse pancreatic islet sclrRNA-seq library prepared using 3′ 10x Genomics technology to published sclrRNA-seq datasets. (H) Read length distribution comparing a mouse pancreatic islet sclrRNA-seq library prepared with 5′ 10x Genomics technology to published sclrRNA-seq datasets. (I) Read length distribution comparing a mouse islet sclrRNA-seq library prepared using 5′ 10x Genomics technology with protocol optimizations to published sclrRNA-seq datasets. .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint −5 0 5 10 0 10umap_1 umap_2 alpha beta delta EC MΦ ppy qSC Transcript clustering −5 0 5 10 0 10umap_1 umap_2 0_beta 1_alpha 10_ESC 11_qSC 12_qSC 13_MΦ 14_delta 15_delta 2_beta 3_beta 4_delta 5_beta 6_alpha 7_ppy 8_delta 9_delta −10 −5 0 5 −5 0 5 10 15 umap_1 umap_2 alpha beta delta EC MΦ ppy qSC Gene clustering BA D E H C F G alpha beta delta ppy EC MΦ qSC alpha beta delta ppy EC MΦ qSC 0.2 0.4 0.6 0.8 1 0 Gene clustering Transcript clustering 97.3%97.3% 92.6% 93.4% 0 25 50 75 100 Alpha Beta Delta Ppy Percentage Overlap 69.3%58.7% 12.3% 43.3% 91.6% 0 25 50 75 100 C1 C2 C3 C4 C5 Percentage Overlap 0_beta 3_beta 10_beta 5_beta 2_beta 0_beta 2_beta 3_beta 5_beta 0 0.2 0.4 0.6 0.8 1 Gene clustering Transcript clustering C1 C2 C3 C4 C5 Gcg Arx Nkx6−1 Mafa Ins1 Ins2 Sst Flt1 Pecam1 Tyrobp Cd52 Ppy new_identity alpha beta delta EC MΦppy qSC −1 −0.5 0 0.5 1 1.5 2 2.5 −10 −5 0 5 −5 0 5 10 15 umap_1 umap_2 0_beta 1_alpha 10_beta 11_qSC 12_MΦ 13_qSC 14_delta 15_delta 16_ductal 2_beta 3_beta 4_delta 5_beta 6_alpha 7_ppy 8_delta 9_EC Gene clustering Transcript clustering I Figure 2: Comparison of single-cell clustering based on gene expression versus transcript-level expression. (A) UMAP projection of single cells based on gene-level expression. Cells are grouped by gene expression profiles reflecting major pancreatic cell types. (B) Heatmap showing expression of cell type-specific markers across single cells from mouse pancreatic islets. (C) UMAP projection of single cells based on gene-level expression, grouped by gene expression profiles reflecting cell subpopulations. (D) UMAP projection of single cells based on transcript-level (isoform) expression. Grouped by transcript expression profiles reflecting major pancreatic cell types. (E) UMAP projection of single cells based on transcript-level expression, grouped by transcript expression profiles reflecting cell subpopulations. (F) Confusion matrix showing concordance in cell type identification between gene-based and transcript-based clustering. (G) Bar plot quantifying cell type concordance between clustering methods. (H) Confusion matrix showing low concordance in beta cell subpopulation identification between clustering methods. (I) Bar plot quantifying beta cell subpopulation concordance between clustering methods. .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint 0.00 0.25 0.50 0.75 1.00 ENSMUST00000005364 ENSMUST00000112317 Proportion cell_type 0_beta 3_beta 0 1 2 3 4 0_beta 3_beta Expression Level 0.00 0.25 0.50 0.75 1.00 ENSMUST00000026495 ENSMUST00000114748Proportion cell_type alpha beta 0 1 2 3 ENSMUST00000026495 ENSMUST00000114748 Expression Level 0 1 2 3 alpha beta Expression Level Atp5a1 gene expression Atp5a1 transcript expression Atp5a1 transcript usage G6pc2 gene expression G6pc2 transcript expression G6pc2 transcript usage A B 0 1 2 3 4 ENSMUST00000005364 ENSMUST00000112317 Expression Level Figure 3: Differential transcript usage between cell types and cell subpopulations. (A) Differential transcript usage between cell types and cell subpopulations. (A) Differential gene expression (DGE), differential transcript expression (DTE), and differential transcript usage (DTU) analysis of Atp5a1. (B) DGE, DTE, DTU analysis of G6pc2. .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint A 0 25 50 75 100 shiau233‘ spleen3' isletyou23tian21 wang_fc1wang_fc2 5‘ mod spleenlebrigand190lebrigand950 5' islet 5' mod islet5‘ spleen Percent Type Untagged Gene Transcript Supplemental Figure 1: (A) Proportion of reads across datasets where the gene is identified, the transcript is identified, or neither is identified. Shown are reanalyzed published datasets, three mouse pancreatic islet samples (prepared with 3′ 10x Genomics technology, 5′ 10x Genomics technology, and 5′ 10x Genomics technology with library preparation optimizations), and three mouse lymphocyte/spleen samples prepared using the same methods as the pancreatic islet libraries.technology, and 5′ 10x Genomics technology with library preparation optimizations), and three mouse spleen samples prepared using the same methods as the pancreatic islet libraries. .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint A B 1_alpha 6_alpha 1_alpha6_alpha 0.2 0.3 0.4 0.5 0.6 0.7 0.8 89.2% 74.7% 0 25 50 75 100 1_alpha 6_alpha Percentage Overlap C D C1 C2 C5 14_delta 4_delta 8_delta 15_delta 4_delta 8_delta 9_delta 15_delta 14_delta 0 0.2 0.4 0.6 0.8 1 C4 C3 7.1% 98.9%100.0%96.9%100.0% 0 25 50 75 100 C1 C2 C3 C4 C5 Percentage Overlap Supplemental Figure 2: Confusion matrix showing concordance in alpha cell subpopulation identification between gene-based and transcript-based clustering. (B) Bar plot quantifying alpha cell subpopulation concordance between clustering methods. (C) Confusion matrix showing low concordance in delta cell subpopulation identification between clustering methods. (D) Bar plot quantifying delta cell subpopulation concordance between clustering methods. .CC-BY 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-4.0