{"paper_id":"24237ec9-c772-4e74-b668-9d6f2fefb003","body_text":"1 \n \n \n \n \n \nOptimizing Single-Cell Long-Read Sequencing for Enhanced Isoform Detection  \nin Pancreatic Islets \nMaria S. Hansen, Christopher J. Hill, Lori Sussel, Kristen L. Wells* \nBarbara Davis Center, University of Colorado Anschutz Medical Campus, Aurora CO 80045 \n \n \n*Corresponding Author: Kristen.wells-wrasman@cuanschutz.edu \n \nKeywords: single cell long read RNA-sequencing, transcriptomics, islet biology, RNA \nisoforms, RNA splicing \nWord count: 2,614 \n# Figures: 3 \n# Tables: 0 \n \n  \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 2 \nAbstract \nAlternative splicing is an essential mechanism for generating protein diversity by producing \ndistinct isoforms from a single gene. Dysregulation of splicing that aYects  pancreatic \nfunction, and immune tolerance has been linked to both type 1 and type 2 diabetes. Next-\ngeneration sequencing technologies, with their short read lengths, are limited in their ability \nto accurately detect splice variants. Long-read sequencing technologies oYer the potential \nto overcome these limitations by providing full-length transcript information; however, their \napplication in single -cell RNA sequencing  has been hindered by technical challenges, \nincluding insuYicient read lengths  and higher error rates. Furthermore, cell types  that \nproduce high levels  of a single transcript, such as islet endocrine cells, can obscure \nidentiﬁcation of lower abundance transcripts.  In this study, we optimized a protocol for \nsingle-cell long-read sequencing in pancreatic islets to improve read length and transcript \ndetection. Our ﬁndings demonstrate that 5’ library preparation protocols outperform 3’ \nprotocols, resulting in better transcript identiﬁcation. Furthermore, we show that  targeted \ndepletion of insulin transcripts enhances the detection of informative reads, highlighting the \nutility of transcript depletion strategies . This optimized protocol enables isoform -speciﬁc \ngene expression analysis and reveals diYerential transcript usage across the various cell \ntypes in pancreatic islets. By leveraging this approach, we gain deeper insights into the \ntranscriptomic complexity and cellular heterogeneity within pancreatic islets. \n \n \n \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 3 \nArticle Highlights  \n• This study addresses the limitations of current single-cell long-read RNA-sequencing \n(sclrRNA-seq) technologies in detecting full-length transcripts and isoform diversity, \nparticularly in pancreatic islets. \n• We sought to determine whether optimizing single-cell library preparation protocols \ncould enhance read length and transcript identiﬁcation in pancreatic islets. \n• We found that 5’ capture methods, combined with targeted insulin depletion and \nextended reverse transcription, signiﬁcantly improved read length and isoform \ndetection compared to standard protocols, while maximizing the number of \ninformative reads. \n• These improvements yield longer reads in single -cell experiments, substantially \nenhancing transcript identiﬁcation and enabling more accurate analysis of isoform \ndiversity. \nIntroduction \nAlternative splicing (AS) plays a critical role in generating protein diversity from the \n~22,000 known protein -coding genes, leading to the production of over 140,000 distinct \ntranscripts [1]. This process allows for the generation of proteins with diYerent amino acid \nsequences, impacting their functions and localization within the cell , and allowing them to \nrespond readily to changes in the environment [2, 3]. Splicing dysregulation is a key factor in \nmany diseases, including diabetes, either due to inherent mutations in splice sites or RNA-\nbinding protein s (RBPs), or in response to changes in environmental conditions, such as \ninﬂammatory stress or hyperglycemia [4, 5].  In the context of type 1 diabetes (T1D), diversity \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 4 \nin isoform expression has signiﬁcant implications for  pancreatic function  and immune \ntolerance. For example, diYerential isoform expression of autoantigens IA-2 and G6pc2  \nbetween the pancreas and thymus has been proposed to contribute to the generation of \nautoreactive T cells in T1D [6, 7] . Furthermore, dysregulated splicing events have been \nobserved in islets from individuals with type 2 diabetes (T2D), underscoring the importance \nof splicing regulation in maintaining proper cellular function and immune homeostasis  [5]. \nAs one speciﬁc example, SNAP -25, a component of the SNARE complex responsible for \nvesicle fusion and exocytosis, exists as two isoforms (SNAP -25a and SNAP-25b). In SNAP-\n25b-deﬁcient mice, [Ca²⁺] elevations are prematurely activated and delayed in termination, \nand insulin secretion is increased [8].  \nDespite the critical need to  detect splice variants in the context of diabetes, next \ngeneration sequencing (NGS) technologies remain insuYicient for this task. Identifying \nisoform-speciﬁc gene expression requires sequencing reads that span multiple exons of the \nmRNA transcript. In the human genome, transcript lengths are estimated to average \nbetween 1,800 and 4,900 bp, with the mode of the distribution around 2,000 bp [9]. NGS \ntechnologies have read lengths of 150 base pairs , making it diYicult to identify isoforms. In \ncontrast, long -read sequencing technologies, such as PacBio and Oxford Nanopore \nTechnologies, oYer the generation of full -length reads that can capture the full RNA \nmolecule, thereby providing a clearer picture of isoform diversity.  Published single-cell long-\nread RNA sequencing (sclrRNA -seq) libraries often report shorter read lengths than \nexpected, which may limit transcript coverage and isoform detection. For instance, a recent \nstudy reported a median read length of 900 bp for sclrRNA-seq of two cancer cell lines [10]. \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 5 \nAdvances in sequencing technologies, especially single -cell approaches, have \nrevealed the complex heterogeneity within the pancreas, uncovering distinct functional and \ntranscriptomic subpopulations across diYerent cell types. In pancreatic islets, single -cell \ngenomics and patch-seq have identiﬁed transcriptionally and functionally distinct beta-cell \nsubpopulations directly linking gene expression to key physiological processes such as \nvesicle exocytosis [11].  This heterogeneity underscores the importance of characterizing \nsplicing events and their resulting isoforms at the single -cell level. However, sclrRNA-seq \ntechnologies come with inherent limitations. Nanopore ﬂow cells produce fewer reads than \nIllumina, with around 2 0,000 reads per cell for a 5,000 -cell experiment, well below the \ntypical 30,000 –50,000 reads per cell common in NGS. Moreover, Nanopore’s higher error \nrate (1%) increases the likelihood of incorrect barcode and UMI assignments. To overcome \nthese challenges, we have optimized a protocol for pancreatic islets that improves read \nlength, advancing the utility of long-read sequencing for single-cell transcriptomics. \n \nResearch Design and Methods \nDissociation of pancreatic islets \n10-week-old female C57BL/6 mice were obtained from Jackson laboratories. \nPancreatic islets were isolated from mice under ketamine/xylazine/acepromazine \nanesthesia by collagenase delivery into the pancreas via injection into the bile duct. The \ncollagenase-inﬂated pancreas was surgically removed and digested.  After isolation, islets \nwere dissociated using Accutase in a 37°C bead bath for 25 -30 minutes. Single -cell \nsuspension was ﬁltered through a 40 mm ﬁlter and quenched in RPMI media + 10% FBS. \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 6 \nCells were washed again with RPMI +10% FBS and with PBS + 0.1% BSA . Single-cell \nsuspensions were loaded into a Genomics Chromium targeting 4000 cells per sample.  \nDissociation of spleens  \nSpleens were isolated from mice under ketamine/xylazine/acepromazine \nanesthesia. Spleens were dissociated through a 70 µm strainer in cIMDM using a 3 mL \nsyringe plunger. Cells were washed, centrifuged , and treated with 1 mL Ammonium-\nChloride-Potassium (ACK) lysis buYer for 30 seconds, followed by dilution in cIMDM and a \nsecond spin. After one additional cIMDM wash, cells were resuspended in PBS + 0.1% BSA. \nSingle-cell suspensions were loaded into a Genomics Chromium targeting 4000 cells per \nsample \nscRNA-seq library preparation and insulin depletion \nSingle-cell libraries were prepared using either the Chromium Next GEM Single Cell \n3ʹ Kit v3.1 or the Chromium Next GEM Single Cell 5' Kit v2  following the protocol up to and \nincluding step 2.4, stopping just before fragmentation.  For the optimized libraries, the \nfollowing modiﬁcations were applied to the 5’ library preparation: 1 ul 10 mM dNTP solution \n(Thermo Scientiﬁc FERR0191) was added to the reaction in step 1.1. The extension time was \nincreased from 45 minutes to 2 hours in step 1.5. 1 ul 10 mM dNT P solution was added to \nthe reaction in step 2.2 and the extension time was increased from 1 minute to 3 minutes.  \nInsulin depletion was performed on cDNA from step 2.4 of the 10X Genomics Chromium \nlibrary preparation using the DepleteXTM RNA Depletion Panel (Insulin)  kit from Jumpcode \nGenomics. We followed the PacBio MAS-IsoSeq protocol (December 2022, Version 1.0) with \nthe following modiﬁcations: During RNP Complex Formation (Step A), we used 0.9 ul Cas9 \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 7 \ninstead of 2.3 ul, and 1.6 ul Insulin Guide RNA instead of 4.0 ul Single Cell Boost Guide RNA. \nDuring Bead Cleanup (Step D), we used 50 ul (1X) AMPure XP Beads instead of 75 ul 1.5X \nSMRTbell Cleanup Beads.  \nFollowing insulin depletion, long -read libraries were prepared from the cDNA using \nSequencing Kit V14 ( Nanopore SQK-LSK114) and the PCR Expansion ( Nanopore EXP-\nPCA001).  For 3’ libraries, the Ligation sequencing V14 — single-cell transcriptomics with 3' \ncDNA prepared using 10X Genomics on PromethION (SQK-LSK114) protocol was used.  For \n5’ libraries, the Ligation sequencing V14 - Single-cell transcriptomics with 5' cDNA prepared \nusing 10X Genomics on PromethION (SQK -LSK114) protocol was used. Short Fragment \nBuYer (SFB) was used for library preparation instead of Long Fragment buYer (LFB).  Library \nBeads (LIB) were used for the ﬂow cell priming mix stead of Library Solution (LIS). Libraries \nwere sequenced on R10.4.1 ﬂow cells on either a PromethION 2 Solo (P2S) or PromethION  \n2 Integrated (P2i).  \nData availability \nAll data will be made available on GEO at time of publication. \n \nResults \nEvaluation of read lengths and isoform detection in published single cell long read datasets \nWe aimed to identify isoform diYerences between islet cell types and subtypes using \nsingle cell RNA-sequencing.  To evaluate the ability of single cell sequencing technologies to \ngenerate full -length reads , we reanalyzed previously published sclrRNA-seq datasets \ngenerated using 10x Genomics and Oxford Nanopore Technologies, focusing on their ability \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 8 \nto capture full -length transcripts and detect isoform -speciﬁc transcript expression. Our \nanalysis included seven sclrRNA-seq libraries from ﬁve diYerent studies  [10, 12 -15]. The \nreanalysis revealed a n average read length of 794 bp , and an average mode of 582 bp, \ncompared to the expected mode distribution of ~2,000 bp in the human genome  [9] (Figure \n1A). This discrepancy between the average read length and the expected transcript length \nunderscores the ongoing challenge of capturing full-length transcripts. This shortfall in read \nlength is important because it limits the transcript detection ability. Where gene detection \nranges from 60-75% of total reads, transcript detection ranges from 30 -60% of total read s \n(Figure 1B). These ﬁndings highlight the limitations of current sclrRNA -seq technologies in \nachieving comprehensive transcript-level resolution. \nEYicient and speciﬁc depletion of insulin from islet sequencing libraries generates \nenhanced read diversity \nAnalyzing transcript expression requires a higher overall read depth than gene \nexpression analysis, as each gene is associated with multiple transcripts. Initial analysis of \nour sclrRNA-seq libraries of mouse pancreatic islets led to the discovery that the two mouse \ninsulin genes, Ins1 and Ins2 made up 25% of the total reads, impeding our ability to achieve \noptimal read depth (Figure 1C). To overcome this issue, we incorporated an insulin depletion \nstep into the protocol and validated the  speciﬁcity and eYiciency of insulin depletion in a \nbulk short -read RNA -sequencing library of mouse pancreatic islet s. The depletion was  \nremarkably eYicient and highly speciﬁc: insulin transcripts were uniquely depleted, while all \nother genes remained completely unaYected  (Figure 1D). The same insulin depletion was \nthen applied  to a single cell pancreatic islet library followed by long -read Nanopore \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 9 \nsequencing. Importantly, the insulin depletion was as eYicient as in the bulk sample (Figure \n1C). This strategy was applied to all subsequent pancreatic islet libraries generated for this \nstudy. \nProtocol modiﬁcations enhance read length and transcript identiﬁcation in islet single cell \nlong read libraries \nMost high -throughput sclrRNA-seq methods rely on 10x Genomics single -cell \ncapture and library preparation, which was originally optimized to generate and amplify \nshorter sequences, raising the question of whether it can eYectively amplify full -length \ntranscripts.  10x Genomics oYers two  types of transcriptomic proﬁling for single -cell RNA-\nseq: one that captures the 3’ end of transcripts and another that captures the 5’ end. Studies \nhave shown that 3' RNA libraries frequently contain internal priming artifacts [16] that would \nprevent the ampliﬁcation of full length reads. To test for internal priming in 3’ vs 5’ libraries, \nwe downloaded libraries generated using each technology in human melanoma samples \nfrom the datasets created by 10x genomics and analyzed the genomic coverage  [17]. 3’ \nlibraries exhibited a notably higher degree of internal priming compared to 5’ libraries, as \nevidenced by an increased number of reads mapping to the central regions of transcripts in \ngenomic coverage plots  (Figure 1E). Because reads generated through internal priming \ncannot span the full length of a transcript, this phenomenon likely contributes to the shorter \nread lengths observed in these libraries. To test this, sclrRNA-seq libraries were prepared \nfrom mouse pancreatic islets in parallel using both 3’ and 5’ capture technologies  (Figure \n1F). Remarkably, the library prepared with 5’ technology generated longer reads than the 3’ \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 10 \nlibrary (p < 2 x 10 -16) ( Figure 1G-H) and provided substantially improved transcript \nidentiﬁcation (Figure 1B). \nTo further improve the read length, several additional  optimization steps were \nintroduced into the islet 5’ library preparation protocol (Chromium Next GEM Single Cell 5' \nReagent Kits v2), including increasing the extension time from 45 minutes to 2 hours during \nGEM-RT Incubation and from 1 to 3 minutes during cDNA ampliﬁcation , based on the \napproach outlined by Lebrigand et al. [12] and increasing the amount of dNTPs. Remarkably, \nthese modiﬁcations resulted in longer reads than those from the 5’ library without \nmodiﬁcations (p < 2 x 10 -16) (Figure 1I), and better transcript identiﬁcation  than any of the \npublished datasets (Figure 1B). Overall, this emphasizes the preference for 5’ capture over \n3’ capture and highlights the necessity for library prep optimizations to enhance the \nampliﬁcation of full-length reads.  \nIsolating high-quality RNA from pancreatic islets is notoriously diYicult, primarily due \nto the presence of digestive enzymes, including RN ases that are secreted by the exocrine  \npancreas. To explore whether a diYerent cell type might yield still longer reads, we applied \n3’ , 5’ , and 5’ optimized library preparations, as described  above, to lymphocytes isolated \nfrom dissociated mouse spleens. The 5’ lymphocyte sample demonstrated better transcript \nidentiﬁcation compared to the 3’ pancreatic islet sample (Supplemental ﬁgure 1). However, \nthe 10x Genomics Chromium library prep modiﬁcations for the 5’ sample did not yield the \nsame improvements in the lymphocyte sample as observed with the pancreatic islet sample \n(Supplemental ﬁgure 1) . This suggests that the beneﬁts of these optimizations might be \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 11 \ntissue-speciﬁc and highlights the need for further reﬁnements tailored to diYerent tissue \ntypes. \nIsoform variants identiﬁed between alpha and beta cells and within beta cell \nsubpopulations \nWith the improved library preparation, the optimized 5’ sclrRNA-seq dataset from \nmouse pancreatic islets was used to explore whether splicing changes could be detected \nfrom diYerent cell types and cell states. Importantly, the sclrRNA-seq dataset allowed clear \nidentiﬁcation of all expected cell populations ( Figure 2A-B). Furthermore, the analysis \nrevealed that cell type identiﬁcation remains robust whether using gene -level or transcript-\nlevel expression data  for dimensionality reduction and clustering , w ith over 90% \nconcordance between the two approaches  (Figure 2D-G). This stability in broad cell type \nclassiﬁcation aligns with the understanding that major cell types are deﬁned by distinct gene \nexpression patterns. However, when examining substructure within these cell types, \nsubstantial diYerences emerged between gene -level and transcript -level analyses, with \nconsistency ranging from 12% to 92% across subclusters (Figure 2I, Supplemental ﬁgure 2). \nThese ﬁndings suggest that while gene-level expression is suYicient for identifying major cell \ntypes, transcript-level analysis provides crucial insights into subtle variations within cell \npopulations. Such variations may reﬂect diYerent cell states, functions, or responses that \nare not captured by gene-level analysis alone. \nThe primary strength of sclrRNA-seq lies in its ability to capture cell-speciﬁc isoform \nexpression. To assess diYerential splicing, diYerential transcript usage (DTU) analysis was \nconducted alongside diYerential gene expression (DGE) analysis. DTU analysis [18] \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 12 \nidentiﬁes proportional diYerences in the transcript composition of a gene, comparing how \nmuch each transcript contributes to the total gene expression across conditions. Using this \nanalysis, 342 DTU events were identiﬁed between alpha and beta cells, and 57 DTU events \nacross subpopulations of beta cells  (Supplemental table). Speciﬁcally, when comparing \nalpha and beta cells, we identiﬁed isoform-speciﬁc diYerences in Atp5a1, a gene involved in \nATP production and insulin and glucagon secretion  (Figure 3A). Similarly, G6pc2, a  known \nautoantigen in T1D, displays distinct isoform expression between two beta cell \nsubpopulations (0_beta and 3_beta), despite similar overall gene expression levels  (Figure \n3B). Interestingly, alternative splicing of G6pc2 has been shown to drive diYerential \nexpression of G6PC2 transcripts between the pancreas and thymus, highlighting its \npotential as a critical target for isoform -speciﬁc studies [7]. Neither of these genes were \nidentiﬁed by DGE, underscoring the importance of long read sequencing for identifying \npreviously unidentiﬁed RNA diYerences between cell types and cell states.  \n \nDiscussion \nThis study demonstrates how an improved sclrRNA-seq library preparation protocol \nfrom isolated islets produces longer reads and increases the proportion of reads that can be \nconﬁdently assigned to speciﬁc transcripts, improving the utility of long -read sequencing \ndata for identifying splice variants and cellular heterogeneity  in pancreatic endocrine cell \npopulations. Speciﬁcally, this study demonstrates that islet sclrRNA-seq libraries prepared \nwith 5’ protocols outperform those prepared with 3’ protocol s for long -read sequencing. \nEnhancements to the 5’ library preparation further improve s read length and transcript \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 13 \ntagging eYiciency in pancreatic islets. Furthermore, depleting insulin transcripts from the \npancreatic islet libraries proved to be a highly eYective strategy for maximizing informative \nreads, demonstrating the broader potential of targeted transcript depletion in single -cell \nRNA-sequencing experiments. \nWhile the modiﬁed 5' protocol signiﬁcantly improved read length in islet samples, \nlymphocyte samples showed signiﬁcant improvement only with the unmodiﬁed 5' protocol, \nwith no additional beneﬁt from the modiﬁcations. This indicates that individual cell types will \nrequire unique modiﬁcations and further optimizations . Despite the signiﬁcant \nimprovements in read length achieved with the modiﬁed protocol, it did not meet \nexpectations for full -length transcript coverage. Achieving this goal will requir e further \nmodiﬁcations to the 10x chemistry, including adjustments to the master mix and reverse \ntranscriptase. \nAlthough full-length coverage was not achieved for all transcripts, we successfully \nanalyzed transcript expression and identiﬁed diYerential transcript usage across cell types \nand cell subpopulations. These advancements are critical for uncovering the full complexity \nof transcriptomes and hold immense potential for broad application across tissues, \nenabling deeper insights into cellular heterogeneity, isoform regulation, and functional \ndiversity. Furthermore, investigating these variations at the single -cell level enables us to \nuncover the intricate heterogeneity within tissues, oYering a deeper understanding of the \nfunctional and transcriptional diversity that would otherwise go unnoticed.  Understanding \nsplicing dysregulation in pancreatic islets is particularly important, as it may reveal how \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 14 \nalternative splicing shapes beta cell function, immune tolerance, and beta cell susceptibility \nin diabetes. \n \nAcknowledgments \nWe thank Laura White, PhD, and Jay Hesselberth, PhD, for their guidance and support with \nNanopore long-read sequencing technologies. We also thank Scott Beard, BDC Cytometer \nCore Manager, for islet and spleen isolations. \nFunding. This work was supported by grants from the National Institutes of Health \n(P30DK116073 [Lori Sussel] , R01 DK082590 [Lori Sussel], and U01 DK127505 [Lori Sussel]). \nDuality of interest. No potential conﬂicts of interest relevant to this article were reported. \n Author Contributions. M.S.H. was responsible for data acquisition and prepared the \noriginal manuscript. K.L.W. and L.S. reviewed and edited the manuscript. C.J.H. and K.L.W. \ndeveloped the computational pipelines. M.S.H., C.J.H., and K.L.W. contributed to data \nanalysis and the g raphical presentation of results. All authors contributed to the study’s \nmethodology and conceptualization. K.L.W. is the guarantor of this work and had full access \nto all data in the study and takes responsibility for the integrity of the data and the accuracy \nof the data analysis. \n  \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 15 \nReferences \n1. Gonzalez-Porta, M., et al., Transcriptome analysis of human tissues and cell lines \nreveals one dominant transcript per gene. Genome Biol, 2013. 14(7): p. R70. \n2. Black, D.L., Mechanisms of alternative pre-messenger RNA splicing. Annu Rev \nBiochem, 2003. 72: p. 291-336. \n3. Piazzi, M., et al., Alternative Splicing, RNA Editing, and the Current Limits of Next \nGeneration Sequencing. Genes (Basel), 2023. 14(7). \n4. Juan-Mateu, J., O. Villate, and D.L. Eizirik, MECHANISMS IN ENDOCRINOLOGY: \nAlternative splicing: the new frontier in diabetes research. Eur J Endocrinol, 2016. \n174(5): p. R225-38. \n5. JeYery, N., et al., Cellular stressors may alter islet hormone cell proportions by \nmoderation of alternative splicing patterns. Hum Mol Genet, 2019. 28(16): p. 2763-\n2774. \n6. Diez, J., et al., Di\\erential splicing of the IA-2 mRNA in pancreas and lymphoid \norgans as a permissive genetic mechanism for autoimmunity against the IA-2 type 1 \ndiabetes autoantigen. Diabetes, 2001. 50(4): p. 895-900. \n7. Dogra, R.S., et al., Alternative splicing of G6PC2, the gene coding for the islet-\nspeciﬁc glucose-6-phosphatase catalytic subunit-related protein (IGRP), results in \ndi\\erential expression in human thymus and spleen compared with pancreas. \nDiabetologia, 2006. 49(5): p. 953-7. \n8. Daraio, T., et al., SNAP-25b-deﬁciency increases insulin secretion and changes \nspatiotemporal proﬁle of Ca(2+)oscillations in beta cell networks. Sci Rep, 2017. \n7(1): p. 7744. \n9. Lopes, I., et al., Gene Size Matters: An Analysis of Gene Length in the Human \nGenome. Front Genet, 2021. 12: p. 559998. \n10. Shiau, C.K., et al., High throughput single cell long-read sequencing analyses of \nsame-cell genotypes and phenotypes in human tumors. Nat Commun, 2023. 14(1): \np. 4124. \n11. Camunas-Soler, J., et al., Patch-Seq Links Single-Cell Transcriptomes to Human \nIslet Dysfunction in Diabetes. Cell Metab, 2020. 31(5): p. 1017-1031 e4. \n12. Lebrigand, K., et al., High throughput error corrected Nanopore single cell \ntranscriptome sequencing. Nat Commun, 2020. 11(1): p. 4025. \n13. Tian, L., et al., Comprehensive characterization of single-cell full-length isoforms in \nhuman and mouse with long-read sequencing. Genome Biol, 2021. 22(1): p. 310. \n14. Wang, Q., et al., Single cell transcriptome sequencing on the Nanopore platform \nwith ScNapBar. RNA, 2021. 27(7): p. 763-70. \n15. You, Y., et al., Identiﬁcation of cell barcodes from long-read single-cell RNA-seq with \nBLAZE. Genome Biol, 2023. 24(1): p. 66. \n16. Svoboda, M., H.R. Frost, and G. Bosco, Internal oligo(dT) priming introduces \nsystematic bias in bulk and single-cell RNA sequencing count data. NAR Genom \nBioinform, 2022. 4(2): p. lqac035. \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 16 \n17. Shen, L., et al., ngs.plot: Quick mining and visualization of next-generation \nsequencing data by integrating genomic databases. BMC Genomics, 2014. 15: p. \n284. \n18. Tekath, T. and M. Dugas, Di\\erential transcript usage analysis of bulk and single-cell \nRNA-seq data with DTUrtle. Bioinformatics, 2021. 37(21): p. 3781-3787. \n \n \n \nFigure legends \nFigure 1 \nRead length and transcript identiﬁcation comparison between single -cell long -read RNA-\nsequencing (sclrRNA-seq) libraries. (A) Read length distribution of published sclrRNA -seq \nlibraries prepared with 10x Genomics and Nanopore technology. Biological replicates are \nincluded for datasets from Lebrigand et al., 2020 and Wang et al., 2021; other datasets are \nshown as single samples. (B) Proportion of reads across datasets where the gene is \nidentiﬁed, the transcript is identiﬁed, or neither is identiﬁed. Shown are  published \nreanalyzed datasets and three mouse pancreatic islet samples: one prepared with 3′ 10x \nGenomics technology, one with 5′ 10x Genomics technology, and one with 5′ 10x Genomics \ntechnology incorporating library preparation optimizations. (C) Proport ion of reads aligned \nto Ins1 or Ins2 in a single -cell RNA-seq analysis of mouse pancreatic islets pre - and post-\ninsulin depletion. (D) Volcano plot depicting diYerential gene expression between non -\ndepleted and insulin-depleted bulk RNA-seq libraries from mouse pancreatic islets. (E) NGS \ncoverage plot indicating read start sites across the genomic region. Libraries are single cell \n10x Genomics preparations derived from human DTC melanoma cells. (F) Overview of the \nexperimental workﬂow. (G) Read length dis tribution comparing a mouse pancreatic islet \nsclrRNA-seq library prepared using 3′ 10x Genomics technology to published sclrRNA-seq \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 17 \ndatasets. (H) Read length distribution comparing a mouse pancreatic islet sclrRNA -seq \nlibrary prepared with 5′ 10x Genomics technology to published sclrRNA -seq datasets. (I) \nRead length distribution comparing a mouse islet sclrRNA-seq library prepared using 5′ 10x \nGenomics technology with protocol optimizations to published sclrRNA-seq datasets. \n \nSupplemental Figure 1 \n(A) Proportion of reads across datasets where the gene is identiﬁed, the transcript is \nidentiﬁed, or neither is identiﬁed. Shown are reanalyzed published datasets, three mouse \npancreatic islet samples (prepared with 3′ 10x Genomics technology, 5′ 10x Genomics \ntechnology, and 5′ 10x Genomics technology with library preparation optimizations), and \nthree mouse lymphocyte/spleen samples prepared using the same methods as the \npancreatic islet libraries. \n \nFigure 2 \nComparison of single -cell clustering based on gene expression versus transcript -level \nexpression. (A) UMAP projection of single cells based on gene -level expression. Cells are \ngrouped by gene expression proﬁles reﬂecting major pancreatic cell types.  (B) Heatmap \nshowing expression of cell type-speciﬁc markers across single cells from mouse pancreatic \nislets. (C) UMAP projection of single cells based on gene-level expression, grouped by gene \nexpression proﬁles reﬂecting cell subpopulations. (D) UMAP projection of single cells based \non transcript-level (isoform) expression. Grouped by transcript expression proﬁles reﬂecting \nmajor pancreatic cell types.  (E) UMAP projection of single cells based on transcript-level \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n 18 \nexpression, grouped by transcript expression  proﬁles reﬂecting cell subpopulations. (F) \nConfusion matrix showing concordance in cell type identiﬁcation between gene-based and \ntranscript-based clustering. (G) Bar plot quantifying cell type concordance between \nclustering methods. (H) Confusion matrix showing low concor dance in beta cell \nsubpopulation identiﬁcation between clustering methods. (I) Bar plot quantifying beta cell \nsubpopulation concordance between clustering methods. \n \nSupplemental Figure 2 \n(A) Confusion matrix showing concordance in alpha cell subpopulation identiﬁcation \nbetween gene -based and transcript -based clustering. (B) Bar plot quantifying alpha cell \nsubpopulation concordance between clustering methods. (C) Confusion matrix showing \nlow concordance in delta cell subpopulation identiﬁcation between clustering methods. (D) \nBar plot quantifying delta cell subpopulation concordance between clustering methods. \n \n \nFigure 3 \nDiYerential transcript usage between cell types and cell subpopulations. (A) DiYerential \ngene expression (DGE), diYerential transcript expression (DTE), and diYerential transcript \nusage (DTU) analysis of Atp5a1. (B) DGE, DTE, DTU analysis of G6pc2.  \n \n \n \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\nIns1 Ins2\n0\n30\n60\n90\n−5 0 5\n Log2 fold change \n −Log10 P\nNS p−value p−value and log2 FC\n0\n25\n50\n75\n100\nshiau23you233' islet tian21wangfc1 wangfc2\nlebrigand190\n5' islet\nlebr\nigand9505' mod islet\nPercent\nType\nUntagged\nGene\nTranscript\n0 1000 2000 3000\nDensity\nlebrigand190\nlebrigand900\nshiau23\ntian21\nwangfc1\nwangfc2\nyou23\nRead length (bp)\nBA\n1 2 3 4\nGenomic Region (5' −> 3')\nRead count per million mapped reads\n−2000 TSS 33% 66% TES 2000\n3’ library\n5’ library\n0 1000 2000 3000\nRead length (bp)\nDensity\n3' islet\nlebrigand190\nlebrigand900\nshiau23\ntian21\nwangfc1\nwangfc2\nyou23\n0 1000 2000 3000\nRead length (bp)\nDensity\n5' mod islet\nlebrigand190\nlebrigand900\nshiau23\ntian21\nwangfc1\nwangfc2\nyou23\n0 1000 2000 3000\nRead length (bp)\nDensity\n5' islet\nlebrigand190\nlebrigand900\nshiau23\ntian21\nwangfc1\nwangfc2\nyou23\nC D E\nG H I\n0\n25\n50\n75\n100\nNo depletion Insulin depletion\nPercent reads\nall other reads\nIns1 reads Ins2 reads\nF\ninternal priming\nFigure 1: Read length and transcript identification comparison between single-cell long-read RNA-sequencing (sclrRNA-seq) libraries. \n(Read length and transcript identification comparison between single-cell long-read RNA-sequencing (sclrRNA-seq) libraries. (A) Read length \ndistribution of published sclrRNA-seq libraries prepared with 10x Genomics and Nanopore technology. Biological replicates are included for \ndatasets from Lebrigand et al., 2020 and Wang et al., 2021; other datasets are shown as single samples. (B) Proportion of reads across datasets \nwhere the gene is identified, the transcript is identified, or neither is identified. Shown are published reanalyzed datasets and three mouse \npancreatic islet samples: one prepared with 3′ 10x Genomics technology, one with 5′ 10x Genomics technology, and one with 5′ 10x Genomics \ntechnology incorporating library preparation optimizations. (C) Proportion of reads aligned to Ins1 or Ins2 in a single-cell RNA-seq analysis of \nmouse pancreatic islets pre- and post-insulin depletion. (D) Volcano plot depicting differential gene expression between non-depleted and \ninsulin-depleted bulk RNA-seq libraries from mouse pancreatic islets. (E) NGS coverage plot indicating read start sites across the genomic region. \nLibraries are single cell 10x Genomics preparations derived from human DTC melanoma cells. (F) Overview of the experimental workflow. (G) \nRead length distribution comparing a mouse pancreatic islet sclrRNA-seq library prepared using 3′ 10x Genomics technology to published \nsclrRNA-seq datasets. (H) Read length distribution comparing a mouse pancreatic islet sclrRNA-seq library prepared with 5′ 10x Genomics \ntechnology to published sclrRNA-seq datasets. (I) Read length distribution comparing a mouse islet sclrRNA-seq library prepared using 5′ 10x \nGenomics technology with protocol optimizations to published sclrRNA-seq datasets.\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n−5\n0\n5\n10\n0 10umap_1\numap_2\nalpha\nbeta\ndelta\nEC\nMΦ\nppy\nqSC\nTranscript clustering\n−5\n0\n5\n10\n0 10umap_1\numap_2\n0_beta\n1_alpha\n10_ESC\n11_qSC\n12_qSC\n13_MΦ\n14_delta\n15_delta\n2_beta\n3_beta\n4_delta\n5_beta\n6_alpha\n7_ppy\n8_delta\n9_delta\n−10\n−5\n0\n5\n−5 0 5 10 15\numap_1\numap_2\nalpha\nbeta\ndelta\nEC\nMΦ\nppy\nqSC\nGene clustering BA D\nE\nH\nC\nF\nG\nalpha\nbeta\ndelta\nppy\nEC\nMΦ\nqSC\nalpha\nbeta\ndelta\nppy\nEC\nMΦ\nqSC\n0.2\n0.4\n0.6\n0.8\n1\n0\nGene clustering\nTranscript clustering\n97.3%97.3% 92.6% 93.4%\n0\n25\n50\n75\n100\nAlpha Beta Delta Ppy\nPercentage Overlap\n69.3%58.7%\n12.3%\n43.3%\n91.6%\n0\n25\n50\n75\n100\nC1 C2 C3 C4 C5\nPercentage Overlap\n0_beta\n3_beta\n10_beta\n5_beta\n2_beta\n0_beta\n2_beta\n3_beta\n5_beta\n0\n0.2\n0.4\n0.6\n0.8\n1\nGene clustering\nTranscript clustering\nC1\nC2\nC3 C4\nC5\nGcg\nArx\nNkx6−1\nMafa\nIns1\nIns2\nSst\nFlt1\nPecam1\nTyrobp\nCd52\nPpy\nnew_identity\nalpha\nbeta\ndelta\nEC\nMΦppy\nqSC\n−1\n−0.5\n0\n0.5\n1\n1.5\n2\n2.5\n−10\n−5\n0\n5\n−5 0 5 10 15\numap_1\numap_2\n0_beta\n1_alpha\n10_beta\n11_qSC\n12_MΦ\n13_qSC\n14_delta\n15_delta\n16_ductal\n2_beta\n3_beta\n4_delta\n5_beta\n6_alpha\n7_ppy\n8_delta\n9_EC\nGene clustering Transcript clustering\nI\nFigure 2: Comparison of single-cell clustering based on gene expression versus transcript-level expression. (A) UMAP projection of \nsingle cells based on gene-level expression. Cells are grouped by gene expression profiles reflecting major pancreatic cell types. (B) Heatmap \nshowing expression of cell type-specific markers across single cells from mouse pancreatic islets. (C) UMAP projection of single cells based on \ngene-level expression, grouped by gene expression profiles reflecting cell subpopulations. (D) UMAP projection of single cells based on \ntranscript-level (isoform) expression. Grouped by transcript expression profiles reflecting major pancreatic cell types. (E) UMAP projection of \nsingle cells based on transcript-level expression, grouped by transcript expression profiles reflecting cell subpopulations. (F) Confusion matrix \nshowing concordance in cell type identification between gene-based and transcript-based clustering. (G) Bar plot quantifying cell type \nconcordance between clustering methods. (H) Confusion matrix showing low concordance in beta cell subpopulation identification between \nclustering methods. (I) Bar plot quantifying beta cell subpopulation concordance between clustering methods.\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\n0.00\n0.25\n0.50\n0.75\n1.00\nENSMUST00000005364 ENSMUST00000112317\nProportion\ncell_type\n0_beta\n3_beta\n0\n1\n2\n3\n4\n0_beta 3_beta\nExpression Level\n0.00\n0.25\n0.50\n0.75\n1.00\nENSMUST00000026495 ENSMUST00000114748Proportion\ncell_type\nalpha\nbeta\n0\n1\n2\n3\nENSMUST00000026495 ENSMUST00000114748\nExpression Level\n0\n1\n2\n3\nalpha beta\nExpression Level\nAtp5a1 gene expression Atp5a1 transcript expression Atp5a1 transcript usage\nG6pc2 gene expression G6pc2 transcript expression G6pc2 transcript usage\nA\nB\n0\n1\n2\n3\n4\nENSMUST00000005364 ENSMUST00000112317\nExpression Level\nFigure 3: Differential transcript usage between cell types and cell subpopulations. (A) Differential transcript usage between cell types and \ncell subpopulations. (A) Differential gene expression (DGE), differential transcript expression (DTE), and differential transcript usage (DTU) \nanalysis of Atp5a1. (B) DGE, DTE, DTU analysis of G6pc2. \n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\nA\n0\n25\n50\n75\n100\nshiau233‘ spleen3' isletyou23tian21\nwang_fc1wang_fc2\n5‘ mod spleenlebrigand190lebrigand950\n5' islet\n5' mod islet5‘ spleen\nPercent\nType\nUntagged\nGene\nTranscript\nSupplemental Figure 1:  (A) Proportion of reads across datasets \nwhere the gene is identified, the transcript is identified, or neither is \nidentified. Shown are reanalyzed published datasets, three mouse \npancreatic islet samples (prepared with 3′ 10x Genomics technology, 5′ \n10x Genomics technology, and 5′ 10x Genomics technology with library \npreparation optimizations), and three mouse lymphocyte/spleen \nsamples prepared using the same methods as the pancreatic islet \nlibraries.technology, and 5′ 10x Genomics technology with library \npreparation optimizations), and three mouse spleen samples prepared \nusing the same methods as the pancreatic islet libraries.\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint \n\nA B\n1_alpha 6_alpha\n1_alpha6_alpha\n0.2\n0.3\n0.4\n0.5\n0.6\n0.7\n0.8\n89.2%\n74.7%\n0\n25\n50\n75\n100\n1_alpha 6_alpha\nPercentage Overlap\nC D\nC1\nC2\nC5\n14_delta\n4_delta\n8_delta\n15_delta\n4_delta\n8_delta\n9_delta\n15_delta\n14_delta\n0\n0.2\n0.4\n0.6\n0.8\n1\nC4\nC3\n7.1%\n98.9%100.0%96.9%100.0%\n0\n25\n50\n75\n100\nC1 C2 C3 C4 C5\nPercentage Overlap\nSupplemental Figure 2: Confusion matrix showing concordance in \nalpha cell subpopulation identification between gene-based and \ntranscript-based clustering. (B) Bar plot quantifying alpha cell \nsubpopulation concordance between clustering methods. (C) \nConfusion matrix showing low concordance in delta cell subpopulation \nidentification between clustering methods. (D) Bar plot quantifying \ndelta cell subpopulation concordance between clustering methods.\n.CC-BY 4.0 International licenseavailable under a \nwas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made \nThe copyright holder for this preprint (whichthis version posted May 7, 2025. ; https://doi.org/10.1101/2025.04.30.651101doi: bioRxiv preprint","source_license":"CC-BY-4.0","license_restricted":false}