Keywords
18
Diplonemids, subcellular proteomics, cell membrane, metabolism, 19
20
Abstract
21
Diplonemids are among the most diverse and abundant protists in the deep ocean, have 22
extremely complex and ancient cellular systems, and exhibit unique metabolic capacities. 23
Despite this, we know very little about this major group of eukaryotes. To establish a model 24
organism for comprehensive investigation, we performed subcellular proteomics on 25
Paradiplonema papillatum and localized 4,870 proteins to 22 cellular compartments. We 26
additionally confirmed the predicted location of several proteins by epitope tagging and 27
fluorescence microscopy. To probe the metabolic capacities of P . papillatum, we explored the 28
proteins predicted to the cell membrane compartment in our subcellular proteomics dataset. 29
Our data revealed an accumulation of many carbohydrate active enzymes (CAZymes). Our 30
predictions suggest that these CAZymes are exposed to extracellular space, supporting 31
proposals that diplonemids may specialize in breaking down carbohydrates in plant and algal 32
cell walls. Further exploration of carbohydrate metabolism revealed an evolutionary 33
divergence in the function of glycosomes (modified peroxisomes) in diplonemids versus 34
kinetoplastids. Our subcellular proteome provides a resource for future investigations into the 35
unique cell biology of diplonemids. 36
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
2
37
Introduction
38
Diplonemids are unicellular, heterotrophic eukaryotes, which constitute one of the most 39
abundant and species-rich protist groups within the world’s oceans (Flegontova, et al. 2016; 40
Gawryluk, et al. 2016). In addition, recent investigations show a comprehensive distribution 41
of diplonemids in freshwater environments (Mukherjee, et al. 2020), as well as in all pelagic 42
zones of the ocean (Obiol, et al. 2020; Lax, et al. 2024). Global metabarcoding estimates > 43
67,000 species of diplonemids worldwide, and therefore, they are presumed to be key 44
ecological players in all marine ecosystems (Tashyreva, et al. 2022). 45
Despite their importance, our knowledge of diplonemid nutrition strategies, ecological roles 46
as well as their molecular and cellular biology remains limited. Beyond general heterotrophy 47
(Prokopchuk, et al. 2022), investigating their lifestyles and specific feeding modes remains 48
challenging, partly due to the difficulty in observing diplonemid behavior in nature. By 49
contrast, the relative ease by which diplonemids can be established in stable axenic cultures 50
(typically in protein-rich media) is promising, and makes them amenable to an expanding 51
range of genomic, transcriptomic and proteomic experiments (Škodová-Sveráková, et al. 52
2021; Valach, Moreira, et al. 2023). Such techniques are necessary to further characterize 53
diplonemids’ cellular and ecological functions. 54
A high-quality nuclear genome is available for the diplonemid Paradiplonema papillatum 55
(formerly Diplonema) (Valach, Moreira, et al. 2023), with two recent assemblies now 56
available for Diplonema japonicum (Tashyreva, Faktorová, Stříbrná, et al. 2025) and 57
Rhynchopus euleeides (Tashyreva, Faktorová, Horák, et al. 2025), in addition to several 58
previously existing transcriptomes (Tashyreva, et al. 2022). However, P . papillatum remains 59
the only genetically tractable diplonemid, enabling functional investigations by gene deletion 60
(Faktorová, Kaur, et al. 2020), endogenous tagging of proteins (Akiyoshi, et al. 2025), and 61
immunoprecipitation (Valach, Benz, et al. 2023). Such tractability has allowed the 62
investigation of P . papillatum respiratory complexes (Valach, et al. 2018), mitochondrial 63
ribosomes (Valach, Benz, et al. 2023), and kinetochores (Benz, et al. 2024). Diplonemids 64
retain many genes that can be traced to the last eukaryotic common ancestor (LECA), 65
including rare, restricted homologs referred to as jotnarlogs (Záhonová, et al. 2025). Thus, 66
diplonemids may prove particularly informative for understanding the complexities of the 67
ancestral eukaryote (Richards, et al. 2024). 68
Among the many protein-coding genes predicted from its genome, an unexpected finding in 69
P . papillatum was the identification of several hundred carbohydrate active enzymes 70
(CAZymes), with the capacity to digest pectin, cellulose, and -1,3 glycans among other 71
carbohydrates (Valach, Moreira, et al. 2023). This expanded CAZyme repertoire is 72
particularly prominent compared to their relatives, Euglenida and Kinetoplastea (Valach, 73
Moreira, et al. 2023). Such presence implies a proclivity of P . papillatum (and potentially 74
other diplonemids) towards digestion of cell wall components of plants and algae. However, 75
it is unclear how these organisms can specifically digest the cell walls of photosynthetic 76
eukaryotes. Osmotrophy has been proposed (Prokopchuk, et al. 2022), through secreting 77
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
3
enzymes to their exterior, as well as phagotrophy, internally ingesting components of their 78
prey. Though P . papillatum is a tractable species, tagging and visualizing hundreds of 79
CAZymes to determine their localization is unrealistic. We therefore sought to perform 80
subcellular proteomics to localize CAZymes to various intracellular compartments. 81
Here, we use a subcellular proteomics workflow similar to localization of organelle proteins 82
by isotope tagging via differential ultracentrifugation (LOPIT-DC) (Geladaki, et al. 2019), to 83
produce the first subcellular proteome of a diplonemid. With our data, we classified 4,870 84
proteins to 22 cellular compartments in P . papillatum. We validated several predicted 85
locations by epitope and fluorescent tagging. Our subcellular proteome provided a clearly 86
resolved cluster of cell membrane proteins enriched with secreted CAZymes. We suggest 87
these enzymes can actively degrade plant and algal cell walls, initially at the cell’s exterior. 88
We also show an ability for internal carbohydrate processing with various secreted CAZymes 89
distributed to the lysosomal compartments, and expand on traditional carbohydrate 90
metabolism across glycosomes and the cytoplasm, demonstrating their diverged 91
compartmentalization from their sister clade Kinetoplastea (Opperdoes and Michels 1993). 92
Finally, we reveal an extensive mitochondrial capacity for varied amino acid digestion, 93
foregrounding the metabolic versatility of this model diplonemid. Our localization of 94
thousands of P . papillatum proteins provides a repository of information that will extend our 95
knowledge of diplonemids, facilitating an exploration of their unusual cell biology and 96
function. 97
98
Results
and Discussion 99
Subcellular proteomics allows predictive clustering of P . papillatum proteins into 22 100
distinct compartments 101
To obtain a subcellular map of P . papillatum, we used a modified workflow adapted from a 102
LOPIT-DC protocol described previously (Geladaki, et al. 2019). Briefly, cells were grown 103
axenically in ‘Diplo’ media (sea water supplemented with 10% Fetal Bovine Serum and 1 g 104
tryptone). Approximately 9.9 x 108 cells per sample were collected and lysed in detergent-free 105
lysis buffer in a nitrogen cavitator (250 psi for 10 min). Cell lysates underwent differential 106
centrifugation resulting in 11 distinct fractions, including initial unlysed cells. We used 107
western blot analysis using antibodies against ATP synthase subunit from Trypanosoma 108
brucei (Šubrtová, et al. 2015), mammalian Grp75 (Joseph, et al. 2013) and Grp78 (Chou, et 109
al. 2020) to ensure fractional proteomic profiles were distinct (Suppl. Fig. 1). 110
Label-free quantification (LFQ) analysis was followed by peptide data analysis in 111
ProteomeDiscoverer (Orsburn 2021) and with R, primarily via the pRoloc package (Breckels, 112
et al. 2016). Data was quantified against the nuclear and mitochondrial genomes of P . 113
papillatum (Valach, Moreira, et al. 2023). After quality control, 4,870 unique proteins were 114
detected in this dataset. Following normalization, proteins lacking peptide coverage in all 115
fractions underwent and imputation via ‘neighbor averaging’ (1,285 proteins) as well as 116
‘zero’ methods (2,073 proteins). 117
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
4
To predict cellular localization for the P . papillatum subcellular proteome, we manually 118
curated a set of 368 proteins constituting markers with canonical localizations (e.g. 119
mitochondrion, flagellum, cytosol), specific functions (e.g. membrane trafficking 120
compartments) or those with inferred localization data, corresponding to a total of 22 distinct 121
subcellular compartments or protein complexes (Table S1). Using a median svm cutoff (Table 122
S2), we predicted sub-localization of 2,435 proteins (Fig. 1A,B), with the remainder 123
additionally classified to these compartments with lower confidence (Table S3; Suppl. Fig. 124
2). To further corroborate our designated clusters, we mapped predicted target signals and 125
protein features onto the t-SNE distributions (Fig. 1C). Mitochondrial target peptides (mTP, 126
predicted via TargetP2.0) (Armenteros, et al. 2019) are abundant across the three 127
mitochondrial clusters—matrix, protein complexes and membrane-enriched. Signal peptides, 128
predicted via SignalP6.0 (Teufel, et al. 2022) show enrichment across soluble lysosome, cell 129
membrane, endoplasmic reticulum (ER)/Golgi clusters, as well as endocytic and 130
multivesicular membrane trafficking compartments. Finally, transmembrane domains (TMD), 131
predicted via DeepTMHMM (Hallgren, et al. 2022) correlate to the various membrane-132
enriched clusters of the diplonemid cell. 133
Next, we highlighted proteins that exhibit differences in abundance when P. papillatum was 134
grown in different media: ‘Diplo’ versus ‘Hemi’ media (sea water supplemented with 10 ml 135
inactivated horse serum and 1 ml/L LB medium), and oxygen abundant versus depleted 136
conditions (Fig. 1D) (Škodová-Sveráková, et al. 2021). Cells grown in nutrient-rich ‘Diplo’ 137
medium show enrichment for proteins predicted to the cytosolic ribosome and cell membrane 138
clusters, including sodium/potassium exchangers and sterol transporters. The nutrient-poorer 139
‘Hemi’ medium showed notable enrichment across multiple clusters, including the 140
proteasome, cytosol, soluble lysosome and mitochondrial regions (Fig. 1D). Equally, aerobic 141
conditions resulted in the enrichment of several hypothetical cell membrane components, 142
subunits of mitochondrial complex IV, as well as various soluble lysosomal proteases. By 143
contrast, anaerobic conditions induced enrichment across clusters of the cytosol, cytosolic 144
ribosomes, mitochondrial matrix, and translation initiation factors 2 and 3 (Fig. 1D). 145
146
Endogenous tagging confirms subcellular localizations inferred from proteomic data 147
To validate designated clusters, we successfully performed endogenous tagging with either 148
V5 or YFP epitopes on 12 proteins predicted or classified to various cell compartments, 149
which typically lack both annotation and homologs outside diplonemids (Fig. 2). Such 150
proteins were ultimately located to the flagella (Fig. 2A), cytoplasm (B,C), mitochondrion 151
(D,E), ER/Golgi (F), nucleus (G,H,I), nucleolus (J), endocytic membrane trafficking (K) and, 152
finally, the cell membrane (L), encompassing nine defined clusters in total (Table S4). 153
Mitochondrial proteins DIPPA_24150 and DIPPA_15120 co-localize with the organellar 154
DNA within this reticulated mitochondrion (Suppl. Fig. 3) at the cell periphery (Figs. 2D and 155
E). In turn, the tagged ER/Golgi candidate DIPPA_04811 shows a signal surrounding the 156
nuclear DNA, while also branching and extending into the cell posterior (Fig. 2F). Next, we 157
validated four proteins assigned to the nucleus, which show different sub-localizations by 158
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
5
immunofluorescence analysis (IFA) within this compartment (Figs. 2G-J). The first nuclear 159
candidate (DIPPA_16310) has a patchy distribution on the outermost periphery of the nuclear 160
DNA (Fig. 2G). Unlike the novel ER/Golgi protein (Fig. 2F), this candidate does not extend 161
beyond the nucleus, and hence, likely constitutes a novel nuclear membrane component. A 162
second nuclear candidate (DIPPA_32825) co-localizes with the chromatin signal of the 163
nucleus (Fig. 2H), similar to the general nuclear signal of a third selected nuclear protein 164
(DIPPA_24937) (Fig. 2I). The last nuclear candidate (DIPPA_00315) displays a confined 165
distribution within the nucleus, corresponding to the nucleolus (Fig. 2J). Similarly, this 166
protein’s uncharacterized homolog within the kinetoplastid T. brucei (Tb927.3.2750) also 167
displays a nucleolus-like signal when tagged via green fluorescent protein (Billington, et al. 168
2023). 169
One protein (DIPPA_21158) classified to the ‘endocytic membrane trafficking’ compartment, 170
seemingly exhibits dual localization, with an ER-like pattern similar to DIPPA_04811 (Fig. 171
2F), while also showing enrichment towards and encompassing the cell cytopharynx (Fig. 172
2K). Finally, a protein predicted to the cell membrane cluster (DIPPA_16504) (Fig. 2), shows 173
a signal enriched across the cell outline, excepting the apical papilla (Fig. 2L). This protein 174
possesses a signal peptide and a TMD, both of which are enriched for proteins predicted to 175
the cell membrane (Fig. 1C). This cell membrane cluster also exhibits an accumulation of 176
predicted signal peptides in tandem with glycosylphosphatidylinositol (GPI)-attachment 177
domains (Suppl. Fig. 4), further supporting the validity of this newly defined cluster. 178
179
Secreted CAZymes localize to the cell membrane and lysosomes 180
Carbohydrate-Active Enzymes (CAZymes) are particularly abundant in P . papillatum, 181
suggesting complex digestive capabilities against plant and algal cell wall carbohydrates 182
(Valach, Moreira, et al. 2023). Through our subcellular dataset, we show a notable proportion 183
of CAZymes enriched with signal peptides localized with high confidence to the cell 184
membrane and the lysosome (Fig. 3). Schematic diagrams of these cell membrane CAZymes 185
show the presence of a C-terminal TMD and/or GPI anchor sites, preceded by the catalytic 186
domains of associated enzymes. This topology indicates that the CAZyme domains are 187
exposed to extracellular space and thus expected to digest external carbohydrate substrates 188
(Fig. 3A). Enzymatic domains present include pectin esterase, pectin lyase and glycosyl 189
hydrolases, from which we construct a digestion pathway on the cell membrane to externally 190
degrade methylated pectin to galacturonic acid monomers (Fig. 3A). Some CAZymes of the 191
cell membrane lack the predicted TMDs or GPI anchors, such as glycosyl hydrolase, which 192
degrades hemicellulose to glucose, xylose and galactose. It remains a possibility that such 193
CAZymes are released into the extracellular space or simply lack identifiable motifs for cell 194
anchorage. 195
Candidate sugar transporters, recently identified through genome analysis(Valach, Moreira, et 196
al. 2023), were not localized to the cell membrane cluster, rather being assigned to the 197
ER/Golgi and glycosome compartments (Table S3). Thus, we propose that instead of being 198
passaged directly to the cytoplasm across the cell membrane, digested or partially digested 199
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
6
carbohydrate substrates are engulfed through the cytopharynx, leading to trafficking through 200
the endocytic vesicles, which have been observed prominently budding off from this 201
distinctive structure in diplonemids (Tashyreva, et al. 2023). Within the endocytic membrane 202
trafficking cluster of this dataset, we also identified one secretory CAZyme (Fig. 3B). 203
Endocytosed contents are typically passaged to the lysosomal compartments, for which we 204
also define a corresponding cluster of soluble proteins containing numerous signal peptide-205
bearing CAZymes, with the ability to digest various forms of pectin and other polysaccharide 206
chains, such as sucrose and glycosides (Fig. 3C). We additionally predict one sugar 207
transporter (DIPPA_16016.mRNA.1) to the multivesicular membrane trafficking body, 208
enriched for V-type ATPases and other membranous components of the lysosome, suggesting 209
eventual saccharide transport from these organelles to the cytosol and possibly other 210
compartments. 211
Given that these analyzed cells were grown in the protein-rich ‘Diplo’ medium, we did not 212
necessarily expect an abundance of CAZymes in our extractions. Nonetheless, we detected a 213
total of 94 different enzymes across our subcellular dataset, 55 of which were not recorded in 214
previous studies (Table S5) (Škodová-Sveráková, et al. 2021; Valach, Moreira, et al. 2023). 215
The proteomic presence of these enzymes in a mostly carbohydrate-depleted medium 216
suggests that most CAZymes are permanently expressed regardless of substrate availability. 217
We further note that in previous cultivation studies, the lysosomal CAZymes identified in this 218
study showed conditional enrichment, while the newly identified CAZymes of the cell 219
membrane do not change in the face of different conditions or media (Fig. 1D) (Škodová-220
Sveráková, et al. 2021). Such constitutive presence supports suggestions recently made for 221
plants and algae being the primary food source of P . papillatum in nature, potentially making 222
use of both carbohydrates on the external cell walls, as well as the internal proteinaceous 223
energy sources (Valach, Moreira, et al. 2023). 224
The soluble lysosome contains a chitinase (Fig. 3C), while in the endocytic trafficking 225
compartment we documented a complementary glucuromannan-digesting GH92, which 226
combined suggests a proclivity for fungal cell wall digestion (Fig. 3B). The single 227
observation of P . papillatum regarding its in natura behavior comes from its initial isolation 228
from drifting eelgrass (Porter 1973), a plant that is known to harbor various fungal 229
cohabitants on its surface (Newell 1981). This documented enzymatic sub-localization 230
appears consistent with such a supposition for varied sources of prey. Interestingly, a single 231
CAZyme member, xylan-α-glucuronidase, is predicted with high confidence to the cytosol, 232
despite the presence of an N-terminal signal sequence. This enzyme is additionally predicted 233
to have been acquired via horizontal gene transfer from a bacterial endosymbiont, for which 234
diplonemids have shown a propensity for acquisition (George, et al. 2022; Tashyreva, 235
V otýpka, et al. 2025), though absent from the extant P . papillatum. 236
237
Subcellular distribution of glycolysis/gluconeogenesis enzymes reveals novel glycosomal 238
insights for P . papillatum 239
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
7
Diplonemids and their sister lineage, the mostly parasitic kinetoplastids, are categorized as 240
glycomonads due to their shared compartmentalization of part of their glycolytic pathways in 241
specialized peroxisomes called glycosomes (Michels, et al. 2006), yet the extent to which 242
these organelles retain the same function in both lineages is unclear. Kinetoplastids localize 243
the first seven steps of glycolysis to glycosomes (Opperdoes and Michels 1993), while in 244
diplonemids peroxisomal targeting signals (PTS) are predicted in six enzymatic steps 245
suggesting a similar metabolic arrangement, although only five enzyme have been 246
experimentally confirmed to colocalize with known peroxisomal proteins (Makiuchi, et al. 247
2011; Morales, et al. 2016). Here, we use our subcellular proteomics dataset to partly confirm 248
and expand on previous analyses (Fig. 4). As our glycosome showed fractional similarity 249
with other organelle clusters, we only confirmed two enzymatic steps to this organelle, one of 250
which (step III) representing a newly described designation in P . papillatum (Fig. 4A). 251
However, we confirmed the cytosolic localization of four enzymes, glyceraldehyde 3-252
phosphate dehydrogenase (GADPH, step VI), phosphoglycerate mutase (PGAM, step VIII), 253
enolase (step IX) and pyruvate kinase (Step Xa) (Morales, et al. 2016). 254
Certain enzymes were not detected in previous investigations; thus, it came as a surprise that 255
we detected in the glycosome both phosphofructokinase (PFK) and fructose 1,6-256
biphosphatase (FBP), which typically participate in glycolysis (IIIa) and gluconeogenesis 257
(IIIb), respectively (Fig. 4A; Suppl. Fig. 5). Their localization demonstrates a capacity for 258
this organelle to mediate both directions of this pathway (Fig. 4A). The genome of P . 259
papillatum encodes two PFKs (Morales, et al. 2016), with PFK1 (DIPPA_21987) being a 260
PPi-dependent variant horizontally acquired from a bacterium (Škodová-Sveráková, et al. 261
2021), which is typically able to function in an ATP-poor environment. PFK1 also shows the 262
potential to engage in gluconeogenesis (Škodová-Sveráková, et al. 2021), which along with 263
FBP further supports the capacity of P . papillatum ‘glycosomes’ to perform steps of 264
gluconeogenesis. We further note the prediction of a TMD in PFK1, the presence of which 265
represents an unusual feature for enzymes of this pathway (Fig. 4A), though not without 266
precedent (Jirsová, et al. 2025). We propose that the N-terminal TMD allows insertion of the 267
enzyme from within the glycosome, exposing its enzymatic domains to the organellar lumen 268
(Suppl. Fig. 6A). While previous transcriptome analysis recorded an additional PTS1-lacking 269
PFK with presumable cytosolic residence (Škodová-Sveráková, et al. 2021), a survey of the 270
now complete genome confirmed only the presence of PFKs furnished with PTS (Valach, 271
Moreira, et al. 2023). 272
One copy (DIPPA_70192) of fructose-biphosphate aldolase (FBA, step IV) was previously 273
localized to the glycosomes and indeed shows a corresponding fractional pattern in our 274
dataset (Fig. 4A; Suppl. Fig. 5). A second FBA (DIPPA_30805), also bearing a PTS2 motif, 275
displays a more subdued profile with less similarity to the cytosolic and glycosomal 276
fractional profiles (Suppl. Fig. 6B). We interpret this as co-localization in both compartments, 277
a phenomenon described for several peroxisomal proteins across eukaryotes (Freitag, et al. 278
2018). Moreover, while in ‘Hemi’ media, the PTS1-bearing copies of G6P and TIM have 279
been localized to the glycosomes (Morales, et al. 2016), in our dataset they occupy an 280
ambiguous position that similarly implies a dual localization, which contrasts with their 281
confidently placed cytosolic counterparts which lack a PTS (Fig. 4A; Suppl. Fig. 6C,D). 282
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
8
We additionally localized multiple copies of PTS-lacking GADPH distributed to either the 283
cell membrane or the mitochondrion, as described elsewhere (Bártulos, et al. 2018). We 284
demonstrate a convincing cytosolic localization for four additional paralogues of enzymes 285
lacking a PTS, namely glucose 6-phophate isomerase (G6P, step II), triosephosphate 286
isomerase (TIM, step V), phosphoglycerate kinase (PGK, step VII) and PGAM (Fig. 4A). 287
While such cytosolic localizations may facilitate the glycolytic processing from 288
glyceraldehyde-3-phosphate to pyruvate, they alternatively reveal the potential for partial 289
cytosolic gluconeogenesis initiated by processing oxaloacetate to phosphoenolpyruvate via 290
cytosol-localized PEP carboxykinase (Fig. 4A). Along with other reversible steps of 291
glycolysis, proteomic enrichment was reported for this enzyme from cells grown in the 292
glucose-depleted ‘Hemi’ medium (Škodová-Sveráková, et al. 2021), reflecting an increased 293
use of gluconeogenesis under such conditions (Table S6). 294
Metabolically adjacent to glycolysis and gluconeogenesis is the pentose phosphate pathway 295
(PPP), which facilitates the interconversion of simple carbohydrates of different sizes (Fig. 296
4B). In kinetoplastids, several PPP enzymes possess a PTS, producing a glycosomal or dual 297
glycosomal and cytosolic localization (Kovárová and Barrett 2016). However, P . papillatum 298
encodes only a single PTS1-possessing enzyme, phosphogluconolactonase (step II) (Fig. 4B). 299
Despite its targeting signal, our dataset suggests the enzyme localizes to the cytoplasm 300
(Suppl. Fig. 6E), demonstrating that, similar to the localizations of certain proteins in T. 301
brucei (Güther, et al. 2014), the presence of a PTS does not guarantee peroxisomal targeting 302
(Fig. 4B,C). While ribulose-5-phosphate 3-epimerase (step IV of PPP) and trans-aldolase 303
(step VI of PPP) are classified to the soluble lysosome, considering the fractional similarity 304
of this cluster to that of the cytosol, we regard them as cytosolic (Fig. 4A-C; Suppl. Fig. 5). 305
By contrast, a copy of glucose-6-phosphate dehydrogenase (step I of PPP) shows a 306
fractionation profile consistent with that of the endocytic membrane trafficking cluster, which 307
warrants future investigation (Fig. 4B,C; Suppl. Fig. 6F). 308
In summary, our localization of individual steps for glycolysis/gluconeogenesis supports the 309
hypothesis that diplonemids separated from kinetoplastids prior to the complete transfer of 310
the first seven steps of these pathways into the glycosomes (Morales, et al. 2016), leaving 311
step VI in the cytosol for diplonemids. Moreover, in these flagellates the cell membrane-312
embedded versions of GAPDH underlines compartmentalization of this enzyme distinct from 313
that of its homologs in kinetoplastids (Moloney, et al. 2023). A further distinction is 314
represented by PEP carboxykinase, which in P . papillatum remains cytosolic and lacks a PTS, 315
with its glycosomal compartmentalization evolved in the kinetoplastid clade only secondarily. 316
The presence of a PTS1 in just a single PPP enzyme likely signifies a remnant of the ancestral 317
trend in glycomonads towards compartmentalization of this pathway which, unlike in 318
kinetoplastids, did not continue to progress in diplonemids. 319
320
Versatile amino acid digestive capabilities within the mitochondrion 321
Gluconeogenesis in P . papillatum is presumably supplied by substrates from the amino acid 322
(AA) catabolism (Morales, et al. 2016), which we endeavored to resolve with our subcellular 323
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
9
dataset. Accordingly, we show this protist’s capacity to digest a broad range of AA’s, 324
primarily within the mitochondrion, reminiscent of similar capabilities demonstrated in the 325
mitochondrion of its fellow euglenozoan, Euglena gracilis (Hammond, et al. 2020) (Suppl. 326
Fig. 7). Ultimately, the mitochondrion of P . papillatum appears capable of metabolizing 327
arginine, aspartate, histidine, glutamate, glycine, isoleucine, leucine, proline, serine, threonine 328
and valine into metabolites that can directly feed into the tricarboxylic acid (TCA) cycle, as 329
well as glutamine and cysteine with initial cytosolic processing (Suppl. Fig. 7). We 330
additionally reveal the ability of diplonemids to process the fatty acid propanoate to 331
propanoyl-CoA, which allows incorporation into AA intermediate-processing pathways 332
within the mitochondrion, representing a functional distinction from kinetoplastids (Suppl. 333
Fig. 7). 334
335
Conclusions
336
Previous work has demonstrated the global distribution, relative importance, abundance, and 337
diversity of marine diplonemids (Tashyreva, et al. 2022), underscoring the value in clarifying 338
their ecological roles and biology. Only recently has P. papillatum emerged as a genetically 339
tractable species (Faktorová, Kaur, et al. 2020), opening the entire clade to inquiry via 340
cellular and molecular methods. Our subcellular proteomics dataset is complementary to 341
these efforts and provides a pathway towards hypothesis-driven research, thereby accelerating 342
our understanding of these ecologically and evolutionary important protists (Valach, Benz, et 343
al. 2023; Benz, et al. 2024; Akiyoshi, et al. 2025; Záhonová, et al. 2025). In total, our data 344
enabled us to localize thousands of proteins to 22 distinct subcellular compartments in P . 345
papillatum. The confidence of our data is strengthened by the endogenous tagging of selected 346
proteins. 347
From this wealth of data, we focused specifically on the confidently predicted cluster of cell 348
membrane proteins. In this cluster, we identified an expanded family of CAZymes, 349
supporting recent predictions that P . papillatum primarily preys on plant and algae via 350
degrading their cell walls. CAZymes were also localized to the lysosome, further suggesting 351
active ingestion of complex carbohydrates. The fact that we supplied P . papillatum with 352
protein-rich, carbohydrate-limited media represents an intriguing question for future analysis: 353
why are CAZymes expressed in the absence of carbohydrates? We speculate that they are 354
produced in anticipation of interacting with these substrates, and hope that in natura studies 355
may now be used to definitively clarify the ecological role for this and other diplonemids. 356
In conclusion, we have sub-localized thousands of proteins in a model species representing a 357
major protist group. Given the scarcity of available marine protists that are genetically 358
tractable and can be investigated with relative ease (Faktorová, Nisbet, et al. 2020), our data 359
provide a novel and rich resource to explore diplonemids’ unique cell biology and to map 360
ancestral traits in this free-living heterotrophic flagellate. 361
362
363
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
10
Materials and methods
364
Key Resource Table 365
REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
Rabbit anti-ATP Synthase-β Zíková et al.
(Šubrtová, et al.
2015)
Mouse anti-GRP 75 Enzo Cat# SPS-826D,
RRID:AB_2120451
(https://www.antibod
yregistry.org/AB_212
0451)
GRP 78 Novus Cat# NB100-56413,
RRID:AB_838320
(https://www.antibod
yregistry.org/AB_838
320)
Goat anti-Rabbit IgG (H+L) Secondary Antibody,
HRP
Invitrogen Catalog# 31460
(https://www.thermof
isher.com/antibody/pr
oduct/Goat-anti-
Rabbit-IgG-H-L-
Secondary-Antibody-
Polyclonal/31460)
Goat anti-Mouse IgG (H+L) Secondary Antibody,
HRP
Invitrogen Catalog# 31430
(https://www.thermof
isher.com/antibody/pr
oduct/Goat-anti-
Mouse-IgG-H-L-
Secondary-Antibody-
Polyclonal/31430)
Mouse anti-V5 Monoclonal Antibody (2F11F7) Invitrogen Catalog# 37-7500-
A555,
RRID:AB_2610631
(https://www.antibod
yregistry.org/AB_261
0631)
Rabbit anti-V5 Polyclonal Antibody Sigma-Aldrich Catalog# V8137,
RRID:AB_261889
(https://www.antibod
yregistry.org/AB_261
889)
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
11
Goat anti-Mouse IgG (H+L) Cross-Adsorbed
Secondary Antibody, Alexa Fluor™ 555
Invitrogen Catalog# A-21422
(https://www.thermof
isher.com/antibody/pr
oduct/Goat-anti-
Mouse-IgG-H-L-
Cross-Adsorbed-
Secondary-Antibody-
Polyclonal/A-21422)
Goat anti-Rabbit IgG (H+L) Cross-Adsorbed
Secondary Antibody, Alexa Fluor™ 488
Invitrogen Catalog# A-110087
(https://www.thermof
isher.com/antibody/pr
oduct/Goat-anti-
Rabbit-IgG-H-L-
Cross-Adsorbed-
Secondary-Antibody-
Polyclonal/A-11008)
Chemicals, peptides, and recombinant proteins
Critical commercial assays
Pierce™ Dilution-Free™ Rapid Gold BCA Protein
Assay
Thermo Scientific Catalog# A55860
Pierce™ Quantitative Peptide Assays & Standards Thermo Scientific Catalog# 23290
Deposited data
Raw peptide data PRIDE XXXX
For protein predictions and annotations see Table
S3
Experimental models: Cell lines
Paradiplonema papillatum Porter(Porter
1973)
ATCC50162
Experimental models: Organisms/strains
For cell lines generated for this study see Table S4
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
12
Oligonucleotides
For primers used in this study see Suppl. File 2.
Recombinant DNA
pBA3294 vector Akiyoshi et
al.(Akiyoshi, et
al. 2025)
pDP011 vector Genebank OQ547858
Software and algorithms
Micosoft Excel Microsoft https://www.microsof
t.com
R and Rstudio Rstudio https://posit.co/downl
oad/rstudio-desktop/
pROLOC Crook et al.
(Crook, et al.
2019)
Fiji (Image J) Fiji https://fiji.sc/
Signal P 6.0 Teufel et al.
(Teufel, et al.
2022)
https://services.health
tech.dtu.dk/services/S
ignalP-6.0/
Target P 2.0 Armenteros et al.
(Armenteros, et
al. 2019)
https://services.health
tech.dtu.dk/services/T
argetP-2.0/
DeepTMHMM Hallgren et al.
(Hallgren, et al.
2022)
https://services.health
tech.dtu.dk/services/
DeepTMHMM-1.0/
DeepLOC 2.1 Odum et al.
(Odum, et al.
2024)
https://services.health
tech.dtu.dk/services/
DeepLoc-2.1/
NetGPI 1.1 Gíslason et al.
(Gíslason, et al.
2021)
https://services.health
tech.dtu.dk/services/
NetGPI-1.1/
Ghost KOALA Kaneisha et al.
(Kanehisa, et al.
2016)
https://www.kegg.jp/g
hostkoala/
366
Resource availability 367
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
13
Further information and requests for reagents should be directed to and will be fulfilled by 368
the lead contact, Michael Hammond (
[email protected]). 369
Materials
availability 370
Vectors and novel cell lines generated for this study are available from lead contact upon 371
request. 372
Experimental model and study participant details 373
P . papillatum (ATCC50162) served as the cell line for both proteomic analysis and cell line 374
generation. 375
Strain, culture conditions and preparation for lysis 376
Paradiplonema papillatum (ATCC50162) cells were cultivated axenically in Diplo media (36 377
g/L sea salts [Sigma], with 1g Tryptone and 10 ml Fetal Bovine Serum [Sigma], 0.22 m 378
filter sterilized) at 22°C. Cell cultures were harvested in a combined volume of 750 ml per 379
sample, harvested at ~2.5 x106 cells/ml, processed in triplicates. Cell cultures were 380
concentrated by centrifugation (900xg for 10 min) and pellets were resuspended in 6 ml 381
detergent free lysis buffer (0.25 M sucrose, 10 mM HEPES, pH 7.4, 2 mM EDTA, 2 mM 382
Mg(OAc) with HaltTM Protease and Phosphatase Inhibitor Cocktail, pre-chilled to 4°C. 383
Cell lysis and fractionation 384
Cell suspension underwent lysis via nitrogen cavitation at 250 psi for 10 min (Parr 4639, Parr 385
Instrument Co.). Cell lysate was gently released from the chamber to minimize foaming, with 386
collected sample undergoing differential centrifugation following a previously established 387
protocol (Geladaki, et al. 2019). Briefly, cell lysate underwent centrifugation at speeds (Table 388
1), and the supernatant was transferred to fresh 2 ml centrifuge tubes and subjected to 389
subsequent centrifugation steps, with pellets from previous spins stored at -80°C after 390
collection, in addition to the supernatant fraction from final spin. Pelleted cell lysate was 391
additionally collected and stored for proteomic analysis. 392
Table 1: Fractional protocol for centrifugation as used in this study, adapted from LOPIT-DC 393
protocol (Geladaki, et al. 2019). 394
Fraction Centrifuge speed (x g) Spin time (min)
Cell Lysate 200 5
1 1,000 10
2 3,000 10
3 5,000 10
4 9,000 15
5 12,000 15
6 15,000 15
7 30,000 20
8 79,000 43
9 120,000 45
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
14
Supernatant NA NA
395
Fractional assessment 396
To assess the distribution and enrichment of proteins across P . papillatum fractions, 397
immunoblotting was performed using antibodies against ATP synthase subunit (kindly 398
provided by A. Zíková) (Šubrtová, et al. 2015), Grp75 (Enzo) (Joseph, et al. 2013)and Grp78 399
(Novus) (Chou, et al. 2020). Pellets were resuspended in 2x Laemmli buffer (0.125 M Tris-400
HCl, pH 6.8, 4% SDS, 20% glycerol, 0.004% bromophenol blue) without DTT. Fractional 401
samples were quantified using the Pierce™ BCA Protein Assay (Thermo Fisher Sci.). 402
10 µg of protein was loaded onto an SDS-PAGE gel (Invitrogen Bolt Bis-Tris Plus Mini 403
Protein Gels, 4-12%, 1.0 mm, WedgeWell™ format) along with a protein marker 404
(Amersham™ ECL™ Rainbow™ Marker - Full Range). The gel was run for 1 hour at 130 V , 405
briefly washed in 1x PBS buffer, and transferred onto a methanol-activated PVDF membrane 406
(iBlot™ 2 Transfer Stacks, PVDF, Invitrogen) using the iBlot 2 Western Blot Transfer Device 407
(Invitrogen). The membrane was blocked in 5% non-fat dry milk and 1x PBS buffer for 1 408
hour at room temperature, followed by incubation with relevant antibodies diluted (1:10000 409
for ATP-β, and 1:1000 for Grp75 and Grp78) in blocking solution (5% non-fat dry milk in 1x 410
PBS buffer). Blots were incubated at room temperature for 1 hour, then overnight at 4°C. The 411
following day blots were washed three times for 10 min each in 1x PBS, probed with HRP-412
linked secondary antibodies (31460/31430, Invitrogen) diluted 1:1000 in blocking solution 413
for 1 hour at room temperature, and rinsed again three times for 10 min each in 1x PBS-T. 414
Detection was performed using the Pierce ECL Western Blotting Substrate (Thermo Fisher 415
Sci.), and imaging was conducted with the Azure 600 (Biosystems). 416
Sample preparation and LC-MS Analysis 417
Native protein pellets obtained from differential centrifugation were digested and desalted 418
following the protocol for the S-Trap Micro Column (ProtiFi, USA). Protein concentration 419
was quantified using the BCA assay (Thermo Fisher Sci.), while peptide concentration was 420
measured using a fluorometric kit (Thermo Fisher Sci.). 421
Liquid-chromatography tandem mass spectrometry 422
LC-MS/MS analyses were performed at the Biosciences Mass Spectrometry Core Facility, 423
Arizona State University. Data-dependent mass spectra were collected in positive mode using 424
an Orbitrap Fusion Lumos mass spectrometer coupled with an UltiMate 3000 UHPLC 425
(Thermo Fisher Sci.). Peptides were fractionated on an Easy-Spray LC column (50 cm × 75 426
μm ID, PepMap C18, 2 μm, 100 Å) with an upstream trap column. Each sample was 427
analyzed in technical triplicate. LC-MS settings: electrospray potential 1.6 kV , ion transfer 428
tube temperature 300°C, and the “Universal” peptide analysis method. Full MS scans (375–429
1500 m/z) were acquired at a resolution of 120,000 with three sec cycles. The RF lens was set 430
to 30%, AGC to “Standard,” and monoisotopic peak determination included charge states 2–431
7. Dynamic exclusion was 60 sec with a 10 ppm mass tolerance. MS/MS spectra were 432
acquired in centroid mode with a quadrupole isolation window of 1.6 m/z and CID energy of 433
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
15
35%. Peptides were eluted over a 240-min gradient at 0.25 µL/min using 2–80% 434
acetonitrile/water: 0–3 min (2%), 3–75 min (2–15%), 75–180 min (15–30%), 180–220 min 435
(30–35%), 220–225 min (35–80%), 225–240 min (80–85%). 436
LC-MS/MS analysis of the digested peptides was performed on an EASY-nLC 1200 (Thermo 437
Fisher Sci.) coupled to an Orbitrap Eclipse Tribrid mass spectrometer (Thermo Fisher Sci.). 438
Peptides were separated on an Aurora UHPLC column (25 cm × 75 µm, 1.6 µm C18, AUR2-439
25075C18A, Ion Opticks) with a flow rate of 0.35 µL/min for a total duration of 135 min 440
ionized at 1.6 kV in the positive ion mode. The gradient was composed of 2% solvent B (5 441
min), 2-6% B (7.5 min), 6-25% B (82.5 min), 25–40% B (30 min), 40-98% B (1min) and 442
98% B (15min); solvent A: 2% ACN and 0.2% FA in water; solvent B: 80% ACN and 0.2% 443
FA. MS1 scans were acquired at the resolution of 120,000 from 350 to 1,600 m/z, AGC target 444
1e6, and maximum injection time 50 ms. MS2 scans were acquired in the ion trap using fast 445
scan rate on precursors with 2-7 charge states and quadrupole isolation mode (isolation 446
window: 0.7 m/z) with higher-energy collisional dissociation (HCD, 30%) activation type. 447
Dynamic exclusion was set to 30 s. The temperature of ion transfer tube was 300°C and the 448
S-lens RF level was set to 30. 449
Raw data processing and quantification 450
The LFQ analysis was performed using Proteome Discoverer 2.4 (Thermo Fisher Sci.) based 451
on the composite database: P . papillatum’ s predicted proteome, and mitochondrial ORFs, 452
Raw files were searched with SequestHT using Trypsin as the enzyme, allowing up to three 453
missed cleavages. Peptide length was set to 6–144 amino acids, with precursor ion mass 454
tolerance at 20 ppm, fragment mass tolerance at 0.5 Da, and a minimum of one peptide 455
identified. Carbamidomethyl (C) was a fixed modification, while Acetyl (N-terminus), Met-456
loss (N-terminus), and oxidation of Met were dynamic modifications. A target/decoy strategy 457
and 1.0% FDR were calculated using Percolator. Data were imported into Proteome 458
Discoverer 2.4, and features were detected using the Minora Feature Detector algorithm. The 459
area-under-the-curve for aligned ion chromatograms was calculated to determine relative 460
abundances. The RAW data have been deposited to the ProteomeXchange Consortium via the 461
PRIDE partner repository with the dataset identifier XXXXXX. 462
Proteins and their corresponding LFQ abundance values were imported into the R 463
programming language and converted into MSnset object using the Bioconductor packages 464
MSnbase (v 2.24.2) and pRoloc (v 1.38.2) (Crook, et al. 2019). The data was examined and 465
proteins with low confidence (PSM < 3 and without unique peptides) were filtered out. 466
Triplicates were averaged to generate a 33rd dimensional dataset of relative protein 467
abundance. The datasets were split into their respective experiments (i.e., 1-11, 12-22, 23-33) 468
to perform hybrid imputation and sum-normalization across rows. 469
Missing data were imputed first by nearest-neighbor averaging and then imputing zeros for 470
all remaining empty cells. Principal component analysis and t-distributed Stochastic 471
Neighbor Embedding (t-SNE) were applied for dimensional reduction and data visualization. 472
Supervised and unsupervised classification 473
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
16
268 manually curated marker proteins (Table S1) were used as the training set for a support 474
vector machine (SVM) model with the ‘svmOptimization’ and ‘svmClassification’ functions 475
in pRoloc package. Initially, 100 rounds of five-fold cross-validation were performed to 476
optimize the SVM parameters based on the marker protein abundance profiles. The optimal 477
parameters for the SVM classifier were then applied to all proteins in the dataset with a 478
corresponding SVM score whose range is 0-1 with 1 being the score of marker proteins. The 479
SVM classifier was then applied to unlabeled data (i.e., non-marker proteins) with 480
corresponding weights applied to each marker class. Each protein was thus classified to one 481
compartment, and any protein whose classification fell below the global median SVM score 482
was reset to ‘unknown’ while the other half of the dataset was considered “predicted” to its 483
corresponding compartment due to their higher SVM scores (Table S3). 484
Unsupervised clustering was performed using the K-means (KM) algorithm implemented in 485
the MLearn function from the MLInterfaces package in Rstudio (version 1.78.0). KM 486
generates k random centroids and includes surrounding data points iteratively such that all 487
data points are included in one of the k clusters and the size of each centroid is minimized. K-488
means clusters were generated with 22 clusters corresponding to number Rof marker groups 489
(Table S3). 490
Targeting signal prediction, annotation, and conditional enrichment analysis 491
P . papillatum protein database was annotated via blast search against CDS of parasitic 492
kinetoplastid Trypanosoma brucei 927 (v66) and free-living Bodo saltans (v66) 493
(https://tritrypdb.org/tritrypdb/app) as well as baker’s yeast Saccharomyces cerevisiae 494
(559292) (https://blast.ncbi.nlm.nih.gov/Blast.cgi), with a threshold of E-5. Metabolic 495
pathway analysis was also performed via GhostKoala (Kanehisa, et al. 2016). 496
Signal P version 6.0 was used for the prediction of signal peptides, using a confidence 497
threshold of >0.9 (Fig. 1C) (Teufel, et al. 2022), with NetGPI 1.1 additionally used on this 498
subset to determine proteins that additionally possessed predicted C-terminal GPI anchors 499
(Gíslason, et al. 2021) (Table S3). Target P 2.0 was used for prediction of mitochondrial 500
target peptides (Armenteros, et al. 2019), with DeepTMHMM (Hallgren, et al. 2022) used for 501
predictions of TMD (Fig.1C) (Table S3). Peroxisomal target signal prediction was conducted 502
using a custom regex script designed by Prof. Fred Oppoerdoes against a broad range of AA 503
combinations with PTS1 determined by the script: [SAGCNP][RHKSNQ][LIVFAMY]$, and 504
PTS2 via ^M.[1,10],[RK][LVI].....[HQ][ILA] (Table S3), which were then manually 505
inspected for specific enzymes of relevance (Table S5-7). DeepLoc2.1 was additionally used 506
to assess protein localization predictions and membranous status (Odum, et al. 2024) (Table 507
S3). 508
Protein enrichment data for media and conditional cultivation (Škodová-Sveráková, et al. 509
2021) was displayed across dataset, including proteins that displayed enrichment status of 510
any capacity (Fig. 1D) (Table S3). 511
Endogenous tagging and P . papillatum microscopy 512
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
17
Endogenous C-terminal tagging of cell lines corresponding to 12 proteins within supervised 513
protein clusters were generated to verify predictions (Table S4). 514
Proteins DIPPA_11651.mRNA.1, DIPPA_15120.mRNA.1, DIPPA_04811.mRNA.1, 515
DIPPA_32825.mRNA.1, DIPPA_00315.mRNA.1 underwent tagging via yellow fluorescent 516
protein, using vector pBA3294 (Akiyoshi, et al. 2025). PacI and AscI restriction sites of 517
pBA3294 were used to insert two ~2 kb homology arms that were amplified from genomic 518
DNA by PCR using KOD one polymerase (Merck). Primer sequences are provided in Suppl. 519
File 2. The first fragment corresponds to downstream of the gene ORF (starting just after its 520
stop codon) surrounded with PacI and NotI restriction sites, while the second fragment 521
corresponds to the 2 kb DNA fragment starting from 2kb upstream of the stop codon and 522
ending just before the stop codon surrounded with NotI and AscI. After cutting the fragments 523
with respective restriction enzymes, the two DNA fragments were ligated into pBA3294 that 524
were cut with PacI and AscI. Plasmids were validated by nanopore whole plasmid sequencing 525
(Plasmidsaurus). Tagging constructs were linearized by NotI, transfected into P . papillatum 526
cells by electroporation, and selected by the addition of 75 µg/mL G418. 527
Cells were pelleted by centrifugation at 1300 x g for 5 min and fixed by 4% formaldehyde 528
solution diluted in PBS for 5 min. Cells were washed with 1 mL PBS twice, resuspended in a 529
small volume of DABCO mounting media (1% w/v 1,4-diazabicyclo[2.2.2]octane, 90% 530
glycerol, 50 mM sodium phosphate pH 8.0) with 100 ng/mL DAPI, and mounted onto glass 531
slides. Images were captured on an Axioimager.Z2 microscope (Zeiss) installed with ZEN 532
using a Hamamatsu ORCA-Flash4.0 camera with 63x objective lenses (1.40 NA). Typically, 533
25 z sections spaced 0.24 μm apart were collected. 534
Proteins DIPPA_07493.mRNA.1, DIPPA_20982.mRNA.1, DIPPA_24150.mRNA.1, 535
DIPPA_16310.mRNA.1, DIPPA_24837.mRNA.1, DIPPA_21158.mRNA.1, 536
DIPPA_16504.mRNA.1 underwent tagging via 3xV5 epitope, using vector pDP011 537
(GeneBank OQ547858) (Faktorová, et al. 2023) (Table S4). A fusion PCR strategy using 538
Q5 High-Fidelity DNA Polymerase (NEB Biolabs, M0491S) was used to design and obtain 539
the above DNA constructs, as described previously (Kaur, et al. 2018). Used primers and 540
product sizes are listed in Suppl. File 2. 1-5 µg of gel-purified and ethanol-precipitated DNA 541
constructs were electroporated into 5 x 107 cells/ml P . papillatum cells as described elsewhere 542
(Kaur, et al. 2018; Faktorová, Kaur, et al. 2020). 24 h after electroporation, transfected cells 543
underwent selection in a 24-well plate at 27°C, under increasing concentrations of 544
hygromycin (100-225 µg/mL). After 3 weeks, transfectants were selected and expanded into 545
a volume of 10 ml before downstream analyses. 546
To address subcellular localization of the tagged proteins, an immunofluorescence assay was 547
performed as described previously (Faktorová, et al. 2023). Briefly, 20 to 30 ml of a log 548
phase culture was harvested by centrifugation at 1,700 x g for 10 min, resuspended in 500 μl 549
of 4% paraformaldehyde (dissolved in sea water), and fixed for 15 min on Superfrost plus 550
slides (Thermo Fisher Sci.) at room temperature. After removing the fixative with 1x PBS, 551
cells were permeabilized in ice-cold methanol for 10 min and rinsed with 1x PBS. From this 552
point on, the slides were kept in a humid chamber. Next, the slides were blocked in 5.5% 553
(w/v) fetal bovine serum in PBS-T for 45 min at room temperature, and the blocking solution 554
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
18
was removed by washing the cells two times with 1x PBS. The slides were incubated with 555
either mouse anti-V5 or rabbit anti-V5 primary antibody diluted (1:500; Thermo Fisher Sci.) 556
in 3% (w/v) bovine serum albumin (Sigma), at 4°C overnight, covered with parafilm. 557
Afterwards, the primary antibody was removed by washing the slides three times with PBS-T 558
and twice with 1x PBS. AlexaFluor555-labelled goat anti-mouse (1:1000; Invitrogen) or 559
AlexaFluor488-labelled goat anti-rabbit (1:1000; Invitrogen) secondary antibody was added 560
and incubated at room temperature for 1 hour in the dark, covered with parafilm. After that, 561
the slides were rinsed three times with PBS-T and twice with 1x PBS. All slides were coated 562
with ProLong Gold Antifade Mountant with DNA Stain DAPI (Life Technol.) and mounted. 563
Samples were imaged with an Olympus BX63 automated fluorescence microscope equipped 564
with an Olympus DP74 digital camera. Pictures were acquired with the cellSens Dimension 565
software (Olympus) and processed through the ImageJ software. 566
567
Acknowledgements
568
We thank A. Zíková (Biology Centre) for the anti-ATP synthase subunit antibodies. This 569
work is supported by the National Science Foundation BII: Mechanisms of Cellular 570
Evolution DBI-2119963 (to J.W.), the Czech Grant Agency grants 23-06479X and 25-15298S 571
(to J.L.) and a Wellcome Discovery Award 227243/Z/23/Z (to B.A.). 572
573
Author contributions 574
Conceptualization, M.H, J.L and J.G.W.; Methodology, M.H, D.F, B.A, and J.G.W.; 575
Software, M.H, Y.P and T.L.; Validation, M.H.; Formal Analysis, M.H.; Investigation, M.H, 576
O.I, D.F, M.S and B.A.; Data Curation, M.H and Y.P.; Writing – Original Draft, M.H, O.I, 577
D.F, M.S, B.A, J.L, J.G.W.; Writing – Review & Editing, M.H, J.L and J.G.W.; 578
Visualization, M.H.; Supervision, M.H, D.F, J.L and J.G.W.; Project Administration, M.H 579
and J.G.W.; Funding Acquisition, M.H, B.A, J.L and J.G.W. 580
581
Declaration of interests 582
The authors declare no competing interests. 583
584
Reference
list 585
Akiyoshi B, Faktorová D, Lukeš J. 2025. Discovery of unique mitotic mechanisms in 586
Paradiplonema papillatum. bioRxiv:2025.2003.2021.644664. 587
Armenteros J, Salvatore M, Emanuelsson O, Winther O, von Heijne G, Elofsson A, Nielsen 588
H. 2019. Detecting sequence signals in targeting peptides using deep learning. Life Sci 589
Alliance 2. 590
Benz C, Raas MWD, Tripathi P, Faktorová D, Tromer EC, Akiyoshi B, Lukeš J. 2024. On the 591
possibility of yet a third kinetochore system in the protist phylum Euglenozoa. mBio 592
15:e02936-02924. 593
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
19
Billington K, Halliday C, Madden R, Dyer P, Barker A, Moreira-Leite F, Carrington M, 594
Vaughan S, Hertz-Fowler C, Dean S, et al. 2023. Genome-wide subcellular protein map for 595
the flagellate parasite Trypanosoma brucei. Nature Microbiology 8:533-547. 596
Breckels L, Holden S, Wojnar D, Mulvey C, Christoforou A, Groen A, Trotter M, Kohlbacher 597
O, Lilley K, Gatto L. 2016. Learning from heterogeneous data sources: An application in 598
spatial proteomics. PLos Comput Biol 12. 599
Bártulos C, Rogers M, Williams T, Gentekaki E, Brinkmann H, Cerff R, Liaud M, Hehl A, 600
Yarlett N, Gruber A, et al. 2018. Mitochondrial glycolysis in a major lineage of Eukaryotes. 601
Genome Biol and Evol 10:2310-2325. 602
Chou C, Yang R, Chan L, Li C, Sun L, Lee H, Lee P, Sher Y , Ying H, Hung M. 2020. The 603
stabilization of PD-L1 by the endoplasmic reticulum stress protein GRP78 in triple-negative 604
breast cancer. Am J Cancer Res 10:2621-2634. 605
Crook OM, Breckels LM, Lilley KS, Kirk PDW, Gatto L. 2019. A Bioconductor workflow 606
for the Bayesian analysis of spatial proteomics. F1000Res 8:446. 607
Faktorová D, Kaur B, Valach M, Graf L, Benz C, Burger G, Lukeš J. 2020. Targeted 608
integration by homologous recombination enables in situ tagging and replacement of genes in 609
the marine microeukaryote Diplonema papillatum. Environ Microbiol 22:3660-3670. 610
Faktorová D, Nisbet R, Robledo J, Casacuberta E, Sudek L, Allen A, Ares M, Aresté C, 611
Balestreri C, Barbrook A, et al. 2020. Genetic tool development in marine protists: emerging 612
model organisms for experimental cell biology. Nature Methods 17:481-494. 613
Faktorová D, Záhonová K, Benz C, Dacks J, Field M, Lukeš J. 2023. Functional 614
differentiation of Sec13 paralogues in the euglenozoan protists. Open Biol 13:220364. 615
Flegontova O, Flegontov P, Malviya S, Audic S, Wincker P, de Vargas C, Bowler C, Lukeš J, 616
Horák A. 2016. Extreme diversity of diplonemid eukaryotes in the ocean. Curr Biol 26:3060-617
3065. 618
Freitag J, Stehlik T, Stiebler AC, Bölker M. 2018. The obvious and the hidden: Prediction and 619
function of fungal peroxisomal matrix proteins. Subcell Biochem 89:139-155. 620
Gawryluk RMR, Del Campo J, Okamoto N, Strassert JFH, Lukeš J, Richards TA, Worden 621
AZ, Santoro AE, Keeling PJ. 2016. Morphological identification and single-cell genomics of 622
marine diplonemids. Curr Biol 26:3053-3059. 623
Geladaki A, Britovšek N, Breckels L, Smith T, Vennard O, Mulvey C, Crook O, Gatto L, 624
Lilley K. 2019. Combining LOPIT with differential ultracentrifugation for high-resolution 625
spatial proteomics. Nat Commun 10. 626
George EE, Tashyreva D, Kwong WK, Okamoto N, Horák A, Husnik F, Lukeš J, Keeling PJ. 627
2022. Gene transfer agents in bacterial endosymbionts of microbial eukaryotes. Genome Biol 628
Evol 14,7. 629
Gíslason M, Nielsen H, Armenteros J, Johansen A. 2021. Prediction of GPI-anchored proteins 630
with pointer neural networks. Curr Res in Biotech 3:6-13. 631
Güther M, Urbaniak M, Tavendale A, Prescott A, Ferguson M. 2014. High-confidence 632
glycosome proteome for procyclic form Trypanosoma brucei by epitope-tag organelle 633
enrichment and SILAC proteomics. Journal of Proteome Res 13:2796-2806. 634
Hallgren J, Tsirigos K, Pederson M, Armenteros J, Marcatili P, Nielsen H, Krogh A, Winther 635
O. 2022. DeepTMHMM predicts alpha and beta transmembrane proteins using deep neural 636
networks. In. 637
Hammond M, Nenarokova A, Butenko A, Zoltner M, Dobáková E, Field M, Lukeš J. 2020. A 638
uniquely complex mitochondrial proteome from Euglena gracilis. Mol Biol and Evol 639
37:2173-2191. 640
Jirsová D, Licknack TJ, Poh Y-P, Qiu Y , Quan N, Chou T-F, Karr T, Lynch M, Wideman JG. 641
2025. Subcellular proteomics of Paramecium tetraurelia reveals mosaic localization of 642
glycolysis and gluconeogenesis. bioRxiv:2025.2004.2024.650466. 643
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
20
Joseph A, Adhihetty P, Wawrzyniak N, Wohlgemuth S, Picca A, Kujoth G, Prolla T, 644
Leeuwenburgh C. 2013. Dysregulation of mitochondrial quality control processes contribute 645
to sarcopenia in a mouse model of premature aging. PLos One 8. 646
Kanehisa M, Sato Y , Morishima K. 2016. BlastKOALA and GhostKOALA: KEGG Tools for 647
functional characterization of genome and metagenome sequences. J Mol Biol 428:726-731. 648
Kaur B, Valach M, Peña-Diaz P, Moreira S, Keeling P, Burger G, Lukeš J, Faktorová D. 2018. 649
Transformation of Diplonema papillatum, the type species of the highly diverse and abundant 650
marine microeukaryotes Diplonemida (Euglenozoa). Environ Microbiol 20:1030-1040. 651
Kovárová J, Barrett M. 2016. The pentose phosphate pathway in parasitic trypanosomatids. 652
Trends Parasitol 32:622-634. 653
Lax G, Okamoto N, Keeling PJ. 2024. Phylogenomic position of eupelagonemids, abundant, 654
and diverse deep-ocean heterotrophs. ISME J 18. 655
Makiuchi T, Annoura T, Hashimoto M, Hashimoto T, Aoki T, Nara T. 2011. 656
Compartmentalization of a glycolytic enzyme in Diplonema, a non-kinetoplastid 657
Euglenozoan. Protist 162:482-489. 658
Michels P, Bringaud F, Herman M, Hannaert V . 2006. Metabolic functions of glycosomes in 659
trypanosomatids. Biochimi Biophys Acta-Mol Cell Res 1763:1463-1477. 660
Moloney N, Barylyuk K, Tromer E, Crook O, Breckels L, Lilley K, Waller R, MacGregor P. 661
2023. Mapping diversity in African trypanosomes using high resolution spatial proteomics. 662
Nat Commun 14. 663
Morales J, Hashimoto M, Williams T, Hirawake-Mogi H, Makiuchi T, Tsubouchi A, Kaga N, 664
Taka H, Fujimura T, Koike M, et al. 2016. Differential remodelling of peroxisome function 665
underpins the environmental and metabolic adaptability of diplonemids and kinetoplastids. 666
Proc R Soc B-Biolog Sci 283. 667
Mukherjee I, Salcher MM, Andrei A, Kavagutti VS, Shabarova T, Grujčić V , Haber M, 668
Layoun P, Hodoki Y , Nakano SI, et al. 2020. A freshwater radiation of diplonemids. Environ 669
Microbiol 22:4658-4668. 670
Newell S. 1981. Fungi and bacteria in or on leaves of Eelgrass (Zostera marina L.) from 671
Chesapeake Bay. Appl Environ Microbiol 41:1219-1224. 672
Obiol A, Giner CR, Sánchez P, Duarte CM, Acinas SG, Massana R. 2020. A metagenomic 673
assessment of microbial eukaryotic diversity in the global ocean. Mol Ecol Resour 20. 674
Odum M, Teufel F, Thumuluri V , Armenteros J, Johansen A, Winther O, Nielsen H. 2024. 675
DeepLoc 2.1: multi-label membrane protein type prediction using protein language models. 676
Nucleic Acids Res 52:W215-W220. 677
Opperdoes FR, Michels PA. 1993. The glycosomes of the Kinetoplastida. Biochimie 75:231-678
234. 679
Orsburn B. 2021. Proteome discoverer-A community enhanced data processing suite for 680
protein informatics. Proteomes 9. 681
Porter D. 1973. Isonema papillatum sp. n., a new colorless marine flagellate: A light- and 682
electronmicroscopic study. J Protozool 20:351-356. 683
Prokopchuk G, Korytár T, Juricová V , Majstorovic J, Horák A, Šimek K, Lukeš J. 2022. 684
Trophic flexibility of marine diplonemids-switching from osmotrophy to bacterivory. ISME J 685
16:1409-1419. 686
Richards T, Eme L, Archibald J, Leonard G, Coelho S, de Mendoza A, Dessimoz C, Dolezal 687
P, Fritz-Laylin L, Gabaldon T, et al. 2024. Reconstructing the last common ancestor of all 688
eukaryotes. Plos Biol 22. 689
Tashyreva D, Faktorová D, Horák A, Lukeš J, Archibald J, Oatley G, Sinclair E, Santos C, 690
Paulini M, Aunin E, et al. 2025. The genome sequences of the diplonemid protist Rhynchopus 691
euleeides YPF1915 and its bacterial endosymbiont Candidatus Syngnamydia salmonis 692
(Chlamydiota). Wellcome Open Res 10. 693
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
21
Tashyreva D, Faktorová D, Stříbrná E, Horák A, Lukeš J, Archibald JM, Oatley G, Sinclair E, 694
Aunin E, Gettle N, et al. 2025. The genome sequences of the diplonemid protist Diplonema 695
japonicum YFP1604 and its bacterial endosymbiont Ca. Cytomitobacter primus and Ca. 696
Nesciobacter abundans. 10. 697
Tashyreva D, Simpson A, Prokopchuk G, Škodová-Sveráková I, Butenko A, Hammond M, 698
George E, Flegontova O, Záhonová K, Faktorová D, et al. 2022. Diplonemids-a review on 699
"new" flagellates on the oceanic block. Protist 173:125868. 700
Tashyreva D, Týč J, Horák A, Lukeš J. 2023. Ultrastructure and 3D reconstruction of a 701
diplonemid protist (Diplonemea) and its novel membranous organelle. mBio 14:e01921-702
01923. 703
Tashyreva D, V otýpka J, Yabuki A, Horák A, Lukeš J. 2025. Description of new diplonemids 704
(Diplonemea, Euglenozoa) and their endosymbionts: Charting the morphological diversity of 705
these poorly known heterotrophic flagellates. Protist 177. 706
Teufel F, Armenteros J, Johansen A, Gíslason M, Pihl S, Tsirigos K, Winther O, Brunak S, 707
von Heijne G, Nielsen H. 2022. SignalP 6.0 predicts all five types of signal peptides using 708
protein language models. Nat Biotechnol 40:1023-1025. 709
Valach M, Benz C, Aguilar L, Gahura O, Faktorová D, Zíková A, Oeffinger M, Burger G, 710
Gray M, Lukeš J. 2023. Miniature RNAs are embedded in an exceptionally protein-rich 711
mitoribosome via an elaborate assembly pathway. Nucleic Acids Res 51:6443-6460. 712
Valach M, Léveillé-Kunst A, Gray MW, Burger G. 2018. Respiratory chain Complex I of 713
unparalleled divergence in diplonemids. J Biol Chem 293:16043-16056. 714
Valach M, Moreira S, Petitjean C, Benz C, Butenko A, Flegontova O, Nenarokova A, 715
Prokopchuk G, Batstone T, Lapébie P, et al. 2023. Recent expansion of metabolic versatility 716
in Diplonema papillatum, the model species of a highly speciose group of marine eukaryotes. 717
BMC Biol 21. 718
Záhonová K, Lukeš J, Dacks JB. 2025. Diplonemid protists possess exotic endomembrane 719
machinery, impacting models of membrane trafficking in modern and ancient eukaryotes. 720
Curr Biol 35:1508-1520.e1502. 721
Škodová-Sveráková I, Záhonová K, Juricová V , Danchenko M, Moos M, Baráth P, 722
Prokopchuk G, Butenko A, Lukáčová V , Kohútová L, et al. 2021. Highly flexible metabolism 723
of the marine euglenozoan protist Diplonema papillatum. BMC Biol 19:251. 724
Šubrtová K, Panicucci B, Zíková A. 2015. ATPaseTb2, a unique membrane-bound FoF1-725
ATPase component, is essential in bloodstream and dyskinetoplastic trypanosomes. PLoS 726
Pathog 11:e1004660. 727
728
729
730
731
732
733
734
735
736
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
22
Figure Legends 737
738
Fig. 1: Clustered protein predictions of Paradiplonema papillatum align with predicted 739
protein features and clarify conditional enrichment trends. A Neighbor-average imputed 740
t-SNE of dataset displaying clustered predictions displayed for 2,797 proteins across 22 cell 741
compartments. Predictions were generated via support vector modelling conducted on 742
fractional profiles of marker proteins, applied to the remaining dataset. B Selected fractional 743
abundances of marker proteins across one replicate of this experiment, representing distinct 744
profiles that facilitate predictive clustering (SUP, Supernatant). C Software prediction for 745
protein features of signal peptides, transmembrane domains and mitochondrial target peptides 746
across dataset, demonstrating accumulation across certain defined compartments. D Proteins 747
determined to be enriched in varying nutrient media (Diplo or Hemi) or cultivation conditions 748
(aerobic or anaerobic) from a conditional study of P . papillatum (Škodová-Sveráková, et al. 749
2021). Additional information for all proteins available in Table S1 and S3. 750
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
23
751
752
Fig. 2: Endogenous tagging of novel proteins confirms supervised cluster predictions. 753
Tagged proteins highlighted (black) among relevant predicted clusters, resolved on neighbor-754
averaged imputed t-SNE. Individual cell lines were generated via endogenous tagging and 755
imaged through fluorescence microscopy for comparison with the compartment relevant 756
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
24
protein was predicted to. Merged microscopy images showing protein signal (green) merged 757
with nuclear and mitochondrial DNA (blue). All imaged cells are oriented with their apical 758
regions facing right and posterior facing left, cell membrane outlines are traced for all images 759
except for L, showing only trace of the papilla, which lacks signal. Scale bar represents 5m. 760
Proteins A, E, G and J are resolved in zero and neighbor-averaged imputed t-SNE in Suppl. 761
Fig. 3, for which separate channels of each cell line are also shown. Further information on 762
cell lines is available in Table S4. 763
764
Fig. 3: Secreted Carbohydrate Active Enzymes (CAZymes) primarily localized on cell 765
membrane and lysosomes. Distribution of signal peptide-enriched CAZymes, which are 766
predicted with high confidence on neighbor-average imputed t-SNE, corresponding to 767
highlighted cluster predictions of the cell membrane (A), endocytic membrane trafficking 768
(B), lysosome (C) and cytosol (D). Proteins of cell membrane (A) have schematic 769
representations showing software predictions for signal peptides, transmembrane domains 770
(TMD) and/or GPI attachment sites, which demonstrate extracellular exposure of CAZyme 771
domains in accordance with conventional membrane topology. Bordered outlines indicate 772
separate enzymatic reactions for CAZymes, with carbohydrate substrates and products in 773
black. Further information on CAZymes of P . papillatum is available in Table S5. 774
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
25
775
Fig. 4: Metabolic reconstruction of glycolysis/gluconeogenesis and pentose phosphate 776
pathway demonstrates altered glucose metabolism in P . papillatum. Localization of 777
relevant enzymes across glycolysis/gluconeogenesis (A) and pentose phosphate pathway (B), 778
resolved on neighbor-average imputed t-SNE (C) with relevant localization clusters 779
highlighted. Peroxisomal target sequences (PTS), mitochondrial target peptides (mTP) and 780
transmembrane domains (TMDs) are indicated. Proteins previously localized via anti-sera 781
immunolocalizations indicated with *, metabolite shunts between two pathways indicated 782
with dotted arrows. Split coloring of proteins represents their manual designations to the 783
cytosol (24,25,38) or indicates the possibility of glycosomal dual localizations between the 784
cytosol and glycosomes (1,2,5,9,12,20), based on inspection of fractionation profiles (Suppl. 785
Fig. 6) and targeting signals. Protein numbers highlighted in white represent those only 786
resolved on zero and neighbor-average imputed t-SNE (Suppl. Figure 5). Further information 787
is available in Table S6. 788
789
Supplementary Figures, Files and Tables 790
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
26
791
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
27
Suppl. Fig. 1: Immunoblot analysis used to resolve fractional distribution across 792
triplicate samples. 10g of protein has been loaded for each fraction generated via 793
differential centrifugation in addition to the initial cell lysate (CL). ATP synthase-β antibody 794
used at 1:10,000 ratio (A), Grp75 antibody used in 1:1,000 (B) and Grp78 (C) which displays 795
non-specific signal. Unlysed cells (UC), Supernatant (S). Marker band molecular weights 796
(kDa) indicated in dark grey on the leftmost lane of blots. 797
798
Suppl. Fig. 2: Neighbor-averaged and zero imputed t-SNE of clustered protein 799
predictions, protein features and conditional enrichment of dataset. A Full dataset 800
displaying clustered predictions displayed for 4,780 proteins across 22 cell compartments. 801
Predictions were generated via support vector modelling conducted on fractional profiles of 802
marker proteins, applied to the remaining dataset. B Selected fractional abundances of marker 803
proteins across one replicate of this experiment, representing distinct profiles that facilitate 804
predictive clustering (SUP, Supernatant). C Software prediction for protein features of signal 805
peptides, transmembrane domains and mitochondrial target peptides across dataset, 806
demonstrating accumulation across certain defined compartments. D Proteins determined to 807
be enriched in varying nutrient media (Diplo or Hemi) or cultivation conditions (aerobic or 808
anaerobic) from a conditional study of P . papillatum (Škodová-Sveráková, et al. 2021). 809
Additional information for all proteins available in Table S1 and S3. 810
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
28
811
812
Suppl. Fig. 3: Neighbor-averaged and zero imputed t-SNE of endogenous tagged cell 813
lines. Tagged proteins highlighted (black) among relevant predicted clusters, resolved on 814
neighbor-averaged and zero imputed t-SNE. Individual cell lines were generated via 815
endogenous tagging and imaged through fluorescence microscopy for comparison with 816
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
29
compartment relevant protein was predicted to. In descending order, panels depict phase 817
contrast, epitope signal (green), nuclear and mitochondrial DNA (blue), with merges below 818
additionally displaying cell membrane outlines traces for all images, excepting L, which 819
shows only trace of the papilla, which lacks epitope signal. All imaged cells are oriented with 820
their apical regions facing right and posterior facing left. Scale bar represents 5m. Further 821
information on cell lines is available in Table S4. 822
823
Suppl. Fig. 4: Cell membrane cluster shows enrichment of proteins possessing both 824
predicted signal peptide (SP) glycosylphosphatidylinositol (GPI) anchors. t-SNE imputed 825
via neighbor-averaging (A) as well as zeroed dataset (B). Signal peptides predicted via Signal 826
P 6.0 with a confidence threshold greater than 0.9, in tandem with NetGPI 1.1 used for GPI 827
predictions. Further information is available in Table S3. 828
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
30
829
Suppl. Fig. 5: Metabolic reconstruction of glycolysis/gluconeogenesis and pentose 830
phosphate pathway on neighbor-averaged and zero imputed t-SNE. Localization of 831
relevant enzymes across glycolysis/gluconeogenesis (A) and pentose phosphate pathway (B), 832
resolved on neighbor-average and zero imputed t-SNE (C) with relevant localization clusters 833
highlighted. Peroxisomal target sequences (PTS), mitochondrial target peptides (mTP) and 834
transmembrane domains (TMDs) are indicated. Proteins previously localized via anti-sera 835
immunolocalizations indicated with *, metabolite shunts between two pathways indicated 836
with dotted arrows. Split coloring of proteins represents their manual designations to the 837
cytosol (24,25,38) or indicates the possibility of glycosomal dual localizations between the 838
cytosol and glycosomes (1,2,5,9,12,20), based on inspection of fractionation profiles (Suppl. 839
Fig. 6) and targeting signals. Further information is available in Table S6. 840
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
31
841
Suppl. Fig. 6: Fractional and schematic analysis of specific enzymes mediating 842
carbohydrate metabolism. Schematic depiction of DIPPA_21987, phosphofructokinase 1 843
showing phosphofructokinase (PFK) domains, transmembrane domain (TMD) and 844
Peroxisomal Target Signal along with fractional analysis (A), along with fractional profiles of 845
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
32
relevant enzymes across glycolysis/gluconeogenesis (B) compared to marker proteins of the 846
cytosol, glycosomes and endocytic membrane trafficking markers. 847
848
Suppl. Fig. 7: Metabolic reconstruction of Amino Acid (AAs) breakdown for 849
incorporation in the TCA cycle, localized across cell compartments. AAs and metabolites 850
of the TCA cycle are indicated in bold. Propanoate metabolism, which involves intermediates 851
of certain AA digestion, is also depicted. Split coloring indicates manual annotation for 852
specific enzymes based on certain target peptides or candidate function, on top, versus 853
contrasting predictions below (eg. Enzyme 2: proline dehydrogenase, we designate to the 854
mitochondrion, despite low confidence predictions to the nucleus). Further information is 855
available in Table S7. 856
Suppl. File 1: Tables S1-7. 857
Suppl. File 2: Primer sequences used for endogenous tagging of P . papillatum. 858
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted July 20, 2025. ; https://doi.org/10.1101/2025.07.16.665091doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.