Methods
Experimental design
Gene selection for the genome-scale panel
Guides were selected for the genome-scale panel based on wild-type expression levels in
iPSCs, as well as those that demonstrated growth effects. Guides were then separated into
those with targeting genes with fitness effects (fitness genes), which were measured at 3, 4
and 5 DPI, and those without (non-fitness genes), which were measured at 6 DPI. 40
non-targeting control guides from Dolcetto A were used for both panels.
T o identify genes with fitness effects, we conducted an essentiality screen using the Dolcetto
library17. We used 57,050 guides targeting 18,899 genes from the Dolcetto A library and
57,011 guides targeting 18,897 genes from the Dolcetto B library to knock down a total of
18,940 genes in fiaj_1 cells and harvested cells at 3, 4, 5, 9 and 10 days post-infection. At
each time-point, fitness effects were quantified by calculating the log 2-fold change of
normalised cell counts compared to that of the read counts in the plasmid library 64 and genes
were considered to have fitness effects if the median fitness effect at day 10 across all
guides was less than -1. The three guides with the lowest log 2 fold change at day 10
post-transfection were then chosen for screening. If fewer than 3 guides were available
across both Dolcetto A and B libraries, all available guides were chosen. In total, this part of
our library consisted of 6,784 guides targeting 2,264 targeted fitness genes.
Additional genes, without a fitness effect iPSCs were selected based on fitness effects in
cancer cell lines, as well expression level in iPSCs. Fitness effects in cancer cell lines were
assessed based on the CERES scores of all 1,376 lines in the DepMap consortium 16. Gene
expression in iPSCs was measured in a pilot screen on fiaj_1 cells and expression values
per cell were normalised by total sum scaling with a scale factor of 100,000 and
log-transformation with a pseudo-count of 1 65. Additional targets were considered if they
were either highly expressed in iPSCs (normalised expression > 0.1), or had strong fitness
effects in cancer cell lines and were expressed in iPSCs (genes with a CERES score > 0.22
or 0.01) or had variable fitness effects (genes with a CERES score standard
deviation across lines > 0.15 and a normalised expression > 0.01). In addition, the 50 genes
with highest CERES score standard deviation but expression < 0.01 were selected. For each
gene, 3 guides were selected from the Dolcetto A library, complementing with guides from
the Dolcetto B library if less than three guides were available in Dolcetto A. In total, 4,962
genes and 14,883 guides were selected.
Gene selection for the targeted panel
Genes were selected for the targeted panel for measuring genetic background effects based
on their effect size observed in the genome-scale screens. Only target genes with at least 20
cells per gene, log-normalized expression greater than 0.1 and minimum correlation of
significant trans effects across timepoints and guides larger than 0.5 were considered for
selection. Of these, all genes with more than 12 significant trans effects were selected for
the panel (n=110 genes, 106 of which were fitness genes). For comparison, we added 97
17
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
genes with 5-12 significant trans effects (all of which were fitness genes) and 203 genes with
fewer than 5 significant trans effects (141 of which were fitness genes). In addition, we
considered 7 genes that have been linked to eQTLs with many trans effects3, 38 genes
associated with monogenic diseases 3, 17 genes with high variance of CERES scores across
lines16, 15 genes with a high expression heritability 1 and ARID1A, EZH2 and BCOR. Guides
were chosen as in the genome-scale screen apart from 5 outlier guides, for which the target
gene log-fold change was higher than the upper bound of a 95% confidence interval in a
linear regression of guide-level versus target-level log-fold changes and a suitable
replacement guide with a similar growth-effect ten days after transfection could be found in
the Dolcetto libraries. This resulted in a total of 1,355 guides targeting 444 genes and 20
non-targeting control guides for the targeted panel.
Experimental protocol
Molecular cloning
The libraries were cloned into the lentiviral expression library
pKLV2-U6gRNA5(BbsI)-ccdb-PGKpuroBFP-W (Addgene 67974) 66. Briefly, the guide libraries
were ordered from Twist Biosciences as 215-mer oligo pool. The pool was composed of
several sub-pools to allow for the selective amplification of gRNAs that were amplified with
subpool specific primers (Supplementary Table STM-1). BBSI-digested amplicons encoding
gRNAs were inserted into the BBSI-digested vector by Gibson assembly (NEB Gibson
Assembly Master Mix) according to manufacturer’s specifications, and transformed by
electroporation (NEB 10-beta Electrocompetent E. coli C3020K). Bacterial cells were
cultured overnight in liquid culture and plasmid DNA was extracted. The plasmid libraries
were pooled together in equimolar ratios to achieve the desired final libraries.
For the construction of the pB-CAGGS-dCas9-KRAB-MeCP2-BSD-mScarlet plasmid, the
pB-CAGGS-dCas9-KRAB-MeCP2 (Addgene 110824) vector was digested with NotI (NEB)
and EcorV (NEB). The EF1α promoter and blasticidin resistance gene was amplified by PCR
using primers #1009 and #1010 (Supplementary Table STM-1). The SNV40 polyA signal
was amplified by PCR using primers #1013 and #1014 (Supplementary Table STM-1). The
mScarlet sequence was amplified by PCR from plasmid pmScarlet_C1 (Addgene 85042)
using primers #1016 and #1012 (Supplementary Table STM-1). All products were purified
with Monarch DNA Cleanup Columns (NEB). T2A sequence was ordered as a gBlock from
IDT . A Gibson assembly with 4 fragments is incubated at 50ºC for 30 minutes and
transformed by electroporation.
Cell culture
Human iPSCs were cultured on Vitronectin XF (StemCell T echnologies, 07180)-coated
plates and mT eSR Plus medium (StemCell T echnologies). The medium was changed every
other day throughout expansion and all experiments. Cell lines were cultured at 37°C, 5%
CO2.
dCas+ cell line generation and activity validation
For the generation of dCas9-KRAB-MeCP2 iPS cell lines, 3 x 10 5 wild type cells were
seeded into 12-well plates with ROCKi containing media. For the transfection of one line,
18
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
600ng of pB-CAGGS-dCas9-KRAB-MeCP2-BSD-mScarlet, 300 ng of mPBase 67 and 100ng
of a reporter plasmid encoding for GFP were mixed with 50ul of Opti-Mem in one tube and
50ul of Opti-Mem was mixed with 2 ul of Lipofectamine Stem (Invitrogen) in another tube.
After 5 minutes of incubation at room temperature, the contents of the tubes were mixed
together and incubated for another 10-30 minutes at room temperature. During incubation,
the media in the wells was refreshed and 0.5ml media was added. After incubation, 100ul of
the complexes were added to the wells. 24h after transfection, 1ml of media was added to
cells. 48h after transfection, blasticidin (TOKU-E) selection was started using a concentration
of 2µg/ml. The cells were cultured in selection for 2 weeks.
T o validate the dCas9-KRAB-MeCP2 activity of the cells, an adopted method of the
previously published Cas9 validation system was used 64. Briefly, cells were transfected with
a plasmid that encodes for BFP and GFP and either a mock gRNA or a gRNA targeting GFP
TSS. 1 x 10 5 cells were seeded into 24-well plates. Cells were transfected with either the
mock or silencing construct using Lipofectamine Stem 24h later. BFP and GFP expression
were measured three days after transfection at FACS. dCas9-KRAB-MeCP2 activity was
calculated based on the median expression of GFP in BFP positive cells. Two replicate
measurements were made for all cell lines for both conditions.
Lentivirus production and determination of lentiviral titer
Supernatants containing lentiviral particles were produced by transient transfection of 293FT
cells using Lipofectamine LTX (Invitrogen). 5.4 μg of a lentiviral vector, 5.4 μg of psPax2
(Addgene 12260), 1.2 μg of pMD2.G (Addgene 12259) and 12 μl of PLUS reagent were
added to 3 ml of OPTI-MEM and incubated for 5 min at room temperature. 36 μl of the LTX
reagent was then added to this mixture and further incubated for 30 min at room
temperature. The transfection complex was added to 80%-confluent 293FT cells in a 10cm
dish containing 10 ml of culture medium. After 48 h viral supernatant was harvested and
fresh medium was added. After 24h the lentiviral supernatant was collected and mixed with
the first supernatant which was then stored at -80°C.
For gRNA library lentiviral titration on dCas9-KRAB-MeCP2 expressing iPSCs, iPSCs were
harvested by Accutase (Stemcell T echnologies) as single cells. iPSCs (3.6x10 5/well in 6-well
plate) were infected with at least five serial dilutions of lentiviral supernatant supplemented
with 10µM Rock inhibitor Y-27632 (Stemcell T echnologies). Uninfected cells were used as
negative control. The transduced cell mixture was cultured in 6-well plates in 2ml/well. 24h
post transduction, the medium was refreshed with mT eSR Plus without Rock inhibitor. After
three days of cell culture the cells were harvested for FACS analysis and the level of BFP
expression was measured. Virus titer was estimated and scaled up accordingly for
subsequent screens.
Screening and sequencing
Cells were transduced with the lentivirus aiming for an MOI of 0.2. The cells were seeded at
a density of 2.0 x 10 5 to 4.5 x 10 5 depending on the day of harvest. Media was refreshed 24h
after transduction. Cells were harvested either on day 3, 4, 5 or 6 after transduction. On
collection day, cells were harvested with accutase, spun down and resuspended in
eBioscience Fixable Viability Dye eFluor 780 (Invitrogen) that was diluted 5000-fold in
eBioscience™ Flow Cytometry Staining Buffer (Invitrogen). Cells were stained for at least 5
19
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
minutes and then filtered with Scienceware® Flowmi™ Cell Strainer (SP Belart). Cells were
then sorted based on dead/alive-staining, BFP and mScarlet expression on MA900
Multi-Application Cell Sorter (Sony), The BD Influx™ (BD Biosciences) or MoFlo XDP Cell
Sorter (Beckman Coulter). An equal number of cells were sorted for all the lines. 12 lines
and 8 lines were pooled together for the genes with and without fitness effects, respectively,
and 1.65 x 10 4 cells were loaded in a 10X inlet. Chromium Next GEM Single Cell 5' Kit v2
(10X Genomics) was used for transcriptome capture, with a modified protocol where we
added an extra primer to the GEM generation mix to capture gRNAs 21.
Computational analysis
Unless otherwise stated, all analyses were performed in R (version 4.3.1) 68 and Seurat
(version 5.0.3).
Read alignment using CellRanger
Reads were aligned with CellRanger 69 (version 6.0.1), processing each inlet separately.
Alignment was conducted using default parameters, using genome build GRch38 as a
reference, an adding additional sequences for BFP , mScarlet, BSD and
dCas9-KRAB-MeCP2 (Supplementary Table STM-2). The sgRNAs were aligned to libraries
for the fitness genes and other genes, respectively. For one inlet, the minimum threshold for
the GEX/Cite-Seq cell barcode overlap was lowered from 0.1 to 0.01.
De-multiplexing of cells based on natural genetic variation
Individual cells were assigned to the source cell lines by de-multiplexing using natural
sequence variants, as each pool consisted of lines from different individuals. We first used
cellSNP 0.1.7 70 to call genotypes from the bam files containing the 10x read sequences for
all cells passing the CellRanger filters. We used bcftools 71 (version 1.10.2) to subset a list of
candidate SNPs 72 to only lines present in each inlet and filtered for a min. allele frequency
threshold of 0.01 and minimum aggregated count of 20. This output was used in Vireo
(version 0.2.1) 73 to de-multiplex the cells into the number of lines present in each pool using
genotype data for each donor provided by the HipSci consortium 1, modified variant
coordinates from GRCh37 to the genome build GRch38 using CrossMap 74. In total, 69% and
72% of cells were uniquely assigned to one cell line in the genome-wide and targeted
screens, respectively. Doublets and unassigned cells were removed for further analysis.
Quality control and filtering
High quality cells were retained based on three criteria: number of RNA UMI counts per cell,
number of unique features per cell and percentage of mitochondrial RNA 75. The number
RNA UMI counts and unique features per cell were either bimodal or trimodal for each inlet
and we removed cells that were in the lowest mode of number of features and UMI counts
using inlet-specific thresholds between 1,926 and 39,260 UMIs per cell (average of 14,186
RNA UMIs across inlets), 2,000 features per cell and a percentage of mitochondrial genes
above 10%. After filtering, we assigned cell cycle scores for each cell using Seurat’s
CellCycleScoring function with cell cycle marker genes retrieved as
cc.genes.updated.201065.
20
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Guide assignment
T o establish an optimal guide assignment strategy, known to impact power and discoveries 76,
we considered a pilot data-set knocking down 161 genes with weak fitness effects in 24
iPSC lines (Supplementary Table ST4-1). We employed five different tools and evaluated
the quality of each assignment by considering the number of knockdowns with significant
on-target down-regulation and the median number of cells per guide (Figure SMA-B). Based
on this, we considered the relative UMI abundance of the most abundant guide with respect
to the total number of guide UMIs in a cell for guide assignment in all further analyses. Cells
were assigned to a guide if the relative frequency of the most abundant guide was in the
upper mode across cells within a cutoff window 0.5 and 1 (median threshold across inlets
was 0.75, minimum 0.5 and maximum 0.88) and had a minimum of 3 UMIs in the cell. All
other cells were considered unassigned. The percent assigned cells varied across inlets and
library sizes, ranging from 9% to 78% in the genome-wide experiments and 37% to 69% in
the targeted experiments, with an average of 30% and 55% of cells assigned to a single
guide, respectively.
Data integration and variance component analysis
For all cells passing quality control we normalised the data by total sum scaling with a scale
factor of 10,000 and log-transformation with a pseudo-count of 1 and combined these results
using Seurat’s merge function, keeping all genes with a minimum normalised expression of
0.1 in all inlets, resulting in a total of 6,471 expressed genes in the genome-wide screen and
6,517 expressed genes in the targeted screen. T o produce UMAP plots, we extracted highly
variable features using FindVariableFeatures, performed PCA using RunPCA and calculated
a UMAP embedding on the top 20 PCs. T o quantify the contribution of the different variables
on the transcriptome heterogeneity, we used a linear mixed model on the expression of the
2,000 most highly variable genes in a variance component analysis including donor / cell
line, batch / inlet, cell cycle phase, sequencing time point and target gene as random effects
and percentage of mitochondrial genes and total number of UMIs per cell as fixed effects. T o
remove technical and batch differences as well as line-specific effects, the corresponding
variables were regressed out from the normalised expression data using ScaleData with
vars.to.regress set to the respective variables and the PCA and UMAP were re-calculated on
the residuals of the model.
Quantifying perturbation effects
T o quantify perturbation effects in the genome-wide screen, we defined all unassigned cells
as well as cells assigned to a non-targeting guide as control cells. Based on the gene
expression measurements of all control cells, we used a linear model to estimate the effects
of cell line, inlet, percent of mitochondrial genes, cell cycle scores and total number of UMIs
per cell on the gene expression. T o assess the effect of a perturbation within an assigned
cell, we calculated the expected expression of each gene based on the linear model and
compared this to the observed expression, yielding a perturbation effect profile for each cell
defined as the difference of the expected and observed expression.
T o assess overall perturbation effects per guide, per target or per target x line pairing, we
averaged these effects across all cells assigned to a guide, target or target x line pairing,
respectively, with significance evaluated based on a z-test using the residuals variance of
the control fit. For the genome-wide screen, targets were considered for analysis if they had
21
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
a minimum of 10 cells assigned to it, individual guides if they had a minimum of 5 cells. This
left a total of 14,982 guides and 6,673 targets to be considered, for which trans effects on
6,471 genes were calculated, giving a total of 96,948,522 and 43,180,983 trans effects
across guides and targets, respectively. For the targeted screen, effects for each of the 444
knockdowns were computed across all lines, as well as separately for each of the lines with
where there were least 10 assigned cells (total of 8,204 trans effects). As in the
genome-wide screen, only the transcriptomic changes for the genes whose log-normalized
mean expression was greater than 0.1 were computed (6,517 expressed genes). In total,
effects were computed across 53,465,468 target, line, expressed gene triplets.
For cis effects by natural genetic variants, we used the same procedure to estimate cis
effects size on the known cis gene,using all control cells in the model and replacing the cell
line covariate with the number of alternate alleles as a proxy for the genotype of each donor.
Cis effect sizes for every eQTL were determined as the model coefficient.
Trans effects were considered significant if the p-value after Benjamini-Hochberg correction
across all targets, tested genes and lines (if applicable) was below 0.1.
Power estimation based on down-sampling experiments
We estimated the variance of the estimated transcriptional change due to knockdown and
impact of the number of assigned cells using a bootstrap procedure. For this, we considered
118 targets of varying effect sizes (11 < # of differentially expressed genes < 551) where the
full data set had at least 1,000 assigned cells. For each target, we subsampled all cells with
replacement to obtain a simulated dataset of 5, 10, 25, 50, 100, 250, 500 and 1000 cells.
Transcriptomic changes for all expressed genes were then computed separately on each of
these data sets exactly as on the full data set (see Quantifying perturbation effects) This was
done 25 times per knockdown, resulting in a total of 25 separate estimates of transcriptional
effect for 8 different sample sizes for each of the 118 targets.
Identifying co-regulated and co-regulating genes
T o quantify the similarity of targets (and expressed genes) based on their perturbation
effects (perturbation response) we calculated Pearson’s correlation between targets
(expressed genes) based on the log-fold changes for all 6,471 expressed genes (6,673
well-powered targets). A total of 22,261,128 target-target and 20,933,685 expressed
gene-expressed gene pairs were considered for analysis. Two targets (expressed genes)
were considered to be co-regulating (co-regulated) if the absolute Pearson’s correlation
between their trans perturbation (response) was greater than 0.2.
Quantifying heritability of perturbation effects across donors
Heritability was estimated in the targeted screen, considering the 5 donors where both cell
lines from each donor demonstrated strong response CRISPR perturbation. We considered
every target-expressed gene pair where we observed a significant trans effect in at least one
line (68,321 target x expressed gene pairs). For each of these pairs, the perturbation effect
profile for every assigned cell obtained from Quantifying Perturbation Effects was fitted with
a linear mixed model using normalized target gene expression as a fixed effect and guide,
cell line and donor as random effects. Donor effect was quantified by computing the
likelihood ratio between the full model and the model without donor as a random effect. T o
22
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
assess significance we created a permutation scheme to obtain an empirical null distribution
of the donor effects. For this, donor labels were permuted across cell lines such that two cell
lines from one donor were assigned to different donors after the permutation. Thereby we
retain the cell line structure in the data but permute the donor structure. This yielded a total
of 544 permutations for the 5 pairs of cell lines. Permutations for every target-gene pair were
computed until 10 null values greater than or equal to the true value were observed (stronger
observations), or until 10 4 null values were computed. The empirical p-value was estimated
to be ptd = (max(10, # Stronger Observations + 1)/(min(# of permutations, 10 4) + 1). Effects
where the Benjamini-Hochberg adjusted p-value < 0.1 were considered to be significant.
All models were fitted using the lmer function from the R package lme4 (v1.1-35.1) 77.
Log-likelihood was computed using the logLik function from the R stats package.
Identification of off-target effects
A knockdown was considered an o ff-target effect based on two criteria:
- Genomic proximity: guide sequence could be mapped to within 1kbp of a
transcription start site of another gene in the Fantom 5 database 78,79
- Sequence similarity: any sequence with fewer than 3nt mismatches could be
mapped to within 1kbp of a transcription start site of another gene in the Fantom 5
database78,79.
T o identify guides with potential off-target effects on OCT4, we additionally considered any
guide in our library whose first 9nt of their seed sequence could be mapped to a region
within 2kbp of a transcription start site of OCT4 in the Fantom 5 database.
Functional annotations
Functional annotations were used throughout analysis, such as for predicting number of
trans effects and number of regulating target genes, similarity of transcriptional e ffects upon
target downregulation and co-perturbation. T o do this, we made use of the following
annotations:
Conservation scores. Conservation scores were obtained for each target from the
Bioconductor package phastCons100way.UCSC.hg38 (version 3.7.1) 23.
Wildtype expression, correlation and variance of expression. Wild-type gene
expression, variance and co-expression was calculated based on undi fferentiated iPSCs 49.
Values were computed based on the log-normalised expression values after regressing out
effects due to donor and technical covariates (percent of mitochondrial genes, total number
of UMIs per cell and number of genes expressed). T wo genes were considered
co-expressed on the single-cell level if their absolute Pearson correlation was above 0.3.
Essentiality. Essentiality was quanti fied as described in Gene selection for the
genome-scale panel based on an iPSC cell line and the DepMap consortium 16.
23
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Expression heritability. Heritability of wild-type expression of iPSCs was obtained from the
HipSci consortium 1.
Protein-protein interactions. Known gene interactions were obtained from the OmniPath
database80 using the import_all_interactions function from OmnipathR (version 3.10.1) 81. A
pair of genes was considered to be interacting if they formed an interaction pair in the
OmniPath database (undirected).
Protein complexes. Known protein complexes were obtained from the Omnipath
database80 using the import_omnipath_complexes(resources = c('CORUM', 'hu.MAP'))
function from OmnipathR (version 3.10.1) 81. Pairs of genes were considered to be a protein
complex pair if they were both members of at least one common protein complex.
Transcription factor regulation. T ranscription factor-target gene interactions were
obtained from the DoRothEA database 82 using the function get_dorothea in the decoupleR
package (version 2.8.0) 83, using all pairs with con fidence level A or B.
Hallmark gene sets. Knockdown and target genes were annotated by their membership in
release 7.5.1 of the mSigDB hallmark gene sets 25. T wo genes were considered to be a
hallmark gene set pair if they were both members of at least one gene set.
Co-essentiality. Co-essentiality modules were taken from Wainberg et. al. 27. T wo genes
were considered to be a co-essentiality module pair if they were both members of at least
one co-essentiality module.
GO T erms. GO term annotations were obtained from the authors of the gPro filer
database84. Pairs of genes were considered to be an enriched gPro filer pair if they had at
least 1 GO annotation in common.
Expression Quantitative Loci in Human iPSCs. Putative trans-eQTLs were obtained from 3
on February 27, 2023. A pair was considered a cis eQTL-trans eQTL pair if the target gene
and identified trans gene in our data corresponded to a cis eQTL gene and its trans effect.
Unless otherwise stated, prediction was done by fitting an elastic net regression model with
alpha = 0.1 using of the glmnet package (version 4.1-8) 85. When predicting the discrete,
positive values (i.e. the number of trans effects and regulating knockdowns), a Poisson
regression was used. When predicting binary response (i.e. similarity of perturbation
response or perturbation pro files), a logistic regression was used. T o account for di fferences
in statistical power we controlled for the number of cells by adding this as a term in the
model.
24
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Protein Complex Prediction
We identified candidate complex interactions based on the correlation between candidate
target genes downstream effects and those of known complex members (Pearson’s R >
0.2). Candidates were then categorised into known and novel interactions based on
literature review and prioritised according to shared function and cellular compartment with
the complex. AlphaFold-Multimer 86 (version 2.3) was used to model pairwise interactions
between each target gene and all members of the candidate complex. pDockQ 38,87 scores
were calculated for each interaction as well as a random background sample of protein
pairs and a set of known protein interactions. We identified plausible target-complex
interactions with a combination of manual examination of predicted structures and pDockQ
scores. We then aligned the top predicted pairwise target-complex member interactions with
known complex structures using PyMol 88 (version 2.5 Open-Source).
MOFA
Multi-modal factor analysis was used to compare trans effects across cell lines using MOFA2
(version 1.12.1) 89,90. For this, log-fold change values for 445 target genes, expression of
6,517 genes, dCas9-KRAB-MeCP2, BSD and mScarlet and 19 cell lines were z-transformed
and input to MOFA with default parameters, with each cell line as a separate view in the
model and the number of factors set to 8.
25
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Figure S1 | Coverage and variance in the genome-scale screen. A) Venn diagram of genes
selected as targets. B) Number of lines (y-axis) for different numbers of cells recovered per line after
genotyping (x-axis) with gRNA assigned (blue) and not (grey). C) Estimated absolute error of
expression log-fold changes for varying number of assigned cells (top) (relative to estimates from all
genes, with a minimum of 1,000; Methods), histogram of cells per target gene (bottom). D) Number of
genes (y-axis; log10 scale) with increasing amounts of variance explained (colors) by cell cycle and
technical artefacts (x-axis).
26
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Figure S2-1 | Molecular signatures of the trans effects of gene knockdown. A) Down-regulation
of target genes due to CRISPRi. Expression of targeted (red markers) and other genes (grey markers)
Red dots show target gene expression values in control cells (x-axis) and assigned cells (y-axis), grey
dots show expression values of expressed genes beyond the target. B) Quantile-quantile plot of
p-values of trans effect of transcription factors in the DoRoTHeA database. Orange points indicate the
trans effects of transcription factors and their known targets while gray points indicate trans effects the
same transcription factors with non-targets. Number of downstream genes (y-axis) with different
numbers of regulators (x-axis). Labels: six genes with most upstream regulators. C) Histogram of the
number of regulators per expressed gene. Expressed genes with the highest numbers of regulators
are labelled. D) Absolute model coefficients (x-axis) for predicting the number of regulators based on
properties of the expressed gene (y-axis). Blue: negative coefficients (fewer regulators); red: positive
coefficients (more regulators). E) Volcano plot of the log-fold change (x-axis) and log-scale
significances (Benjamini Hochberg adjusted p-values, y-axis) for trans effects of (target, expressed
gene) pairs with a known eQTLs acting in cis on the target and trans on the expressed gene. Dashed
line: p=0.1. Labels: two pairs with corrected p-value less than 1. F) Comparison of effect size of
natural variation in expression attributed to a cis eQTLs and CRISPRi. Red: cis eQTLs with at least
one significant trans effect. Gray: cis eQTLs without any significant trans effects.
27
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Figure S2-2 | Plausible protein interactions predicted from similar trans effects of knockdowns.
Similarity of trans effects between target genes (x- and y-axis) involved in the A) regulation of
transcription, B) translation and post-translational processing and C) transcription. Heatmap color:
Pearson’s R. Annotation color: covariates of biological processes involved (see legend). D) A
plausible quadramer formed between RAB10, WDR61, CDC73 and PAF1. RAB10 clashes with CTR9
when we consider the larger structure of Paf 91. E) Predicted binding structure between RAB10 and the
Paf complex. F) Predicted binding structure of CTU1 with EIF2B3 at a buried interface 92 (6O81) of the
EIF2B complex. G) Predicted binding structure of ELP3 with EIF2B3 and EIF2B5 of the EIF2B
complex.
28
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Figure S2-3 | Co-regulated modules identified by perturbation response similarity | A) Heatmap
of trans effects (log-fold change, color) of genes in the glycolysis pathway (x-axis) due to knockdown
of hypoxia pathway regulators ARNT, HIF1A and VHL (y-axis). B) Joint up-regulation of cholesterol
biosynthesis pathway members due to down-regulation of a pathway member. Purple nodes: Genes
in the cholesterol biosynthesis pathway Red edges: up-regulation of arrow target upon knockdown of
arrow source. C) As A), but change (color) of cholesterol biosynthesis gene expression (x-axis) upon
knockdown of genes in cholesterol biosynthesis gene expression (y-axis). D) Predicting correlation
between co-perturbation profiles of downstream effects. Coefficient (y-axis) for different covariates
(x-axis) in a generalized linear model trained to predict correlation of downstream gene log-fold
change vectors for pairs of targets. E) Correlation of gene expression values across single cells in
wild-type iPSCs (x-axis) against correlation in response to perturbations of different targets in
CRISPRi screening (y-axis). F) As A) and C) but of trans effects on iPSC marker genes CD24,
DNMT3B, L1TD1, TDGF1 and TERF1 (x-axis) due to different target genes (y-axis). Single star:
predicted off-target activity on OCT4 (Methods). Double star: inconsistent transcriptional change
between guides for the same gene (maximum correlation between guides < 0.1).
29
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Figure S3-1 | Recovery of the targeted screen. A) UMAP representation (x- and y-axis) of
technical covariate corrected expression of all assigned cells (markers) in the targeted screen, Colors:
cell lines. B) Number of assigned cells (y-axis) per knockdown per line (x-axis). Dashed line: median
number of cells per knockdown per line. C) Number of knockdowns with at least 10 assigned cells,
plotted per line. Blue: number of knockdowns with significant (Benjamini-Hochberg adjusted p-value <
0.1, t-test) on-target down-regulation in a line. Red: additional number of knockdowns with
insignificant on-target down-regulation. D) Concordance of all trans effects that were significant in
either the genome-scale or target screen. Log-fold change across all cells in the targeted screen
(y-axis) compared to genome-scale screen (x-axis) for 288,089 (target, downstream gene) pairs. A
point represents a target-expressed gene pair.
30
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Figure S3-2 | Global effect of transcriptional change due CRISPRi perturbation across cell
lines. A) Percentage of log-fold change variance explained (color) by different MOFA factors (x-axis)
in different cell lines (y-axis). B) MOFA weights of a knockdown (markers) for different factors (x- and
y- axis), for different factor combinations (panels). C) Correlation (color) of MOFA factors (panels),
between cell lines (x- and y-axis).
31
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Figure S3-3 | Genetic background influences transcriptional response due to knockdown. A)
Number of heritable trans effects vs. # of trans effects tested per gene. Genes in red indicate
knockdowns with more heritable trans effects than expected, blue genes indicate knockdowns with
fewer heritable trans effects than expected given the number of significant trans effects across lines.
B) An example of loss of heritability. C9orf135 expression change due to knockdown of OCT4.
C9orf135 expression (y-axis; log-normalized) in individual cells (markers) from different cell lines
(x-axis, colors) with OCT4 knockdown. Colored dash: mean expression in knockdown in cell line.
Grey dash: mean expression in control cells in cell line. Colored arrow: median expression change in
line in response to knockdown. C) As B), but expression change of MRPL55 due to knockdown of
MRPL55. D) Expression change of PCSK9 due to knockdown of MED7. E) Expression change of
SEH1L due to knockdown of the trans eQTL hotspot ZNF208. F) Expression change of MOB1A due
to knockdown of the trans eQTL hotspot ZNF611.
32
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Figure S4 | Variation of transcriptional change due CRISPRi perturbation. A) Strategy for
quantifying sources of variation in transcriptional response due to knockdown. B) Percentage of
variance explained in transcriptional response due to knockdown due to CRISPRi efficacy (on-target
expression), guide, donor and cell line. C) dCas9-KRAB-MeCP2 activity vs. fraction of assigned cells.
D) Number of cells recovered per cell line after 14 days of selection (y-axis) for different cell lines
(x-axis) in a pilot experiment with early pooling of lines. E) Repression log-fold change (x-axis) and
log-scale p-value (y-axis) of target gene (markers) in a pilot experiment with early pooling of lines and
late sequencing time point (14 days post-infection).
33
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Figure SM | Comparison of guide assignment strategies. We utilized a Poisson-Gaussian model 21,
a Gaussian-Gaussian model 69 adopted from previous work, as well as assigning cells to a guide if and
only if more than a fixed threshold (> 5) of guide UMIs were detected (fixed threshold), if and only if
the fraction of a guide compared to all UMIs in a given cell was greater than a given threshold (ratio)
and a modified version of the Gaussian-Gaussian model where assignments were further filtered so
that assignments based on fewer than 5 guide UMIs were disregarded (filtered Gaussian-Gaussian).
T o evaluate the quality of each guide assignment method, we considered A) the number of assigned
cells per guide and B) on-target repression.
34
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
References
1. Kilpinen, H., Goncalves, A., Leha, A., Afzal, V., Alasoo, K., Ashford, S., Bala, S.,
Bensaddek, D., Casale, F .P ., Culley, O.J., et al. (2017). Common genetic variation drives
molecular heterogeneity in human iPSCs. Nature 546, 370–375.
2. Jakubosky, D., D’Antonio, M., Bonder, M.J., Smail, C., Donovan, M.K.R., Young
Greenwald, W.W., Matsui, H., D’Antonio-Chronowska, A., Stegle, O., Smith, E.N., et al.
(2020). Properties of structural variants and short tandem repeats associated with gene
expression and complex traits. Nat. Commun. 11, 1–15.
3. Bonder, M.J., Smail, C., Gloudemans, M.J., Frésard, L., Jakubosky, D., D’Antonio, M.,
Li, X., Ferraro, N.M., Carcamo-Orive, I., Mirauta, B., et al. (2021). Identification of rare
and common regulatory variants in pluripotent cells using population-scale
transcriptomics. Nat. Genet. 53, 313–321.
4. Dixit, A., Parnas, O., Li, B., Chen, J., Fulco, C.P ., Jerby-Arnon, L., Marjanovic, N.D.,
Dionne, D., Burks, T ., Raychowdhury, R., et al. (2016). Perturb-Seq: Dissecting
Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens.
Cell 167, 1853–1866.e17.
5. Adamson, B., Norman, T .M., Jost, M., Cho, M.Y ., Nuñez, J.K., Chen, Y ., Villalta, J.E.,
Gilbert, L.A., Horlbeck, M.A., Hein, M.Y ., et al. (2016). A Multiplexed Single-Cell
CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein
Response. Cell 167, 1867–1882.e21.
6. Datlinger, P ., Rendeiro, A.F ., Schmidl, C., Krausgruber, T ., Traxler, P ., Klughammer, J.,
Schuster, L.C., Kuchler, A., Alpar, D., and Bock, C. (2017). Pooled CRISPR screening
with single-cell transcriptome readout. Nat. Methods 14, 297–301.
7. Xie, S., Duan, J., Li, B., Zhou, P ., and Hon, G.C. (2017). Multiplexed Engineering and
Analysis of Combinatorial Enhancer Activity in Single Cells. Mol. Cell 66, 285–299.e5.
8. Jaitin, D.A., Weiner, A., Yofe, I., Lara-Astiaso, D., Keren-Shaul, H., David, E., Salame,
T .M., T anay, A., van Oudenaarden, A., and Amit, I. (2016). Dissecting Immune Circuits
by Linking CRISPR-Pooled Screens with Single-Cell RNA-Seq. Cell 167,
1883–1896.e15.
9. Replogle, J.M., Saunders, R.A., Pogson, A.N., Hussmann, J.A., Lenail, A., Guna, A.,
Mascibroda, L., Wagner, E.J., Adelman, K., Lithwick-Yanai, G., et al. (2022). Mapping
information-rich genotype-phenotype landscapes with genome-scale Perturb-seq. Cell
185, 2559–2575.e28.
10. Yao, D., Binan, L., Bezney, J., Simonton, B., Freedman, J., Frangieh, C.J., Dey, K.,
Geiger-Schuller, K., Eraslan, B., Gusev, A., et al. (2023). Scalable genetic screening for
regulatory circuits using compressed Perturb-seq. Nat. Biotechnol.
https://doi.org/10.1038/s41587-023-01964-9.
11. Niemi, M.E.K., Martin, H.C., Rice, D.L., Gallone, G., Gordon, S., Kelemen, M.,
McAloney, K., McRae, J., Radford, E.J., Yu, S., et al. (2018). Common genetic variants
contribute to risk of rare severe neurodevelopmental disorders. Nature 562, 268–271.
12. Mirauta, B.A., Seaton, D.D., Bensaddek, D., Brenes, A., Bonder, M.J., Kilpinen, H.,
HipSci Consortium, Stegle, O., and Lamond, A.I. (2020). Population-scale proteome
variation in human induced pluripotent stem cells. Elife 9.
35
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
https://doi.org/10.7554/eLife.57390.
13. Morris, J.A., Caragine, C., Daniloski, Z., Domingo, J., Barry, T ., Lu, L., Davis, K., Ziosi,
M., Glinos, D.A., Hao, S., et al. (2023). Discovery of target genes and pathways at
GWAS loci by pooled single-cell CRISPR screens. Science 380, eadh7699.
14. Tian, R., Gachechiladze, M.A., Ludwig, C.H., Laurie, M.T ., Hong, J.Y ., Nathaniel, D.,
Prabhu, A.V., Fernandopulle, M.S., Patel, R., Abshari, M., et al. (2019). CRISPR
Interference-Based Platform for Multimodal Genetic Screens in Human iPSC-Derived
Neurons. Neuron 104, 239–255.e12.
15. Gasperini, M., Hill, A.J., McFaline-Figueroa, J.L., Martin, B., Kim, S., Zhang, M.D.,
Jackson, D., Leith, A., Schreiber, J., Noble, W.S., et al. (2019). A Genome-wide
Framework for Mapping Gene Regulation via Cellular Genetic Screens. Cell 176, 1516.
16. Meyers, R.M., Bryan, J.G., McFarland, J.M., Weir, B.A., Sizemore, A.E., Xu, H., Dharia,
N.V., Montgomery, P .G., Cowley, G.S., Pantel, S., et al. (2017). Computational
correction of copy number effect improves specificity of CRISPR-Cas9 essentiality
screens in cancer cells. Nat. Genet. 49, 1779–1784.
17. Sanson, K.R., Hanna, R.E., Hegde, M., Donovan, K.F ., Strand, C., Sullender, M.E.,
Vaimberg, E.W., Goodale, A., Root, D.E., Piccioni, F ., et al. (2018). Optimized libraries
for CRISPR-Cas9 genetic screens with multiple modalities. Nat. Commun. 9, 5416.
18. Holland, C.H., Szalai, B., and Saez-Rodriguez, J. (2020). Transfer of regulatory
knowledge from human to mouse for functional genomics analysis. Biochim. Biophys.
Acta Gene Regul. Mech. 1863, 194431.
19. Zheng, X., Dumitru, R., Lackford, B.L., Freudenberg, J.M., Singh, A.P ., Archer, T .K.,
Jothi, R., and Hu, G. (2012). Cnot1, Cnot2, and Cnot3 maintain mouse and human ESC
identity and inhibit extraembryonic differentiation. Stem Cells 30, 910–922.
20. Park, J., Park, S., and Lee, J.-S. (2023). Role of the Paf1 complex in the maintenance of
stem cell pluripotency and development. FEBS J. 290, 951–961.
21. Replogle, J.M., Norman, T .M., Xu, A., Hussmann, J.A., Chen, J., Cogan, J.Z., Meer,
E.J., T erry, J.M., Riordan, D.P ., Srinivas, N., et al. (2020). Combinatorial single-cell
CRISPR screens by direct guide RNA capture and targeted sequencing. Nat.
Biotechnol. 38, 954–961.
22. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P .,
Dolinski, K., Dwight, S.S., Eppig, J.T ., et al. (2000). Gene ontology: tool for the
unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29.
23. Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K.,
Clawson, H., Spieth, J., Hillier, L.W., Richards, S., et al. (2005). Evolutionarily conserved
elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15,
1034–1050.
24. Bergmiller, T ., Ackermann, M., and Silander, O.K. (2012). Patterns of evolutionary
conservation of essential genes correlate with their compensability. PLoS Genet. 8,
e1002803.
25. Liberzon, A., Birger, C., Thorvaldsdóttir, H., Ghandi, M., Mesirov, J.P ., and T amayo, P .
(2015). The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell
Syst 1, 417–425.
36
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
26. Luo, H., Gao, F ., and Lin, Y . (2015). Evolutionary conservation analysis between the
essential and nonessential genes in bacterial genomes. Sci. Rep. 5, 13210.
27. Wainberg, M., Kamber, R.A., Balsubramani, A., Meyers, R.M., Sinnott-Armstrong, N.,
Hornburg, D., Jiang, L., Chan, J., Jian, R., Gu, M., et al. (2021). A genome-wide atlas of
co-essential modules assigns function to uncharacterized genes. Nat. Genet. 53,
638–649.
28. Costanzo, M., VanderSluis, B., Koch, E.N., Baryshnikova, A., Pons, C., T an, G., Wang,
W., Usaj, M., Hanchard, J., Lee, S.D., et al. (2016). A global genetic interaction network
maps a wiring diagram of cellular function. Science 353.
https://doi.org/10.1126/science.aaf1420.
29. T sai, S.Q., and Joung, J.K. (2016). Defining and improving the genome-wide
specificities of CRISPR-Cas9 nucleases. Nat. Rev. Genet. 17, 300–312.
30. Doench, J.G. (2018). Am I ready for CRISPR? A user’s guide to genetic screens. Nat.
Rev. Genet. 19, 67–80.
31. Zukeran, A., T akahashi, A., T akaoka, S., Mohamed, H.M.A., Suzuki, T ., Ikematsu, S.,
and Yamamoto, T . (2016). The CCR4-NOT deadenylase activity contributes to
generation of induced pluripotent stem cells. Biochem. Biophys. Res. Commun. 474,
233–239.
32. Fazzio, T .G., Huff, J.T ., and Panning, B. (2008). Chromatin regulation Tip(60)s the
balance in embryonic stem cell self-renewal. Cell Cycle 7, 3302–3306.
33. Huang, W., Chen, T .-Q., Fang, K., Zeng, Z.-C., Ye, H., and Chen, Y .-Q. (2021).
N6-methyladenosine methyltransferases: functions, regulation, and clinical potential. J.
Hematol. Oncol. 14, 117.
34. Poli, J., Gasser, S.M., and Papamichos-Chronakis, M. (2017). The INO80 remodeller in
transcription, replication and repair. Philos. Trans. R. Soc. Lond. B Biol. Sci. 372.
https://doi.org/10.1098/rstb.2016.0290.
35. UniProt Consortium (2015). UniProt: a hub for protein information. Nucleic Acids Res.
43, D204–D212.
36. Jumper, J., Evans, R., Pritzel, A., Green, T ., Figurnov, M., Ronneberger, O.,
Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., et al. (2021). Highly accurate
protein structure prediction with AlphaFold. Nature 596, 583–589.
37. Fusaro, G., Dasgupta, P ., Rastogi, S., Joshi, B., and Chellappan, S. (2003). Prohibitin
induces the transcriptional activity of p53 and is exported from the nucleus upon
apoptotic signaling. J. Biol. Chem. 278, 47853–47861.
38. Basu, S., and Wallner, B. (2016). DockQ: A Quality Measure for Protein-Protein Docking
Models. PLoS One 11, e0161879.
39. T sitsiridis, G., Steinkamp, R., Giurgiu, M., Brauner, B., Fobo, G., Frishman, G.,
Montrone, C., and Ruepp, A. (2023). CORUM: the comprehensive resource of
mammalian protein complexes-2022. Nucleic Acids Res. 51, D539–D545.
40. Kagey, M.H., Newman, J.J., Bilodeau, S., Zhan, Y ., Orlando, D.A., van Berkum, N.L.,
Ebmeier, C.C., Goossens, J., Rahl, P .B., Levine, S.S., et al. (2010). Mediator and
cohesin connect gene expression and chromatin architecture. Nature 467, 430–435.
37
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
41. Ramasamy, S., Aljahani, A., Karpinska, M.A., Cao, T .B.N., Velychko, T ., Cruz, J.N.,
Lidschreiber, M., and Oudelaar, A.M. (2023). The Mediator complex regulates
enhancer-promoter interactions. Nat. Struct. Mol. Biol. 30, 991–1000.
42. Qin, K., Jian, D., Xue, Y ., Cheng, Y ., Zhang, P ., Wei, Y ., Zhang, J., Xiong, H., Zhang, Y .,
and Yuan, X. (2021). DDX41 regulates the expression and alternative splicing of genes
involved in tumorigenesis and immune response. Oncol. Rep. 45, 1213–1225.
43. Andreou, A.Z. (2021). DDX41: a multifunctional DEAD-box protein involved in
pre-mRNA splicing and innate immunity. Biol. Chem. 402, 645–651.
44. Shinriki, S., Hirayama, M., Nagamachi, A., Yokoyama, A., Kawamura, T ., Kanai, A.,
Kawai, H., Iwakiri, J., Liu, R., Maeshiro, M., et al. (2022). DDX41 coordinates RNA
splicing and transcriptional elongation to prevent DNA replication stress in
hematopoietic cells. Leukemia 36, 2605–2620.
45. Dybkov, O., Preußner, M., El Ayoubi, L., Feng, V.-Y ., Harnisch, C., Merz, K., Leupold, P .,
Yudichev, P ., Agafonov, D.E., Will, C.L., et al. (2023). Regulation of 3’ splice site
selection after step 1 of splicing by spliceosomal C* proteins. Sci Adv 9, eadf1785.
46. Lee, J.W., Ko, J., Ju, C., and Eltzschig, H.K. (2019). Hypoxia signaling in human
diseases and therapeutic targets. Exp. Mol. Med. 51, 1–13.
47. Haase, V.H. (2009). The VHL tumor suppressor: master regulator of HIF . Curr. Pharm.
Des. 15, 3895–3903.
48. Mullen, P .J., Yu, R., Longo, J., Archer, M.C., and Penn, L.Z. (2016). The interplay
between cell signalling and the mevalonate pathway in cancer. Nat. Rev. Cancer 16,
718–731.
49. Cuomo, A.S.E., Seaton, D.D., McCarthy, D.J., Martinez, I., Bonder, M.J.,
Garcia-Bernardo, J., Amatya, S., Madrigal, P ., Isaacson, A., Buettner, F ., et al. (2020).
Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects
on gene expression. Nat. Commun. 11, 810.
50. Novak, G., Kyriakis, D., Grzyb, K., Bernini, M., Rodius, S., Dittmar, G., Finkbeiner, S.,
and Skupin, A. (2022). Single-cell transcriptomics of human iPSC differentiation
dynamics reveal a core molecular network of Parkinson’s disease. Commun Biol 5, 49.
51. Shakiba, N., White, C.A., Lipsitz, Y .Y ., Yachie-Kinoshita, A., T onge, P .D., Hussein,
S.M.I., Puri, M.C., Elbaz, J., Morrissey-Scoot, J., Li, M., et al. (2015). CD24 tracks
divergent pluripotent states in mouse and human cells. Nat. Commun. 6, 7329.
52. Liu, Q., Wang, G., Lyu, Y ., Bai, M., Jiapaer, Z., Jia, W., Han, T ., Weng, R., Yang, Y ., Yu,
Y ., et al. (2018). The miR-590/Acvr2a/T erf1 Axis Regulates T elomere Elongation and
Pluripotency of Mouse iPSCs. Stem Cell Reports 11, 88–101.
53. Hough, S.R., Laslett, A.L., Grimmond, S.B., Kolle, G., and Pera, M.F . (2009). A
continuum of cell states spans pluripotency and lineage commitment in human
embryonic stem cells. PLoS One 4, e7708.
54. Emani, M.R., Närvä, E., Stubb, A., Chakroborty, D., Viitala, M., Rokka, A., Rahkonen,
N., Moulder, R., Denessiouk, K., Trokovic, R., et al. (2015). The L1TD1 protein
interactome reveals the importance of post-transcriptional regulation in human
pluripotency. Stem Cell Reports 4, 519–528.
38
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
55. Iwabuchi, K.A., Yamakawa, T ., Sato, Y ., Ichisaka, T ., T akahashi, K., Okita, K., and
Yamanaka, S. (2011). ECAT11/L1td1 is enriched in ESCs and rapidly activated during
iPSC generation, but it is dispensable for the maintenance and induction of pluripotency.
PLoS One 6, e20461.
56. Wongtrakoongate, P ., Li, J., and Andrews, P .W. (2014). DNMT3B inhibits the
re-expression of genes associated with induced pluripotency. Exp. Cell Res. 321,
231–239.
57. T akahashi, K., and Yamanaka, S. (2006). Induction of pluripotent stem cells from mouse
embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676.
58. Li, M., and Belmonte, J.C.I. (2017). Ground rules of the pluripotency gene regulatory
network. Nat. Rev. Genet. 18, 180–191.
59. Riordan, J.D., and Nadeau, J.H. (2017). From Peas to Disease: Modifier Genes,
Network Resilience, and the Genetics of Health. Am. J. Hum. Genet. 101, 177–191.
60. Doench, J.G., Fusi, N., Sullender, M., Hegde, M., Vaimberg, E.W., Donovan, K.F ., Smith,
I., T othova, Z., Wilen, C., Orchard, R., et al. (2016). Optimized sgRNA design to
maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34,
184–191.
61. Rohatgi, N., Fortin, J.P ., Lau, T ., Ying, Y ., Zhang, Y ., Costa, M., and Reja, R. (2024).
Seed sequences mediate off-target activity in the CRISPR-interference (CRISPRi)
system. bioRxiv, 2024.04.10.588881. https://doi.org/10.1101/2024.04.10.588881.
62. Parts, L., Batté, A., Lopes, M., Yuen, M.W., Laver, M., Luis, B.S., Yue, J., Pons, C., Eray,
E., Aloy, P ., et al. (2021). Natural variants suppress mutations in hundreds of essential
genes. Mol. Syst. Biol. https://doi.org/10.15252/msb.202010138.
63. Ünlü, B., Pons, C., Ho, U.L., Batté, A., Aloy, P ., and van Leeuwen, J. (2023). Global
analysis of suppressor mutations that rescue human genetic defects. Genome Med. 15.
https://doi.org/10.1186/s13073-023-01232-0.
64. Usluer, S., Hallast, P ., Crepaldi, L., Zhou, Y ., Urgo, K., Dincer, C., Su, J., Noell, G.,
Alasoo, K., El Garwany, O., et al. (2023). Optimized whole-genome CRISPR
interference screens identify ARID1A-dependent growth regulators in human induced
pluripotent stem cells. Stem Cell Reports 18, 1061–1074.
65. Butler, A., Hoffman, P ., Smibert, P ., Papalexi, E., and Satija, R. (2018). Integrating
single-cell transcriptomic data across different conditions, technologies, and species.
Nat. Biotechnol. 36, 411–420.
66. Tzelepis, K., Koike-Yusa, H., De Braekeleer, E., Li, Y ., Metzakopian, E., Dovey, O.M.,
Mupo, A., Grinkevich, V., Li, M., Mazan, M., et al. (2016). A CRISPR Dropout Screen
Identifies Genetic Vulnerabilities and Therapeutic T argets in Acute Myeloid Leukemia.
Cell Rep. 17, 1193–1205.
67. Yusa, K., Zhou, L., Li, M.A., Bradley, A., and Craig, N.L. (2011). A hyperactive piggyBac
transposase for mammalian applications. Proc. Natl. Acad. Sci. U. S. A. 108,
1531–1536.
68. Ripley, B.D. (2001). The R project in statistical computing. MSOR Connect. 1, 23–25.
69. Zheng, G.X.Y ., T erry, J.M., Belgrader, P ., Ryvkin, P ., Bent, Z.W., Wilson, R., Ziraldo,
39
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
S.B., Wheeler, T .D., McDermott, G.P ., Zhu, J., et al. (2017). Massively parallel digital
transcriptional profiling of single cells. Nat. Commun. 8, 14049.
70. Huang, X., and Huang, Y . (2021). Cellsnp-lite: an efficient tool for genotyping single
cells. Bioinformatics. https://doi.org/10.1093/bioinformatics/btab358.
71. Danecek, P ., Bonfield, J.K., Liddle, J., Marshall, J., Ohan, V., Pollard, M.O., Whitwham,
A., Keane, T ., McCarthy, S.A., Davies, R.M., et al. (2021). Twelve years of SAMtools
and BCFtools. Gigascience 10. https://doi.org/10.1093/gigascience/giab008.
72. CellSNP - browse /SNPlist at SourceForge.Net
https://sourceforge.net/projects/cellsnp/files/SNPlist/.
73. Huang, Y ., McCarthy, D.J., and Stegle, O. (2019). Vireo: Bayesian demultiplexing of
pooled single-cell RNA-seq data without genotype reference. Genome Biol. 20, 273.
74. Zhao, H., Sun, Z., Wang, J., Huang, H., Kocher, J.-P ., and Wang, L. (2014). CrossMap:
a versatile tool for coordinate conversion between genome assemblies. Bioinformatics
30, 1006–1007.
75. Luecken, M.D., and Theis, F .J. (2019). Current best practices in single‐ cell RNA‐seq
analysis: a tutorial. Mol. Syst. Biol. 15, e8746.
76. Braunger, J.M., and Velten, B. (2024). Guide assignment in single-cell CRISPR screens
using crispat. bioRxiv, 2024.05.06.592692. https://doi.org/10.1101/2024.05.06.592692.
77. Bates, D., Mächler, M., Bolker, B., and Walker, S. (2015). Fitting Linear Mixed-Effects
Models Using lme4. J. Stat. Softw. 67, 1–48.
78. Abugessaisa, I., Noguchi, S., Carninci, P ., and Kasukawa, T . (2017). The FANTOM5
Computation Ecosystem: Genomic Information Hub for Promoters and Active
Enhancers. Methods Mol. Biol. 1611, 199–217.
79. Hodgkins, A., Farne, A., Perera, S., Grego, T ., Parry-Smith, D.J., Skarnes, W.C., and
Iyer, V. (2015). WGE: a CRISPR database for genome engineering. Bioinformatics 31,
3078–3080.
80. Lab, S.-R. OmniPath :: Intra- & intercellular signaling knowledge.
https://omnipathdb.org/.
81. Türei, D., Korcsmáros, T ., and Saez-Rodriguez, J. (2016). OmniPath: guidelines and
gateway for literature-curated signaling pathway resources. Nat. Methods.
82. Garcia-Alonso, L., Holland, C.H., Ibrahim, M.M., Turei, D., and Saez-Rodriguez, J.
(2019). Benchmark and integration of resources for the estimation of human
transcription factor activities. Genome Res. 29, 1363–1375.
83. Badia-i-Mompel, P ., Vélez Santiago, J., Braunger, J., Geiss, C., Dimitrov, D., Müller-Dott,
S., T aus, P ., Dugourd, A., Holland, C.H., Ramirez Flores, R.O., et al. (2022). decoupleR:
ensemble of computational methods to infer biological activities from omics data.
Bioinformatics Advances 2, vbac016.
84. Raudvere, U., Kolberg, L., Kuzmin, I., Arak, T ., Adler, P ., Peterson, H., and Vilo, J.
(2019). g:Profiler: a web server for functional enrichment analysis and conversions of
gene lists (2019 update). Nucleic Acids Res. 47, W191–W198.
85. Friedman, J., Hastie, T ., and Tibshirani, R. (2010). Regularization Paths for Generalized
40
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint
Linear Models via Coordinate Descent. J. Stat. Softw. 33, 1–22.
86. Evans, R., O’Neill, M., Pritzel, A., Antropova, N., Senior, A., Green, T ., Žídek, A., Bates,
R., Blackwell, S., Yim, J., et al. (2022). Protein complex prediction with
AlphaFold-Multimer. bioRxiv, 2021.10.04.463034.
https://doi.org/10.1101/2021.10.04.463034.
87. Bryant, P ., Pozzati, G., and Elofsson, A. (2022). Improved prediction of protein-protein
interactions using AlphaFold2. Nat. Commun. 13, 1265.
88. PyMOL http://www.pymol.org/pymol.
89. Argelaguet, R., Velten, B., Arnol, D., Dietrich, S., Zenz, T ., Marioni, J.C., Buettner, F .,
Huber, W., and Stegle, O. (2018). Multi-Omics Factor Analysis-a framework for
unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 14, e8124.
90. Argelaguet, R., Arnol, D., Bredikhin, D., Deloro, Y ., Velten, B., Marioni, J.C., and Stegle,
O. (2020). MOFA+: a statistical framework for comprehensive integration of multi-modal
single-cell data. Genome Biol. 21, 1–17.
91. Kokic, G., Wagner, F .R., Chernev, A., Urlaub, H., and Cramer, P . (2021). Structural basis
of human transcription–DNA repair coupling. Nature 598, 368–372.
92. Kenner, L.R., Anand, A.A., Nguyen, H.C., Myasnikov, A.G., Klose, C.J., McGeever, L.A.,
T sai, J.C., Miller-Vedam, L.E., Walter, P ., and Frost, A. (2019). eIF2B-catalyzed
nucleotide exchange and phosphoregulation by the integrated stress response. Science
364, 491–495.
41
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted November 28, 2024. ; https://doi.org/10.1101/2024.11.28.625833doi: bioRxiv preprint