Keywords
Hi-C data, Self-Attention, Resolution Enhancement, Single-cell Hi-C, Data
Sparsity
1 Introduction 1
Three-dimensional (3D) conformation of chromosomes is crucial for elucidating genomic processes 2
within the nuclei of eukaryotic cells. The Hi-C technique facilitates an all-versus-all mapping of 3
chromosomal fragment interactions, resulting in an interaction frequency contact matrix, where 4
n × n represents the number of fragments in a chromosome or genome at a specific resolution, 5
Lieberman-Aiden et al., 2009. These Hi-C data are critical for numerous algorithms designed to 6
improve the understanding of genome organization, Oluwadare et al., 2019. A major challenge 7
in this field is the scarcity of high-resolution Hi-C data, which are indispensable for identifying 8
intricate genomic topologies such as enhancer-promoter interactions and subdomains. 9
To address this need, deep learning models have been employed to predict high-resolution data 10
from low-resolution data with remarkable accuracy. Notable models in this area include HiCPlus 11
Y. Zhang et al., 2018, HiCNN T. Liu and Z. Wang, 2019a, hicGAN Q. Liu et al., 2019, Boost-HiC 12
Carron et al., 2019, HiCSR Dimmick, 2020, SRHiC Z. Li and Dai, 2020, HiCNN2 T. Liu and 13
Z. Wang, 2019b, HiCARN Hicks and Oluwadare, 2022, and DeepHiC Hong et al., 2020. These 14
models leverage various network architectures such as Convolutional Neural Networks (CNNs), 15
Autoencoders, and Generative Adversarial Networks (GANs). Despite the advancements made 16
by these models, there remains considerable room for improvement, especially when it comes to 17
single-cell Hi-C data enhancement, Y. Wang et al., 2023, as all of the aforementioned methods are 18
designed for bulk Hi-C data enhancement. 19
Single-cell Hi-C (scHi-C) is a groundbreaking technology that offers a unique opportunity to in- 20
vestigate 3D genome structures at the single-cell level with high resolution, Galitsyna and Gelfand, 21
1
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 20, 2024. ; https://doi.org/10.1101/2024.12.16.628505doi: bioRxiv preprint
2021. By capturing chromatin interactions at the individual cell level, scHi-C enables the explo- 22
ration of cellular heterogeneity in chromatin conformation, Arrastia et al., 2020; Collombet et al., 23
2020; Payne et al., 2021. However, scHi-C data are characterized by high dimensionality, noise, 24
and sparsity, presenting computational challenges that demand innovative solutions for the accu- 25
rate reconstruction of 3D genome structures, Paulsen et al., 2015; Galitsyna and Gelfand, 2021. 26
Therefore, scHi-C data imputation is crucial, as it enables the reconstruction of enhanced contact 27
maps from raw and sparse scHi-C data, thereby improving the quality for downstream analyses, 28
including the reconstruction of chromatin organization at the single-cell level. This enhancement 29
aids in uncovering cell-to-cell variability and heterogeneity, ultimately providing deeper insights 30
into cellular functions and disease mechanisms Y. Wang et al., 2023. 31
Recently, algorithms like ScHiCEDRN, Y. Wang et al., 2023 and Loopenhance, S. Zhang et al., 32
2022 have been developed to address the challenges of scHi-C data enhancement. While these 33
Methods
aim to improve the resolution of single-cell Hi-C data, they often fall short in capturing 34
the complex spatial relationships within chromatin structures, especially long-range dependen- 35
cies. This limitation leads to the loss of critical interactions, which are essential for accurately 36
reconstructing chromatin topology. 37
On the other hand, Attention mechanisms have proven effective in capturing both short-range 38
and long-range dependencies in various domains, such as natural language processing and computer 39
vision, Vaswani, 2017. These mechanisms enable models to focus on different regions of the input 40
data dynamically; hence, they have the potential to be used to enhance the resolution of sparse 41
datasets like scHi-C by capturing context at multiple scales. The motivation behind our work is to 42
leverage Attention mechanisms to address challenges unique to scHi-C data, such as sparsity, noise, 43
and limited coverage. By selectively focusing on relevant chromatin interactions, our approach aims 44
to provide a more biologically meaningful reconstruction of 3D genome structures. 45
In this work, we propose ScHiCAtt, which employs a cascading residual network integrated 46
with an optimal attention mechanism identified through validation across multiple candidates. 47
ScHiCAtt explores different attention mechanisms, such as self-attention, local attention, global 48
attention, and dynamic attention (Attention-in-Attention), selecting the optimal mechanism for 49
each layer during training to determine the best attention mechanism to incorporate for scHi-C data 50
enhancement. The goal of this experimentation is to allow ScHiCAtt to capture both short-range 51
and long-range dependencies adaptively, thus enhancing the quality of scHi-C data reconstruction. 52
Through comprehensive experiments on human and Drosophila data across various downsam- 53
pling rates, we demonstrate that ScHiCAtt significantly improves the resolution of scHi-C data. 54
Our results show superior performance in terms of computational metrics and biological repro- 55
ducibility metrics, such as GenomeDISCO, Ursu et al., 2018, compared to existing methods, par- 56
ticularly under extreme downsampling conditions. Moreover, ScHiCAtt maintains efficient training 57
times, making it a robust solution for high-resolution single-cell Hi-C data enhancement. 58
2 Materials and Methods 59
2.1 Model Architecture 60
Our model architecture starts with an entry convolution layer (Figure 1A) that processes the 61
input raw scHi-C contact map. This is followed by a series of cascading blocks interleaved with 62
attention layers, designed to progressively upscale the resolution of the Hi-C maps. The final 63
high-resolution Hi-C maps are produced through an exit convolution layer. The architecture also 64
includes tunable hyperparameters such as the number of cascading blocks and attention layers, 65
allowing for flexibility in optimizing the model’s performance. 66
In the following subsections, we explore various attention mechanisms that have been considered 67
in our study. We describe each mechanism in detail, highlighting its unique features and the 68
rationale behind its selection for our research. Furthermore, we elucidate how these mechanisms 69
were implemented within our architecture for evaluation. 70
2.1.1 Self-Attention Mechanism 71
The self-attention mechanism in our architecture (Figure 1A) facilitates efficient learning of both 72
local and global chromatin interactions by allowing the model to dynamically assign weights to 73
relationships between chromatin loci, regardless of their spatial distance on the Hi-C contact maps. 74
This capability is crucial for capturing both short-range and long-range dependencies within chro- 75
matin structures. 76
2
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 20, 2024. ; https://doi.org/10.1101/2024.12.16.628505doi: bioRxiv preprint
To achieve this, the attention scores are computed by taking the scaled dot-product between 77
the input projection matrices: queries ( H) and keys ( J), divided by the square root of the keys 78
dimension. The resulting attention scores are passed through a softmax function to compute the 79
attention weights, which are then applied to the values ( Z). This enables the model to prioritize 80
important interactions, enhancing the quality of the predicted high-resolution contact maps. 81
The process is defined as: 82
A(H, J, Z) = Softmax
H · JT
p
dj
!
· Z (1)
Here, H ∈ Rn×d, J ∈ Rn×d, and Z ∈ Rn×d represent the query, key, and value matrices 83
respectively, where n is the sequence length (number of loci in the Hi-C contact map), and d is the 84
feature dimension. The term dj is the dimension of the keys (i.e., dj = d) used to scale the dot 85
product and stabilize the training process. This mechanism enables the model to focus on critical 86
chromatin interactions, significantly improving prediction accuracy. 87
2.1.2 Cascading Residual Blocks 88
The backbone of our architecture is the cascading residual blocks, illustrated in Figure 1B, Ahn 89
et al., 2018. Each block comprises residual units with skip connections that progressively refine 90
the Hi-C contact maps. These cascading blocks are interconnected, allowing for the aggregation of 91
features across different layers. 92
2.1.3 Local Attention Mechanism 93
Local attention is applied within the cascading residual blocks (Figure 1B). It focuses on capturing 94
fine-grained chromatin interactions within localized regions of the Hi-C contact maps. The use 95
of depthwise and pointwise convolutions in the local attention mechanism allows the model to 96
enhance the spatial resolution of the Hi-C maps by emphasizing intricate local details. 97
LocalAttention(xi) =
i+wX
j=i−w
αijxj (2)
where αij = exp(eij )Pi+w
k=i−w exp(eik), and eij = (xiWQ)(xjWK)T . Here, xi is the input at position i, w is 98
the window size defining the local neighborhood, WQ and WK are learnable weight matrices for 99
queries and keys respectively. 100
2.1.4 Global Attention Mechanism 101
The global attention mechanism is applied after several cascading residual blocks (Figure 1B) to 102
ensure that global chromatin structures are preserved. This module aggregates context across 103
the entire Hi-C map and allows the model to capture large-scale genomic interactions, which are 104
critical for accurate super-resolution Zhu et al., 2021. 105
GlobalAttention(x) = Softmax
QK T
√
d
V (3)
where Q = xWQ, K = xWK, V = xWV , and WQ, WK, WV are learnable weight matrices for 106
queries, keys, and values, respectively. 107
2.1.5 Multi-Head Attention Mechanism 108
The multi-head attention mechanism is designed to enhance the model’s ability to capture complex 109
relationships in the input data by dividing the input into multiple attention heads. Each head 110
performs attention operations independently, focusing on different aspects of the input, which 111
allows the model to extract diverse contextual information. 112
The mechanism takes three primary inputs: the Query ( Q), Key (K ), and Value (V ) matrices. 113
These inputs are derived from the original data through linear transformations. The attention op- 114
eration for each head calculates a weighted representation of the Value matrix, where the weights 115
are determined by the similarity between the Query and Key matrices. This is expressed mathe- 116
matically as: 117
3
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 20, 2024. ; https://doi.org/10.1101/2024.12.16.628505doi: bioRxiv preprint
Attention(Q, K, V) = softmax
QK T
√dk
V (4)
Here, dk is the dimensionality of the Key matrix, and the softmax function ensures that the 118
weights sum to 1, highlighting the most relevant features for each Query. 119
For multi-head attention, the inputs are split intoh separate heads, each with its ownQ, K, and 120
V . The outputs from all heads are concatenated and passed through a final linear transformation, 121
as shown in the equation below: 122
MultiHead(Q, K, V ) = Concat(head1, head2, . . . ,headh)WO (5)
In this equation, head i represents the output of the i-th attention head, and WO is the learned 123
weight matrix for the final linear transformation. This design allows the model to integrate in- 124
formation from multiple perspectives, improving its ability to capture chromatin interactions and 125
other complex patterns. 126
2.1.6 Dynamic Attention Mechanism 127
Dynamic Attention, also referred to as the Attention-in-Attention (A2A) mechanism, combines 128
static and dynamic attention features to weigh their contributions adaptively. The dynamic atten- 129
tion module applies global pooling, followed by fully connected layers, to dynamically adjust the 130
contribution of features based on and without attention Huang et al., 2019. 131
A2A(x) = wnon-att · NonAttention(x)
+ watt · AttentionBranch(x) (6)
2.2 Loss Function 132
To optimize the quality of the enhanced scHi-C contact matrices, we leverage several key loss 133
functions that address distinct aspects of the reconstruction process. These loss functions ensure 134
that the generated matrices not only minimize pixel-wise error with respect to the target but also 135
maintain structural integrity and visual consistency. 136
2.2.1 Mean Squared Error (MSE) 137
The goal is to minimize the pixel-wise difference between the true and enhanced scHi-C matrices, 138
ensuring that the generated maps closely approximate the true scHi-C data. 139
LM SE = 1
N
NX
i=1
(Yi − ˆYi)2 (7)
In this equation: 140
• N: The total number of data points or pixels in the scHi-C matrices. 141
• Yi: The true value of the i-th pixel in the scHi-C matrix. 142
• ˆYi: The predicted value of the i-th pixel in the enhanced scHi-C matrix. 143
• LM SE: The computed Mean Squared Error, representing the average of the squared differ- 144
ences between the true and predicted values. 145
This loss function penalizes larger deviations more heavily due to the squaring operation, en- 146
couraging the model to generate outputs that closely match the true data. 147
2.2.2 Perceptual Loss 148
Perceptual loss, based on feature representations from a pre-trained VGG network Wu et al., 2020, 149
ensures that the generated Hi-C maps are not only pixel-accurate but also visually consistent with 150
the real Hi-C data. 151
In the perceptual loss LV GG, we utilize the feature maps from specific layers of the pre-trained 152
VGG network: 153
4
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 20, 2024. ; https://doi.org/10.1101/2024.12.16.628505doi: bioRxiv preprint
LV GG = 1
N
NX
i=1
X
ℓ
ϕℓ(Yi) − ϕℓ( ˆYi)
2
(8)
where ϕℓ(·) denotes the feature map extracted from the ℓ-th layer of the VGG network. 154
2.2.3 Total Variation (TV) Loss 155
TV loss reduces noise and enforces smoothness in the generated Hi-C maps, improving the overall 156
visual quality. 157
LT V = 2ψ(hT V + wT V )
F (9)
2.2.4 Adversarial Loss (AD) 158
Adversarial loss improves the realism of the generated high-resolution Hi-C maps by ensuring that 159
the discriminator cannot easily distinguish between real and generated matrices. 160
LAD = 1 − 1
N
NX
i=1
D( ˆYi) (10)
2.3 Evaluation Metrics 161
To evaluate the effectiveness of our models in enhancing the resolution of scHi-C data, we used 162
a few standard metrics that give us different ways to look at the quality of the reconstructed 163
contact maps. Each of these metrics helps us understand how good the reconstruction is from dif- 164
ferent perspectives. They can broadly be categorized as computational metrics, such as Structural 165
Similarity Index Measure, Peak Signal-to-Noise Ratio, and Signal-to-Noise Ratio and biological 166
reproducibility metrics, such as GenomeDISCO, Ursu et al., 2018. 167
2.3.1 Structural Similarity Index 168
Structural Similarity Index Measure(SSIM) quantifies the structural similarities between the true 169
and enhanced scHi-C matrices. 170
SSIM is defined as, 171
SSIM(x, y) = (2µxµy + C1)(2σxy + C2)
(µ2x + µ2
y + C1)(σ2x + σ2y + C2) (11)
Here, µx and µy are the means of x and y, σ2
x and σ2
y are the variances, σxy is the covariance 172
between x and y, and C1 and C2 are constants to stabilize the division when the denominator is 173
close to zero. 174
2.3.2 Peak Signal-to-Noise Ratio 175
As the name states, Peak Signal-to-Noise Ratio (PSNR) quantifies the ratio between the maximum 176
achievable signal and the noise that distorts it. 177
PSNR is defined as 178
PSNR = 20 · log10
MAXI
√
MSE
(12)
In this equation: 179
• PSNR: Peak Signal-to-Noise Ratio, a metric to measure the quality of the enhanced image. 180
• MAXI: The maximum possible pixel value of the image (e.g., 255 for 8-bit images). 181
• MSE: Mean Squared Error between the original and enhanced images. 182
• log10: The base-10 logarithm. 183
5
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 20, 2024. ; https://doi.org/10.1101/2024.12.16.628505doi: bioRxiv preprint
2.3.3 Mean Squared Error 184
Mean Squared Error (MSE) calculates the average squared difference between the predicted and 185
true values. 186
Mean Squared Error is defined as, 187
MSE = 1
N
NX
i=1
(xi − yi)2 (13)
2.3.4 Signal-to-Noise Ratio 188
Signal-to-Noise Ratio (SNR) measures the relationship of the signal power to noise power. 189
SNR = 10 · log10
PN
i=1 y2
iPN
i=1(xi − yi)2
!
(14)
2.3.5 GenomeDISCO 190
In this study, we utilize GenomeDISCO Ursu et al., 2018 as a measure of biological reproducibility. 191
GenomeDISCO produces a concordance score ranging from -1 to 1, reflecting the biological simi- 192
larity between two contact maps. A higher value indicates better concordance. The methodology 193
entails applying a smoothing technique to the contact maps through their graph representations, 194
followed by the calculation of the similarity score on the resulting smoothed matrices. 195
3 Results 196
3.1 Dataset Preparation 197
For this study, we utilized scHi-C datasets as prepared by the ScHiCEDRN framework, which 198
includes data from both Drosophila melanogaster and Homo sapiens cell lines. The Drosophila 199
dataset comprises seven chromosomes (chr2L, chr2R, chr3L, chr3R, chr4, chrX, and chrM) (GSE131811)200
Ulianov et al., 2021, while the human dataset includes chromosomes from the frontal cortex 201
(GSE130711) Lee et al., 2019; Luo et al., 2022. 202
Following the preprocessing steps as described in the ScHiCEDRN framework Y. Wang et 203
al., 2023, we utilized the low-resolution contact maps provided, which had been downsampled to 204
varying degrees (75%, 45%, 10% and 2% of the original raw reads). Detailed preprocessing infor- 205
mation can be found in ScHiCEDRN, Y. Wang et al., 2023, and the datasets are publicly available 206
at https://github.com/BioinfoMachineLearning/ScHiCEDRN. No additional preprocessing was 207
performed on the data. For the human cell line, chromosomes 1, 3, 5, 7, 8, 9, 11, 13, 15, 16, 17, 19, 208
21, and 22 from Human cell 1 were used as the training dataset, while chromosomes 4, 14, 18, and 209
20 were used for validation. For testing, we used chromosomes 2, 6, 10, and 12 from both Human 210
cell 1 and a different human cell, referred to as Human cell 2, as done by ScHiCEDRN. For testing 211
on Drosophila cells, we used chromosomes chr2L and chrX. 212
These datasets were used as inputs for our models, with the raw scHi-C contact maps serving 213
as the ground truth for model training and evaluation. 214
3.2 Hyperparameter Search for Individual Attention Mechanisms 215
We have conducted an extensive hyperparameter search to determine the optimal configuration for 216
our architecture. The two criteria to optimize are (i) determining the best-performing attention 217
mechanism and (ii) its placement within the network layers. Our primary focus is on the im- 218
plementation of various attention mechanisms, including Self-Attention, Local Attention, Global 219
Attention, and Dynamic Attention. The goal is to ascertain which attention mechanism and its 220
placement within the network layers yield the best performance metrics, specifically the PSNR, 221
SSIM, and SNR metrics. The loss function applied in this search is the MSE loss. 222
We performed experiments on the Human cell 1 dataset by integrating each attention mech- 223
anism in different layers of the model (Layers 2, 3, and 5) and evaluated their impact on the 224
model’s performance. The average results, which were obtained from the corresponding validation 225
set chromosomes are in Figure 2, Supplementary Figure S1 and Table I, indicate that the choice of 226
attention mechanism and its placement within the network significantly influences the model’s out- 227
put quality. As illustrated in Table I, the model configuration that utilized Self-Attention at Layer 228
6
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 20, 2024. ; https://doi.org/10.1101/2024.12.16.628505doi: bioRxiv preprint
2 consistently outperformed the other configurations across all metrics. Implying that the Layer 229
2 effectively captures both local and global chromatin interactions, enhancing the model’s ability 230
to preserve long-range dependencies while refining the details in the contact maps. Specifically, it 231
achieved high values of PSNR, SSIM and SNR (Table I). The Dynamic Attention mechanism at the 232
same layer closely followed these results. Conversely, the Local and Global Attention mechanisms, 233
while still providing significant improvements over a baseline model, did not achieve the same level 234
of performance. 235
3.3 Hyperparameter search on Composite Attention Mechanism 236
To evaluate the potential benefits of combining multiple attention mechanisms, we conducted 237
comprehensive experiments integrating self-attention, local attention, and global attention within 238
ScHiCAtt’s architecture. The experiments were designed to assess the model’s performance across 239
all testing chromosomes (Chr 2, Chr 6, Chr 10, and Chr 12) and downsampling ratios (0.75, 0.45, 240
0.10). Training was performed on Human Cell 1, and testing was conducted on Human Cell 2, as 241
specified in the dataset preparation section. 242
Table II presents the performance metrics of ScHiCAtt on composite attention for all tested 243
chromosomes and downsampling ratios. The results demonstrate that combining attention mech- 244
anisms provides slight improvements at higher downsampling ratios , particularly for metrics like 245
SSIM and GenomeDisco. However, at more challenging downsampling ratios , composite atten- 246
tion mechanisms consistently underperform compared to single attention mechanisms, such as 247
self-attention. This underperformance may have resulted from increased architectural complexity, 248
which can hinder the model’s ability to capture long-range chromatin interactions at lower resolu- 249
tions. Overall, based on the results, Self-Attention at Layer 2 provides the best overall performance, 250
which we have adopted as the final configuration for ScHiCAtt. 251
3.4 Composite Loss Function 252
To further validate the effectiveness of the Self-Attention mechanism at Layer 2, we extended 253
our experiments to fine-tune the weights used in the composite loss function. This additional set 254
of experiments was motivated by the need to explore how different configurations of loss func- 255
tion weights impact the model’s output quality. We experimented with various configurations for 256
the composite loss function, which includes Mean Squared Error (MSE), perceptual loss, Total 257
Variation (TV) loss, and adversarial loss components. 258
The overall loss is computed as: 259
LG = αLM SE + βLV GG + γLT V + δLAD, (15)
where α, β, γ, and δ are scalar weights that control the contributions of each component to 260
the final loss. By adjusting the weights α, β, γ, and δ, we aimed to optimize both pixel-wise 261
accuracy and the structural consistency of the generated Hi-C maps. The optimal configuration 262
identified was α = 0.5, β = 0.3, γ = 0.1, and δ = 0.1. See Supplementary Table S1. Our objective 263
was to find an optimal balance that would enhance the reconstruction quality, particularly for 264
challenging downsampling ratios. These adjustments significantly improved the reconstruction 265
quality, especially at extreme downsampling ratios (e.g., 0.10). This indicates that fine-tuning the 266
loss function weights is crucial for achieving high-resolution scHi-C data that not only aligns closely 267
with ground truth but also retains essential structural features. Through these experiments, we 268
confirmed that our proposed ScHiCAtt method, with Self-Attention at Layer 2 and optimized loss 269
function weights, consistently outperforms other configurations. This establishes ScHiCAtt as a 270
robust solution for enhancing scHi-C data, especially in scenarios with severe data sparsity. 271
3.5 Benchmarking with Other Algorithms 272
We evaluated the performance of our novel ScHiCAtt method against existing methods, namely 273
ScHiCEDRN, Y. Wang et al., 2023, Loopenhance, S. Zhang et al., 2022, and DeepHiC, Hong 274
et al., 2020, across different downsampling ratios (0.75, 0.45, and 0.1). These experiments are 275
crucial in demonstrating the robustness and effectiveness of ScHiCAtt under varying conditions. 276
The downsampling ratios represent different levels of data reduction, with 0.75 being the least 277
and 0.1 being the most extreme. We compare the methods based on key metrics: PSNR, SSIM, 278
MSE, SNR, and GenomeDISCO scores. Using these metrics, we benchmarked ScHiCAtt and other 279
algorithms’ ability to generalize across different chromosomes of the same cell type, different cells 280
of the same species, and different species. 281
7
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 20, 2024. ; https://doi.org/10.1101/2024.12.16.628505doi: bioRxiv preprint
3.5.1 Benchmarking on the Same Cell of the Same Species 282
To evaluate the performance of different Hi-C resolution enhancement methods on the same cell 283
from the same species, we conducted experiments on Human Cell 1. The loss function applied 284
was Mean Squared Error loss. The experiments were performed on four different chromosomes: 285
chromosome 2, 6, 10, and 12. For each chromosome, the methods were tested across three different 286
downsampling ratios: 0.75, 0.45, and 0.10. Table III presents a comprehensive comparison of 287
the methods ScHiCAtt, ScHiCEDRN, Loopenhance, and DeepHiC across these chromosomes and 288
downsampling ratios (Figures 3 and Supplementary Figure S2). In Table III, the highest values 289
for each metric at a given downsampling ratio are bolded to indicate the best-performing method. 290
ScHiCAtt consistently outperforms other methods across all the chromosomes and downsampling 291
ratios. Figures 4 show a side-by-side comparison of the heatmaps for enhanced scHi-C contact map 292
from all algorithms for chromosome 12 at a downsampling ratio of 0.75. All together, these results 293
illustrate the consistency of ScHiCAtt’s superiority, highlighting its effectiveness in preserving 294
high-resolution features even when significant downsampling is applied. 295
3.5.2 Benchmarking on Different Cells of the Same Species 296
In addition to evaluating the performance of Hi-C resolution enhancement methods on the same 297
cell, we extended our analysis to different cells from the same species. For this evaluation, we 298
conducted experiments using Human Cell 2 on four distinct chromosomes: Chr 2, Chr 6, Chr 299
10, and Chr 12. The loss function applied was Mean Squared Error loss. Similar to the previous 300
benchmarking, the methods were tested across three different downsampling ratios: 0.75, 0.45, and 301
0.10.Table IV summarizes the performance of the methods ScHiCAtt, ScHiCEDRN, Loopenhance, 302
and DeepHiC across these chromosomes and downsampling ratios. As in the previous analysis, 303
the highest values for each metric at a given downsampling ratio are bolded to indicate the best- 304
performing method. 305
The results demonstrate that ScHiCAtt consistently delivers superior performance across dif- 306
ferent cells from the same species. These findings emphasize the strength of ScHiCAtt’s cascading 307
architecture in preserving essential chromatin interaction features, particularly when enhanced by 308
attention mechanisms like self-attention. These trends are consistent across the other chromosomes 309
and downsampling ratios, reaffirming the robustness and effectiveness of ScHiCAtt.These results 310
underscore the importance of ScHiCAtt in consistently enhancing resolution across different cell 311
types. This ability is critical for studying cell-specific chromatin interactions, which play a key 312
role in understanding gene regulation and other genomic functions. These findings highlight the 313
adaptability and reliability of ScHiCAtt when applied to different cells within the same species, 314
making it a highly effective tool for enhancing Hi-C data resolution across varying cellular condi- 315
tions. Supplementary Figure S3 provides a visual representation of these results, showcasing the 316
consistent performance of ScHiCAtt across different cells. The graphs clearly depict the ability 317
of ScHiCAtt to maintain high-resolution details, even when applied to different cellular contexts 318
within the same species. 319
3.5.3 Benchmarking Across Different Species 320
To assess the generalizability of Hi-C resolution enhancement methods across species, we extended 321
our benchmarking to include cross-species analysis. Specifically, we trained the models on human 322
Hi-C data and tested them on Drosophila chromosomes. The analysis was conducted on two 323
Drosophila chromosomes, chr2L and chrX, across three different downsampling ratios: 0.75, 0.45, 324
and 0.10. The loss function applied was Mean Squared Error loss. Table V presents the comparative 325
performance of ScHiCAtt, ScHiCEDRN, Loopenhance, and DeepHiC in this cross-species setting. 326
These results demonstrate the capability of ScHiCAtt to effectively generalize across species, 327
indicating its robustness in reconstructing chromatin interactions even when the training and 328
testing datasets come from different organisms. Such generalizability highlights its potential utility 329
in comparative genomics studies. The specific choice of downsampling ratios (0.75, 0.45, and 0.10) 330
was informed by typical sparsity levels encountered in single-cell Hi-C data. These ratios allow for 331
a comprehensive evaluation of the methods’ performance under varying levels of data degradation, 332
ensuring the robustness of the conclusions drawn from these experiments. Supplementary Figure 333
S4 illustrates these findings, providing a visual comparison of the methods’ performance across the 334
two Drosophila chromosomes. The graphs clearly show that ScHiCAtt adapts well to cross-species 335
scenarios, retaining high-resolution features despite the challenges posed by species differences. 336
These cross-species benchmarking results underscore the robustness and adaptability of ScHiCAtt 337
8
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 20, 2024. ; https://doi.org/10.1101/2024.12.16.628505doi: bioRxiv preprint
and demonstrate its potential utility in broader genomic studies where cross-species comparisons 338
are necessary. 339
3.6 Topologically Associating Domains Analysis 340
Topologically Associating Domains (TADs) are intrinsic features in mammalians and are key struc- 341
tural elements in genome arrangement Dixon et al., 2012. They are crucial for many biological 342
processes involving CTCF, tRNA, and various insulators and binding proteins. These biological 343
elements are often found near TAD boundary regions and are important for maintaining biological 344
functions such as preventing the spread of heterochromatin, maintaining histone modification, and 345
regulating transcription sites Dixon et al., 2012. 346
To validate the biological relevance of ScHiCAtt’s generated results, we identified TAD regions 347
from the result set and marked them with blue lines in Figure 5. We compared TAD regions 348
identified by ScHiCAtt with those from DeepHiC, ScHiCEDRN, and Loopenhance to support our 349
model’s enhanced data. TopDom Shin et al., 2016, a deterministic and widely accepted tool for 350
extracting TADs, was utilized to extract TAD regions from the generated results. We visualized 351
TAD regions from 20 Mb to 24 Mb regions. We used the model’s generated results trained with 352
the same cell (Human Cell 1) and input these results into TopDom to generate and visualize TADs 353
(Figure 5A). ScHiCAtt preserves all the TAD information, with the predicted TADs marked by blue 354
lines. To support ScHiCAtt’s TADs, we analyzed TADs from the other three tools and visualized 355
them using TopDom. We observed that ScHiCAtt preserved TAD information comparable to the 356
other methods, showing 8 TAD regions similar in number to those identified by the other tools 357
in the specified region. To assess the robustness of ScHiCAtt, we conducted the same analysis 358
using the model’s generated results on a different cell of the same species (Human Cell 2). We 359
visualized the TAD regions with blue lines for all four methods (Figure 5B). We observed that 360
ScHiCAtt preserves TAD information in the specified regions as effectively as the other methods. 361
The similar number and lengths of TADs across all methods indicate the robustness of ScHiCAtt, 362
regardless of the trained model used to generate the enhanced Hi-C data. To further validate our 363
preserved TAD domains, we computed the L2 norm to quantify the similarity with the original 364
Hi-C matrix. A lower value of the L2 norm indicates greater closeness to the original Hi-C matrix. 365
It is challenging to find TAD boundaries from single-cell data, and to address this challenge, we 366
calculated the insulation score as described by Zhang et al. R. Zhang et al., 2022 considering the 367
TAD boundaries. Using this insulation score, we calculated the differential L2 norm of the TAD 368
boundaries reported by ScHiCAtt, DeepHiC, Loopenhance, and ScHiCEDRN, comparing them to 369
those from the original Hi-C matrix (Figure 6). This score reflects how closely each tool preserves 370
the TAD domains. We observed that ScHiCAtt’s L2 norm scores are 1.19 and 1.47 for the same 371
cell and different cell scenarios, respectively. ScHiCAtt showed a lower score compared to other 372
methods, indicating greater similarity to the original data in preserving the TAD boundary regions. 373
We used GenomeFlow Trieu et al., 2019 to visualize the TAD regions from 500 to 600 genomic 374
bins to support the differential L2 norm score, as shown in Supplementary Figure S5. We observed 375
that ScHiCAtt’s TADs are more similar to the original TADs, supporting the differential L2 norm 376
scores of ScHiCAtt. Considering these metrics, ScHiCAtt efficiently enhances the Hi-C contact 377
matrix while preserving biological features (e.g., TADs) across different trained models. 378
4 Discussion 379
The results presented in this study demonstrate the effectiveness of the ScHiCAtt method for en- 380
hancing the resolution of single-cell Hi-C data using attention mechanisms. By experimenting with 381
different attention configurations such as self, local, global, and dynamic attention mechanisms, 382
ScHiCAtt achieves superior performance across several key metrics, including PSNR, SSIM, SNR, 383
and GenomeDISCO scores, particularly at higher downsampling ratios. These results underscore 384
the potential of attention-based models in addressing the challenges of data sparsity and resolution 385
References
421
Ahn, Namhyuk, Byungkon Kang, and Kyung-Ah Sohn (2018). “Fast, accurate, and lightweight 422
super-resolution with cascading residual network”. In: pp. 252–268. 423
Arrastia, Mary V et al. (2020). “A single-cell method to map higher-order 3D genome organization 424
in thousands of individual cells reveals structural heterogeneity in mouse ES cells”. In: bioRxiv, 425
pp. 2020–08. 426
Carron, Leopold et al. (2019). “Boost-HiC: computational enhancement of long-range contacts in 427
chromosomal contact maps”. In: Bioinformatics 35.16, pp. 2724–2729. 428
Collombet, Samuel et al. (2020). “Parental-to-embryo switch of chromosome organization in early 429
embryogenesis”. In: Nature 580.7801, pp. 142–146. 430
Dimmick, Michael (2020). HiCSR: a Hi-C super-resolution framework for producing highly realistic 431
contact maps. University of Toronto (Canada). 432
Dixon, Jesse R et al. (2012). “Topological domains in mammalian genomes identified by analysis 433
of chromatin interactions”. In: Nature 485.7398, pp. 376–380. 434
Galitsyna, Aleksandra A and Mikhail S Gelfand (2021). “Single-cell Hi-C data analysis: safety in 435
numbers”. In: Briefings in bioinformatics 22.6, bbab316. 436
Hicks, Parker and Oluwatosin Oluwadare (2022). “HiCARN: resolution enhancement of Hi-C data 437
using cascading residual networks”. In: Bioinformatics 38.9, pp. 2414–2421. 438
Hong, Hao et al. (2020). “DeepHiC: A generative adversarial network for enhancing Hi-C data 439
resolution”. In: PLoS computational biology 16.2, e1007287. 440
Huang, Lun et al. (2019). “Attention on attention for image captioning”. In: Proceedings of the 441
IEEE/CVF international conference on computer vision, pp. 4634–4643. 442
Lee, Dong-Sung et al. (2019). “Simultaneous profiling of 3D genome structure and DNA methyla- 443
tion in single human cells”. In: Nature methods 16.10, pp. 999–1006. 444
Li, Zhilan and Zhiming Dai (2020). “SRHiC: a deep learning model to enhance the resolution of 445
Hi-C data”. In: Frontiers in genetics 11, p. 353. 446
10
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 20, 2024. ; https://doi.org/10.1101/2024.12.16.628505doi: bioRxiv preprint
Lieberman-Aiden, Erez et al. (2009). “Comprehensive mapping of long-range interactions reveals 447
folding principles of the human genome”. In: science 326.5950, pp. 289–293. 448
Liu, Qiao, Hairong Lv, and Rui Jiang (2019). “hicGAN infers super resolution Hi-C data with 449
generative adversarial networks”. In: Bioinformatics 35.14, pp. i99–i107. 450
Liu, Tong and Zheng Wang (2019a). “HiCNN: a very deep convolutional neural network to better 451
enhance the resolution of Hi-C data”. In: Bioinformatics 35.21, pp. 4222–4228. 452
— (2019b). “HiCNN2: enhancing the resolution of Hi-C data using an ensemble of convolutional 453
neural networks”. In: Genes 10.11, p. 862. 454
Luo, Chongyuan et al. (2022). “Single nucleus multi-omics identifies human cortical cell regulatory 455
genome diversity”. In: Cell genomics 2.3. 456
Oluwadare, Oluwatosin, Max Highsmith, and Jianlin Cheng (2019). “An overview of methods 457
for reconstructing 3-D chromosome and genome structures from Hi-C data”. In: Biological 458
procedures online 21, pp. 1–20. 459
Paulsen, Jonas, Odin Gramstad, and Philippe Collas (2015). “Manifold based optimization for 460
single-cell 3D genome reconstruction”. In: PLoS computational biology 11.8, e1004396. 461
Payne, Andrew C et al. (2021). “In situ genome sequencing resolves DNA sequence and structure 462
in intact biological samples”. In: Science 371.6532, eaay3446. 463
Shin, Hanjun et al. (2016). “TopDom: an efficient and deterministic method for identifying topo- 464
logical domains in genomes”. In: Nucleic acids research 44.7, e70–e70. 465
Trieu, Tuan et al. (2019). “GenomeFlow: a comprehensive graphical tool for modeling and analyzing 466
3D genome structure”. In: Bioinformatics 35.8, pp. 1416–1418. 467
Ulianov, Sergey V et al. (2021). “Order and stochasticity in the folding of individual Drosophila 468
genomes”. In: Nature communications 12.1, p. 41. 469
Ursu, Oana et al. (2018). “GenomeDISCO: a concordance score for chromosome conformation 470
capture experiments using random walks on contact map graphs”. In: Bioinformatics 34.16, 471
pp. 2701–2707. 472
Vaswani, A (2017). “Attention is all you need”. In: Advances in Neural Information Processing 473
Systems. 474
Wang, Yanli, Zhiye Guo, and Jianlin Cheng (2023). “Single-cell Hi-C data enhancement with deep 475
residual and generative adversarial networks”. In: Bioinformatics 39.8, btad458. 476
Wu, Qiong et al. (2020). “A novel perceptual loss function for single image super-resolution”. In: 477
Multimedia Tools and Applications 79, pp. 21265–21278. 478
Zhang, Ruochi, Tianming Zhou, and Jian Ma (2022). “Multiscale and integrative single-cell Hi-C 479
analysis with Higashi”. In: Nature biotechnology 40.2, pp. 254–261. 480
Zhang, Shanshan et al. (2022). “DeepLoop robustly maps chromatin interactions from sparse allele- 481
resolved or single-cell Hi-C data at kilobase resolution”. In:Nature genetics 54.7, pp. 1013–1025. 482
Zhang, Yan et al. (2018). “Enhancing Hi-C data resolution with deep convolutional neural network 483
HiCPlus”. In: Nature communications 9.1, p. 750. 484
Zhu, Hongyu et al. (2021). “Attention mechanisms in CNN-based single image super-resolution: A 485
brief review and a new perspective”. In: Electronics 10.10, p. 1187. 486
11
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 20, 2024. ; https://doi.org/10.1101/2024.12.16.628505doi: bioRxiv preprint
Figure 1: Architecture of the Cascading Residual Network with Attention for Hi-C
Super Resolution. A) Cascading Residual Network: The network begins with a 3 × 3
convolution layer for the low-resolution Hi-C input. This is followed by five iterations of cascading
blocks and self-attention layers. Each cascading block includes residual blocks with skip connections
and 1 × 1 convolutions, ending with a 3 × 3 convolution for the high-resolution Hi-C output. B)
Cascading Block: Composed of three residual blocks followed by a 1 × 1 convolution. Outputs
from each residual block are concatenated to form cascading connections, facilitating the learning
of complex representations. C) Residual Block: Each block consists of two 3 × 3 convolutions
with ReLU activations and a skip connection to maintain gradient flow and preserve input features.
12
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 20, 2024. ; https://doi.org/10.1101/2024.12.16.628505doi: bioRxiv preprint
Figure 2: Performance comparison of models based on attention placement across
different layers. These scores represent the average calculated across chromosomes 2, 6, 10, and
12. (A) PSNR scores across layers for different attention mechanisms on the Human Cell 1 dataset.
(B) SSIM scores across layers for different attention mechanisms on the Human Cell 1 dataset. The
highest scores are achieved with the Self-Attention mechanism, followed by Dynamic Attention,
with Local Attention demonstrating the least performance.
13
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 20, 2024. ; https://doi.org/10.1101/2024.12.16.628505doi: bioRxiv preprint
Figure 3: Benchmarking of ScHiCAtt and other algorithms across Downsampling Ratio
on the Human Cell 1 dataset. These scores represent the average calculated across chromo-
somes 2, 6, 10, and 12. (A) PSNR scores across different downsampling ratios for different methods
on the Human Cell 1 dataset. (B) SSIM scores across different downsampling ratios for different