Results
92
Mutations in the core epitope of TMR account for most, but not all, resistance to TMR 93
The small-molecule inhibitor TMR is generated by hydrolysis of the orally administered 94
prodrug fostemsavir in the gut (Fig 1A) (28-30). TMR targets the CD4-binding pocket of Env and 95
stabilizes the trimer in a CD4-unbound state, preventing activation of the entry cascade by the 96
CD4 receptor (Fig 1B) (31). The main contact sites for TMR are the side chains of Env positions 97
375, 426 and 434, which cradle this molecule in the pocket (27, 32-36). Mutations at position 475 98
have also been shown to impact resistance (36-39). The consensus amino acid motif at these 99
four sites for TMR-sensitive strains is Ser at position 375 and Met at positions 426, 434 and 475, 100
collectively designated herein the SM3 motif. Mutations at these sites reduce virus sensitivity to 101
TMR in vitro and were associated with clinical resistance in treated individuals (22, 27, 32) . 102
Several trials were conducted to test the efficacy of fostemsavir and predecessor compound s. 103
The largest trial, BRIGHTE, enrolled 371 HIV-positive participants, who were treated for up to five 104
years (see trial design in Fig S1A) (21-23, 26, 27, 40). Plasma samples were collected from the 105
participants before and during treatment and analyzed by the Monogram Biosciences 106
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
5
PhenoSense GT test (Fig S1B). The assay is based on amplification of env from plasma virus, 107
and bulk cloning of the amplicons into an expression vector (33, 41). The library of plasmids from 108
each sample is then sequenced and used to generate a library of pseudoviruses that is tested for 109
resistance to TMR in vitro, as measured by the concentration required to achieve a 50% reduction 110
of infection (IC50). The trial thus provided genomic and phenotypic data that we used to establish 111
the escape paths of the virus in vivo (see Data File S1). 112
Sequence and TMR IC50 data were available for 570 plasma samples from 360 BRIGHTE 113
participants. We first examined the distribution of IC50 values based on their HIV-1 subtype 114
associations (see phylogenetic tree in Fig S2A and Data File S2). Of the 360 samples, 83.3% 115
were identified as clade B, 3.5% as clade B recombinants, and 9.5% as clade F1. Considerable 116
variability was observed in IC 50 values within and between the clades (Fig 1C). This variability 117
was significantly reduced in the subgroup of samples that contained the SM 3 motif (Fig 1D). 118
Nevertheless, some SM3-containing samples still exhibited high IC 50 values, suggesting that 119
additional sites impact resistance. We also performed the above analysis for a previously 120
published panel of 208 Envs (designated herein the Single-Env Dataset) that were tested for 121
their resistance to TMR (38). A similar decrease in the intra - and inter-clade variability in IC 50 122
values was observed for the SM3-containing Envs of this panel (Fig S2B and Data File S3). This 123
finding suggested that sensitivity to TMR involves only modest βclade contextβ effects and that 124
positions outside the SM3 motif may contribute to resistance. 125
To establish a threshold that distinguishes between sensitive and resistant samples, we 126
examined the distribution of IC50 values for the BRIGHTE dataset. A right-skewed distribution was 127
observed for the pre-treatment samples and a bimodal one for the post-treatment samples (Fig 128
1E). Based on these data, we selected an IC50 threshold of 50 nM TMR, which represents the 85th 129
percentile for the pre-treatment samples. This threshold was then used to define changes in 130
resistance status during treatment. Among the 360 participants, only 132 had sequence and IC50 131
data for both pre - and post-treatment time points ( Fig 1F). Of these, 24 were resistant to TMR 132
before treatment, 43 remained sensitive to TMR throughout treatment, and 65 gained resistance 133
on treatment (with a median time of 218 days). For this latter group, designated herein the escape 134
group, we sought to identify and characterize the mutational paths used by the virus to gain 135
resistance. 136
137
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
6
A combined approach to identify Env mutations suspected of increasing HIV-1 resistance 138
to TMR 139
To identify the escape paths from TMR in the BRIGHTE participants and compare them 140
with all possible paths available to the virus, we pursued the approach described in Fig 2A. First, 141
we used four strategies to identify Env mutations suspected of increasing resistance to TMR. We 142
then tested them in vitro for their effects on TMR resistance and Env function using a pseudovirus 143
infection assay . Finally, for the resistance-enhancing mutations identified, we examined the ir 144
emergence frequencies after treatment in the escape group , to determine if the observed 145
frequencies can be explained by their effects on fitness and/or TMR resistance. 146
To determine if the genotype-phenotype datasets from the BRIGHTE trial are sufficiently 147
informative to identify mutations that increase resistance to TMR , we tested different machine 148
learning models (see details in the Methods Section). As input for the models, we used the amino 149
acids at all 856 positions of Env according to the HXBc2 numbering system (42). The outcome to 150
be predicted was the presence of resistance (IC 50 value greater than 50 nM). The XGBoost and 151
Gradient Boosting classifier algorithms (43) achieved the highest performance across all key 152
metrics (Fig 2B and Fig S3), and were thus chosen as the foundations for subsequent models. 153
We then evaluated the optimal number of Env positions to include as input, which were selected 154
based on their minimal distance (in Γ
ngstroms, Γ
) from the TMR molecule using coordinates of 155
the TMR-liganded Env (PDB ID 5U7O) (38). Positions within 4Γ
, 5Γ
, 7.5Γ
, 10Γ
or 12.5Γ
from any 156
TMR atom were tested (see positions in Table S1). In addition, we tested the four sites of the SM3 157
motif, and all 856 positions of Env. Fig 2C shows the area under the curve (AUC) metric for these 158
tests, which describes performance of the model to distinguish between resistant and sensitive 159
samples by sequence, whereby a value of 1 indicates perfect discrimination and 0.5 corresponds 160
to random classification. AUC values greater than 0.9 were observed for most sets, peaking at 161
0.96 for the 56 positions located within 7.5Γ
of TMR (see all metrics in Fig S4 and S5). Given that 162
the variability in the modelβs performance metrics across the five folds was also low for the 163
positions within 7.5Γ
(see standard deviations at bottom of Fig 2C), we selected this subset for 164
subsequent analyses. Finally, to identify the specific mutations that impact resistance, we used a 165
GB Regressor algorithm with the combined set of 570 BRIGHTE and 208 Single-Env datasets 166
(see normalization of the IC50 values and performance in Fig S 6). The Shapley Additive 167
Explanations (SHAP) value w as used to quantify the contribution of mutation s to the model's 168
predictive capacity (44). As shown in Fig 2D, performance of the GB Regressor model was high. 169
Importantly, it provided a list of mutations, each with an estimated effect on resistance (Fig S7). 170
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
7
The 14 mutations with mean absolute SHAP values greater than 0.01 (Data File S4) were defined 171
as suspected of impacting resistance, and were further tested in vitro as described below. 172
Machine learning algorithms identify mutations that are frequently sampled in the datasets. 173
To identify mutations that are less frequently sampled, we used a probabilistic modeling approach 174
that only considered mutations appearing in less than 5% of samples (see Methods Section). 175
The model applies for each mutation the IC50 values across the different samples that contain it, 176
and calculates the relative likelihood of that mutation to contribute to resistance (see Methods 177
Section and Fig S8A). A total of 17 unique mutations were identified by this approach (see Fig 178
S8B and Data File S5 ). In addition, to account for mutations that are not represented in our 179
datasets, we used the structure of the TMR-bound Env (5U7O) to identify Env positions with side 180
chains that extend into the CD4-binding pocket (Table S2). We then examined the frequency of 181
all residues at these positions in HIV-1 clade B viruses, represented by a panel of 2,535 Envs 182
from fostemsavir-untreated individuals (see alignment in Data File S6). Variants that appeared in 183
at least 0.5% of th is panel were selected. A total of 1 7 such unique mutations were identified. 184
Finally, we identified 11 unique mutations noted in previous studies to increase resistance to TMR 185
or its structural analogs (Table S3) (32, 34, 35, 38, 45, 46) . The four approaches yielded a total 186
of 59 unique mutations at 21 Env positions that were suspected of increasing Env resistance to 187
TMR (Fig 2E). 188
189
Phenotypic analysis of mutations suspected of increasing HIV-1 resistance to TMR 190
To determine the effects of the 59 mutations on Env fitness and TMR resistance, we 191
introduced them individually into the Env protein of HIV -1 strain AD8. This well-characterized 192
clade B strain occupies a closed conformation typical of Tier-2-like primary HIV-1 isolates (47, 193
48). Pseudoviruses that contain the mutant Envs were tested for their infectivity using Cf2Th cells 194
that express CD4 and CCR5, and values were normalized for viral particle count by the reverse 195
transcriptase activity in each sample (49). All 59 mutants were also tested for their resistance to 196
TMR. In addition, for each mutation , we examined the emergence frequency after treatment in 197
the escape group, as well as the proportion of the 65 participants that required a single nucleotide 198
change to acquire the mutation . As shown in Fig 3A, most suspected mutations did not alter 199
resistance to TMR , while several increased it considerably (e.g., L116P). Interestingly, some 200
mutations were significantly enriched in the escape group, most notably 426L and 375N , which 201
appeared in 74% and 38% of these individuals, respectively (Fig 3B). To better understand the 202
basis for the differential emergence frequencies of mutations after treatment, and based on the 203
observed variability in the IC50 values, we focused on the subgroup of 18 mutations that increased 204
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
8
the IC50 by 3.5 -fold or more (see shaded region in Fig 3A ). We designate this subgroup 205
resistance-enhancing mutations (REMs). For some REMs (e.g., 375H, 375M and 375Y), their low 206
frequency in the escape group could be explained by the number of nucleotide changes required 207
(see color of data points in Fig 3A). However, several poorly sampled REMs (e.g., 375I) only 208
required a single nucleotide substitution. REM emergence frequencies were not associated with 209
their effects on resistance to TMR (Fig 3C). By contrast, a threshold effect was observed for the 210
relationship between fitness and emergence rate, whereby low-fitness REMs (e.g., 434K, 255M 211
and 204D) were poorly sampled whereas REMs with favorable fitness profiles and nucleotide 212
substitution requirements showed variable frequencies of emergence. 213
In most individuals, REMs appeared at more than one Env position (Fig 3D). We thus 214
examined the fitness-resistance profile s of the two-site mutation combinations that appeared 215
frequently after treatment, as well as combinations that appeared less frequently or not at all (Fig 216
3E and 3F). The most frequent combination in the escape group, 375N/426L, which requires one 217
nucleotide substitution in each codon, exhibited a favorable fitness -resistance profile relative to 218
other two -substitution combinations . Nevertheless, it did not exhibit a unique synergistic 219
advantage in fitness or TMR resistance that could explain the observed high prevalence of this 220
combination or of the individual changes (Fig S9). 221
In the above tests, we compared the emergence frequencies of REMs with their fitness 222
and TMR resistance levels measured using a single HIV-1 isolate (strain AD8). Nevertheless, it 223
could be argued that some poorly sampled REMs, such as 116P, may only increase Env 224
resistance in the context of a limited number of strains (e.g., AD8). Conversely, the 113E and 434I 225
mutations, which appeared frequently after treatment but did not increase AD8 Env resistance 226
significantly, could exhibit limited effects in AD8 relative to other strains. To address this concern, 227
we introduced the 116P, 113E and 434I mutations into the Envs of two clade B transmitted/founder 228
strains, 700010040.C9.4520 and WEAUd15.410.5017 (48). For the 113E mutation, which 229
emerged frequently in BRIGHTE, we also tested the Envs of strains QH_692 and JRFL. For all 230
three REMs, the effects on fitness and resistance were qualitatively similar to those observed for 231
AD8 (Fig S10). While these tests included only 2 -4 variants, they suggested that the effects of 232
the mutations observed in the Env of strain AD8 can be generalized to other clade B isolates. 233
In summary, we identified 18 mutations that increase Env resistance to TMR. Some REMs 234
emerged after treatment at significantly higher frequencies than others. This preference could not 235
be fully explained by the number of nucleotide changes required or by the level of resistance to 236
TMR they impart, and only partially by their effects on Env fitness. 237
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
9
238
The emergence frequency of REM s in the escape group corresponds with their 239
spontaneous emergence rate in the population 240
The 426L mutation, which appeared in 74% of subjects in the escape group, increased 241
TMR resistance by 48-fold, while four other suspected REMs at this position (Ile , Thr, Lys and 242
Arg) did not increase resistance significantly (Fig 3A). To assess the comprehensiveness of our 243
screening approach, we examined if additional variants at th is position may also increase 244
resistance to TMR. To this end, we performed saturation mutagenesis to test the effects of all 245
possible mutation s at position 426 on Env fitness and resistance to TMR (50). Replication-246
competent libraries of HIV -1AD8 that contain a degenerate codon at this position were used to 247
infect the T cell line A3R5.7 in the absence or presence of TMR (250 nM). The frequency of each 248
form in the infected cells was compared with the frequency in the virus library used for infection 249
(see protocol in Methods Section). While several residues at this position showed fitness levels 250
similar to the wild-type Met, remarkably, only Leu was able to infect the cells in the presence of 251
this concentration of TMR (Fig 4A). 252
We also tested the fitness and TMR resistance profiles of all residues at position 375 (Fig 253
4B). REMs 375H, 37 5I, 375M , 375N and 375Y, which showed considerable increases in 254
resistance using the pseudovirus system (Fig 3A ), also showed high resistance in the se 255
experiments (see correlation in Fig 4C). Interestingly, in addition, the 375W and 375F variants 256
showed high resistance to TMR. These forms were not identified by our four approaches because 257
they did not appear in the sequence-IC50 datasets or in the panel of 2,535 clade B isolates from 258
fostemsavir-untreated individuals. Indeed, the emergence frequency of mutations at position 375 259
in BRIGHTE participants correlated well with their proportion in the untreated clade B population 260
(Fig 4D). To quantify the propensity for spontaneous emergence of the 18 REMs in fostemsavir-261
untreated individuals, we applied the panel of 2,535 clade B Envs. The sequences were used to 262
construct a phylogenetic tree that was partitioned into subgroup s. Each position was tested 263
separately, by assigning the taxa their amino acid occupancy at that position, and subgroups 264
dominated by the non-ancestral form at that position were excluded (see example of position 426 265
in Fig 4E ). The average number of independent emergence events of each REM and the 266
variability across the subgroups was then calculated (see Fig 4 F). A strong correlation was 267
observed between the average rate of spontaneous REM emergence in clade B and their 268
emergence in the escape group (see Fig 4G and Fig S11). These findings suggested that the 269
same selection pressures exerted in fostemsavir-untreated individuals may determine the 270
likelihood of the REMs to emerge on treatment. 271
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
10
272
Immune pressures restrict the escape paths of HIV-1 from TMR 273
Selection pressures on replicative fitness and inhibitor resistance are primary factors that 274
guide the evolution of virus proteins in treated individuals. Interestingly, the relationships shown 275
in Fig 3C only suggested a threshold effect for fitness and none for the level of resistance imparted 276
by the REMs. We thus sought to identify additional factors that may contribute to the observed 277
preference for some REMs to emerge (on treatment and in the clade B population). In most HIV-278
infected individuals, antibodies are elicited against conserved epitopes that overlap the 279
coreceptor-binding site (CoR-BS), CD4-binding site (CD4-BS), and inner domain of gp120 (51, 280
52). These epitopes are not exposed on the native state of Env in most primary isolates, and are 281
thus designated non -neutralizing. The non -neutralizing antibody response was suggested to 282
reduce the risk of infection in vaccine trials (53, 54) and alter the evolutionary path of HIV during 283
early stages of infection (55-57). In both cases, the attributed mechanism was based on Fc -284
mediated effector functions . To examine the potential effect of such antibodies on mutation 285
emergence during treatment, we measured the sensitivity of the REMs to plasma samples from 286
two HIV-positive individuals that do not reduce infectivity of Env AD8 at the lowest dilution tested 287
(1:160). As shown in Fig 5A, several REMs that did not emerge on treatment were neutralized by 288
the plasma (e.g., 116P and 434K) , whereas REMs that were sampled more frequently were 289
resistant. 290
To determine the epitope specificity of the antibodies in the plasma that differentially 291
neutralize these mutants, we performed a preliminary experiment using the frequently-sampled 292
(plasma-resistant) 426L and 375N mutants, and the poorly-sampled (plasma-sensitive) 116P and 293
434K mutants. Plasmids that encode the Envs were used to transfect human osteosarcoma 294
(HOS) cells, which express on their surface fully-cleaved Env trimers in their native closed form 295
(58). Monoclonal antibodies that target different Env epitopes (Fig 5B) were added to the cells, 296
and their binding efficiency was detected by cell-based ELISA (59). As shown in Fig 5C, the 116P 297
and 434K changes enhanced binding of antibodies against otherwise-cryptic epitopes that overlap 298
the CoR-BS, V3 loop and CD4-BS. In addition, these changes reduced binding of antibody 299
PGT145 that targets a quaternary epitope at the apex of the trimer (60). These features are 300
consistent with a CD4-bound-like conformation of Env (61, 62). We extended the analysis to all 301
18 REMs and observed that, in addition to 116P and 434K, mutations 423S, 202E, 204D, which 302
appeared infrequently in the escape group and were sensitive to non -neutralizing plasma ( Fig 303
5A), also increased binding of the CoR -BS antibody 17b (Fig 5D). Given the CD4 -bound-like 304
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
11
conformation of these REMs, we also tested their ability to infect CD4-negative cells (47). Indeed, 305
a strong relationship was observed between this variable and the level of 17b binding (Fig 5E and 306
Fig S12). Interestingly, only the 116P mutant showed high binding of 17b but no CD4-independent 307
infection, suggesting two potential structural-functional outcomes for these changes. 308
We also examined the effects of the 18 REMs on Env stability. The Envs of diverse HIV-1 309
isolates can exhibit different levels of conformational stability (47, 59) and sensitivities to 310
inactivation at physiological temperature (63, 64). Nevertheless, a relationship between stability 311
of Env variants and their likelihood to appear in HIV-infected individuals has not been established. 312
To this end, we incubated viruses containing the 18 REMs at 37Β°C and measured the changes in 313
residual infectivity over time. Consistent with previous results for the related ADA strain (64), the 314
half-life of the wild -type AD8 Env was approximately 7 hours (Fig 5 F). Interestingly, the 315
infrequently sampled REMs that were sensitive to non-neutralizing plasma and enhanced 316
exposure of the CoR-BS also demonstrated low functional stability at 37Β°C (see correlations in 317
Fig S13). 318
Taken together, these results suggest that REMs 116P , 434K, 204 D, 202 E and 423S 319
increase sensitivity to non-neutralizing plasma by inducing an open (CD4-bound-like) form of Env 320
that exposes cryptic epitopes targeted by non-neutralizing antibodies. These mutations also 321
reduce Env stability and functional fitness. Their enhanced resistance to TMR can be explained 322
by a change in the CD4-binding pocket to a CD4 -bound-like conformation that is not conducive 323
with binding of TMR ( note location of the SM 3 sites in Fig S14). However, their resistance is 324
associated with a cost β elimination by the non-neutralizing antibody response. 325
326
Amino acid substitution likelihoods impact the path of resistance in vivo 327
We examined the relationships between the above-described features of the 18 REMs, 328
including their emergence frequencies during treatment and in the clade B population (see values 329
in Fig 6A and P-values for the Spearman correlation tests in Fig 6B). We divide these features 330
into four groups: (i) Functional fitness, measured by the fusion competence of the Env, (ii) 331
Functional stability , measured by resistance to inactivation at 37Β°C, (iii) Immune fitness, 332
captured by exposure of the CoR-BS, sensitivity to the non-neutralizing plasma, and indirectly by 333
the requirement for CD4 to infect cells, and (iv) Therapeutic fitness, measured by the in vitro 334
resistance to TMR. Strong correlations were observed between the variables that capture 335
functional fitness, stability, and immune fitness. However, none of them alone could fully explain 336
the emergence frequency of the 18 REMs in the escape group, other than a modest correlation 337
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
12
with Env stability at 37Β°C. As such, we decided to generate a combined fitness variable that 338
describes the functional fitness, stability and immune resistance of each REM. It was calculated 339
as the product of their relative infectivity, plasma resistance, and stability at 37Β°C (expressed as 340
a fraction of the values measured for the wild-type Env). As shown in Fig 6C and 6D, the combined 341
fitness of the REMs correlated well with their emergence rates in the BRIGHTE subjects and in 342
clade B. Yet some REMs, such as 375Y, showed high combined fitness but did not emerge on 343
treatment. 344
Based on these finding, w e also examined the likelihood of the mutations to appear, as 345
determined by: (i) The pre-treatment nucleotide sequence of the participants, (ii) The number of 346
nucleotide changes needed to acquire each REM , and (iii) The type o f nucleotide changes 347
required, based on the transition and transversion rates determined by Martinez Del Rio et al., 348
(15) (Fig S15A). An algorithm that accounts for these variables was generated, in which mutation 349
appearance was modeled as a series of stochastic events with mutation-specific probabilities (see 350
flowchart in Fig S15B). As shown in Fig 6A (bottom rows), similar REM appearance likelihoods 351
were calculated using the clade B consensus and BRIGHTE pre-treatment sequences as the 352
starting state. We examined if these likelihoods could explain the outliers in Fig 6C and 6D (see 353
colors of datapoints). As expected, several REMs with high fitness but low emergence frequencies 354
(most notably 375Y, 424R and 375I) exhibited low substitution likelihoods. 355
We extended our analyses to account for the variables associated with both appearance 356
and persistence of each REM (see flow chart in Fig S15C). Here, we examined the ability to 357
predict the emergence frequencies of the 18 REMs in the escape group based on their: (i) Pre-358
treatment trinucleotide sequence at these sites, (ii) Likelihood for appearance of the REMs (by 359
number and type of the nucleotide substitutions) , and (iii) The c ombined fitness metric that 360
incorporates the functional fitness, stability, and immune fitness. As shown in Fig 6E, the model 361
that integrated all variables demonstrated good predictive performance (AUC = 0.82) and was 362
robust across multiple cross-validation folds, as indicated by the error bars. The combined fitness 363
variable and the nucleotide substitution likelihood s contributed to the prediction significantly. 364
Consistent with the data in Fig 6A, the use of the participantsβ pre-treatment sequences did not 365
improve performance relative to the clade B consensus sequence. 366
Finally, we explored the contribution of the nucleotide substitution likelihoods to the 367
observed profile of mutations in BRIGHTE. To this end, we examined separately the effects of 368
the number of changes required and the probability for each transition or transversion (15). For 369
these tests, we focused on position 375, which contains the largest number of REMs. 370
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
13
Interestingly, both the number of changes and the expected rate of each substitution contributed 371
to predictions of REM emergence ( Fig 6F). For example, position 375 was occupied in most 372
subjects by Ser with the trinucleotide sequence AGT (or less frequently by AGC). Substitutions to 373
Asn (AAT/AAC) or Ile (ATT/ATC) constitute single nucleotide changes (from G to A, or G to T, 374
respectively). However, the G to A transition is more common than the G to T transversion, likely 375
explaining the higher frequency of Asn in the escape group and population relative to Ile despite 376
favorable fitness profiles in both . Indeed, given similar immune and functional fitness levels for 377
the five REMs at position 375 (Fig 6A), addition of the selection pressures did not further improve 378
prediction of the on-treatment mutational outcomes (Fig 6F). We note that these calculations are 379
not intended to capture the quantitative effects of the variables on appearance or persistence of 380
the REMs. Nevertheless, they clearly demonstrate that the mutational path is explained well by 381
the properties of Env measured in vitro and modeled in silico. 382
383
Methods
508
Processing and analysis of sequence data 509
All BRIGHTE trial samples used in this study were collected and analyzed by informed 510
consent. Samples were tested for viral loads and CD4 counts. In addition, some samples were 511
analyzed using the Monogram Biosciences PhenoSense GT assay. N ucleotide and amino acid 512
sequences for 580 samples from 371 subjects were available for analysis. TMR resistance values 513
were available for 570 of these samples. Multiple Env positions contained ambiguous nucleotide 514
or amino acid designations, which reflect the presence of more than one sequenced variant in the 515
donor. To align such sequences, we initially demultiplexed the variants by retaining for each 516
position the amino acid or nucleotide found in the clade B ancestral sequence, or, if not present, 517
the first variant listed at that position. We then used a custom Python code to run MAFFT 7.520 518
for alignment (84). Sequences were then used to determine their clade associations using the 519
Recombinant Identification Program (RIP) tool (85). Sequence variability information was 520
subsequently reintegrated into the samples to allow more than one form at each position, and 521
sequences trimmed to the subset of 856 positions according to HXBc2 numbering (42). 522
523
The Single-Env dataset and transformation of TMR resistance values 524
We also used a previously published dataset composed of 208 Envs from diverse clades, 525
each associated with an amino acid sequence and TMR resistance value measured in vitro (38). 526
Accession numbers (listed in Data File S3) were used to download the sequences from the Los 527
Alamos National Lab (LANL) database (86), which were then aligned and processed as above. 528
TMR IC50 values for the Single-Env set were measured using an in -house assay (38), whereas 529
the BRIGHTE trial samples were analyzed using the Monogram Biosciences Entry assay . To 530
allow us to combine the datasets, we converted the Single-Env set values to a distribution similar 531
to that of the pre-treatment samples from the BRIGHTE trial (Fig S6A). For this purpose, we first 532
examined the distribution type of the BRIGHTE and Single-Env datasets, including Beta, 533
Lognormal, Exponential, Gamma, and Pareto. Based on the sum of squared errors (SSE) metric, 534
both datasets conformed best to a Beta Distribution. We then converted the Single-Env set IC50 535
values to the distribution of the pre -treatment BRIGHTE samples using a probability integral 536
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
18
transform approach. In brief, we first mapped the Single-Env s et values to their cumulative 537
probabilities, then applied the inverse cumulative distribution function of the BRIGHTE Beta 538
distribution, and finally rescaled to the target support, ensuring the values follow the BRIGHTE 539
distribution while preserving their rank ordering. 540
541
Classification algorithms to estimate Env resistance to TMR by sequence 542
We evaluated different classification algorithms for their performance to estimate T MR 543
resistance by amino acid sequence. For these evaluations, as well as for hyperparameter tuning, 544
we defined an IC50 of 50 nM TMR as the threshold to distinguish between sensitive and resistant 545
samples. As input for the model, we first used the amino acids at the 856 positions of Env 546
according to the HxBC2 numbering system. To account for sequence ambiguity in the BRIGHTE 547
data, we used one -hot encoding to convert each position into binary features representing the 548
absence (0) or presence (1) of each amino acid in the sample. In addition, given that some 549
subjects had multiple samples analyzed (before and after treatment), we used a group-stratified 550
5-fold cross validation approach, which ensured that all sequences of each subject are assigned 551
to either the training or test folds. 552
Following preliminary testing of different classification algorithms, we pursue d Extreme 553
Gradient Boosting (XGBoost) from the xgboost package in Python (43), for its high performance 554
across several key metrics (Fig S3). Parameter fine-tuning was performed through grid search in 555
Python to enhance the model's predictive capacity. The following hyperparameters were tuned: 556
The number of estimators was set to 100, 150, 200, 250 and 300 trees; learning rate (Ξ·) was set 557
to 0.1, 0.2 and 0.3; maximum depth of trees was set to 3, 5 and 8; Lambda Regularization was 1, 558
2, and 3; and the fraction of features and samples used by the algorithm was 1 . The AUC was 559
used as the objective function for optimization. Classification metrics were calculated using the 560
metrics module from the sklearn library (87). 561
562
Gradient Boosting Regressor to identify Env features that impact resistance to TMR 563
To identify the sequence features that contribute to TMR resistance, we used the Gradient 564
Boosting Regressor (GBR) algorithm. The algorithm was trained on the combined dataset of 570 565
BRIGHTE and 208 Single-Env samples. Amino acid sequences were used as the input feature 566
set, and the log 10-transformed TMR IC50 values as the response variable. To prepare the 567
sequences, we first excluded Env positions with a minimal distance greater than 7.5 Γ
from any 568
TMR atom on the TMR -liganded structure of Env (PDB ID 5U7O), resulting in 56 remaining 569
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
19
positions (Table S1). Next, we applied one-hot encoding to convert the amino acid occupancy at 570
each position into binary features. Of the 243 remaining features, nine exhibited no variation and 571
were excluded. To further reduce dimensionality and mitigate multicollinearity, we dropped the 572
least frequent feature at each position that meets two criteria: (i) The position contains at least 573
two unique AA variants in the dataset, and (ii) The mutation is included in the feature set tested 574
by the probabilistic approach (see below). This reduced the feature set to 129. 575
The GBR model was trained using mean squared error (MSE) as the objective function 576
for optimization with the Friedman improvement score as the criterion for measuring split quality, 577
and a learning rate (π) value of 0.1. A k -fold nested grouped cross-validation strategy was used 578
with π = 10 for the outer folds and π = 5 for the inner folds. This nested structure separates the 579
parameter optimization process from the model assessment with the inner cross-validation loop 580
for hyperparameter tuning, and the outer loop for model evaluation. The grouping strategy 581
ensured that sequences from the same subject appeared in either the training or test fold, but not 582
both. Hyperparameters of the GBR model were optimized via grid search, including the number 583
of estimators (20, 50, or 100) and the maximum depth of individual regression es timators (3, 5, 584
10, or full tree). Predicted IC50 values that exceeded 5 Β΅M were capped at the maximal allowable 585
value of 5 Β΅M TMR. To estimate the contribution of features to the modelβs predictions, we used 586
the Shapley Additive Explanations (SHAP) value (44). SHAP values quantitatively describe the 587
impact of each feature on the modelβs performance to predict the outcome , including their 588
magnitude, direction and distribution. 589
590
Probabilistic approach to identify low-prevalence mutations that impact TMR resistance 591
The occurrence of mutations at a low frequency in the dataset limits the ability of 592
algorithms to model their effect. To estimate their impact, we implemented a separate procedure. 593
Mutations included in this set were defined by three criteria: (i) They must appear in at least two 594
unique sequences, (ii) They must not appear in more than 5% of all sequences, and (iii) The side 595
chain of the position must be located within 7.5 Γ
of the T MR molecule. This filtering process 596
resulted in 107 mutations. If a mutation appeared in more than one sample of a BRIGHTE 597
participant, the average value of the log10(IC50) was used. 598
A prioritization algorithm was used to rank these mutations by their potential impact. For 599
each of the 10 7 mutations, described as amino acid π and position π, we modeled the log 10-600
converted IC50 values of samples containing that mutation using a normal distribution ππ,π, with 601
average π and variance π2. We assume that ππ,π corresponds with the distribution of their effects 602
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
20
on the greater population of HIV -1 strains (ππ,π). To compare between the 107 mutations, we 603
randomly sampled with replacement the distribution ππ,π of each mutation. This value was 604
compared with the randomly sampled value from all other 106 mutations and ranked (highest 605
value received the lowest rank). This process was repeated 10,000 times, and the average ranks 606
for all 107 mutations across all iterations were calculated. Fig S8B summarizes the top features 607
identified by this approach. 608
609
Production and testing of pseudoviruses containing the Env mutations 610
All 59 mutations suspected of increasing Env resistance to TMR were introduced by site-611
directed mutagenesis into a pSVIII vector that expresses the Env of strain AD8 under control of 612
the LTR promoter (88). Pseudoviruses that contain the variants were generated by transfection 613
of HEK 293T cells. Briefly, 9.5βΓβ105 cells were seeded in each 6-well plate well, and transfected 614
the next day with 0.4βΞΌg of the HIV-1 packaging construct pCMVΞP1ΞenvpA, 1.2βΞΌg of the firefly 615
luciferase-expressing construct pHIvec2.luc, 0.4βΞΌg of pSVIII expressing HIV-1 Env , 0.2βΞΌg of 616
pRev expressing HIV -1 Rev , and 4.2βΞΌL of JetPrime reagent (PolyPlus). The medium was 617
replaced the next day, and virus-containing supernatant was collected 24 h later. Samples were 618
cleared of cell debris by centrifugation at 800βΓβg and filtered through 0.45 ΞΌm pore sized 619
membranes. 620
As a measure of virus particle content, we quantified the reverse transcriptase activity in 621
the samples using a modified version of a previously published protocol (49), where TaqMan 622
chemistry was used in place of SYBR Green. In brief, RT -qPCR reactions (25 ΞΌl total volume) 623
were prepared with TaqMan Gene Expression Master Mix (ThermoFisher) and contained 2.5 mU 624
of MS2 Bacteriophage RNA (Roche), 1 ΞΌM of MS2 Forward Primer (5β - 625
TCCTGCTCAACTTCCTGTCGAG -3β), 1 ΞΌM of MS2 Reverse Primer (5β - 626
CACAGGTCAAACCTCCTAGGAATG -3β), 200 nM Probe (6[FAM] 627
CGAGACGCTACCATGGCTATCGCTGTAG [TAMsp]), 0.5 U RNase Inhibitor (Fermentas, 628
EO0381), and 2 ΞΌl of pseudovirus sample. Pseudovirus samples were lysed in 0.125% Triton X-629
100, 50 mM KCl, 100 mM Tris HCl (pH7.4), 0.4 U/ΞΌl RNase Inhibitor, and 20% glycerol. A standard 630
curve was generated using a dilution series of recombinant HIV reverse transcriptase (NIH AIDS 631
Reagent Program, Cat. No. 12583) ranging from 10β΄ to 10ΒΉΒ² pU/ΞΌl prepared in the same lysis 632
buffer. Reactions were run in 0.1 ml MicroAmp plates (Applied Biosystems, 4306737) on a Quant 633
Studio 3 Real-Time PCR System under the following cycling conditions: 42 Β°C for 20 min, 50 Β°C 634
for 2 min, 95 Β°C for 10 min, followed by 50 cycles of 95 Β°C for 15 s and 60 Β°C for 1 min. Virus 635
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
21
infectivity was expressed as the mean luciferase activity measured for each virus stock (in relative 636
light units, see below) divided by the reverse transcriptase activity in that sample. 637
To measure sensitivity of the variants to TMR ( BMS-626529, MedChemExpress) or 638
plasma from HIV -infected individuals, Cf2Th-CD4+CCR5+ cells were seeded in 96 -well opaque 639
white plates at 2βΓβ10 4 cells per well and infected the next day. For neutralization assays, 640
pseudovirus samples were pre-incubated with the TMR or plasma for one hour at 37Β°C. Samples 641
were then added to the target cells and incubated for 3 days in a 37Β°C 5% CO 2 incubator. To 642
measure infection, the medium was removed, 35βΞΌL passive lysis buffer (Promega) were added, 643
and samples subjected to three freeze -thaw cycles. To measure luciferase activity, 100βΞΌL of 644
luciferin buffer (15βmM MgSO 4, 15βmM KPO4 [pH 7.6], 1βmM ATP, and 1βmM dithiothreitol) and 645
50βΞΌL of 1βmM d -luciferin potassium salt (Syd Labs, MA) were added to each sample. 646
Luminescence was recorded using a Synergy H1 microplate reader (BioTek). 647
To measure the decay rate of virus infectivity at 37Β°C, pseudovirus stocks were divided 648
into aliquots (one sample for each time point), and all samples were snap -frozen on dry ice 649
immersed in ethanol and stored at β80Β°C. At different time points, samples were thawed in a 37Β°C 650
water bath for 2βmin and then further incubated at 37Β°C for 2 -48 hours. All samples were 651
subsequently added to the Cf2Th-CD4+CCR5+ cells and infectivity measured 3 days later by 652
luciferase activity. All tests of infectivity and neutralization were performed in at least three 653
independent experiments that contain three replicates each. Standard errors of the mean were 654
used to quantify the variability in the data. 655
656
Statistical analysis of mutation enrichment in the escape group after treatment 657
We compared the frequency of all 59 suspected mutations in the samples collected before 658
and during treatment in the escape group. The escape group was defined as participants for 659
whom all pre-treatment samples had IC50 values lower than 50 nM TMR, and at least one on -660
treatment sample with an IC 50 greater than 50 nM. To determine enrichment of mutations after 661
treatment in the se 65 participants, we tested the null hypothesis that the frequency of the 662
mutations in the pre-treatment samples was equal to or greater than their frequency in the post -663
treatment samples. Briefly, for each subject we merged all amino acids at each position into a 664
single pre-treatment sample or post -treatment sample. We then calculated for each of the 59 665
suspected mutations the ratio between the mutation frequency in the post -treatment samples 666
relative to the pre-treatment samples. If a participant contained a mutation in both pre- and post-667
treatment samples, the individual was excluded from the analysis of that mutation. Timepoint 668
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
22
identifiers were then permuted 10,000 times and for each iteration the ratio was calculated. To 669
avoid dividing by zero, a small constant value was added to the denominator. The fraction of 670
iterations in which the ratio for the permuted data was the same or larger than the non-permuted 671
data was applied as the P-value for this one-sided test. 672
673
Rate of independent substitution events in HIV-1 clade B 674
We calculated the emergence frequency of the 18 REMs in HIV-1 clade B. To this end, 675
we downloaded Env amino acid sequences from the LANL database, and aligned them to the 676
HXBc2 strain. All sequences that contain any ambiguity or stop codons were removed, as well as 677
sequences within 0.05 amino acid substitutions per site from any other sequence. The resulting 678
dataset consisted of 2,535 Env sequences . We note that while no information was associated 679
with these sequences to indicate lack of fostemsavir treatment of the hosts, the sample collection 680
dates (99.2% collected before FDA approval of fostemsavir) and the sources of the remaining 681
samples render the likelihood of such a coincidence negligible. Phylogenetic relationships 682
between sequences were calculated using FastTree (89) on the Galaxy platform (90), and the 683
tree was divided into sublineages comprising 80 to 150 sequences each using the Depth -First 684
Search algorithm (91). All sublineages were processed using HyPhy SLAC (92) on Galaxy to infer 685
the most likely ancestral sequence. A custom python code was then used to calculate the rate of 686
independent mutation events at each Env position using the results of SLAC as an input. In brief, 687
each subgroup tree was recursively searched to determine all substitution events relative to the 688
prior inferred node. A subgroup was ignored if the majority variant differed from the clade 689
consensus sequence. The rate of independent emergence events of for each amino acid was 690
calculated as the ratio between the number of emergence events and the total number of nodes. 691
Daughter nodes of inferred substitution events were excluded from this count unless a new 692
substitution event occurred. The code is available through our GitHub repository at 693
https://github.com/haimlab/Independent_mutation_Tree. 694
695
Cell-based ELISA to measure binding of antibodies to cell-surface Env variants 696
To measure effects of the mutations on Env antigenicity, we expressed the different 697
variants on the surface of human osteosarcoma (HOS) cells and measured binding of monoclonal 698
antibodies by cell-based ELISA, as we described (7, 59, 93). Briefly, HOS cells were seeded in 699
96-well plates ( 1.4Γ104 cells per well) and transfected the next day with plasmids that express 700
Env, Tat, and Rev, using 60, 11, and 6βng of each plasmid per well, respectively, and 0.12βΞΌL per 701
well of JetPrime reagent. Background antibody binding was quantified using cells transfected with 702
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
23
a pSVIII construct containing a premature stop codon at Env position 46 . Three days later, cells 703
were washed in blocking buffer (BB) composed of a tris-saline (TS) buffer (140βmM NaCl, 1.8βmM 704
CaCl2, 1βmM MgCl2, and 25βmM Tris [pH 7.5]) supplemented with 3% bovine serum albumin and 705
1.1% skim milk. Cells were then incubated for 45 min at room temperature in BB that contains the 706
antibodies at the following concentrations: 17b and F105 at 5βΞΌg/mL; 39F and 10E8 at 2 βΞΌg/mL; 707
CD4-Ig, N6, 447-52D, 35O22, PGT145 and 2G12 at 1 ΞΌg/mL. Cells were then washed 6 times 708
with BB and incubated with a horseradish peroxidase-conjugated goat anti-human IgG for 1 h at 709
room temperature. Cells were subsequently washed six times with BB and six times with TS 710
buffer. Antibody binding was measured by chemiluminescence using 35βΞΌL per well of a 1:1 mix 711
of SuperSignal West Pico chemiluminescent peroxide and luminol enhancer solutions (Thermo 712
Scientific) supplemented with 150βmM NaCl on a Synergy H1 microplate reader. To correct 713
antibody binding values to the expression level of each Env , we also measured in each 714
experiment their recognition by antibody 2G12 that targets an exposed epitope on the high-715
mannose patch of gp120 (94). The antibody-to-2G12 binding ratio was calculated for each Env 716
variant, and the value expressed as a fraction of this ratio measured for the wild -type AD8 Env. 717
Binding assays were conducted at least three times in independent experiments, each with three 718
replicate samples. Standard error values were used to describe the variability in the data. 719
720
Saturation mutagenesis to determine effects on Env fitness and resistance to TMR 721
To introduce a degenerate codon that encodes for all 20 amino acids ( and one stop 722
codon), we used primers that contain the trinucleotide sequence NNK at position 375 or 426 (see 723
schematic of approach and all primers in Fig S17). N represents any nucleotide, and K represents 724
G or T. The NNK-containing primers were used to amplify an Env segment using as template the 725
proviral vector pNL4 -3 that encodes for the Env of strain AD8 with PrimeSTAR Max DNA 726
Polymerase (Takara). This fragment was then ligated to a second fragment by overlapping PCR 727
to generate a combined fragment that spans the entire env gene. This product was then cloned 728
into the env-deleted pNLAD8 vector (GenBank ID PV345784) using In-Fusion assembly (Takara), 729
and the product transformed into Zymo Mix&Go DH5Ξ± chemical ly competent cells. At least 350 730
colonies for each library were collected and pooled, and plasmids were purified using ZR Plasmid 731
Miniprep-Classic. Two micrograms of the provirus library were then used to transfect 1x106 HEK 732
293T cells cultured in 6-well-plate-wells using JetPrime reagent. The medium was changed after 733
4 hours, and 48-hours after transfection the virus was harvested, passed through 0.45 ΞΌm filters 734
and treated with 100 U/ml DNase -1 (Roche) for 30 min at 37Β°C. Samples were the divide d into 735
aliquots, snap-frozen using dry ice immersed in ethanol, and stored at -80Β°C until use. Virus titers 736
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
24
were determined by plaque assay after a 96-hour infection of TZMbl-GFP cells (BEI Resources, 737
HRP-20041, contributed by David G. Russell and David W. Gludish) in DMEM/FCS supplemented 738
with 20 ΞΌg/ml DEAE dextran. 739
To eliminate mixed-allele virions from the samples, the above virus libraries were used to 740
infect 2x106 A3R5.7 acute lymphoblastic leukemia T cells in 2 mL at an MOI of 0.003 in 741
RPMI/FCS supplemented with 20 ΞΌg /ml DEAE dextran. Cells were resuspended in 4 mL fresh 742
culture medium 24 hours after infection, and 2 days later the virus was harvested and filtered as 743
above. Titers were determined using TZM-bl-GFP cells and the samples used to infect a culture 744
of 1x106 A3R5.7 cells in RPMI/FCS supplemented with 40 ΞΌg/ml DEAE dextran at an MOI of 0.05. 745
These infections were performed in the absence or presence of 250 nM TMR. At 16 hours post-746
infection, cells were pelleted and non -integrated viral DNA was purified using QIAprep Spin 747
Miniprep kit (Qiagen). A 900 -bp region encompassing position 375 and 426 was then amplified 748
using PrimeSTAR Max DNA Polymerase. For each biological replicate, a n initial 30 -cycle 749
amplification was performed in triplicate, the samples pooled, and gel-purified using Zymoclean 750
Gel DNA Recovery Kit (Zymo). If necessary, an additional 8-12 rounds of PCR were performed 751
to obtain sufficient sample for sequencing, which was purified using the QIAquick PCR Purification 752
Kit (Qiagen). To sequence the input virus used for the second round of A3R5.7 cell infection, viral 753
RNA was purified using Quick RNA Viral Kit (Zymo), reverse transcribed using SuperScript IV 754
(Invitrogen), and PCR amplified using the same primers used with the purified product from the 755
infected cells. Finally, all samples were sequenced by Oxford Nanopore Technology 756
(PlasmidSaurus), which provided an average count of 5,000 reads per sample. 757
Sequence data ( in fastq format ) were used to calculate amino acid preferences at 758
positions 375 and 426 in the absence and presence of Temsavir based on the method described 759
by Haddox and colleagues (50). This approach calculates the relative frequency of each amino 760
acid in viral DNA isolated from cells infected by the virus library relative to their frequencies in the 761
virus library used for infection. To determine if calculations should be corrected for the error rate 762
in the sequencing reactions, we first examined the incorrect variant calls at the two positions by 763
sequencing three samples infected by the wild-type virus. The average frequency of minority 764
variants across the six samples was 0.13% (standard deviation 0.12%). As such, we performed 765
subsequent calculations in an error-agnostic manner. 766
To calculate the enrichment ratio (π) for each amino acid π at position π, we used: 767
ππ,π =
Ζπ,π ππππ Ζπ,π€π‘(π)
ππππβ
Ζπ,π π£πππ’π Ζπ,π€π‘(π)
π£πππ’π β 768
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
25
where Ζπ,π ππππ and Ζπ,π€π‘(π)
ππππ are the frequencies in the cell lysate of amino acid π or the wild-type (π€π‘) 769
amino acid for position π, and Ζπ,π π£πππ’π and Ζπ,π€π‘(π)
π£πππ’π are the frequencies of these forms in the input 770
virus sample used for infection. Finally, we calculated the amino acid preference ( π) for each 771
amino acid as: 772
ππ,π = ππ,π
β ππ,πβ²πβ²
773
where the β ππ,πβ²πβ² is the sum of enrichments for all amino acids πβ² at position π. 774
A complete software package to calculate amino acid preferences is available through our Github 775
repository at https://github.com/haimlab/EZDMS. 776
777
Algorithms to estimate appearance and persistence of REMs 778
To evaluate the contribution of the different variables to the observed mutational profiles 779
in treated individuals, we designed two algorithms (see flowcharts in Fig S15B and S15C). In 780
both, mutation appearance and persistence were modeled as stochastic events with mutation -781
specific probabilities. The first algorithm calculate d for each of the 18 REMs individually the 782
probability of appearance based on the number and type of nucleotide changes required from the 783
initial trinucleotide sequence. To test the probability for REM appearance in clade B, the algorithm 784
was initiated with the trinucleotide sequence at the Env position in the clade B consensus 785
sequence. To test the probability for REM appearance in the 65 participants of the escape group, 786
the algorithm was initiated with the trinucleotide sequence in the pre-treatment samples of the 65 787
escape group participants. One of the three nucleotide sites was selected randomly, and a 788
nucleotide substitution was randomly introduced at that site based on the transition and 789
transversion likelihoods determined by Martinez Del Rio et al., (15) (see values in Fig S15A). Up 790
to t wo consecutive mutations were allowed. If the desired REM was acquired within the two 791
attempts, a success event was recorded. For each REM, the algorithm was repeated 1,000 times 792
for the clade B consensus sequence, and 1,000 times for each escape group participant. The 793
fraction of success events of the 1,000 iterations in each participant was defined as the ir 794
probability for mutation appearance. 795
The second algorithm ( Fig S15C ) introduce d an additional module to the mutation 796
appearance step, which aimed to capture the ability of each REM to persist in the host . It was 797
based on the in vitro measured effects of the REMs on: (i) Infectivity, (ii) Functional stability at 798
37Β°C, and (iii) Resistance to non-neutralizing plasma. The three values measured for each REM 799
were expressed as a fraction of the wild-type AD8 Env values, and their product was defined as 800
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
26
the Combined Fitness value of each REM. For each success event from the appearance module, 801
persistence was determined by a random process , in which the Combined Fitness value was 802
used as the probability of the REM to persist . If the mutation persisted, a success event was 803
recorded. The fraction of success events of the 1,000 iterations was defined as the probability for 804
REM emergence. Performance was evaluated by compiling the probability values for the 18 805
REMs in the 65 participants of the escape group, which were compared with their outcomes: the 806
absence (0) or presence (1) of REM emergence in the participant after treatment. Performance 807
was measured using the AUC metric. The code for the above algorithm can be found on our 808
Github repository at https://github.com/haimlab/REM-Emergence-Modeling. 809
810
811
Background
Therapy in Heavily Treatment-Experienced Adults with HIV-1. Infect Dis Ther 923
12:2321-2335. 924
22. Kozal M, Aberg J, Pialoux G, Cahn P, Thompson M, Molina JM, Grinsztejn B, Diaz R, 925
Castagna A, Kumar P, Latiff G, DeJesus E, Gummel M, Gartland M, Pierce A, Ackerman 926
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
30
P, Llamoso C, Lataillade M, Team BT. 2020. Fostemsavir in Adults with Multidrug -927
Resistant HIV-1 Infection. N Engl J Med 382:1232-1243. 928
23. Lataillade M, Lalezari JP, Kozal M, Aberg JA, Pialoux G, Cahn P, Thompson M, Molina 929
JM, Moreno S, Grinsztejn B, Diaz RS, Castagna A, Kumar PN, Latiff GH, De Jesus E, 930
Wang M, Chabria S, Gartland M, Pierce A, Ackerman P, Llamoso C. 2020. Safety and 931
efficacy of the HIV -1 attachment inhibitor prodrug fostemsavir in heavily treatment -932
experienced individuals: week 96 results of the phase 3 BRIGHTE study. Lancet HIV 933
7:e740-e751. 934
24. Benlarbi M, Richard J, Bourassa C, Tolbert WD, Chartrand-Lefebvre C, Gendron-Lepage 935
G, Sylla M, El -Far M, Messier -Peet M, Guertin C, Turcotte I, Fromentin R, Verly MM, 936
Prevost J, Clark A, Mothes W, Kaufmann DE, Maldarelli F, Chomont N, Begin P, Tremblay 937
C, Baril JG, Trottier B, Trottier S, Duerr R, Pazgier M, Durand M, Finzi A. 2024. Plasma 938
Human Immunodeficiency Virus 1 Soluble Glycoprotein 120 Association With Correlates 939
of Immune Dysfunction and Inflammation in Antiretroviral Therapy -Treated Individuals 940
With Undetectable Viremia. J Infect Dis 229:763-774. 941
25. Richard J, Prevost J, Bourassa C, Brassard N, Boutin M, Benlarbi M, Goyette G, Medjahed 942
H, Gendron-Lepage G, Gaudette F, Chen HC, Tolbert WD, Smith AB, 3rd, Pazgier M, 943
Dube M, Clark A, Mothes W, Kaufmann DE, Finzi A. 2023. Temsavir blocks the 944
immunomodulatory activities of HIV-1 soluble gp120. Cell Chem Biol 30:540-552 e6. 945
26. Clark A, Prakash M, Chabria S, Pierce A, Castillo -Mancilla JR, Wang M, Du F, Tenorio 946
AR. 2024. Inflammatory Biomarker Reduction With Fostemsavir Over 96 Weeks in Heavily 947
Treatment-Experienced Adults With Multidrug -Resistant HIV-1 in the BRIGHTE Study. 948
Open Forum Infect Dis 11:ofae469. 949
27. Gartland M, Cahn P, DeJesus E, Diaz RS, Grossberg R, Kozal M, Kumar P, Molina JM, 950
Mendo Urbina F, Wang M, Du F, Chabria S, Clark A, Garside L, Krystal M, Mannino F, 951
Pierce A, Ackerman P, Lataillade M. 2022. Week 96 Genotypic and Phenotypic Results 952
of the Fostemsavir Phase 3 BRIGHTE Study in Heavily Treatment -Experienced Adults 953
Living with Multidrug -Resistant HIV -1. Antimicrob Agents Chemother 954
doi:10.1128/aac.01751-21:e0175121. 955
28. Brown J, Chien C, Timmins P, Dennis A, Doll W, Sandefer E, Page R, Nettles RE, Zhu L, 956
Grasela D. 2013. Compartmental absorption modeling and site of absorption studies to 957
determine feasibility of an extended-release formulation of an HIV-1 attachment inhibitor 958
phosphate ester prodrug. J Pharm Sci 102:1742-1751. 959
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
31
29. Heidary M, Shariati S, Nourigheimasi S, Khorami M, Moradi M, Motahar M, Bahrami P, 960
Akrami S, Kaviar VH. 2024. Mechanism of action, resistance, interaction, 961
pharmacokinetics, pharmacodynamics, and safety of fostemsavir. BMC Infect Dis 24:250. 962
30. Lai YT. 2021. Small Molecule HIV-1 Attachment Inhibitors: Discovery, Mode of Action and 963
Structural Basis of Inhibition. Viruses 13. 964
31. Langley DR, Kimura SR, Sivaprakasam P, Zhou N, Dicker I, McAuliffe B, Wang T, Kadow 965
JF, Meanwell NA, Krystal M. 2015. Homology models of the HIV -1 attachment inhibitor 966
BMS-626529 bound to gp120 suggest a unique mechanism of action. Proteins 83:331-50. 967
32. Lataillade M, Zhou N, Joshi SR, Lee S, Stock DA, Hanna GJ, Krystal M, team AIs. 2018. 968
Viral Drug Resistance Through 48 Weeks, in a Phase 2b, Randomized, Controlled Trial 969
of the HIV -1 Attachment Inhibitor Prodrug, Fostemsavir. J Acquir Immune Defic Syndr 970
77:299-307. 971
33. Gartland M, Zhou N, Stewart E, Pierce A, Clark A, Ackerman P, Llamoso C, Lataillade M, 972
Krystal M. 2021. Susceptibility of global HIV -1 clinical isolates to fostemsavir using the 973
PhenoSense(R) Entry assay. J Antimicrob Chemother 76:648-652. 974
34. Ray N, Hwang C, Healy MD, Whitcomb J, Lataillade M, Wind-Rotolo M, Krystal M, Hanna 975
GJ. 2013. Prediction of virological response and assessment of resistance emergence to 976
the HIV-1 attachment inhibitor BMS -626529 during 8 -day monotherapy with its prodrug 977
BMS-663068. J Acquir Immune Defic Syndr 64:7-15. 978
35. Zhou N, Nowicka-Sans B, McAuliffe B, Ray N, Eggers B, Fang H, Fan L, Healy M, Langley 979
DR, Hwang C, Lataillade M, Hanna GJ, Krystal M. 2014. Genotypic correlates of 980
susceptibility to HIV-1 attachment inhibitor BMS-626529, the active agent of the prodrug 981
BMS-663068. J Antimicrob Chemother 69:573-81. 982
36. Prevost J, Chen Y, Zhou F, Tolbert WD, Gasser R, Medjahed H, Nayrac M, Nguyen DN, 983
Gottumukkala S, Hessell AJ, Rao VB, Pozharski E, Huang RK, Matthies D, Finzi A, 984
Pazgier M. 2023. Structure-function analyses reveal key molecular determinants of HIV-1 985
CRF01_AE resistance to the entry inhibitor temsavir. Nat Commun 14:6710. 986
37. Gartland M, Arnoult E, Foley BT, Lataillade M, Ackerman P, Llamoso C, Krystal M. 2021. 987
Prevalence of gp160 polymorphisms known to be related to decreased susceptibility to 988
temsavir in different subtypes of HIV -1 in the Los Alamos National Laboratory HIV 989
Sequence Database. J Antimicrob Chemother 76:2958-2964. 990
38. Pancera M, Lai YT, Bylund T, Druz A, Narpala S, O'Dell S, Schon A, Bailer RT, Chuang 991
GY, Geng H, Louder MK, Rawi R, Soumana DI, Finzi A, Herschhorn A, Madani N, 992
Sodroski J, Freire E, Langley DR, Mascola JR, McDermott AB, Kwong PD. 2017. Crystal 993
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
32
structures of trimeric HIV envelope with entry inhibitors BMS -378806 and BMS-626529. 994
Nat Chem Biol 13:1115-1122. 995
39. Zuze BJL, Radibe BT, Choga WT, Bareng OT, Moraka NO, Maruapula D, Seru K, 996
Mokgethi P, Mokaleng B, Ndlovu N, Kelentse N, Pretorius-Holme M, Shapiro R, Lockman 997
S, Makhema J, Novitsky V, Seatla KK, Moyo S, Gaseitsiwe S. 2023. Fostemsavir 998
resistance-associated polymorphisms in HIV-1 subtype C in a large cohort of treatment -999
naive and treatment-experienced individuals in Botswana. Microbiol Spectr 11:e0125123. 1000
40. Hanna GJ, Lalezari J, Hellinger JA, Wohl DA, Nettles R, Persson A, Krystal M, Lin P, 1001
Colonno R, Grasela DM. 2011. Antiviral activity, pharmacokinetics, and safety of BMS -1002
488043, a novel oral small -molecule HIV -1 attachment inhibitor, in HIV -1-infected 1003
subjects. Antimicrob Agents Chemother 55:722-8. 1004
41. Nettles RE, Schurmann D, Zhu L, Stonier M, Huang SP, Chang I, Chien C, Krystal M, 1005
Wind-Rotolo M, Ray N, Hanna GJ, Bertz R, Grasela D. 2012. Pharmacodynamics, safety, 1006
and pharmacokinetics of BMS -663068, an oral HIV -1 attachment inhibitor in HIV -1-1007
infected subjects. J Infect Dis 206:1002-11. 1008
42. Korber B, Foley B, Kuiken C, Pillai S, Sodroski J. 1998. Numbering positions in HIV relative 1009
to HXBc2. Los Alamos: Los Alamos Natl Lab:iii-102-iii-103. 1010
43. Chen TQ, Guestrin C. 2016. XGBoost: A Scalable Tree Boosting System. Kdd'16: 1011
Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery 1012
and Data Mining doi:10.1145/2939672.2939785:785-794. 1013
44. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, 1014
Bansal N, Lee SI. 2020. From Local Explanations to Global Understanding with 1015
Explainable AI for Trees. Nat Mach Intell 2:56-67. 1016
45. Gartland M, Stewart E, Zhou N, Li Z, Rose R, Beloor J, Clark A, Tenorio AR, Krystal M. 1017
2024. Characterization of clinical envelopes with lack of sensitivity to the HIV-1 inhibitors 1018
temsavir and ibalizumab. Antiviral Res 228:105957. 1019
46. Zhou N, Fan L, Ho HT, Nowicka-Sans B, Sun Y, Zhu Y, Hu Y, McAuliffe B, Rose B, Fang 1020
H, Wang T, Kadow J, Krystal M, Alexander L, Colonno R, Lin PF. 2010. Increased 1021
sensitivity of HIV variants selected by attachment inhibitors to broadly neutralizing 1022
antibodies. Virology 402:256-61. 1023
47. Haim H, Strack B, Kassa A, Madani N, Wang L, Courter JR, Princiotto A, McGee K, 1024
Pacheco B, Seaman MS, Smith AB, 3rd, Sodroski J. 2011. Contribution of intrinsic 1025
reactivity of the HIV -1 envelope glycoproteins to CD4 -independent infection and global 1026
inhibitor sensitivity. PLoS Pathog 7:e1002101. 1027
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
33
48. Seaman MS, Janes H, Hawkins N, Grandpre LE, Devoy C, Giri A, Coffey RT, Harris L, 1028
Wood B, Daniels MG, Bhattacharya T, Lapedes A, Polonis VR, McCutchan FE, Gilbert 1029
PB, Self SG, Korber BT, Montefiori DC, Mascola JR. 2010. Tiered categorization of a 1030
diverse panel of HIV -1 Env pseudoviruses for assessment of neutralizing antibodies. J 1031
Virol 84:1439-52. 1032
49. Vermeire J, Naessens E, Vanderstraeten H, Landi A, Iannucci V, Van Nuffel A, Taghon T, 1033
Pizzato M, Verhasselt B. 2012. Quantification of reverse transcriptase activity by real-time 1034
PCR as a fast and accurate method for titration of HIV, lenti- and retroviral vectors. PLoS 1035
One 7:e50859. 1036
50. Haddox HK, Dingens AS, Hilton SK, Overbaugh J, Bloom JD. 2018. Mapping mutational 1037
effects along the evolutionary landscape of HIV envelope. Elife 7. 1038
51. Guan Y, Pazgier M, Sajadi MM, Kamin-Lewis R, Al-Darmarki S, Flinko R, Lovo E, Wu X, 1039
Robinson JE, Seaman MS, Fouts TR, Gallo RC, DeVico AL, Lewis GK. 2013. Diverse 1040
specificity and effector function among human antibodies to HIV-1 envelope glycoprotein 1041
epitopes exposed by CD4 binding. Proc Natl Acad Sci U S A 110:E69-78. 1042
52. Richard J, Nguyen DN, Tolbert WD, Gasser R, Ding S, Vezina D, Yu Gong S, Prevost J, 1043
Gendron-Lepage G, Medjahed H, Gottumukkala S, Finzi A, Pazgier M. 2021. Across 1044
Functional Boundaries: Making Nonneutralizing Antibodies To Neutralize HIV -1 and 1045
Mediate Fc-Mediated Effector Killing of Infected Cells. mBio 12:e0140521. 1046
53. Forthal D, Hope TJ, Alter G. 2013. New paradigms for functional HIV -specific 1047
nonneutralizing antibodies. Curr Opin HIV AIDS 8:393-401. 1048
54. Mayr LM, Su B, Moog C. 2017. Non-Neutralizing Antibodies Directed against HIV and 1049
Their Functions. Front Immunol 8:1590. 1050
55. Horwitz JA, Bar-On Y, Lu CL, Fera D, Lockhart AAK, Lorenzi JCC, Nogueira L, Golijanin 1051
J, Scheid JF, Seaman MS, Gazumyan A, Zolla -Pazner S, Nussenzweig MC. 2017. Non-1052
neutralizing Antibodies Alter the Course of HIV-1 Infection In Vivo. Cell 170:637-648 e10. 1053
56. Mielke D, Bandawe G, Zheng J, Jones J, Abrahams MR, Bekker V, Ochsenbauer C, 1054
Garrett N, Abdool Karim S, Moore PL, Morris L, Montefiori D, Anthony C, Ferrari G, 1055
Williamson C. 2021. ADCC -mediating non -neutralizing antibodies can exert immune 1056
pressure in early HIV-1 infection. PLoS Pathog 17:e1010046. 1057
57. Santra S, Tomaras GD, Warrier R, Nicely NI, Liao HX, Pollara J, Liu P, Alam SM, Zhang 1058
R, Cocklin SL, Shen X, Duffy R, Xia SM, Schutte RJ, Pemble Iv CW, Dennison SM, Li H, 1059
Chao A, Vidnovic K, Evans A, Klein K, Kumar A, Robinson J, Landucci G, Forthal DN, 1060
Montefiori DC, Kaewkungwal J, Nitayaphan S, Pitisuttithum P, Rerks-Ngarm S, Robb ML, 1061
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
34
Michael NL, Kim JH, Soderberg KA, Giorgi EE, Blair L, Korber BT, Moog C, Shattock RJ, 1062
Letvin NL, Schmitz JE, Moody MA, Gao F, Ferrari G, Shaw GM, Haynes BF. 2015. Human 1063
Non-neutralizing HIV-1 Envelope Monoclonal Antibodies Limit the Number of Founder 1064
Viruses during SHIV Mucosal Infection in Rhesus Macaques. PLoS Pathog 11:e1005042. 1065
58. Haim H, Salas I, Sodroski J. 2013. Proteolytic processing of the human immunodeficiency 1066
virus envelope glycoprotein precursor decreases conformational flexibility. J Virol 1067
87:1884-9. 1068
59. Johnson J, Zhai Y, Salimi H, Espy N, Eichelberger N, DeLeon O, O'Malley Y, Courter J, 1069
Smith AB, 3rd, Madani N, Sodroski J, Haim H. 2017. Induction of a Tier-1-Like Phenotype 1070
in Diverse Tier -2 Isolates by Agents That Guide HIV -1 Env to Perturbation -Sensitive, 1071
Nonnative States. J Virol 91. 1072
60. Lee JH, Andrabi R, Su CY, Yasmeen A, Julien JP, Kong L, Wu NC, McBride R, Sok D, 1073
Pauthner M, Cottrell CA, Nieusma T, Blattner C, Paulson JC, Klasse PJ, Wilson IA, Burton 1074
DR, Ward AB. 2017. A Broadly Neutralizing Antibody Targets the Dynamic HIV Envelope 1075
Trimer Apex via a Long, Rigidified, and Anionic beta-Hairpin Structure. Immunity 46:690-1076
702. 1077
61. Cale EM, Driscoll JI, Lee M, Gorman J, Zhou T, Lu M, Geng H, Lai YT, Chuang GY, Doria-1078
Rose NA, Mothes W, Kwong PD, Mascola JR. 2022. Antigenic analysis of the HIV -1 1079
envelope trimer implies small differences between structural states 1 and 2. J Biol Chem 1080
298:101819. 1081
62. Munro JB, Gorman J, Ma X, Zhou Z, Arthos J, Burton DR, Koff WC, Courter JR, Smith 1082
AB, 3rd, Kwong PD, Blanchard SC, Mothes W. 2014. Conformational dynamics of single 1083
HIV-1 envelope trimers on the surface of native virions. Science 346:759-63. 1084
63. Agrawal N, Leaman DP, Rowcliffe E, Kinkead H, Nohria R, Akagi J, Bauer K, Du SX, 1085
Whalen RG, Burton DR, Zwick MB. 2011. Functional stability of unliganded envelope 1086
glycoprotein spikes among isolates of human immunodeficiency virus type 1 (HIV -1). 1087
PLoS One 6:e21339. 1088
64. Gift SK, Leaman DP, Zhang L, Kim AS, Zwick MB. 2017. Functional Stability of HIV -1 1089
Envelope Trimer Affects Accessibility to Broadly Neutralizing Antibodies at Its Apex. J Virol 1090
91. 1091
65. Veillette M, Coutu M, Richard J, Batraville LA, Dagher O, Bernard N, Tremblay C, 1092
Kaufmann DE, Roger M, Finzi A. 2015. The HIV -1 gp120 CD4 -bound conformation is 1093
preferentially targeted by antibody-dependent cellular cytotoxicity-mediating antibodies in 1094
sera from HIV-1-infected individuals. J Virol 89:545-51. 1095
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
35
66. Williams KL, Cortez V, Dingens AS, Gach JS, Rainwater S, Weis JF, Chen X, Spearman 1096
P, Forthal DN, Overbaugh J. 2015. HIV-specific CD4-induced Antibodies Mediate Broad 1097
and Potent Antibody-dependent Cellular Cytotoxicity Activity and Are Commonly Detected 1098
in Plasma From HIV-infected humans. EBioMedicine 2:1464-77. 1099
67. Kolchinsky P, Kiprilov E, Sodroski J. 2001. Increased neutralization sensitivity of CD4 -1100
independent human immunodeficiency virus variants. J Virol 75:2041-50. 1101
68. Salimi H, Johnson J, Flores MG, Zhang MS, O'Malley Y, Houtman JC, Schlievert PM, 1102
Haim H. 2020. The lipid membrane of HIV -1 stabilizes the viral envelope glycoproteins 1103
and modulates their sensitivity to antibody neutralization. J Biol Chem 295:348-362. 1104
69. Zhang Z, Wang Q, Nguyen HT, Chen HC, Chiu TJ, Smith Iii AB, Sodroski JG. 2023. 1105
Alterations in gp120 glycans or the gp41 fusion peptide -proximal region modulate the 1106
stability of the human immunodeficiency virus (HIV-1) envelope glycoprotein pretriggered 1107
conformation. J Virol 97:e0059223. 1108
70. Haim H, Si Z, Madani N, Wang L, Courter JR, Princiotto A, Kassa A, DeGrace M, McGee-1109
Estrada K, Mefford M, Gabuzda D, Smith AB, 3rd, Sodroski J. 2009. Soluble CD4 and 1110
CD4-mimetic compounds inhibit HIV -1 infection by induction of a short -lived activated 1111
state. PLoS Pathog 5:e1000360. 1112
71. Currenti J, Chopra A, John M, Leary S, McKinnon E, Alves E, Pilkinton M, Smith R, Barnett 1113
L, McDonnell WJ, Lucas M, Noel F, Mallal S, Conrad JA, Kalams SA, Gaudieri S. 2019. 1114
Deep sequence analysis of HIV adaptation following vertical transmission reveals the 1115
impact of immune pressure on the evolution of HIV. PLoS Pathog 15:e1008177. 1116
72. McBrien JB, Kumar NA, Silvestri G. 2018. Mechanisms of CD8(+) T cell -mediated 1117
suppression of HIV/SIV replication. Eur J Immunol 48:898-914. 1118
73. Fritschi CJ, Anang S, Gong Z, Mohammadi M, Richard J, Bourassa C, Severino KT, 1119
Richter H, Yang D, Chen HC, Chiu TJ, Seaman MS, Madani N, Abrams C, Finzi A, 1120
Hendrickson WA, Sodroski JG, Smith AB, 3rd. 2023. Indoline CD4-mimetic compounds 1121
mediate potent and broad HIV-1 inhibition and sensitization to antibody-dependent cellular 1122
cytotoxicity. Proc Natl Acad Sci U S A 120:e2222073120. 1123
74. Matsumoto K, Kuwata T, Tolbert WD, Richard J, Ding S, Prevost J, Takahama S, Judicate 1124
GP, Ueno T, Nakata H, Kobayakawa T, Tsuji K, Tamamura H, Smith AB, 3rd, Pazgier M, 1125
Finzi A, Matsushita S. 2023. Characterization of a Novel CD4 Mimetic Compound YIR -1126
821 against HIV-1 Clinical Isolates. J Virol 97:e0163822. 1127
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
36
75. Beerenwinkel N, Daumer M, Oette M, Korn K, Hoffmann D, Kaiser R, Lengauer T, Selbig 1128
J, Walter H. 2003. Geno2pheno: Estimating phenotypic drug resistance from HIV -1 1129
genotypes. Nucleic Acids Res 31:3850-5. 1130
76. Eshleman SH, Hackett J, Jr., Swanson P, Cunningham SP, Drews B, Brennan C, Devare 1131
SG, Zekeng L, Kaptue L, Marlowe N. 2004. Performance of the Celera Diagnostics 1132
ViroSeq HIV -1 Genotyping System for sequence -based analysis of diverse human 1133
immunodeficiency virus type 1 strains. J Clin Microbiol 42:2711-7. 1134
77. Liu TF, Shafer RW. 2006. Web resources for HIV type 1 genotypic -resistance test 1135
interpretation. Clin Infect Dis 42:1608-18. 1136
78. Sanchez V, Masia M, Robledano C, Padilla S, Ramos JM, Gutierrez F. 2010. Performance 1137
of genotypic algorithms for predicting HIV -1 tropism measured against the enhanced -1138
sensitivity Trofile coreceptor tropism assay. J Clin Microbiol 48:4135-9. 1139
79. Rawi R, Mall R, Shen CH, Farney SK, Shiakolas A, Zhou J, Bensmail H, Chun TW, Doria-1140
Rose NA, Lynch RM, Mascola JR, Kwong PD, Chuang GY. 2019. Accurate Prediction for 1141
Antibody Resistance of Clinical HIV-1 Isolates. Sci Rep 9:14696. 1142
80. Magaret CA, Benkeser DC, Williamson BD, Borate BR, Carpp LN, Georgiev IS, Setliff I, 1143
Dingens AS, Simon N, Carone M, Simpkins C, Montefiori D, Alter G, Yu WH, Juraska M, 1144
Edlefsen PT, Karuna S, Mgodi NM, Edugupanti S, Gilbert PB. 2019. Prediction of VRC01 1145
neutralization sensitivity by HIV -1 gp160 sequence features. PLoS Comput Biol 1146
15:e1006952. 1147
81. Williamson BD, Magaret CA, Karuna S, Carpp LN, Gelderblom HC, Huang Y, Benkeser 1148
D, Gilbert PB. 2023. Application of the SLAPNAP statistical learning tool to broadly 1149
neutralizing antibody HIV prevention research. iScience 26:107595. 1150
82. Kantor R. 2024. Overview of HIV -1 drug resistance testing assays, on UpToDate. 1151
https://www.uptodate.com/contents/overview-of-hiv-1-drug-resistance-testing-1152
assays#H1277302675. Accessed Oct 18, 2025. 1153
83. Simen BB, Simons JF, Hullsiek KH, Novak RM, Macarthur RD, Baxter JD, Huang C, 1154
Lubeski C, Turenchalk GS, Braverman MS, Desany B, Rothberg JM, Egholm M, Kozal 1155
MJ, Terry Beirn Community Programs for Clinical Research on A. 2009. Low-abundance 1156
drug-resistant viral variants in chronically HIV -infected, antiretroviral treatment -naive 1157
patients significantly impact treatment outcomes. J Infect Dis 199:693-701. 1158
84. Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7: 1159
improvements in performance and usability. Mol Biol Evol 30:772-80. 1160
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
37
85. Siepel AC, Halpern AL, Macken C, Korber BT. 1995. A computer program designed to 1161
screen rapidly for HIV type 1 intersubtype recombinant sequences. AIDS Res Hum 1162
Retroviruses 11:1413-6. 1163
86. Kuiken C, Korber B, Shafer RW. 2003. HIV sequence databases. AIDS Rev 5:52-61. 1164
87. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, 1165
Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, 1166
Perrot M, Duchesnay E. 2011. Scikit -learn: Machine Learning in Python. Journal of 1167
Machine Learning Research 12:2825-2830. 1168
88. Helseth E, Kowalski M, Gabuzda D, Olshevsky U, Haseltine W, Sodroski J. 1990. Rapid 1169
complementation assays measuring replicative potential of human immunodeficiency 1170
virus type 1 envelope glycoprotein mutants. J Virol 64:2416-20. 1171
89. Price MN, Dehal PS, Arkin AP. 2010. FastTree 2 --approximately maximum -likelihood 1172
trees for large alignments. PLoS One 5:e9490. 1173
90. Galaxy C. 2024. The Galaxy platform for accessible, reproducible, and collaborative data 1174
analyses: 2024 update. Nucleic Acids Res 52:W83-W94. 1175
91. Prosperi MC, Ciccozzi M, Fanti I, Saladini F, Pecorari M, Borghi V, Di Giambenedetto S, 1176
Bruzzone B, Capetti A, Vivarelli A, Rusconi S, Re MC, Gismondo MR, Sighinolfi L, Gray 1177
RR, Salemi M, Zazzi M, De Luca A, group Ac. 2011. A novel methodology for large-scale 1178
phylogeny partition. Nat Commun 2:321. 1179
92. Kosakovsky Pond SL, Poon AFY, Velazquez R, Weaver S, Hepler NL, Murrell B, Shank 1180
SD, Magalis BR, Bouvier D, Nekrutenko A, Wisotsky S, Spielman SJ, Frost SDW, Muse 1181
SV. 2020. HyPhy 2.5-A Customizable Platform for Evolutionary Hypothesis Testing Using 1182
Phylogenies. Mol Biol Evol 37:295-299. 1183
93. Haim H, Salas I, McGee K, Eichelberger N, Winter E, Pacheco B, Sodroski J. 2013. 1184
Modeling virus- and antibody-specific factors to predict human immunodeficiency virus 1185
neutralization efficiency. Cell Host Microbe 14:547-58. 1186
94. Trkola A, Purtscher M, Muster T, Ballaun C, Buchacher A, Sullivan N, Srinivasan K, 1187
Sodroski J, Moore JP, Katinger H. 1996. Human monoclonal antibody 2G12 defines a 1188
distinctive neutralization epitope on the gp120 glycoprotein of human immunodeficiency 1189
virus type 1. J Virol 70:1100-8. 1190
1191
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
P
a
n
c
e
r
a
d
a
t
a
β
n
o
t
a
l
l
c
a
n
b
e
e
x
p
l
a
i
n
e
d
b
y
S
M
M
M
I
n
d
e
e
d
,
s
e
v
e
r
a
l
a
d
d
i
t
i
o
n
a
l
s
i
t
e
s
h
a
v
e
b
e
e
n
d
e
s
c
r
i
b
e
d
;
h
o
w
e
v
e
r
,
t
h
e
r
e
i
s
n
o
c
o
m
p
r
e
h
e
n
s
i
v
e
u
n
d
e
r
s
t
a
n
d
i
n
g
t
e
s
t
h
a
t
m
e
d
i
a
t
e
r
e
s
i
s
t
a
n
c
e
A
375S
434M
475M
426M
Temsavir
(BMS-626529)
Fostemsavir
(BMS-663068)
B
A1A1, G
AE B B, DB, F1B, G
C D F1 G
-5
-4
-3
-2
-1
0
1
All samples
Resistance,
log10(IC50)
F
C
A1A1, G
AE BB, DB, F1B, G
C D F1 G
-5
-4
-3
-2
-1
0
1
-4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Pre-treatment (n=385)
Post-treatment (n=185)
Frequency (%)
SM3-containing samples
Log10(IC50), Β΅M
E
D
0
5
10
15
20
25
30
35
40
45
24
65
Frequency (subjects)
On-treatment emergence
(days after initiation)
43
o
f
t
h
e
s
i
Figure 1. TMR resistance and escape in the BRIGHTE clinical trial. (A)
Structure of fostemsavir and the active
metabolite TMR. (B) Cryo-electron microscopy model of the Env trimer bound to TMR (in magenta, PDB ID
8TTW). The inset shows the CD4-binding pocket with the SM3 sites labeled. (C,D) Resistance values of all 570
samples from BRIGHTE trial participants and of the subset of samples that contain the TMR-sensitive SM3 motif.
Samples are grouped by their inferred clade associations. (E) Distribution of IC50 values for samples collected
before and after fostemsavir treatment. The th reshold used to define resistance ( 50 nM) is shown by a dotted
line. (F) TMR resistance outcomes in the 132 BRIGHTE trial participants for whom genotype and IC50 data were
available for samples collected both before and after treatment.
0
10
20
30
40
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
0.6
0.7
0.8
0.9
1
AUC Accuracy F1 Score Precision Senstivity Specifictiy
XGBoost Gradient Boosting
Logistic Regression AdaBoost
Random Forest SVM
Metric
B
A
E
Predicted log 10(IC50)
D
Measured log 10(IC50)
0
0.01
0.02
0.03
0.04
0.88
0.91
0.94
0.97
1
0.05 0.1
0.5 1
Average
AUCVariability in AUC
C
Resistance Threshold (Β΅M):
-5
-4
-3
-2
-1
0
1
-4 -3 -2 -1 0 1
R2: 0.75
MSE: 0.63
BRIGHTE
(570 πππππππ )
Single-Env Set
(208 Envs)
Sequence-IC50 Data
GB Regressor model for high-
frequency mutations (n = 14)
Probabilistic model for low-
frequency mutations (n = 17)
TMR-bound structure
Structure-guided estimate
(n = 17)
Identify suspected mutations
Prior Studies
(n = 11)Mutation frequency
in HIV-1 clade B
Introduce mutations
in HIV-1AD8 Env
Measure in vitro effects on
fitness & TMR resistance
Evaluate emergence frequency
in BRIGHTE subjects that
developed resistance
59
mutations
Figure 2. Identification of mutations suspected of increasing HIV-1 resistance to TMR. (A) Our approach to
identify the mutations. The number of mutations identified by each approach is shown. (B) Performance of
different algorithms to predict TMR resistance by sequence. Amino acid sequence at the 856 positions of Env in
the 570 BRIGHTE trial samples was used as input. Average metrics for five-fold cross- validation are shown. Error
bars, standard deviation (SD). (C) Performance of the XGBoost algorithm to predict TMR resistance in the
BRIGHTE samples by Env sequence. As input, we used amino acids at the four SM3 positions, all 856 positions, or
positions within the indicated distances from TMR on the TMR-bound structure of Env. Performance was tested
using the indicated IC50 thresholds to define resistance. Average AUC values and their variability (SD) across the
five folds is shown. (D) Performance of a Gradient Boosting Regressor model to predict resistance of 778
samples from the BRIGHTE and Single-Env datasets by sequence. MSE, mean squared error. (E) The 59 Env
mu
tations suspected of increasing resistance to TMR identified by the four approaches shown in panel A.
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
0.1 1 10 100 1,000 10,000
0
1
2
3
375N,475I
375N,429R
426L,423F
116Q,434K
375N,423F
375N,113G
426L,434K
426L,424R
375M,475I
426L,429R
375N,434T
113E,375N
113E,434I
WT
375N,434I
375H,426L
434T,475I
113G,434K
113G,424R
426L,375I
426L,202K
426L,113E
426L,434I
426L,202A
426L,116P
375N,116P
375M,426L375N,424R375N,434K
426L,475I
375N,426L
434T,375I
426L,113G
375I,426I
>50,000
Frequency in escape
group:
ο Not observed
NT changes from
Anc_B: β 2 β 3
Emergence frequency
in escape group (%)
0.01 0.1 1
113H
377V
432Q
478I
384L
384C
432R
426R
373T
434L
113E WT
255M
434K
204D
475L
375I
434T
434I
423S
429R
116Q
424R
375Y
202E
375M
375H
373Q
595F
376I
384F
210W
429H
506M
595V
202K
113G
202A
116P
423F
475I
424V
375N
426L
655E
655A
202S
426T
0.1 1 10 100 1,000 10,000 100,000
0
1
2
3
113A
200S
202R
375T
426K
109V
432L
255I 423Y
426I
376L
255A
AFitness (fold wild-type)
Temsavir resistance, IC50 (fold wild-type)
FFitness (fold wild-type)
Temsavir resistance, IC50 (fold wild-type)
Fitness (fold-WT)
TMR Resistance (fold-WT)
E
0 1 2 3 4
Number of REMs
Number
of subjects
D
C
Emergence Frequency in escape group:
ο Not observed
% participants requiring 1 -NT change:
β >90% β 50-90% β 10-50% β <10%
1 10 100 1,000 10,000
# of REMs:
0
1
2
Emergence frequency (%)
375N 426L
426L 434I
424V 426L
113E 375N
113E 426L
375N 475I
426L 475I
375N 424V
375N 434I
113E 424V
375M 426L
113E 434I
426L 429R
113E 375M
113E 475I
255I 426L
423F 426L
424R 426L
424V 434I
Figure 3. Mutations that increase resistance to TMR are not represented equally in indi viduals that develop
resistance. (A) The 59 mutations suspected of increa sing resistance to TMR were introduced in the AD8 Env and
tested in a pseudovirus system for their fitness (infectivity normalized for virus particle count) and resistance to
TMR. Datapoint size corresponds with emergence frequency in the escape group. Color corresponds with the
percent of BRIGHTE subjects that required one nucleotide (NT) change to acquire the mutation. (B) Emergence
frequency of the mutations in the escape group. Significance of mutation enrichment in the post-treatment
samples, as determined in a permutation test, is indicated. (C) Relationship between frequency of the 18
mutations that increase resistance by more than 3.5-fold (designated REMs) and their effects on TMR resistance
or relativ
e fitness. (D) Number of REMs that emerged in each of the 65 escape group subjects. (E) Emergence
frequency of mutation combinations in the escape group. (F) Fitness versus TMR resistance of two-mutation
combinations. Datapoint size describes their frequency in the escape group, and color describes the number of
NT changes required to acquire the mutation from the clade B ancestral form.
0
20
40
60
80
0
20
40
0
20
40
60
80
0 10 20 30
0
10
20
30
40
50
60
70
80
426L
375N
113E
475I
434I
424V
375M
429R
424R
423F
434T
255I
375H
373T
202K
113A
255A
426I
202E
375I
423S
434K
595F
3.5
Fold-increase in IC 50:***
***
*** *** *
Emergence frequency
in escape group (%)
Enrichment in post-
treatment samples ,
P-value:
*, <0.05; ***, <0.0005
B
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
F
H
I M
N
S T
W
Y
0
1
10
100
1,000
10,000
0.01 0.1 1 10
0
1
10
100
0 1 10
0.1
1
10
100
0.01 0.1 1 10
0
2
4
6
8
10
12
A C D E F G H I K L M N P Q R S T V W Y
0.2
426
(M)
Emergence frequency in
BRIGHTE escape group (%)
E F
G
Avg. emergence
rate in clade B (%)
Avg. emergence rate in clade B (%)
rS = 0.82
P = 0.00004
426L
375N
475I
375I
429R
0
0.5
1
1.5
2
2.5
. A C D E F G H I K L M N P Q R S T V W Y
0
5
10
15
. A C D E F G H I K L M N P Q R S T V W Y
0
1
2
3
4
5
. A C D E F G H I K L M N P Q R S T V W Y
0
0.5
1
1.5
2
. A C D E F G H I K L M N P Q R S T V W Y
Position 375, No InhibitorPosition 426, No Inhibitor
Position 426, TMR 250 nM Position 375, TMR 250 nM
AA Preference
A B
AA Preference with TMR
250 nM (replicative HIV-1)
C
D
Frequency in clade B (%)
N
M
H
I
W
Y
F
375M
375H434T
434K
424R
423S
202E
475L255M
116P116Q204D375Y
>25,000
TMR IC50 (nM),
pseudovirus system
AA Preference
Emergence frequency in
BRIGHTE escape group (%)
Figure 4. The frequency of REM emergence in the escape group corresponds with their spontaneous
emergence frequency in the population. (A,B) Saturation mutagenesis to determine the effects of all amino
acid changes at positions 426 and 375 on HIV-1 fitness (βNo inhibitorβ) and resistance to TMR ( 250 nM). The
wild-type form is shown in maroon. (C) Relationship between the preference for amino acids at position 375 in
the presence of TMR and their IC50 values measured using the pseudovirus system. (D) Relationship between
the emergence frequency of amino acids at position 375 in the escape group and their frequency in a panel of
2,535 clade B Envs from Fostemsavir-untreated individuals. (E,F) Example of the emergence rate of mutations at
position 426 in HIV-1 clade B. The tree was constructed using amino acid sequences of the 2,535 Envs. Branches
are colored by the amino acid in each taxon at position 426. The tree was partitioned into subgroups, which
were excluded if the dominant form at that position differed from the clade B consensus. For all remaining
subgroups, the number of new substitution events at position 426 to each amino acid was calculated, and these
values were averaged. Error bars, standard errors of the mean. (G) Relationship between emergence frequency
of the 18 REMs in clade B and their emergence frequency in the BRIGHTE escape group.
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
0
2
4
6
8
10
12
375N 426L 434K 116P
0
5
10
15
WT
426L
375N
475I
375M
429R
424R
434T
375H
202E
375I
423S
434K
116P
116Q
204D
255M
375Y
475L
High Low
WT
426L
375N
475I
375M
429R
424R
434T
375H
202E
375I
423S
434K
116P
116Q
204D
255M
375Y
475L
0.00001
0.0001
0.001
0.01
0
20
40
60
80
0.1
Plasma 1295
Plasma 1641
A
Emergence frequency
in BRIGHTE escape group
0
2
4
6
8
WT
426L
375N
475I
375M
429R
424R
434T
375H
202E
375I
423S
434K
116P
116Q
204D
255M
375Y
475L
0
5
10
15
20
0 2 4 6 8
Frequency in escape group:
Normalized 17b
binding (fold WT)
C
D
Resistance to
37Β°C (IT50, hours)
F
B
Relative CD4
independence (%)
E
Normalized antibody
binding (fold wild-type)
CoR-BS
(cryptic)
CD4-BS
(cryptic)
CD4-BS
(exposed)
V3 loop (cryptic)
Apex Interface MPER
17b Binding (fold-WT)
116P
Interface
CD4-BS
(exposed)
Top view
Side
view
>0.063.
Resistance to HIV+
Plasma (1/dilution)
CoR-BS
Apex
V3 loop
CD4-BS
(cryptic)
MPER
202E
423S
434K
204D
Figure 5. REMs that are poorly sampled in the BRI GHTE escape group exhibit an open conformation of Env
that is sensitive to non-neutralizing antibodies. (A) Sensitivity of the 18 REMs to plasma from two HIV-infected
individuals. Their emergence rates in the escape group are shown as magenta bars. (B) Cryo-EM structure of the
Env trimer ectodomain in the CD4-unliganded state (PDB IDs 6U59 and 6UJV). Residues associated with binding
of the antibodies we tested are colored. (C) Envs containing the indicated REMs were expressed on HOS cells,
and binding of monoclonal antibodies was measured by cell-based ELISA. Values were normalized for cell
surface expression of the Envs using antibody 2G12, and are expressed as a fraction of their binding to the wild-
type AD8 Env. (D) Binding efficiency of the 18 REMs to antibody 17b that targets a cryptic epitope overlapping
the CoR- BS. (E) Relationship between binding of the REMs to 17b and their infection of CD4-negative cells,
expressed as a percent of their measured infection of CD4-positive cells. (F) Resistance of the variants to
incubation at 37Β°C. Values indicate the time until a 50% decrease in infectivity is detected.
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint
*
426L
375N
475I
375M
429R
375H
375I
375Y475L
0.1
1
10
100
0 0.5 1
426L
375N
475I
375M
429R
424R
434T
375H
434K
475L
0.01
0.1
1
10
0 0.5 1
204D
423S
116P
202E
Frequency in escape
group (%)
Emergence rate
in Clade B
Plasma resistance
CoR-BS Exposure
CD4 independence
Fitness (infectivity)
Stability at 37Β° C
Increase in TMR
IC50 (fold WT) ns ns ns ns ns ns ns
Stability at 37Β° C -1.8 -2.0 -3.5 -2.1 < -4 -2.3
Fitness (infectivity) ns ns -1.8 -1.7 -4
CD4 independence ns -1.8 -3.5 -2.3
CoR-BS Exposure ns ns < -4
Plasma resistance ns -2.3
Emergence rate
in Clade B < -4
A
B
426L 375N 475I 375M 429R 424R 434T 375H 202E 375I 423S 434K 116P 116Q 204D 255M 375Y 475L
Frequency in
escape group (%) 74.2 42.6 20 9.5 6.3 6.2 4.8 4.6 1.5 1.5 1.5 1.5 0 0 0 0 0 0
Emergence rate
in Clade B 10.3 3.3 1.1 0.65 4.9 0.05 0.6 0.8 0 1.8 0 0.09 0 0 0 0.04 0 0.05
Plasma resistance > 6.3 > 6.3 > 6.3 > 6.3 > 6.3 > 6.3 4.9 > 6.3 0.09 > 6.3 0.08 0.07 0.25 1.3 0.06 > 6.3 > 6.3 > 6.3
CoR-BS exposure 0.95 0.48 0.89 0.41 1.6 0.48 0.91 0.65 4.6 0.49 3.3 5.1 4.7 0.85 6.36 0.77 0.57 0.58
CD4 independence 0.03 0.01 0.01 0.01 0.003 0.08 0.02 0.03 4.3 0.01 7.4 18.7 0.09 0.04 15.3 0.24 0.00 0.03
Fitness (infectivity) 0.68 2.1 0.63 0.80 1.4 0.68 1.1 1.8 0.33 0.83 0.07 0.09 0.72 0.83 0.01 0.17 1.8 0.49
Stability at 37Β° C 7.7 12.3 11.4 7.2 9.9 8.3 5.6 5.7 0.85 5.5 1.05 0.97 2.5 3.8 3.3 2.4 10.3 6.4
Increase in TMR
IC50 (fold WT) 48.2 3.7 11.8 82.2 4.9 227.9 11.8 129.3 3.5 30 10 53.2 11535 9.5 54.8 48.3 134.8 14.9
Substitution likelihood
from con_B 0.24 0.32 0.35 0.007 0.06 0.04 0.39 0.02 0.05 0.13 0.13 0.07 0.35 0.07 0.17 0.07 0.05 0.26
Avg. substitution
likelihood in BRIGHTE 0.22 0.27 0.33 0.008 0.12 0.04 0.37 0.03 0.05 0.16 0.15 0.08 0.34 0.06 0.17 0.13 0.05 0.25
C
Combined Fitness
Emergence in clade B (%)
Emergence in BRIGHTE (%)
Log10(P-value)
< -4
ns
< -3
< -2
< -1.3
116Q
204D
116P
116Q255M
424R
434T
375Y
255M
375I
Average AUC
Pre-treatment sequence - + - - + - + +
Subst. likelihood - - + - + + - +
Combined Fitness - - - + - + + +
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
D
423S
202E
434K
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
E F
Average AUC
# of nucl. subst. - + + +
Ti&Tv likelihoods - - + +
Combined Fitness - - - +
****
****
****
****
****
****
****
****
***
***
rS = 0.62
P = 0.006
rS = 0.54
P = 0.02
Combined Fitness
Subst. likelihood
Figure 6. Virus and host factors that guide emergence of the REMs. (A) Summary of all features measured for
the 18 REMs. Substitution likelihoods from the clade B consensus (con_B) trinucleotide sequence describes the
number of nucleotide changes required and the transition/transversion rate for each change. This likelihood was
also calculated using the pre-treatment nucleotide sequences of the 65 escape group subjects. (B) P-values for a
Spearman correlation test that compares the values shown in panel A. ns, not significant. (C,D) The combined
fitness metric for each REM was calculated as the product of their effects on fitness, resistance to plasma,
and stability at 37Β°C. This value was compared with REM emergence frequencies in clade B or in the escape
group. Data points are colored by the substitutions likelihood based on the con_B sequence or the subjects' pre-
treatment sequences, respectively. (E) Simulations of REM emergence were performed with different
combinations of the indicated variables. The fraction of success events (mutation acquisition) in the 1000
iterations was compared with REM frequency in the 65 subjects (see algorithm in Fig S1 5C). Data for all 18
REMs were compiled to calculate the AUC. Error bars, SDs for 10 simulations. (F) Contribution of number
and type of nucleotide substitutions to predict emergence of the five REMs at position 375 (H, I, M, N and Y).
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 27, 2025. ; https://doi.org/10.64898/2025.12.27.696684doi: bioRxiv preprint