Reusability Report: Meta-Learning for Antigen-Specific T-Cell Receptor Binder Identification | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Reusability Report: Meta-Learning for Antigen-Specific T-Cell Receptor Binder Identification Dong Xu, Fei He, Xianyu Wang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7456773/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 06 May, 2026 Read the published version in Nature Machine Intelligence → Version 1 posted You are reading this latest preprint version Abstract Accurate prediction of peptide-T-cell receptor (TCR) binding is vital for immunotherapy, vaccine design, and diagnostics. PanPep, a meta-learning framework, was developed to generalize diverse TCR binder predictions. This study presents a comprehensive and unbiased evaluation of PanPep’s reusability and practical utility. We reproduced its reported performance on original datasets and further benchmarked it against the control tools using both classification metrics and virtual screening enrichment evaluations. Leveraging a newly curated independent dataset, we have demonstrated PanPep’s superior generalization to unseen antigens with few or no known TCR binders. We further extended PanPep to peptide-TCRα and peptide-TCRαβ binding prediction, demonstrating its applicability in more biologically and physiologically relevant contexts. Despite its strengths, PanPep shows limitations in early binder enrichment and reduced robustness to novel TCRs, indicating sensitivity to training data composition and negative sampling strategies. This work establishes a reproducible and extensible benchmarking framework for general peptide-TCR binding prediction and related applications. Overall, our study suggests substantial room for improvement in TCR binder prediction, particularly concerning its practical applicability. Biological sciences/Computational biology and bioinformatics/Software Biological sciences/Computational biology and bioinformatics/Computational models Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction T-cell receptors (TCRs) recognize peptides presented by major histocompatibility complex (MHC) molecules, initiating adaptive immune responses. Identifying peptide-specific TCR binders is crucial for immunotherapy, vaccine design, and diagnostic applications, particularly for targeting neoantigens and novel antigens 1 . Given the vast combinatorial diversity of peptides and TCRs, exhaustive experimental screening is impractical. Computational prediction offers a cost-effective, scalable, and time-efficient alternative, enabling high-throughput in silico screening to bridge the gap between sequence diversity and experimental limitations 2 . Early computational approaches predicted antigen-TCR interactions by grouping TCRs based on sequence similarity, with tools such as GLIPH 3 , GIANA 4 , and DeepTCR 5 . While effective in narrow contexts, these methods were often restricted to antigen types and specific human leukocyte antigen (HLA) alleles with limited generalizability 6 . More recent AI-based tools reformulated the task as a sequence modeling problem, using tokenized amino acids and attention-based architectures to infer peptide-TCR interactions. Representative models include DlpTCR 7 , ERGO 8 , pMTnet 9 , TEIM 10 , and PanPep 11 . Other recent models, such as PISTE 12 , UnifyImmun 13 , and deepAntigen 6 have expanded the scope to HLA-antigen-TCR multimers. Despite progress, predicting peptide-TCR binding remains challenging due to the extreme diversity of antigen sequences and TCR types. Models tend to overfit on peptides with abundant binders, limiting performance on rare or unseen antigens. To address this, Gao et al. proposed Pan-Peptide Meta-Learning (PanPep) 11 , a meta-learning framework designed to enhance generalization by training a meta-learner to capture shared features across diverse peptide-TCR interactions ( Fig. 1a ). This meta-learner serves as an adaptable base model for specific peptides through majority learning (more than 100 known binders) or few-shot learning (5 to 100 labeled binders). PanPep employs a Neural Turing Machine to support zero-shot learning andevaluates it using peptides with fewer than 5 annotated binders. This approach produces a tailored predictor for each antigen, offering greater adaptability to biological diversity than a single general model applied across all antigens. While PanPep marks a significant advancement, its real-world applicability remains insufficiently evaluated. First, its classification-based assessments rely on negative sampling strategies that can introduce evaluation bias. In this field, two main negative sampling strategies were adopted ( Fig. 1c ), including reshuffling known positive pairs (named reshuffling ) and randomly drawing from a background repertoire (named background - drawing ) 14 . To avoid the false negatives that reshuffling can introduce from cross-reactive TCRs, PanPep chose background-drawing. However, this approach may lead to overestimation with antigen-irrelevant negatives 15 . Second, many tools in this domain are optimized for narrow settings, such as known peptides or common MHC alleles, and their performance often declines on novel peptides or unfamiliar TCR repertoires. Third, PanPep models only the TCRβ chain, omitting the TCRα chain due to limited data availability, which may reduce prediction confidence in clinical and immunological settings. Therefore, while PanPep represents meaningful progress, broader validation that reveals necessary methodological refinements is essential to establish the reliability and utility of TCR binder predictions in real-world immunological applications. This report presents a comprehensive, unbiased evaluation of peptide-TCR binding predictors, focusing on PanPep and its reported competitors. We benchmarked their performance using both classification metrics under multi-fold cross-validation and enrichment-based metrics from a virtual screening perspective, which ranks candidate TCRs from an entire repertoire by their predicted binding likelihood to a given peptide, thereby minimizing biases introduced by negative sampling subsets. Leveraging newly accumulated data since PanPep’s publication, we conducted direct comparisons on an independent dataset. Furthermore, we extended PanPep to TCRα and paired TCRαβ inputs ( Fig.1 ). Results We report below the reusability testing results across five scenarios. These evaluations first examined PanPep’s inference- and training-level reproducibility using both the original dataset and a newly curated independent dataset. We then assessed its extendibility to peptide-TCRα and peptide-TCRαβ binding recognition, applying the same source code to these new task datasets. Inference-Level Reproducibility Case 1: Reported Performance Reproductions with the PanPep-Provided Dataset We assessed inference-level reproducibility by directly evaluating the original released model weights on the provided test dataset. To assess stability, we followed PanPep’s balanced classification protocol and conducted 100-fold cross-validation under two negative sampling strategies. Across the majority (peptides with ≥100 TCRβ binders), few-shot (5-100 binders), and zero-shot (<5 binders) groups ( Figs. 2 a-f, Extended Data Figs. 1a-f ), boxplots of ROC-AUC and PR-AUC closely matched PanPep’s reported performance. Notably, the results for DlpTCR and ERGO-II differed from those in PanPep’s report, since these tools were benchmarked initially on separate test datasets ( Extended Data Fig. 2 ). In these settings, PanPep underperformed in the majority group, and the relative performance trends among the tools were inconsistent across the two datasets. In our head-to-head comparison on PanPep’s own test set, however, PanPep outperformed both tools across all three groups under the background-drawing strategy ( Figs. 2a-c, Extended Data Figs.1a-c ). Under the reshuffling strategy ( Figs. 2d-f, Extended Data Figs. 1d-f ), its predictive power declined, approaching random guessing as previously reported 15 . To further dissect the performance gap between the two negative sampling strategies, we decomposed PanPep’s predictions from a representative fold of the 100-fold classification evaluations in both the majority and few-shot groups into confusion matrices ( Extended Data Fig. 3 ). The results revealed that the performance drop under the reshuffling strategy was primarily attributable to an inflated false-positive rate. In this setting, negatives were generated by permuting positive peptide-TCR pairs, which introduced “hard negatives” that retained strong sequence-level or contextual similarity to true binders. PanPep misclassified these samples, likely due to over-memorizing TCRs paired with positive labels during the meta-learning training. By contrast, PanPep adopted a background-drawing strategy that produced negatives from a vastly heterogeneous TCR library, where the probability of encountering such challenging TCRs was extremely low. As a result, adaptation failed to expose the model to sufficient representative negatives to correct for this memorization bias. This indicates that (1) PanPep’s learning is strongly driven by TCR information, and (2) current background-drawing negatives poorly approximate the true distribution of non-binding TCRs, leaving a distributional gap between these two negative sampling strategies. To extend PanPep’s analysis, we quantified performance variance across two negative sampling strategies using 100-fold cross-validation ( Figs. 2a-2f, Extend Data Figs. 1a-1f ). Variance in ROC-AUC and PR-AUC was notably higher in the few-shot and zero-shot groups (4.7-4.3% for ROC-AUC in Figs. 2 bcef , 5.6-5.0% for PR-AUC in Extend Data Figs. 1bcef ) than in the majority group ( Figs. 2ad and Extend Data Figs. 1ad ). This instability revealed that the small, artificially balanced negative subsets used in current benchmarking protocols failed to capture the scale and diversity of real peptide-TCRβ interactions, thereby introducing evaluation bias. Alternative Evaluation Scheme from a Virtual Screening Perspective In practical applications, the goal of peptide-TCRβ binding recognition is to prioritize potential TCRβ binders from a repertoire for a given peptide, thereby supporting downstream vaccine or therapeutic design. To align with this objective, we propose benchmarking peptide-TCRβ predictors using a virtual screening formulation, which more closely reflects real-world applications than either the reshuffling or background-drawing strategies ( Methods ). This benchmarking strategy evaluates all possible pairings between a given peptide and the entire TCRβ repertoire, especially focusing on the early enrichment of true bound pairs at top ranks. In the Enrichment plots ( Figs. 2g-i ), comparing with PanPep, DlpTCR, and ERGO-II, PanPep demonstrated higher success rates at the top ranks in the majority, few-shot, and zero-shot settings. This confirms PanPep’s stronger discerning power to identify TCRβ binders for a given peptide than the control tools, despite its limitations in zero-shot classification with reshuffling negatives. The Boltzmann Enhanced Discrimination of Receiver Operating Characteristic (BEDROC, see Methods ) ( Figs. 2j-l ), which weighs early ranks, along with Hit rates ( Extended Data Figs. 1g-i ), consistently shows an advantage over the control tools. Case 2: Challenging PanPep’s Generalization with a New Independent Dataset To evaluate PanPep’s generalizability, we constructed an independent test set containing both unseen peptides and novel TCRβ sequences that are not present in its original training data ( Methods ). Since this dataset was also unseen by DlpTCR and ERGO-II, we first performed a fair zero-shot comparison across all tools, including PanPep’s meta-learner and distilled versions. Enrichment plots, BEDROC scores and Hit rates ( Figs. 3ab, Extended Data Fig. 4a ) show that PanPep-meta and PanPep-distill ranked more ground-truth TCRβ binders at the top than the two control tools, underscoring the strength of its meta-training in zero-shot scenarios. Classification metrics from 100-fold cross-validation ( Figs. 3cd, Extended Data Figs. 4bc ) further support PanPep’s superior generalization. However, in the top 1% of the ranked compounds (57,099,6 compounds), less than 10% of known TCRβ binders were recovered, suggesting that its performance remained inadequate for practical deployment. In the majority group ( Figs. 3e-h, Extended Data Figs. 4d-f ), PanPep’s task-adapted models outperformed both control tools, with significant gains in BEDROC (p < 0.05), ROC-AUC (p < 0.0001), and PR-AUC (p < 0.001). A similar trend was observed in the few-shot setting, with consistent improvements across all metrics ( Figs. 3i-l, Extended Data Figs. 4h-j ). Adapted models significantly outperformed their meta-learners (p 100 for majority vs. 3~5 for few-shot) and more training iterations (1000 for majority vs. 3 for few-shot). The superiority of task adaptation was further supported by comparing zero-shot performance on the entire dataset ( Figs. 3a-d ) with the adapted models in the majority and few-shot settings ( Figs. 3e-l ). In the zero-shot group ( Figs. 3m-p , and Extended Data Figs. 4k-m ), PanPep achieved strong performance in both virtual screening and classification with background-drawing negatives. In contrast, under the reshuffling strategy ( Figs.3dhip ), PanPep performed nearly at random, and was underperformed by control tools trained with reshuffling negatives. These results again highlight the distributional gap between the two negative sampling strategies and its substantial impact on model performance. To probe the source of PanPep’s generalization, we split the independent dataset into two subsets: (1) unseen peptides paired with TCRβs from PanPep’s repertoire (i.e., these TCRβs were present in the training data of PanPep), and (2) unseen peptides paired with novel TCRβs not present in PanPep’s repertoire (unknown-unknown case), which forms a stricter test of generalization to truly novel combinations 15 . In this more challenging scenario, PanPep’s performance declined significantly across both classification and virtual screening metrics, as did its meta-learner ( Figs. 3q-t ). This gap suggests that PanPep’s success depends heavily on prior learned TCRβ repertoire. Training-Level Reproducibility Case 3: Retrained PanPep and Evaluated on the Independent Dataset We tested training-level reproducibility by retraining its model weights using 10-fold cross-validation on peptides from its original training set, then evaluating whether the retrained model consistently achieved similar performance to the original. This process resulted in ten independently trained PanPep models, each evaluated on the same independent testing dataset ( Methods ). To ensure fair comparison, we first assessed these models in zero-shot settings, alongside the two control tools. In the virtual screening evaluation ( Figs. 4ab, Extended Data Fig. 5a ), PanPep maintained superior performance across metrics, including early Success rates, Hit rates, and BEDROC scores. Classification results ( Figs. 4cd, Extended Data Figs. 5bc ) showed that all ten reproduced models consistently outperformed DLCTCR and ERGO-II, with 5.6-21.4% improvements in ROC-AUC and PR-AUC. While the performance of the original PanPep model fell within the range of the ten reproduced models, we observed considerable variance (1.4-21.6%) across them. This indicates that although PanPep is reproducible, its performance is sensitive to training data splits. Furthermore, across the ten reproductions, PanPep-meta consistently outperformed PanPep-distill. These results suggest that the distillation process in PanPep may require further refinement to fully realize its potential in reproducible applications. The ten reproduced PanPep models consistently outperformed the control tools in Success rates, Hit rates and BEDROC scores across the majority, few-shot, and zero-shot groups ( Figs. 4eimfjn, Extended Data Figs. 5dgj ). However, their classification performance ( Figs. 4gkohlp, Extended Data Figs. 5efhikl ) declined under the reshuffling negative strategy, particularly in the zero-shot setting. This suggests that PanPep’s reliance on background-drawing negatives during training may reduce robustness when facing more challenging negative samples. Applying task adaptation, especially in the majority and few-shot settings, could improve performance ( Figs. 4gkhl, Extended Data Figs. 5efhi ). Reusability in Peptide-TCRα Binding Recognition Case 4: Extended PanPep to Peptide-TCRα Binding The TCRα chain aids peptide-MHC recognition but is insufficient for strong binding 16 . Limited public TCRα data make peptide-TCRα binding a small-data challenge for testing PanPep’s extendibility. We derived a PanPep-TCRα dataset from DlpTcr and ERGO-II studies ( Methods ), excluding peptides with fewer than three binders to meet PanPep’s meta-training requirements. We conducted 10-fold cross-validation using varied sampling for PanPep, benchmarking only against DlpTcr, as ERGO-II does not support TCRα prediction. In the majority group ( Figs. 5a-d, Extended Data Figs. 6a-c ), PanPep’s task-adapted models were competitive with DlpTCR in Enrichment plots, Hit rates, and BEDROC scores. ROC-AUC and PR-AUC results from balanced classification evaluations showed similar trends. Notably, under the reshuffling negative strategy, PanPep retained predictive power, while DlpTCR struggled. In the few-shot group ( Figs. 5e-h, Extended Data Figs. 6d-f ), PanPep showed limited advantage, with only 4 out of 10 adapted models outperforming DlpTCR in BEDROC. In the zero-shot setting, both models achieved comparable enrichment, though PanPep lagged in classification metrics ( Figs. 5i-l, Extended Data Figs. 6h-j ). This may be due to the smaller quantity and less diversity of PanPep’s training data (156 peptides with fewer than three TCRα binders were excluded to satisfy PanPep’s meta-training requirements). Additionally, performance variance among the 10 models was high (2.7%-36% in Fig. 5 and Extended Data Fig. 6 ), suggesting that data scarcity also hinders model robustness in peptide-TCRα prediction. Reusability in Peptide-TCRα β Binding Recognition Case 5: Extended PanPep to Peptide-TCRαβ Binding The CDR3α and CDR3β loops together form the functional interface for peptide recognition and stable binding to peptide-MHC (pMHC) complexes 16 . However, most public datasets provide only TCRβ due to easier sequencing, complicating peptide-TCRαβ binding prediction. A practical workaround is to combine separate peptide-TCRα and peptide-TCRβ predictors to infer peptide-TCRαβ interactions 8 . To test PanPep’s reusability in a realistic biological context, we paired the reproduced 10-fold PanPep models trained on TCRα and TCRβ data, creating ten predictors for peptide-TCRαβ binding ( Methods ). Following the DlpTCR and ERGO-II protocols, we applied these models on a peptide-TCRαβ test set derived from their studies in a zero-shot setting. The Enrichment plots, BEDROCs and Hit rates ( Fig. 6ab, Extended Data Fig. 7a ) show that PanPep’s 10-fold models consistently outperformed DlpTCR and ERGO-II, further validating PanPep’s superior extendibility and reusability. Notably, PanPep-meta again outperformed the PanPep-distill models (p-value < 0.0001 in Fig. 6b ). This trend was also reflected in the 10-fold classification evaluations (p-value < 0.001 in Figs. 6cd and Extended Data Figs. 7bc ). We further conducted task adaptation to the majority and few-shot groups using their peptide-TCRαβ support data. Only one peptide belonged to the majority group, where DlpTCR outperformed both PanPep and ERGO-II in virtual screening metrics ( Figs. 6ef , Extended Data Fig. 7d ). Few PanPep models surpassed ERGO-II after task adaptation, reflecting coordination challenges across TCRα and TCRβ models, also evident in classification results ( Figs. 6gh, Extended Data Figs. 7ef ). In the few-shot group, most PanPep models outperformed control tools, though task adaptation yielded inconsistent gains with notable variance ( Figs. 6i-l, Extended Data Figs. 7g-i ). In the zero-shot group, PanPep retained an advantage, but PanPep-distill still underperformed relative to PanPep-meta ( Figs. 6m-p, Extended Data Figs. 7j-l ). However, PanPep’s ~24% early Success rates and ~0.55 ROC-AUCs/PR-AUCs indicate that peptide-TCRαβ binding prediction remains an unsolved challenge in real-world biological contexts. Discussion In this study, we demonstrated that the reported performance of PanPep can be reproduced using the provided model weights, data, and training protocol. In addition to classification evaluation, we comprehensively assessed its performance in a virtual screening setting. Compared to two control tools, PanPep showed clear advantages in its meta-learner, few-shot, and zero-shot settings, especially on an independent dataset consisting of newly released antigens and their TCRβ binders. This confirmed its generalizability to unseen antigens with few or no known TCR binders, which remains a bottleneck in the field. Beyond reproduction, we successfully reused PanPep’s code to build predictors for peptide-TCRα, and peptide-TCRαβ binding, extending its scope to more physiologically relevant contexts. These results highlight PanPep’s progress in antigen-TCR interaction modeling. This study also revealed several limitations in PanPep’s current design. First, PanPep demonstrated limited early enrichment of TCR binders (e.g., within the top 0.1% of our VIRTUAL SCREENING evaluations), indicating the persistent challenges in real-world antigen-TCR screening. Second, the high variance observed across cross-validations suggests that the imbalance between TCR binders and non-binders remains a significant issue. Third, the marked performance decline on unseen peptide-unseen TCRβ combinations further indicated PanPep's limited generalizability to novel TCRs. Moreover, PanPep’s few-shot adaptation and zero-shot distillation did not consistently outperform the meta-learner, implying that pre-learned knowledge may be degraded during fine-tuning, a phenomenon known as catastrophic forgetting 17 . While PanPep aimed to relate unseen antigens to learned tasks and create a zero-shot predictor using a distilled Neural Turing Machine, achieving this goal requires a universal antigen representation strategy and power-conserved distillation. In its current form, PanPep distilled all task-specific models into just three virtual representations, which may not sufficiently capture the diversity of the task space, and thus limit adaptability to novel tasks. A promising future direction involves adopting scaling laws 18 from molecular foundation models 19 , which improve the sequential contextual representation of amino acids by increasing model and data size via unsupervised learning strategies like masked language modeling 20 or autoregressive modeling 21 . Unlike meta-learning, these models do not rely on task partitions and can facilitate broader generalization. Representations of antigens and peptides derived from large-scale corpora may offer more robust support for meta-learning and zero-shot task modeling. Additionally, techniques such as Elastic Weight Consolidation 17 and Parameter-Efficient Fine-Tuning 22,23 may provide mechanisms to preserve generalization and mitigate catastrophic forgetting during adaptation. We also recommend developing negative sampling strategies that combine broad repertoire coverage with the inclusion of representative reshuffled peptide-TCR pairs, while excluding cross-reactive cases. Such strategies would help regularize model training, mitigate overfitting to peptide- or TCR-specific features, and ultimately enhance the robustness of meta-learning and task adaptation. This study provides a comprehensive evaluation of PanPep’s reusability in virtual screening, and its extension to TCRα and TCRαβ prediction tasks, revealing both its strengths and limitations. Our optimized implementation of PanPep supports multi-GPU parallelization to accelerate modeling and inference on the full TCR repertoire. This work also lays the foundation for evaluating future antigen-TCR binding predictors, as well as related models such as those for HLA-antigen, HLA-antigen-TCR, and protein-protein interactions. Declarations Data Availability The dataset used by PanPep is publicly available on Zenodo (https://doi.org/10.5281/zenodo.7544387), and our newly curated dataset has been deposited on Zenodo as well (https://doi.org/10.5281/zenodo.16943691). Code Availability The original code of PanPep is available at https://github.com/bm2-lab/PanPep. Our code to run the reproducibility results and to analyze the reusability is available via GitHub at https://github.com/coffee19850519/PanPep_Reusability. Acknowledgements We thank Kai Liu, Qiuyu Lv, and Zhiyuan Yang for their technical support. This work was funded by the National Institutes of Health (NIH) R35GM126985. Author contributions D.X. and F.H. conceived and designed the study. X.W. and F.H. developed the code, conducted the evaluation, and created the visualizations. F.H. and D.X. drafted and revised the manuscript. D.X. supervised the study. All authors reviewed and approved the final manuscript. Competing interests The authors declare no competing interests. References Braun, D. A. et al. A neoantigen vaccine generates antitumour immunity in renal cell carcinoma. Nature 639 , 474–482 (2025). Hudson, D., Fernandes, R. A., Basham, M., Ogg, G. & Koohy, H. Can we predict T cell specificity with digital biology and machine learning? Nat. Rev. Immunol. 23 , 511–521 (2023). Huang, H., Wang, C., Rubelt, F., Scriba, T. J. & Davis, M. M. Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening. Nat. Biotechnol. 38 , 1194–1202 (2020). Zhang, H., Zhan, X. & Li, B. GIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation. Nat. Commun. 12 , 4699 (2021). Sidhom, J.-W., Larman, H. B., Pardoll, D. M. & Baras, A. S. DeepTCR is a deep learning framework for revealing sequence concepts within T-cell repertoires. Nat. Commun. 12 , 1605 (2021). Que, J. et al. Identifying T cell antigen at the atomic level with graph convolutional network. Nat. Commun. 16 , 5171 (2025). Xu, Z. et al. DLpTCR: an ensemble deep learning framework for predicting immunogenic peptide recognized by T cell receptor. Brief. Bioinform. 22 , bbab335 (2021). Springer, I., Tickotsky, N. & Louzoun, Y. Contribution of T Cell Receptor Alpha and Beta CDR3, MHC Typing, V and J Genes to Peptide Binding Prediction. Front. Immunol. 12 , 664514 (2021). Lu, T. et al. Deep learning-based prediction of the T cell receptor–antigen binding specificity. Nat. Mach. Intell. 3 , 864–875 (2021). Peng, X. et al. Characterizing the interaction conformation between T-cell receptors and epitopes with deep learning. Nat. Mach. Intell. 5 , 395–407 (2023). Gao, Y. et al. Pan-Peptide Meta Learning for T-cell receptor–antigen binding recognition. Nat. Mach. Intell. 5 , 236–249 (2023). Feng, Z. et al. Sliding-attention transformer neural architecture for predicting T cell receptor–antigen–human leucocyte antigen binding. Nat. Mach. Intell. 6 , 1216–1230 (2024). Yu, C., Fang, X., Tian, S. & Liu, H. A unified cross-attention model for predicting antigen binding specificity to both HLA and TCR molecules. Nat. Mach. Intell. 7 , 278–292 (2025). Gao, Y., Gao, Y., Dong, K., Wu, S. & Liu, Q. Reply to: The pitfalls of negative data bias for the T-cell epitope specificity challenge. Nat. Mach. Intell. 5 , 1063–1065 (2023). Dens, C., Laukens, K., Bittremieux, W. & Meysman, P. The pitfalls of negative data bias for the T-cell epitope specificity challenge. Nat. Mach. Intell. 5 , 1060–1062 (2023). Zareie, P. et al. Canonical T cell receptor docking on peptide–MHC is essential for T cell signaling. Science 372 , eabe9124 (2021). Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114 , 3521–3526 (2017). Kaplan, J. et al. Scaling Laws for Neural Language Models. Preprint at https://doi.org/10.48550/arXiv.2001.08361 (2020). Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. OpenAI et al. GPT-4 Technical Report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2024). Houlsby, N. et al. Parameter-Efficient Transfer Learning for NLP. Preprint at https://doi.org/10.48550/arXiv.1902.00751 (2019). Ding, N. et al. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nat. Mach. Intell. 5 , 220–235 (2023). Xiong, G.-L. et al. Improving structure-based virtual screening performance via learning from scoring function components. Brief. Bioinform. 22 , bbaa094 (2021). Vita, R. et al. The Immune Epitope Database (IEDB): 2024 update. Nucleic Acids Res. 53 , D436–D443 (2025). Goncharov, M. et al. VDJdb in the pandemic era: a compendium of T cell receptors specific for SARS-CoV-2. Nat. Methods 19 , 1017–1019 (2022). Tickotsky, N., Sagiv, T., Prilusky, J., Shifrut, E. & Friedman, N. McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences. Bioinformatics 33 , 2924–2929 (2017). Methods Reproducibility Test Setup In the reproducibility test, we followed PanPep’s classification evaluation protocol and additionally conducted a virtual screening evaluation. For classification, we applied PanPep’s balanced sampling strategy by selecting an equal number of unbound and bound TCRs for each peptide to construct the test set. PanPep and the control tools were evaluated on these balanced subsets. To ensure robustness, we performed 100-fold cross-validation, which allowed broader coverage of negative pairs. Performance was assessed using ROC-AUC and PR-AUC, consistent with PanPep’s original report. The virtual screening evaluation offers a more comprehensive assessment by testing a model’s ability to achieve early enrichment. Unlike PanPep’s classification evaluation, which relies on balanced subsets, virtual screening considers all possible peptide-TCR pairs, thereby minimizing bias from subsampling. Early enrichment reflects how well a model ranks known true binders near the top of the list, which is essential for improving experimental efficiency. To assess this, we report the Enrichment plot, Hit rate, and BEDROC 24 . An Enrichment plot visualizes how effectively a model ranks true binders at the top of a sorted list of candidate TCRs for a given peptide. Candidate TCRs from an entire repertoire are sorted by predicted binding likelihood, and the cumulative proportion of true binders recovered is plotted against the proportion of the ranked list examined. Hit rate typically refers to the proportion of true TCR binders retrieved in the top-k predictions. BEDROC applies exponentially greater weight to true binders that appear at earlier ranks. Its formula is defined as follows: Where n is the number of TCR binders and N is the total number of screened TCRs. The term normalizes the rank r i of the i th TCR binder. The parameter 𝛼 controls the emphasis on early top ranks, with a common choice in our work 𝛼=20 placing approximately ~80% of the weight on the top 1% ranks. The virtual screening evaluation required extensive computation due to the large number of peptide-TCR pairs. To accelerate this process, we optimized PanPep’s code to support multi-GPU parallelism. All peptides were divided into k groups, where k equals the number of available GPUs, and each GPU processed one group. For each peptide, its corresponding TCR pairs were batched and processed on the assigned GPU. This parallelized workflow was executed on a machine with 8 GPUs, a 56-core CPU, and 512 GB of physical memory, using a batch size of 150. Virtual screening metrics were implemented using the cuML Python package to leverage GPU-based matrix operations. These improvements enabled efficient and timely execution of the virtual screening evaluations. Training and Test Data Provided by PanPep In the inference-level reusability evaluation, we utilized PanPep’s original test dataset, which includes 276 peptides and their 34,711 TCRβ binders, forming a total of 36,487 peptide-TCRβ binding pairs. The dataset was categorized into majority, few-shot, and zero-shot groups, comprising 25, 122, and 129 peptides, respectively. For the balanced classification evaluation, an equal number of non-binding TCRβ sequences were either randomly sampled from a background repertoire of 57,107,565 TCRβ sequences or generated by reshuffling the 34,711 known binders, corresponding to different negative sampling strategies. In the virtual screening evaluation, each peptide was tested against the entire background repertoire of 57,107,565 TCRβ sequences to identify and rank the most likely binding candidates. For the training-level reusability evaluation, PanPep’s original training dataset was divided into 10 folds, each containing 188 peptides with varying proportions of majority, few-shot, and zero-shot samples. These folds were used to retrain PanPep’s meta-learner under 10-fold cross-validation. During training, balanced negative sampling was applied by selecting non-binding TCRβ sequences from the same background library for each peptide, consistent with PanPep’s original negative sampling protocol. The resulting 10 meta-learner models were then evaluated on PanPep’s original test dataset using both classification and virtual screening metrics. Independent Testing Data Construction We followed PanPep’s data curation protocol to retrieve all available human HLA class I-related peptide and TCRβ binding records from the IEDB 25 , VDJdb 26 , and McPAS 27 TCR databases, excluding the PIRD database due to its recent inaccessibility. PanPep’s data quality-control criteria were then applied to remove low-confidence records. After excluding PanPep’s original training and evaluation data, the remaining records were used as the positive set for our independently curated benchmark dataset. This dataset includes 670 unique peptides and 4,362 unique TCRβ sequences, forming 4,377 peptide-TCRβ binding pairs. Following PanPep’s task definitions, these peptides were grouped into majority, few-shot, and zero-shot categories, containing 4,150, and 516 peptides, respectively. The corresponding non-binding TCRβ set was constructed either by randomly sampling from PanPep’s control repertoire of 57,107,565 TCR sequences or by reshuffling the 4,362 known binders, depending on the chosen negative sampling strategy. After gathering all data sources, we identified 11,550 novel TCRβ sequences that were not present in PanPep’s original TCRβ repertoire. This allowed us to construct an unseen peptide and unseen TCRβ subset to evaluate PanPep’s reasoning ability in completely novel settings. Specifically, these 11,550 TCRβs were treated as an unseen TCRβ library, and 391 unseen peptides known to bind them led to 1,991 peptide-TCRβ binding pairs. All non-binding pairs between these peptides and the unseen TCRβ library were used as negative samples in this subset. Construction of Peptide-TCRα and TCRαβ Binding Datasets for Reusability Test To extend PanPep for peptide-TCRα binding prediction, we applied its meta-training framework using DlpTCR’s training data 7 , which included 273 unique peptides and 4,508 unique TCRα sequences forming 4,922 binding pairs. A total of 156 peptides with fewer than three TCRα binders were excluded, as they did not meet the minimum support and query requirements for PanPep’s meta-training protocol. For evaluation, we compiled peptide-TCRα binding records from IEDB, VDJdb, and McPAS databases, together with the test data from the DlpTCR and ERGO-II studies. Both PanPep and DlpTCR were evaluated on this compiled test set (ERGO-II does not support peptide-TCRα binding prediction), which contained 215 unique peptides and 1,126 unique TCRα sequences with 14,436 binding pairs. Following PanPep’s task definitions, the test set was partitioned into 11 majority, 186 few-shot, and 931 zero-shot tasks. All TCRα sequences from both the training and testing data were pooled into a TCRα library of 37,461 sequences. Negatives were constructed either by pairing peptides with sequences from this library or by reshuffling the known 1,126 binders, depending on the adopted sampling strategy. Similarly, we compiled peptide-TCRαβ binding records from IEDB, VDJdb, and McPAS, together with data from the DlpTCR and ERGO-II studies, yielding 286 unique peptides and 472 unique TCRαβ sequences with 723 documented interactions. Due to the limited availability of peptide-TCRαβ data, all records were reserved exclusively for benchmarking. This dataset was categorized into 1 majority, 18 few-shot, and 267 zero-shot tasks. All TCRαβ sequences were consolidated into a library of 24,191 sequences. Negatives were generated either by pairing the peptides with sequences from this library or by reshuffling the 472 known binding pairs, depending on the adopted sampling strategy. Extending PanPep to Peptide-TCRα and TCRαβ Binding Recognition We used the training code provided by PanPep’s authors to perform meta-learning on the peptide-TCRα dataset, modifying the input to accept a peptide sequence and the corresponding CDR3α sequence of a TCR. The resulting TCRα-oriented meta-learner was fine-tuned using task-specific support data for the majority and few-shot settings. For the zero-shot setting, peptide-TCRα models were generated by distilling task learners from the meta-learning process, following PanPep’s zero-shot protocol. To evaluate the model's stability, we conducted 10-fold cross-validation throughout the peptide-TCRα binding modeling process. For peptide-TCRαβ binding prediction, the CDR3α and CDR3β sequences were input separately into PanPep-TCRα and PanPep-TCRβ models, along with the peptide sequence. The individual predictions from each model were averaged to generate a final binding score for the peptide-TCRαβ pair. For majority and few-shot tasks, the meta-learners trained on peptide-TCRα and peptide-TCRβ data were fine-tuned independently using their respective support sets. In the zero-shot setting, the distilled PanPep-TCRα and PanPep-TCRβ models were applied directly without further adaptation. To assess the overall stability of this approach, we also evaluated peptide-TCRαβ binding performance using the 10-fold cross-validated PanPep-TCRα and PanPep-TCRβ models. References 22. Vita, R. et al. The Immune Epitope Database (IEDB): 2024 update. Nucleic Acids Res. 53 , D436-D443 (2025). 23. Goncharov, M. et al. VDJdb in the pandemic era: a compendium of T cell receptors specific for SARS-CoV-2. Nat. Methods 19 , 1017-1019 (2022). 24. Tickotsky, N., Sagiv, T., Prilusky, J., Shifrut, E. & Friedman, N. McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences. Bioinformatics 33 , 2924-2929 (2017). Additional Declarations There is NO Competing Interest. Supplementary Files ExtendedData.docx Cite Share Download PDF Status: Published Journal Publication published 06 May, 2026 Read the published version in Nature Machine Intelligence → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7456773","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":517133371,"identity":"945aeefd-0a09-40d1-8096-7bd13cb4c834","order_by":0,"name":"Dong Xu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAzUlEQVRIiWNgGAWjYHACAyC2YWCDcCSAOIEoLWmkazmMLEBAi8GN5I2PC36dj+bjP/7wMe8eCwZ+9hwDvFokZ6QVG8/su53bJpFjbMzzTIJBsucNfi38Ejlm0rw9IC08bNI8BySA9hKwhQ2i5VxuG//xZ2At9oS0gG3h+XEgt40hwQxiiwQhv/Q8KzbmbUgG+8VwzgEJHokzzwrwajE4Dgwxnj92ufP7jz988OZAnRx/e/IGvFrAgLENweYhrBwM/hCpbhSMglEwCkYmAAChT0ADUukQqgAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0002-4809-0514","institution":"University of Missouri - Columbia","correspondingAuthor":true,"prefix":"","firstName":"Dong","middleName":"","lastName":"Xu","suffix":""},{"id":517133372,"identity":"46e32e48-88d9-4eca-b0b9-6808d68f7a95","order_by":1,"name":"Fei He","email":"","orcid":"https://orcid.org/0000-0002-3284-9506","institution":"University of Missouri - Columbia","correspondingAuthor":false,"prefix":"","firstName":"Fei","middleName":"","lastName":"He","suffix":""},{"id":517133373,"identity":"fc0f22ce-a0e6-4c54-8d22-fc37e38fac1b","order_by":2,"name":"Xianyu Wang","email":"","orcid":"","institution":"University of Missouri-Columbia","correspondingAuthor":false,"prefix":"","firstName":"Xianyu","middleName":"","lastName":"Wang","suffix":""}],"badges":[],"createdAt":"2025-08-25 20:10:15","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7456773/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7456773/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s42256-026-01236-6","type":"published","date":"2026-05-06T04:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":91683880,"identity":"e2955552-87b4-4f65-8367-bd8cc7d678bf","added_by":"auto","created_at":"2025-09-19 07:09:00","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":274649,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSchematic Overview of the PanPep Reusability Workflow. (a) \u003c/strong\u003eWorkflow for reproducing and evaluating PanPep. The model inputs the CDR3 region of the TCR chain and the target peptide. PanPep trains a meta-learner across multiple peptide datasets, which is then adapted to specific peptide tasks, yielding task learners under majority (\u0026gt;100 known TCR binders) and few-shot (5-100 known TCR binders) settings. For peptides with only five known TCR binders, PanPep distills a zero-shot learner from the meta-learning stage. These learners predict peptide-TCR binding scores for classification or ranking candidate TCRs for a given peptide, enabling evaluations from both classification and virtual screening perspectives. Classification performance was assessed with ROC-AUC and PR-AUC under background-drawing and reshuffling negative sampling strategies. Virtual screening performance was evaluated using Enrichment plots, BEDROC scores, and Hit rates. \u003cstrong\u003e(b) \u003c/strong\u003eFive test scenarios designed in this study to evaluate PanPep’s reusability: inference- and training-level reproducibility on the original and newly curated datasets, and extension to peptide-TCRα and peptide-TCRαβ binding recognitions.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-7456773/v1/693e45f16a2085b431fe40d3.png"},{"id":91683116,"identity":"01b47f3a-0f50-49c8-891c-fa2f92cac27d","added_by":"auto","created_at":"2025-09-19 07:00:59","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":263526,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePerformance Comparisons of PanPep, DlpTCR, and ERGO-II on Peptide-TCRβ Binding Data provided by PanPep. \u003c/strong\u003e(\u003cstrong\u003ea-c\u003c/strong\u003e) ROC-AUCs from balanced classification evaluations using background-drawing negatives under 100-fold cross-validation for the majority, few-shot, and zero-shot settings. (\u003cstrong\u003ed-f\u003c/strong\u003e) ROC-AUCs from balanced classification evaluations using reshuffling negatives under 100-fold cross-validation. (\u003cstrong\u003eg-i\u003c/strong\u003e) Enrichment plots assessing virtual screening performance. The early enrichment region highlights the model’s ability to efficiently identify TCR binders with minimal experimental effort. (\u003cstrong\u003ej-l\u003c/strong\u003e) BEDROC scores quantifying early enrichment performance from a virtual screening perspective.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-7456773/v1/b3e1c5c072874731fbc4ba0c.png"},{"id":91683114,"identity":"f420deba-0ede-48ca-8a77-aef3a74f56d8","added_by":"auto","created_at":"2025-09-19 07:00:59","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":407936,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePerformance of PanPep, DlpTCR, and ERGO-II in Virtual Screening and Classification Evaluations on a Newly Curated Independent Dataset. (a-b) \u003c/strong\u003ePanPep’s zero-shot virtual screening evaluation on the entire dataset. \u003cstrong\u003e(c-d) \u003c/strong\u003eROC-AUC scores from zero-shot balanced classification using background-drawing and reshuffling negatives under 100-fold cross-validation. \u003cstrong\u003e(e-h) \u003c/strong\u003eVirtual screening (enrichment plot and BEDROC) and classification (ROC-AUCs from both negative sampling strategies) evaluations under 100-fold cross-validation in the majority setting. \u003cstrong\u003e(i-l)\u003c/strong\u003e Corresponding evaluations in the few-shot setting. \u003cstrong\u003e(m-p) \u003c/strong\u003eCorresponding evaluations in the zero-shot setting. Dashed arrows indicate significant performance gaps between PanPep and controls. \u003cstrong\u003e(q-r) \u003c/strong\u003eVirtual screening performance comparison between the unseen peptide-seen TCRβ subset and the unseen peptide-unseen TCRβ subset using Enrichment plots and BEDROC scores. (\u003cstrong\u003es-t\u003c/strong\u003e) ROC-AUCs of these two subsets from both negative sampling strategies under 100-fold cross-validation. Dashed arrows show performance gaps between the two subsets.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-7456773/v1/70817933e5fc6ec98292a64a.png"},{"id":91683118,"identity":"8f70b50c-ac92-4447-a638-6454d0bba8d0","added_by":"auto","created_at":"2025-09-19 07:00:59","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":353293,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePerformance of Reproduced PanPep on the Independent Dataset. \u003c/strong\u003e(\u003cstrong\u003ea-d\u003c/strong\u003e) Enrichment plot, BEDROC scores, and ROC-AUC scores from zero-shot balanced classification evaluations using two negative sampling strategies on the entire dataset, over 10 independent PanPep reproductions. (\u003cstrong\u003ee-h\u003c/strong\u003e) Virtual screening (enrichment plots and BEDROC) and classification (ROC-AUCs from both negative sampling strategies) evaluations for the majority group in the majority setting across 10 reproductions. (\u003cstrong\u003ei-l\u003c/strong\u003e) Corresponding evaluations for the few-shot group in the few-shot setting. (\u003cstrong\u003em-p\u003c/strong\u003e) Corresponding evaluations for the zero-shot group in the zero-shot setting. Dashed arrows indicate significant performance gaps between PanPep and controls.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-7456773/v1/6a849f5b689425d68d7e1112.png"},{"id":91683878,"identity":"49240583-1263-480a-8d80-6c92e482dcc8","added_by":"auto","created_at":"2025-09-19 07:08:59","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":279205,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePerformance of Reproduced PanPep on Peptide-TCRα Binding Scenario. (a-d\u003c/strong\u003e) Enrichment plots, BEDROC scores, and ROC-AUCs from balanced classification using two negative sampling strategies, comparing PanPep’s task-adapted model with its meta-learner for the majority group in the majority setting across 10 reproductions. (\u003cstrong\u003ee-h\u003c/strong\u003e) Corresponding evaluations for the few-shot group in the few-shot setting. (\u003cstrong\u003ei-l\u003c/strong\u003e) Evaluations comparing PanPep’s distilled model with its meta-learner for the zero-shot group in the zero-shot setting. Dashed arrows indicate significant performance gaps between PanPep and controls.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-7456773/v1/9c733b402b92f01a8fbb3e5d.png"},{"id":91683119,"identity":"10a3cd9d-9eb6-40f4-b49d-fb3dc4245846","added_by":"auto","created_at":"2025-09-19 07:00:59","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":386906,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePerformance of Reproduced PanPep on Peptide-TCRαβ Binding Scenario. (a-d\u003c/strong\u003e) Enrichment plot, BEDROC scores, and ROC-AUC scores from balanced classification evaluations using two negative sampling strategies, comparing PanPep’s distilled model with its meta-learner, on the entire dataset in the zero-shot setting across 10 reproductions. (\u003cstrong\u003ee-h\u003c/strong\u003e) Corresponding evaluations comparing PanPep’s task-adapted model with its meta-learner for the majority group in the majority setting. (\u003cstrong\u003ei-l\u003c/strong\u003e) Corresponding evaluations for the few-shot group in the few-shot setting. (\u003cstrong\u003em-p\u003c/strong\u003e) Corresponding evaluations for the zero-shot group in the zero-shot setting. Dashed arrows indicate significant performance gaps between PanPep and controls.\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-7456773/v1/ef232549544a30178f95fabf.png"},{"id":108669123,"identity":"b39d4a02-a4f0-4df8-a41d-dbbd255bf675","added_by":"auto","created_at":"2026-05-07 07:12:19","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1951007,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7456773/v1/dd4572c1-b50f-47bc-aca0-f4260daaebe5.pdf"},{"id":91683120,"identity":"ba3e06cf-4285-4c7e-8a96-030a04bc5415","added_by":"auto","created_at":"2025-09-19 07:00:59","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":2936695,"visible":true,"origin":"","legend":"","description":"","filename":"ExtendedData.docx","url":"https://assets-eu.researchsquare.com/files/rs-7456773/v1/6710ee92554c2c6dddaba1ca.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Reusability Report: Meta-Learning for Antigen-Specific T-Cell Receptor Binder Identification","fulltext":[{"header":"Introduction","content":"\u003cp\u003eT-cell receptors (TCRs) recognize peptides presented by major histocompatibility complex (MHC) molecules, initiating adaptive immune responses. Identifying peptide-specific TCR binders is crucial for immunotherapy, vaccine design, and diagnostic applications, particularly for targeting neoantigens and novel antigens\u003csup\u003e1\u003c/sup\u003e. Given the vast combinatorial diversity of peptides and TCRs, exhaustive experimental screening is impractical. Computational prediction offers a cost-effective, scalable, and time-efficient alternative, enabling high-throughput \u003cem\u003ein silico\u003c/em\u003e screening to bridge the gap between sequence diversity and experimental limitations\u003csup\u003e2\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eEarly computational approaches predicted antigen-TCR interactions by grouping TCRs based on sequence similarity, with tools such as GLIPH\u003csup\u003e3\u003c/sup\u003e, GIANA\u003csup\u003e4\u003c/sup\u003e, and DeepTCR\u003csup\u003e5\u003c/sup\u003e. While effective in narrow contexts, these methods were often restricted to antigen types and specific human leukocyte antigen (HLA) alleles with limited generalizability\u003csup\u003e6\u003c/sup\u003e. More recent AI-based tools reformulated the task as a sequence modeling problem, using tokenized amino acids and attention-based architectures to infer peptide-TCR interactions. Representative models include DlpTCR\u0026nbsp;\u003csup\u003e7\u003c/sup\u003e, ERGO\u003csup\u003e8\u003c/sup\u003e, pMTnet\u003csup\u003e9\u003c/sup\u003e, TEIM\u003csup\u003e10\u003c/sup\u003e, and PanPep\u003csup\u003e11\u003c/sup\u003e. Other recent models, such as PISTE\u003csup\u003e12\u003c/sup\u003e, UnifyImmun\u003csup\u003e13\u003c/sup\u003e, and deepAntigen\u003csup\u003e6\u003c/sup\u003e have expanded the scope to HLA-antigen-TCR multimers.\u003c/p\u003e\n\u003cp\u003eDespite progress, predicting peptide-TCR binding remains challenging due to the extreme diversity of antigen sequences and TCR types. Models tend to overfit on peptides with abundant binders, limiting performance on rare or unseen antigens. To address this, Gao et al. proposed Pan-Peptide Meta-Learning (PanPep)\u0026nbsp;\u003csup\u003e11\u003c/sup\u003e, a meta-learning framework designed to enhance generalization by training a meta-learner to capture shared features across diverse peptide-TCR interactions (\u003cstrong\u003eFig. 1a\u003c/strong\u003e). This meta-learner serves as an adaptable base model for specific peptides through \u003cstrong\u003emajority learning\u003c/strong\u003e (more than 100 known binders) or \u003cstrong\u003efew-shot learning\u003c/strong\u003e (5 to 100 labeled binders). PanPep employs a Neural Turing Machine to support \u003cstrong\u003ezero-shot learning\u0026nbsp;\u003c/strong\u003eandevaluates it using peptides with fewer than 5 annotated binders. This approach produces a tailored predictor for each antigen, offering greater adaptability to biological diversity than a single general model applied across all antigens.\u003c/p\u003e\n\u003cp\u003eWhile PanPep marks a significant advancement, its real-world applicability remains insufficiently evaluated. First, its classification-based assessments rely on negative sampling strategies that can introduce evaluation bias. In this field, two main negative sampling strategies were adopted (\u003cstrong\u003eFig. 1c\u003c/strong\u003e), including reshuffling known positive pairs (named \u003cstrong\u003ereshuffling\u003c/strong\u003e) and randomly drawing from a background repertoire (named \u003cstrong\u003ebackground\u003c/strong\u003e-\u003cstrong\u003edrawing\u003c/strong\u003e)\u003csup\u003e14\u003c/sup\u003e. To avoid the false negatives that reshuffling can introduce from cross-reactive TCRs, PanPep chose background-drawing. However, this approach may lead to overestimation with antigen-irrelevant negatives\u003csup\u003e15\u003c/sup\u003e. Second, many tools in this domain are optimized for narrow settings, such as known peptides or common MHC alleles, and their performance often declines on novel peptides or unfamiliar TCR repertoires. Third, PanPep models only the TCRβ chain, omitting the TCRα chain due to limited data availability, which may reduce prediction confidence in clinical and immunological settings. Therefore, while PanPep represents meaningful progress, broader validation that reveals necessary methodological refinements is essential to establish the reliability and utility of TCR binder predictions in real-world immunological applications.\u003c/p\u003e\n\u003cp\u003eThis report presents a comprehensive, unbiased evaluation of peptide-TCR binding predictors, focusing on PanPep and its reported competitors. We benchmarked their performance using both classification metrics under multi-fold cross-validation and enrichment-based metrics from a virtual screening perspective, which ranks candidate TCRs from an entire repertoire by their predicted binding likelihood to a given peptide, thereby minimizing biases introduced by negative sampling subsets. Leveraging newly accumulated data since PanPep’s publication, we conducted direct comparisons on an independent dataset. Furthermore, we extended PanPep to TCRα and paired TCRαβ inputs (\u003cstrong\u003eFig.1\u003c/strong\u003e).\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eWe report below the reusability testing results across five scenarios. These evaluations first examined PanPep\u0026rsquo;s inference- and training-level reproducibility using both the original dataset and a newly curated independent dataset. We then assessed its extendibility to peptide-TCR\u0026alpha; and peptide-TCR\u0026alpha;\u0026beta; binding recognition, applying the same source code to these new task datasets.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eInference-Level Reproducibility\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eCase 1: Reported Performance Reproductions with the PanPep-Provided Dataset\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eWe assessed inference-level reproducibility by directly evaluating the original released model weights on the provided test dataset. To assess stability, we followed PanPep\u0026rsquo;s balanced classification protocol and conducted 100-fold cross-validation under two negative sampling strategies. Across the majority (peptides with \u0026ge;100 TCR\u0026beta; binders),\u0026nbsp;\u0026nbsp;few-shot (5-100 binders),\u0026nbsp;and zero-shot (\u0026lt;5 binders) groups (\u003cstrong\u003eFigs. 2\u003c/strong\u003e\u003cstrong\u003ea-f, Extended Data Figs. 1a-f\u003c/strong\u003e), boxplots of ROC-AUC and PR-AUC closely matched PanPep\u0026rsquo;s reported performance. Notably, the results for DlpTCR and ERGO-II differed from those in PanPep\u0026rsquo;s report, since these tools were benchmarked initially on separate test datasets (\u003cstrong\u003eExtended Data Fig. 2\u003c/strong\u003e). In these settings, PanPep underperformed in the majority group, and the relative performance trends among the tools were inconsistent across the two datasets. In our head-to-head comparison on PanPep\u0026rsquo;s own test set, however, PanPep outperformed both tools across all three groups under the background-drawing strategy (\u003cstrong\u003eFigs. 2a-c, Extended Data Figs.1a-c\u003c/strong\u003e). Under the reshuffling strategy (\u003cstrong\u003eFigs. 2d-f, Extended Data Figs. 1d-f\u003c/strong\u003e), its predictive power declined, approaching random guessing as previously reported\u003csup\u003e15\u003c/sup\u003e.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTo further dissect the performance gap between the two negative sampling strategies, we decomposed PanPep\u0026rsquo;s predictions from a representative fold of the 100-fold classification evaluations in both the majority and few-shot groups into confusion matrices (\u003cstrong\u003eExtended Data Fig. 3\u003c/strong\u003e). The results revealed that the performance drop under the reshuffling strategy was primarily attributable to an inflated false-positive rate. In this setting, negatives were generated by permuting positive peptide-TCR pairs, which introduced \u0026ldquo;hard negatives\u0026rdquo; that retained strong sequence-level or contextual similarity to true binders. PanPep misclassified these samples, likely due to over-memorizing TCRs paired with positive labels during the meta-learning training. By contrast, PanPep adopted a background-drawing strategy that produced negatives from a vastly heterogeneous TCR library, where the probability of encountering such challenging TCRs was extremely low. As a result, adaptation failed to expose the model to sufficient representative negatives to correct for this memorization bias. This indicates that (1) PanPep\u0026rsquo;s learning is strongly driven by TCR information, and (2) current background-drawing negatives poorly approximate the true distribution of non-binding TCRs, leaving a distributional gap between these two negative sampling strategies.\u003c/p\u003e\n\u003cp\u003eTo extend PanPep\u0026rsquo;s analysis, we quantified performance variance across two negative sampling strategies using 100-fold cross-validation (\u003cstrong\u003eFigs. 2a-2f, Extend Data Figs. 1a-1f\u003c/strong\u003e). Variance in ROC-AUC and PR-AUC was notably higher in the few-shot and zero-shot groups (4.7-4.3% for ROC-AUC in\u003cstrong\u003e\u0026nbsp;Figs. 2\u003c/strong\u003e\u003cstrong\u003ebcef\u003c/strong\u003e,\u0026nbsp;5.6-5.0%\u0026nbsp;for PR-AUC in \u003cstrong\u003eExtend Data Figs. 1bcef\u003c/strong\u003e) than in the majority group (\u003cstrong\u003eFigs. 2ad\u0026nbsp;\u003c/strong\u003eand\u003cstrong\u003e\u0026nbsp;Extend Data Figs. 1ad\u003c/strong\u003e). This instability revealed that the small, artificially balanced negative subsets used in current benchmarking protocols failed to capture the scale and diversity of real peptide-TCR\u0026beta; interactions, thereby introducing evaluation bias.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eAlternative Evaluation Scheme from a Virtual Screening Perspective\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eIn practical applications, the goal of peptide-TCR\u0026beta; binding recognition is to prioritize potential TCR\u0026beta; binders from a repertoire for a given peptide, thereby supporting downstream vaccine or therapeutic design. To align with this objective, we propose benchmarking peptide-TCR\u0026beta; predictors using a virtual screening formulation, which more closely reflects real-world applications than either the reshuffling or background-drawing strategies (\u003cstrong\u003eMethods\u003c/strong\u003e). This benchmarking strategy evaluates all possible pairings between a given peptide and the entire TCR\u0026beta; repertoire, especially focusing on the early enrichment of true bound pairs at top ranks. In the Enrichment plots (\u003cstrong\u003eFigs. 2g-i\u003c/strong\u003e), comparing with PanPep, DlpTCR, and ERGO-II, PanPep demonstrated higher success rates at the top ranks in the majority, few-shot, and zero-shot settings. This confirms PanPep\u0026rsquo;s stronger discerning power to identify TCR\u0026beta; binders for a given peptide than the control tools, despite its limitations in zero-shot classification with reshuffling negatives. The Boltzmann Enhanced Discrimination of Receiver Operating Characteristic (BEDROC, see \u003cstrong\u003eMethods\u003c/strong\u003e) (\u003cstrong\u003eFigs. 2j-l\u003c/strong\u003e), which weighs early ranks, along with Hit rates (\u003cstrong\u003eExtended Data Figs. 1g-i\u003c/strong\u003e), consistently shows an advantage over the control tools.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eCase 2: Challenging PanPep\u0026rsquo;s Generalization with a New Independent Dataset\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eTo evaluate PanPep\u0026rsquo;s generalizability, we constructed an independent test set containing both unseen peptides and novel TCR\u0026beta; sequences that are not present in its original training data (\u003cstrong\u003eMethods\u003c/strong\u003e). Since this dataset was also unseen by DlpTCR and ERGO-II, we first performed a fair zero-shot comparison across all tools, including PanPep\u0026rsquo;s meta-learner and distilled versions. Enrichment plots, BEDROC scores and Hit rates (\u003cstrong\u003eFigs. 3ab, Extended Data Fig. 4a\u003c/strong\u003e) show that PanPep-meta and PanPep-distill ranked more ground-truth TCR\u0026beta; binders at the top than the two control tools, underscoring the strength of its meta-training in zero-shot scenarios. Classification metrics from 100-fold cross-validation (\u003cstrong\u003eFigs. 3cd, Extended Data Figs. 4bc\u003c/strong\u003e) further support PanPep\u0026rsquo;s superior generalization. However, in the top 1% of the ranked compounds (57,099,6 compounds), less than 10% of known TCR\u0026beta; binders were recovered, suggesting that its performance remained inadequate for practical deployment.\u003c/p\u003e\n\u003cp\u003eIn the majority group (\u003cstrong\u003eFigs. 3e-h, Extended Data Figs. 4d-f\u003c/strong\u003e), PanPep\u0026rsquo;s task-adapted models outperformed both control tools, with significant gains in BEDROC (p \u0026lt; 0.05), ROC-AUC (p \u0026lt; 0.0001), and PR-AUC (p \u0026lt; 0.001). A similar trend was observed in the few-shot setting, with consistent improvements across all metrics (\u003cstrong\u003eFigs. 3i-l, Extended Data Figs. 4h-j\u003c/strong\u003e). Adapted models significantly outperformed their meta-learners (p \u0026lt; 0.001) particularly in the majority group, benefiting from a larger number of support examples (\u0026gt;100 for majority vs. 3~5 for few-shot) and more training iterations (1000 for majority vs. 3 for few-shot). The superiority of task adaptation was further supported by comparing zero-shot performance on the entire dataset (\u003cstrong\u003eFigs. 3a-d\u003c/strong\u003e) with the adapted models in the majority and few-shot settings (\u003cstrong\u003eFigs. 3e-l\u003c/strong\u003e). In the zero-shot group (\u003cstrong\u003eFigs. 3m-p\u003c/strong\u003e, and \u003cstrong\u003eExtended Data Figs. 4k-m\u003c/strong\u003e), PanPep achieved strong performance in both virtual screening and classification with background-drawing negatives. In contrast, under the reshuffling strategy (\u003cstrong\u003eFigs.3dhip\u003c/strong\u003e), PanPep performed nearly at random, and was underperformed by control tools trained with reshuffling negatives. These results again highlight the distributional gap between the two negative sampling strategies and its substantial impact on model performance.\u003c/p\u003e\n\u003cp\u003eTo probe the source of PanPep\u0026rsquo;s generalization, we split the independent dataset into two subsets: (1) unseen peptides paired with TCR\u0026beta;s from PanPep\u0026rsquo;s repertoire (i.e., these TCR\u0026beta;s were present in the training data of PanPep), and (2) unseen peptides paired with novel TCR\u0026beta;s not present in PanPep\u0026rsquo;s repertoire (unknown-unknown case), which forms a stricter test of generalization to truly novel combinations\u003csup\u003e15\u003c/sup\u003e. In this more challenging scenario, PanPep\u0026rsquo;s performance declined significantly across both classification and virtual screening metrics, as did its meta-learner (\u003cstrong\u003eFigs. 3q-t\u003c/strong\u003e). This gap suggests that PanPep\u0026rsquo;s success depends heavily on prior learned TCR\u0026beta; repertoire.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTraining-Level Reproducibility\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eCase 3: Retrained PanPep and Evaluated on the Independent Dataset\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eWe tested training-level reproducibility by retraining its model weights using 10-fold cross-validation on peptides from its original training set, then evaluating whether the retrained model consistently achieved similar performance to the original. This process resulted in ten independently trained PanPep models, each evaluated on the same independent testing dataset (\u003cstrong\u003eMethods\u003c/strong\u003e). To ensure fair comparison, we first assessed these models in zero-shot settings, alongside the two control tools.\u003c/p\u003e\n\u003cp\u003eIn the virtual screening evaluation (\u003cstrong\u003eFigs. 4ab, Extended Data Fig.\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e5a\u003c/strong\u003e), PanPep maintained superior performance across metrics, including early Success rates, Hit rates, and BEDROC scores. Classification results (\u003cstrong\u003eFigs. 4cd, Extended Data Figs.\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e5bc\u003c/strong\u003e) showed that all ten reproduced models consistently outperformed DLCTCR and ERGO-II, with 5.6-21.4%\u0026nbsp;improvements in ROC-AUC and PR-AUC. While the performance of the original PanPep model fell within the range of the ten reproduced models, we observed considerable variance (1.4-21.6%) across them. This indicates that although PanPep is reproducible, its performance is sensitive to training data splits. Furthermore, across the ten reproductions, PanPep-meta consistently outperformed PanPep-distill. These results suggest that the distillation process in PanPep may require further refinement to fully realize its potential in reproducible applications.\u003c/p\u003e\n\u003cp\u003eThe ten reproduced PanPep models consistently outperformed the control tools in Success rates, Hit rates and BEDROC scores across the majority, few-shot, and zero-shot groups (\u003cstrong\u003eFigs. 4eimfjn, Extended Data Figs. 5dgj\u003c/strong\u003e). However, their classification performance (\u003cstrong\u003eFigs. 4gkohlp, Extended Data Figs. 5efhikl\u003c/strong\u003e) declined under the reshuffling negative strategy, particularly in the zero-shot setting. This suggests that PanPep\u0026rsquo;s reliance on background-drawing negatives during training may reduce robustness when facing more challenging negative samples. Applying task adaptation, especially in the majority and few-shot settings, could improve performance (\u003cstrong\u003eFigs. 4gkhl, Extended Data Figs. 5efhi\u003c/strong\u003e).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eReusability in Peptide-TCR\u0026alpha; Binding Recognition\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eCase 4: Extended PanPep to Peptide-TCR\u0026alpha; Binding\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThe TCR\u0026alpha; chain aids peptide-MHC recognition but is insufficient for strong binding\u003csup\u003e16\u003c/sup\u003e. Limited public TCR\u0026alpha; data make peptide-TCR\u0026alpha; binding a small-data challenge for testing PanPep\u0026rsquo;s extendibility. We derived a PanPep-TCR\u0026alpha; dataset from DlpTcr and ERGO-II studies (\u003cstrong\u003eMethods\u003c/strong\u003e), excluding peptides with fewer than three binders to meet PanPep\u0026rsquo;s meta-training requirements. We conducted 10-fold cross-validation using varied sampling for PanPep, benchmarking only against DlpTcr, as ERGO-II does not support TCR\u0026alpha; prediction.\u003c/p\u003e\n\u003cp\u003eIn the majority group (\u003cstrong\u003eFigs. 5a-d, Extended Data\u0026nbsp;Figs. 6a-c\u003c/strong\u003e), PanPep\u0026rsquo;s task-adapted models were competitive with DlpTCR in Enrichment plots, Hit rates, and BEDROC scores. ROC-AUC and PR-AUC results from balanced classification evaluations showed similar trends. Notably, under the reshuffling negative strategy, PanPep retained predictive power, while DlpTCR struggled. In the few-shot group (\u003cstrong\u003eFigs. 5e-h, Extended Data\u0026nbsp;Figs. 6d-f\u003c/strong\u003e), PanPep showed limited advantage, with only 4 out of 10 adapted models outperforming DlpTCR in BEDROC. In the zero-shot setting, both models achieved comparable enrichment, though PanPep lagged in classification metrics (\u003cstrong\u003eFigs. 5i-l, Extended Data\u0026nbsp;Figs. 6h-j\u003c/strong\u003e). This may be due to the smaller quantity and less diversity of PanPep\u0026rsquo;s training data (156 peptides with fewer than three TCR\u0026alpha; binders were excluded to satisfy PanPep\u0026rsquo;s meta-training requirements). Additionally, performance variance among the 10 models was high (2.7%-36% in \u003cstrong\u003eFig. 5\u0026nbsp;\u003c/strong\u003eand\u003cstrong\u003e\u0026nbsp;Extended Data Fig. 6\u003c/strong\u003e), suggesting that data scarcity also hinders model robustness in peptide-TCR\u0026alpha; prediction.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eReusability in Peptide-TCR\u0026alpha;\u003c/strong\u003e\u003cstrong\u003e\u0026beta;\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;Binding Recognition\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eCase 5: Extended PanPep to Peptide-TCR\u0026alpha;\u0026beta; Binding\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThe CDR3\u0026alpha; and CDR3\u0026beta; loops together form the functional interface for peptide recognition and stable binding to peptide-MHC (pMHC) complexes\u003csup\u003e16\u003c/sup\u003e. However, most public datasets provide only TCR\u0026beta; due to easier sequencing, complicating peptide-TCR\u0026alpha;\u0026beta; binding prediction. A practical workaround is to combine separate peptide-TCR\u0026alpha; and peptide-TCR\u0026beta; predictors to infer peptide-TCR\u0026alpha;\u0026beta; interactions\u003csup\u003e8\u003c/sup\u003e. To test PanPep\u0026rsquo;s reusability in a realistic biological context, we paired the reproduced 10-fold PanPep models trained on TCR\u0026alpha; and TCR\u0026beta; data, creating ten predictors for peptide-TCR\u0026alpha;\u0026beta; binding (\u003cstrong\u003eMethods\u003c/strong\u003e).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFollowing the DlpTCR and ERGO-II protocols, we applied these models on a peptide-TCR\u0026alpha;\u0026beta; test set derived from their studies in a zero-shot setting.\u0026nbsp;The Enrichment plots, BEDROCs and Hit rates (\u003cstrong\u003eFig. 6ab, Extended Data Fig. 7a\u003c/strong\u003e) show that PanPep\u0026rsquo;s 10-fold models consistently outperformed DlpTCR and ERGO-II, further validating PanPep\u0026rsquo;s superior extendibility and reusability. Notably, PanPep-meta again outperformed the PanPep-distill models (p-value \u0026lt; 0.0001\u003cstrong\u003e\u0026nbsp;in Fig. 6b\u003c/strong\u003e). This trend was also reflected in the 10-fold classification evaluations (p-value \u0026lt; 0.001 in\u003cstrong\u003e\u0026nbsp;Figs. 6cd\u003c/strong\u003e and \u003cstrong\u003eExtended Data Figs. 7bc\u003c/strong\u003e). We further conducted task adaptation to the majority and few-shot groups using their\u0026nbsp;peptide-TCR\u0026alpha;\u0026beta; support data. Only one peptide belonged to the majority group, where DlpTCR outperformed both PanPep and ERGO-II in virtual screening metrics (\u003cstrong\u003eFigs. 6ef\u003c/strong\u003e\u003cstrong\u003e,\u0026nbsp;Extended Data Fig. 7d\u003c/strong\u003e). Few PanPep models surpassed ERGO-II after task adaptation, reflecting coordination challenges across TCR\u0026alpha; and TCR\u0026beta; models, also evident in classification results (\u003cstrong\u003eFigs.\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e6gh, Extended Data Figs. 7ef\u003c/strong\u003e). In the few-shot group, most PanPep models outperformed control tools, though task adaptation yielded inconsistent gains with notable variance (\u003cstrong\u003eFigs.\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e6i-l, Extended Data Figs. 7g-i\u003c/strong\u003e). In the zero-shot group, PanPep retained an advantage, but PanPep-distill still underperformed relative to PanPep-meta (\u003cstrong\u003eFigs.\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e6m-p, Extended Data Figs. 7j-l\u003c/strong\u003e). However, PanPep\u0026rsquo;s ~24% early Success rates and ~0.55 ROC-AUCs/PR-AUCs indicate that peptide-TCR\u0026alpha;\u0026beta; binding prediction remains an unsolved challenge in real-world biological contexts.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this study, we demonstrated that the reported performance of PanPep can be reproduced using the provided model weights, data, and training protocol. In addition to classification evaluation, we comprehensively assessed its performance in a virtual screening setting. Compared to two control tools, PanPep showed clear advantages in its meta-learner, few-shot, and zero-shot settings, especially on an independent dataset consisting of newly released antigens and their TCRβ binders. This confirmed its generalizability to unseen antigens with few or no known TCR binders, which remains a bottleneck in the field. Beyond reproduction, we successfully reused PanPep’s code to build predictors for peptide-TCRα, and peptide-TCRαβ binding, extending its scope to more physiologically relevant contexts. These results highlight PanPep’s progress in antigen-TCR interaction modeling.\u003c/p\u003e\n\u003cp\u003eThis study also revealed several limitations in PanPep’s current design. First, PanPep demonstrated limited early enrichment of TCR binders (e.g., within the top 0.1% of our VIRTUAL SCREENING evaluations), indicating the persistent challenges in real-world antigen-TCR screening. Second, the high variance observed across cross-validations suggests that the imbalance between TCR binders and non-binders remains a significant issue. Third, the marked performance decline on unseen peptide-unseen TCRβ combinations further indicated PanPep's limited generalizability to novel TCRs. Moreover, PanPep’s few-shot adaptation and zero-shot distillation did not consistently outperform the meta-learner, implying that pre-learned knowledge may be degraded during fine-tuning, a phenomenon known as catastrophic forgetting\u003csup\u003e17\u003c/sup\u003e. While PanPep aimed to relate unseen antigens to learned tasks and create a zero-shot predictor using a distilled Neural Turing Machine, achieving this goal requires a universal antigen representation strategy and power-conserved distillation. In its current form, PanPep distilled all task-specific models into just three virtual representations, which may not sufficiently capture the diversity of the task space, and thus limit adaptability to novel tasks.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eA promising future direction involves adopting scaling laws\u003csup\u003e18\u003c/sup\u003e from molecular foundation models\u003csup\u003e19\u003c/sup\u003e, which improve the sequential contextual representation of amino acids by increasing model and data size via unsupervised learning strategies like masked language modeling\u003csup\u003e20\u003c/sup\u003e or autoregressive modeling\u003csup\u003e21\u003c/sup\u003e. Unlike meta-learning, these models do not rely on task partitions and can facilitate broader generalization. Representations of antigens and peptides derived from large-scale corpora may offer more robust support for meta-learning and zero-shot task modeling. Additionally, techniques such as Elastic Weight Consolidation\u003csup\u003e17\u003c/sup\u003e and Parameter-Efficient Fine-Tuning\u003csup\u003e22,23\u003c/sup\u003e may provide mechanisms to preserve generalization and mitigate catastrophic forgetting during adaptation. We also recommend developing negative sampling strategies that combine broad repertoire coverage with the inclusion of representative reshuffled peptide-TCR pairs, while excluding cross-reactive cases. Such strategies would help regularize model training, mitigate overfitting to peptide- or TCR-specific features, and ultimately enhance the robustness of meta-learning and task adaptation.\u003c/p\u003e\n\u003cp\u003eThis study provides a comprehensive evaluation of PanPep’s reusability in virtual screening, and its extension to TCRα and TCRαβ prediction tasks, revealing both its strengths and limitations. Our optimized implementation of PanPep supports multi-GPU parallelization to accelerate modeling and inference on the full TCR repertoire. This work also lays the foundation for evaluating future antigen-TCR binding predictors, as well as related models such as those for HLA-antigen, HLA-antigen-TCR, and protein-protein interactions.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe dataset used by PanPep is publicly available on Zenodo (https://doi.org/10.5281/zenodo.7544387), and our newly curated dataset has been deposited on Zenodo as well (https://doi.org/10.5281/zenodo.16943691).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe original code of PanPep is available at https://github.com/bm2-lab/PanPep. Our code to run the reproducibility results and to analyze the reusability is available via GitHub at https://github.com/coffee19850519/PanPep_Reusability.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe thank Kai Liu, Qiuyu Lv, and Zhiyuan Yang for their technical support. This work was funded by the National Institutes of Health (NIH) R35GM126985.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eD.X. and F.H. conceived and designed the study. X.W. and F.H. developed the code, conducted the evaluation, and created the visualizations. F.H. and D.X. drafted and revised the manuscript. D.X. supervised the study. All authors reviewed and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eBraun, D. A. \u003cem\u003eet al.\u003c/em\u003e A neoantigen vaccine generates antitumour immunity in renal cell carcinoma. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e639\u003c/strong\u003e, 474\u0026ndash;482 (2025).\u003c/li\u003e\n\u003cli\u003eHudson, D., Fernandes, R. A., Basham, M., Ogg, G. \u0026amp; Koohy, H. Can we predict T cell specificity with digital biology and machine learning? \u003cem\u003eNat. Rev. Immunol.\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 511\u0026ndash;521 (2023).\u003c/li\u003e\n\u003cli\u003eHuang, H., Wang, C., Rubelt, F., Scriba, T. J. \u0026amp; Davis, M. M. Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening. \u003cem\u003eNat. Biotechnol.\u003c/em\u003e \u003cstrong\u003e38\u003c/strong\u003e, 1194\u0026ndash;1202 (2020).\u003c/li\u003e\n\u003cli\u003eZhang, H., Zhan, X. \u0026amp; Li, B. GIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 4699 (2021).\u003c/li\u003e\n\u003cli\u003eSidhom, J.-W., Larman, H. B., Pardoll, D. M. \u0026amp; Baras, A. S. DeepTCR is a deep learning framework for revealing sequence concepts within T-cell repertoires. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 1605 (2021).\u003c/li\u003e\n\u003cli\u003eQue, J. \u003cem\u003eet al.\u003c/em\u003e Identifying T cell antigen at the atomic level with graph convolutional network. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 5171 (2025).\u003c/li\u003e\n\u003cli\u003eXu, Z. \u003cem\u003eet al.\u003c/em\u003e DLpTCR: an ensemble deep learning framework for predicting immunogenic peptide recognized by T cell receptor. \u003cem\u003eBrief. Bioinform.\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, bbab335 (2021).\u003c/li\u003e\n\u003cli\u003eSpringer, I., Tickotsky, N. \u0026amp; Louzoun, Y. Contribution of T Cell Receptor Alpha and Beta CDR3, MHC Typing, V and J Genes to Peptide Binding Prediction. \u003cem\u003eFront. Immunol.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 664514 (2021).\u003c/li\u003e\n\u003cli\u003eLu, T. \u003cem\u003eet al.\u003c/em\u003e Deep learning-based prediction of the T cell receptor\u0026ndash;antigen binding specificity. \u003cem\u003eNat. Mach. Intell.\u003c/em\u003e \u003cstrong\u003e3\u003c/strong\u003e, 864\u0026ndash;875 (2021).\u003c/li\u003e\n\u003cli\u003ePeng, X. \u003cem\u003eet al.\u003c/em\u003e Characterizing the interaction conformation between T-cell receptors and epitopes with deep learning. \u003cem\u003eNat. Mach. Intell.\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 395\u0026ndash;407 (2023).\u003c/li\u003e\n\u003cli\u003eGao, Y. \u003cem\u003eet al.\u003c/em\u003e Pan-Peptide Meta Learning for T-cell receptor\u0026ndash;antigen binding recognition. \u003cem\u003eNat. Mach. Intell.\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 236\u0026ndash;249 (2023).\u003c/li\u003e\n\u003cli\u003eFeng, Z. \u003cem\u003eet al.\u003c/em\u003e Sliding-attention transformer neural architecture for predicting T cell receptor\u0026ndash;antigen\u0026ndash;human leucocyte antigen binding. \u003cem\u003eNat. Mach. Intell.\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 1216\u0026ndash;1230 (2024).\u003c/li\u003e\n\u003cli\u003eYu, C., Fang, X., Tian, S. \u0026amp; Liu, H. A unified cross-attention model for predicting antigen binding specificity to both HLA and TCR molecules. \u003cem\u003eNat. Mach. Intell.\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 278\u0026ndash;292 (2025).\u003c/li\u003e\n\u003cli\u003eGao, Y., Gao, Y., Dong, K., Wu, S. \u0026amp; Liu, Q. Reply to: The pitfalls of negative data bias for the T-cell epitope specificity challenge. \u003cem\u003eNat. Mach. Intell.\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 1063\u0026ndash;1065 (2023).\u003c/li\u003e\n\u003cli\u003eDens, C., Laukens, K., Bittremieux, W. \u0026amp; Meysman, P. The pitfalls of negative data bias for the T-cell epitope specificity challenge. \u003cem\u003eNat. Mach. Intell.\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 1060\u0026ndash;1062 (2023).\u003c/li\u003e\n\u003cli\u003eZareie, P. \u003cem\u003eet al.\u003c/em\u003e Canonical T cell receptor docking on peptide\u0026ndash;MHC is essential for T cell signaling. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e372\u003c/strong\u003e, eabe9124 (2021).\u003c/li\u003e\n\u003cli\u003eKirkpatrick, J. \u003cem\u003eet al.\u003c/em\u003e Overcoming catastrophic forgetting in neural networks. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e \u003cstrong\u003e114\u003c/strong\u003e, 3521\u0026ndash;3526 (2017).\u003c/li\u003e\n\u003cli\u003eKaplan, J. \u003cem\u003eet al.\u003c/em\u003e Scaling Laws for Neural Language Models. Preprint at https://doi.org/10.48550/arXiv.2001.08361 (2020).\u003c/li\u003e\n\u003cli\u003eLin, Z. \u003cem\u003eet al.\u003c/em\u003e Evolutionary-scale prediction of atomic-level protein structure with a language model.\u003c/li\u003e\n\u003cli\u003eDevlin, J., Chang, M.-W., Lee, K. \u0026amp; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.\u003c/li\u003e\n\u003cli\u003eOpenAI \u003cem\u003eet al.\u003c/em\u003e GPT-4 Technical Report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2024).\u003c/li\u003e\n\u003cli\u003eHoulsby, N. \u003cem\u003eet al.\u003c/em\u003e Parameter-Efficient Transfer Learning for NLP. Preprint at https://doi.org/10.48550/arXiv.1902.00751 (2019).\u003c/li\u003e\n\u003cli\u003eDing, N. \u003cem\u003eet al.\u003c/em\u003e Parameter-efficient fine-tuning of large-scale pre-trained language models. \u003cem\u003eNat. Mach. Intell.\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 220\u0026ndash;235 (2023).\u003c/li\u003e\n\u003cli\u003eXiong, G.-L. \u003cem\u003eet al.\u003c/em\u003e Improving structure-based virtual screening performance via learning from scoring function components. \u003cem\u003eBrief. Bioinform.\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, bbaa094 (2021).\u003c/li\u003e\n\u003cli\u003eVita, R. \u003cem\u003eet al.\u003c/em\u003e The Immune Epitope Database (IEDB): 2024 update. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, D436\u0026ndash;D443 (2025).\u003c/li\u003e\n\u003cli\u003eGoncharov, M. \u003cem\u003eet al.\u003c/em\u003e VDJdb in the pandemic era: a compendium of T cell receptors specific for SARS-CoV-2. \u003cem\u003eNat. Methods\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 1017\u0026ndash;1019 (2022).\u003c/li\u003e\n\u003cli\u003eTickotsky, N., Sagiv, T., Prilusky, J., Shifrut, E. \u0026amp; Friedman, N. McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e33\u003c/strong\u003e, 2924\u0026ndash;2929 (2017).\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Methods","content":"\u003cp\u003e\u003cstrong\u003eReproducibility Test Setup\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn the reproducibility test, we followed PanPep\u0026rsquo;s classification evaluation protocol and additionally conducted a virtual screening evaluation. For classification, we applied PanPep\u0026rsquo;s balanced sampling strategy by selecting an equal number of unbound and bound TCRs for each peptide to construct the test set. PanPep and the control tools were evaluated on these balanced subsets. To ensure robustness, we performed 100-fold cross-validation, which allowed broader coverage of negative pairs. Performance was assessed using ROC-AUC and PR-AUC, consistent with PanPep\u0026rsquo;s original report.\u003c/p\u003e\n\u003cp\u003eThe virtual screening evaluation offers a more comprehensive assessment by testing a model\u0026rsquo;s ability to achieve early enrichment. Unlike PanPep\u0026rsquo;s classification evaluation, which relies on balanced subsets, virtual screening considers all possible peptide-TCR pairs, thereby minimizing bias from subsampling. Early enrichment reflects how well a model ranks known true binders near the top of the list, which is essential for improving experimental efficiency.\u0026nbsp;To assess this, we report the Enrichment plot, Hit rate, and BEDROC\u003csup\u003e24\u003c/sup\u003e. An Enrichment plot visualizes how effectively a model ranks true binders at the top of a sorted list of candidate TCRs for a given peptide. Candidate TCRs from an entire repertoire are sorted by predicted binding likelihood, and the cumulative proportion of true binders recovered is plotted against the proportion of the ranked list examined. Hit rate typically refers to the proportion of true TCR binders retrieved in the top-k predictions. BEDROC applies exponentially greater weight to true binders that appear at earlier ranks. Its formula is defined as follows:\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAa4AAAAuCAYAAABwMv32AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAFiUAABYlAUlSJPAAAArSSURBVHhe7d3PayPlHwfw93zvbZnkpiKS6UUWCUjqwloFF7YTV+lli8mhhwUXywTxsti1CYsXU5viDxCtqeBxSVYUFmHCpsJ6yA/YuocUCgo6PUjxlEnc/QceD2ae78yTpNvWJpu07xcMdJ7nSSZtk+czz89oQggBIiKiMfE/NYGIiGiUMXAREdFYYeAiIqKxwsBFRERjhYGLiIjGCgMXERGNFQYuIiIaKwxcREQ0Vhi4iIhorDBw0bG4rov19XWEQiE1i4hooBi46Mh2dnbw2WefYWpqCu12W80+UalUCqFQCKVSCaFQCDMzM2oRIjpjGLjoyKLRKLLZLM6dO6dmnai9vT0sLS3BMAzUajW0Wq2BB0oiGn0MXGPCdV0kk0lomoaZmRm4rovp6Wns7e2pRUfC+vo6NE071hGPxwEAkUgEzzzzDNrtNrLZLHZ2djA3N6deiojOGAauMRGPxxEKhSCEwNzcHG7evIn3338fkUhELToSFhYWoOs6AMC2bQghHns4jgPDMALPc//+fRmsyuUyotEoSqVSoAwRnS0MXGOgVCqh3W5jY2MDADA1NSW70UZVJBLBjz/+CABYXFw8VMswEongo48+CqTVajW8+uqrQOf3XllZwdNPPx0oQ0RnCwPXGNjd3UUikZDn9+7dw3vvvRcoM4pmZ2eRy+XQbrcDr/8gzz//fOA8m80imUwCAJaWltBqtRCNRgNliOhsYeAaE+12G67rIpPJ4I8//sDk5CTW19fVYkP16NEjNanL8vIyTNPEgwcPkMlk1Owu0WgUd+/eVZOJiCQGrjFgmiaKxSLOnz+Pt956C++88w7m5+dhmqZadGji8TgWFxcBANPT0wcGpVu3bkHXdayurp6a8Sn/ZJJRniRDdCoJoiGoVCoCgNB1XTiOo2aPJNu2hWEYQtd1Ydu2iMViotlsCiGESKfTolAoCCGEsCxL5HI55dFEpLJtW1iWJT9H/Tyu3LEDl2maAoA8dF2XH2Thq6jUw/8BV/N6vVD1OoZhiEajESjTaDREIpGQr6NXJeIvA0DEYjG1CA1YLpcbm7+94zhC13XRaDRkAPO/7wzDEI7jCMdxRCwWE5VKRQghRD6f73p/EpEQhUJBWJYVSHMcR1iWJUzTDKSLTp1tmmZXTBD/JXB5gcmTz+cD56ITdPxBJJ1Oyw+46FRk3gv2KgD1F1Cvk06nA2UajYbQdV3k83l5DiBQeXiVUC6XE81mU16Lhs+7Eel1czFsB91c5XI5+Z6ybVuk02n5OO895t0o2bYdyOv1QSM6yxzHEYZhBNJs2xa5XK5v4BKdGNGrrjj2GNejR48CYyyXLl0K5KOzaPbChQvyPJvNYnZ2Vp4/fPgQFy9eBDpToT///HOUy2W4rivL7O/vB2akvfDCC/JnALhx4waSyaScGh6NRmGaJsrlsiyTSqUwNzeH5eVlhMNhRCIR/PLLL75noWHxxrtu3LiBarWqZg/V7Oxs11oyIQSWl5cB37KDWq2Gqakp7OzsAAC+++47WJYFIQReeukl7O7uAp2dPq5cuYJwOBy4DtFZ9/3333fNLL58+TKWl5fx3HPPBdL9FhYWsLa2piYff4wrnU7LSOg4jjBNM3BX2mw2ZUup2WyKRCIh8zz+LhaP2lqyLEve+VYqFWEYhrzDdRxHAOgaMzFNUz6mX5mjUrss1YMOzz/eNYzWSbPZFJZlCV3Xu1pI/di2LdDpvvZaWIVCIdBKE53uDwCiUqmIRqPR1RVCRP/Wn/0+d/6et168Lnu/Y9e4akWuNufUbhh/UPOoFb7aLSg6Ywnec/iDluh0T/bq8vMqkoPK0OP1+r+eFG+8a1DP7xeLxeT4qf+G66Tl83l5w0RE/2eaZlcjxfO4wNXrscfuKiyXy3AcB0IIVCoVrK2tBaYE1+t12Z1i23ZXF1+1WkUsFguk1ev1QNre3h4cx0Gz2USj0UCr1cL58+dl/p9//tm1d12xWISu67JLsleZYVH34Bv14zDUx6jHYZmmCV3XBz6l37/ryP7+Pra2tvDss8+qxU7Mzs7OE+8CJTrtjhW4qtUqDMOQ++TNzs6i3W7jr7/+kmXu3bsnt+q5fPmy3P3AU6/XAwHFdV2sra3h+vXrMm17exuxWAzhcBjRaBShUAi//vqrzG+325iampLnAPDpp5/igw8+kOeO43SVOY54PN5VST+uwlbHTkb9UAnfeI8/7aDjMFzXxbVr1/Dxxx8PfBeM3d1dOI4DTdNw7do1XL9+veu9eFKWlpawsbERGMclon8dZsOCXra3tzExMRFIO1bgqtfrcqDNdV2kUqlAKwedFpk3YcN1XczMzMjBbXQC28svvwx07orj8TiSyWSgUrlz507g+5disRjq9bo8j0aj+Oabb+C6rnwdAPD222/LMq+99posAwCbm5uBxbLelyFOT0+jVCr1rdTu3r3bVUkftcKmf928eROGYQxtr8V0Og0hBG7fvo07d+6o2UQ0YBcvXkStVlOTgc4kvX68XryuG9xAx+EhNJvNwLgTAJFIJAKDZ97UeP/hnwrpn07s5anjDt7gOJTxKnU9jWVZslyvdWDexBD/tbznKxQKclGpN81efTydrEKhIAzDGNrf2ZuGi876MbWvnIgGz/sc+j/33jotr26OxWKBtcDigOnwRw5cp4lpmjIIptPpvrNe/iuvsh6VStP/hsnn84G/wyB5a+6GcS0iGi29FiAfxKunet3kHqur8DSZmJhAtVqF4ziYnJwMdGeehFQqhcnJSTiOo2Y9MVeuXMHVq1fRbDaxsrKCq1evdjfFT9gwx7WIaPQkk0m8+eabckjnIKVSCZubm7h161bPdZGaOMMDNJlMBqurqygUCnj48CFWVlbw888/D6Ri1TQNlUplKAP31WoVr7zyipqMXC6HCxcu4Msvv0SxWITruojH40NZjJ1KpdBqtVAsFtUsIqIjOdMtrmw2CyGE3Hlj3L7ryZuQEgqFEAqF5M7rj9sRIhQKwXVd/PTTTwiHw0duZXrX9AKfpmkH7vpeKpWwtbWFr776Ss0iIjqyMx24xl08HgcA/P7777AsS249dJCnnnoKxWIR8Xgcly5dwvb2NjY3N9VifZVKJSwtLaHdbuPbb7/FxsYG0ul032vv7e1hcXERP/zwQ88mfy+hUChwnslkjvQaieiUUwe9aDD8syN78c+yVI9eq8q9HctFZxCz14ycQfG28/ImWViW1XdiSywW6zkrqB9vi67jyufzPbcXI6LTgy2uEaF26/mPXt8IPMyFtar79+/DNE3Zrbq1tRXY0cSTyWQQDoe7FjEf5JNPPgmce+vsDsMbP/v777/VLCI6RRi4hsC/2/1JelILa2u1mtzV3xsf29/fD0y8KJVKWF1dRblc7tpl5KDj66+/hmEY8nkWFhYCO6yo5TVNk12myWQS586dk2WJ6HRi4BqwTCYjWyPz8/Oykv2vFhYWcPv2bWiahkQigXfffVctMjBbW1vy62omJibQarWwubkZaPF98cUXvkcczfT0tPz5t99+w4svvijP1dZovxYpEZ1eZ3o6PI2+TCaD119/HejMltR67AtpmqYMXqVSCR9++OFQpvgT0ZPBFheNvPn5ebnJptra8re41tfX8cYbb+DBgwfQNI27tBOdUmxxERHRWGGLi4iIxgoDFxERjRUGLiIiGisMXERENFYYuIiIaKz8Aw0oErG/XXnxAAAAAElFTkSuQmCC\" width=\"430\" height=\"46\"\u003e\u003c/p\u003e\n\u003cp\u003eWhere \u003cem\u003en\u003c/em\u003e is the number of TCR binders and \u003cem\u003eN\u003c/em\u003e is the total number of screened TCRs. The term \u003cimg src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADQAAAAfCAYAAACoE+4eAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAFiUAABYlAUlSJPAAAAJ7SURBVFhH7Ze/bvpADMe/4QEqATsDN1aIJUhVuze8QbIzhQdgyM6Q8AJ060YqdehChcRI2jkD62ViDH/EC/g3/OBEnAioCoRW/UgebF+ic+zzORoREX4RBW746fwFdO38BXTt/AV07fzYgJ6enmBZFjfnE5Dv+9A0Db7vo9FooNls8iV78X0fALBarbgrn4Bubm7gOA6en5/x8vKi7EEQqM3uw7Is3N7ecvN/KCd0XScpZcImpVQ2ACkxDEOtnUwmCX1LLhmaz+dYLpeoVqsJm2maSieilIxGI+Vfr9eYz+dKVxARGYaR+BJCCArDkAd/MobDIdm2zc2k6zo3ZeK6bmK/k8lE+UCb9O1Wn+M4mek8J2EYkmma3PxlCgAwm80S6a7VartJvBir1eqoprAXIiLbtqnf7xNtsiWEoOFwyIPPhJcrl0sDIiIhROL8HBvMNQIpJQGgOI4pDEMqFosUx3FikeM4KoOngGfxFKLePRgMEt1FCJHoGoe4tpIrvL29odFoqDOl6zo+Pz+V7nkeSqWS0jmj0Sh1X+xKHhB2enm/30/cQ1LKk7TTS6HRgc/4/v6O6XSKTqfDXVfJwdHn4+MD9/f3CIKAu75FEATQNA3tdjuhe57Hl6aIogjNZjN7TzxlHMdxqFgsnmUU2r5721WPGX2klLRtZFnN62CGut0uFosF6vU6d32LKIpQqVRg2zZeX18RRREeHx/5shTVahWWZaFcLnMXcEzJnYvxeIy7uzu0Wi30ej2Mx2M8PDwAm86qaVpKMkuMw1N2KXbLyzAMEkKkLvR96LqeOdHkEtB2/Hddl2jzO/GV6X734ubPHWzbP43cztC5+AepTnxxbGPOeAAAAABJRU5ErkJggg==\" width=\"52\" height=\"31\"\u003enormalizes the rank \u003cem\u003er\u003csub\u003ei\u0026nbsp;\u003c/sub\u003e\u003c/em\u003eof the \u003cem\u003ei\u003csup\u003eth\u003c/sup\u003e\u003c/em\u003e TCR binder. The parameter 𝛼 controls the emphasis on early top ranks, with a common choice in our work 𝛼=20 placing approximately ~80% of the weight on the top 1% ranks.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe virtual screening evaluation required extensive computation due to the large number of peptide-TCR pairs. To accelerate this process, we optimized PanPep\u0026rsquo;s code to support multi-GPU parallelism. All peptides were divided into \u003cem\u003ek\u003c/em\u003e groups, where\u0026nbsp;\u003cem\u003ek\u003c/em\u003e equals the number of available GPUs, and each GPU processed one group. For each peptide, its corresponding TCR pairs were batched and processed on the assigned GPU. This parallelized workflow was executed on a machine with 8 GPUs, a 56-core CPU, and 512 GB of physical memory, using a batch size of 150. Virtual screening metrics were implemented using the cuML Python package to leverage GPU-based matrix operations. These improvements enabled efficient and timely execution of the virtual screening evaluations.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTraining and Test Data Provided by PanPep\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn the inference-level reusability evaluation, we utilized PanPep\u0026rsquo;s original test dataset, which includes 276 peptides and their 34,711 TCR\u0026beta; binders, forming a total of 36,487 peptide-TCR\u0026beta; binding pairs. The dataset was categorized into majority, few-shot, and zero-shot groups, comprising 25, 122, and 129 peptides, respectively. For the balanced classification evaluation, an equal number of non-binding TCR\u0026beta; sequences were either randomly sampled from a background repertoire of 57,107,565 TCR\u0026beta; sequences or generated by reshuffling the 34,711 known binders, corresponding to different negative sampling strategies.\u0026nbsp;In the virtual screening evaluation, each peptide was tested against the entire background repertoire of 57,107,565 TCR\u0026beta; sequences to identify and rank the most likely binding candidates.\u003c/p\u003e\n\u003cp\u003eFor the training-level reusability evaluation, PanPep\u0026rsquo;s original training dataset was divided into 10 folds, each containing 188 peptides with varying proportions of majority, few-shot, and zero-shot samples. These folds were used to retrain PanPep\u0026rsquo;s meta-learner under 10-fold cross-validation. During training, balanced negative sampling was applied by selecting non-binding TCR\u0026beta; sequences from the same background library for each peptide, consistent with PanPep\u0026rsquo;s original negative sampling protocol. The resulting 10 meta-learner models were then evaluated on PanPep\u0026rsquo;s original test dataset using both classification and virtual screening metrics.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eIndependent Testing Data Construction\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe followed PanPep\u0026rsquo;s data curation protocol to retrieve all available human HLA class I-related peptide and TCR\u0026beta; binding records from the\u0026nbsp;IEDB\u003csup\u003e25\u003c/sup\u003e, VDJdb\u003csup\u003e26\u003c/sup\u003e, and McPAS\u003csup\u003e27\u003c/sup\u003e TCR databases, excluding the PIRD database due to its recent inaccessibility. PanPep\u0026rsquo;s data quality-control criteria were then applied to remove low-confidence records. After excluding PanPep\u0026rsquo;s original training and evaluation data, the remaining records were used as the positive set for our independently curated benchmark dataset. This dataset includes 670 unique peptides and 4,362 unique TCR\u0026beta; sequences, forming 4,377 peptide-TCR\u0026beta; binding pairs. Following PanPep\u0026rsquo;s task definitions, these peptides were grouped into majority, few-shot, and zero-shot categories, containing 4,150, and 516 peptides, respectively. The corresponding non-binding TCR\u0026beta; set was constructed either by randomly sampling from PanPep\u0026rsquo;s control repertoire of 57,107,565 TCR sequences or by reshuffling the 4,362 known binders, depending on the chosen negative sampling strategy.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAfter gathering all data sources, we identified 11,550 novel TCR\u0026beta; sequences that were not present in PanPep\u0026rsquo;s original TCR\u0026beta; repertoire. This allowed us to construct an unseen peptide and unseen TCR\u0026beta; subset to evaluate PanPep\u0026rsquo;s reasoning ability in completely novel settings. Specifically, these 11,550 TCR\u0026beta;s were treated as an unseen TCR\u0026beta; library, and 391 unseen peptides known to bind them led to 1,991 peptide-TCR\u0026beta; binding pairs. All non-binding pairs between these peptides and the unseen TCR\u0026beta; library were used as negative samples in this subset.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConstruction of Peptide-TCR\u0026alpha; and TCR\u0026alpha;\u0026beta; Binding Datasets for Reusability Test\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo extend PanPep for peptide-TCR\u0026alpha; binding prediction, we applied its meta-training framework using DlpTCR\u0026rsquo;s training data\u003csup\u003e7\u003c/sup\u003e, which included 273 unique peptides and 4,508 unique TCR\u0026alpha; sequences forming 4,922 binding pairs. A total of 156 peptides with fewer than three TCR\u0026alpha; binders were excluded, as they did not meet the minimum support and query requirements for PanPep\u0026rsquo;s meta-training protocol. For evaluation, we compiled peptide-TCR\u0026alpha; binding records from IEDB, VDJdb, and McPAS databases, together with the test data from the DlpTCR and ERGO-II studies. Both PanPep and DlpTCR were evaluated on this compiled test set (ERGO-II does not support peptide-TCR\u0026alpha; binding prediction), which contained 215 unique peptides and 1,126 unique TCR\u0026alpha; sequences with 14,436 binding pairs. Following PanPep\u0026rsquo;s task definitions, the test set was partitioned into 11 majority, 186 few-shot, and 931 zero-shot tasks. All TCR\u0026alpha; sequences from both the training and testing data were pooled into a TCR\u0026alpha; library of 37,461 sequences. Negatives were constructed either by pairing peptides with sequences from this library or by reshuffling the known 1,126 binders, depending on the adopted sampling strategy.\u003c/p\u003e\n\u003cp\u003eSimilarly, we compiled peptide-TCR\u0026alpha;\u0026beta; binding records from IEDB, VDJdb, and McPAS, together with data from the DlpTCR and ERGO-II studies, yielding 286 unique peptides and 472 unique TCR\u0026alpha;\u0026beta; sequences with 723 documented interactions. Due to the limited availability of peptide-TCR\u0026alpha;\u0026beta; data, all records were reserved exclusively for benchmarking. This dataset was categorized into 1 majority, 18 few-shot, and 267 zero-shot tasks. All TCR\u0026alpha;\u0026beta; sequences were consolidated into a library of 24,191 sequences. Negatives were generated either by pairing the peptides with sequences from this library or by reshuffling the 472 known binding pairs, depending on the adopted sampling strategy.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExtending PanPep to Peptide-TCR\u0026alpha; and TCR\u0026alpha;\u0026beta; Binding Recognition\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe used the training code provided by PanPep\u0026rsquo;s authors to perform meta-learning on the peptide-TCR\u0026alpha; dataset, modifying the input to accept a peptide sequence and the corresponding CDR3\u0026alpha; sequence of a TCR. The resulting TCR\u0026alpha;-oriented meta-learner was fine-tuned using task-specific support data for the majority and few-shot settings. For the zero-shot setting, peptide-TCR\u0026alpha; models were generated by distilling task learners from the meta-learning process, following PanPep\u0026rsquo;s zero-shot protocol. To evaluate the model\u0026apos;s stability, we conducted 10-fold cross-validation throughout the peptide-TCR\u0026alpha; binding modeling process.\u003c/p\u003e\n\u003cp\u003eFor peptide-TCR\u0026alpha;\u0026beta; binding prediction, the CDR3\u0026alpha; and CDR3\u0026beta; sequences were input separately into PanPep-TCR\u0026alpha; and PanPep-TCR\u0026beta; models, along with the peptide sequence. The individual predictions from each model were averaged to generate a final binding score for the peptide-TCR\u0026alpha;\u0026beta; pair. For majority and few-shot tasks, the meta-learners trained on peptide-TCR\u0026alpha; and peptide-TCR\u0026beta; data were fine-tuned independently using their respective support sets. In the zero-shot setting, the distilled PanPep-TCR\u0026alpha; and PanPep-TCR\u0026beta; models were applied directly without further adaptation. To assess the overall stability of this approach, we also evaluated peptide-TCR\u0026alpha;\u0026beta; binding performance using the 10-fold cross-validated PanPep-TCR\u0026alpha; and PanPep-TCR\u0026beta; models.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eReferences\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e22.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Vita, R. \u003cem\u003eet al.\u003c/em\u003e The Immune Epitope Database (IEDB): 2024 update. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e53\u003c/strong\u003e, D436-D443 (2025).\u003c/p\u003e\n\u003cp\u003e23.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Goncharov, M. \u003cem\u003eet al.\u003c/em\u003e VDJdb in the pandemic era: a compendium of T cell receptors specific for SARS-CoV-2. \u003cem\u003eNat. Methods\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 1017-1019 (2022).\u003c/p\u003e\n\u003cp\u003e24.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Tickotsky, N., Sagiv, T., Prilusky, J., Shifrut, E. \u0026amp; Friedman, N. McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e33\u003c/strong\u003e, 2924-2929 (2017).\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7456773/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7456773/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Accurate prediction of peptide-T-cell receptor (TCR) binding is vital for immunotherapy, vaccine design, and diagnostics. PanPep, a meta-learning framework, was developed to generalize diverse TCR binder predictions. This study presents a comprehensive and unbiased evaluation of PanPep’s reusability and practical utility. We reproduced its reported performance on original datasets and further benchmarked it against the control tools using both classification metrics and virtual screening enrichment evaluations. Leveraging a newly curated independent dataset, we have demonstrated PanPep’s superior generalization to unseen antigens with few or no known TCR binders. We further extended PanPep to peptide-TCRα and peptide-TCRαβ binding prediction, demonstrating its applicability in more biologically and physiologically relevant contexts. Despite its strengths, PanPep shows limitations in early binder enrichment and reduced robustness to novel TCRs, indicating sensitivity to training data composition and negative sampling strategies. This work establishes a reproducible and extensible benchmarking framework for general peptide-TCR binding prediction and related applications. Overall, our study suggests substantial room for improvement in TCR binder prediction, particularly concerning its practical applicability.","manuscriptTitle":"Reusability Report: Meta-Learning for Antigen-Specific T-Cell Receptor Binder Identification","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-19 07:00:55","doi":"10.21203/rs.3.rs-7456773/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-machine-intelligence","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"natmachintell","sideBox":"Learn more about [Nature Machine Intelligence](http://www.nature.com/natmachintell/)","snPcode":"","submissionUrl":"","title":"Nature Machine Intelligence","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"6add7d16-74de-4a54-bad6-8a360034ec8b","owner":[],"postedDate":"September 19th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":54951971,"name":"Biological sciences/Computational biology and bioinformatics/Software"},{"id":54951972,"name":"Biological sciences/Computational biology and bioinformatics/Computational models"}],"tags":[],"updatedAt":"2026-05-07T07:11:39+00:00","versionOfRecord":{"articleIdentity":"rs-7456773","link":"https://doi.org/10.1038/s42256-026-01236-6","journal":{"identity":"nature-machine-intelligence","isVorOnly":false,"title":"Nature Machine Intelligence"},"publishedOn":"2026-05-06 04:00:00","publishedOnDateReadable":"May 6th, 2026"},"versionCreatedAt":"2025-09-19 07:00:55","video":"","vorDoi":"10.1038/s42256-026-01236-6","vorDoiUrl":"https://doi.org/10.1038/s42256-026-01236-6","workflowStages":[]},"version":"v1","identity":"rs-7456773","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7456773","identity":"rs-7456773","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.