Dealing with differential misclassification of an outcome or a covariate in association studies with an internally validated sample. Application to the use of a serological test for the diagnosis of SARS-CoV-2 infection | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Dealing with differential misclassification of an outcome or a covariate in association studies with an internally validated sample. Application to the use of a serological test for the diagnosis of SARS-CoV-2 infection Júlia Lacombe Ossó, Benjamin Glemain, Céline Dorival, Hélène Blanché, and 8 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6751937/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 01 Dec, 2025 Read the published version in BMC Medical Research Methodology → Version 1 posted 12 You are reading this latest preprint version Abstract Background To present an analytical framework for correcting misclassification when an imperfect test is used as an indicator of a disease in association studies, taking into account that part of the sample has joint test and disease data. Methods We explored two scenarios, depending on whether the disease is a covariate or the outcome. The analysis sample includes an internal validation sample where the disease status is known in addition to the test. Joint likelihood models taking into account classification errors and the possibly non-random selection of the validation sample were used. Simulations were performed to evaluate the methods. We illustrated our framework using data from a multi-cohort COVID-19 serological study conducted in France between 2020 and 2021, with serology as the imperfect test and SARS-CoV-2 infection as the disease. The dataset included concomitant measurements of the serological test and the SARS-CoV-2 infection status in 7% participants. We estimated 1) the association between incident persistent symptoms (outcome) and SARS-CoV-2 infection (covariate) and 2) the association between infection (outcome) and several covariates. For comparison, we also estimated ‘naïve’ models using serology without correction, or models based solely on the validation sample. Results Simulations confirmed the methods’ abilities to correct for misclassification and non-random selection of the validation sample. In the application, the estimated sensitivities and specificities of the serological test with respect to SARS-CoV-2 infection were 86.2%-87.7% and 95.8%-97.5%, respectively. Considering SARS-CoV-2 infection as a covariate, the corrected analysis showed a significant association between infection and persistent symptoms, while other analyses did not. Considering SARS-CoV-2 infection as the outcome, the corrected analysis confirmed the association between infection and age, gender and active smoking, but did not retrieve an association with living with at least one child at home and previous smoking, which were identified in the naive analysis. Conclusion This methodological framework can be applied in association studies when an imperfect test is used as an indicator of a disease and the disease status has been validated in a subset of the sample. We extended previous works to deal with non-random selection of this validated sample. Registration: NCT04392388 Epidemiologic Biases Differential Misclassification Imperfect Test Sampling bias Serology SARS-CoV-2 Likelihood Full Text Additional Declarations No competing interests reported. Supplementary Files SupplementarymaterialsBMC.docx Cite Share Download PDF Status: Published Journal Publication published 01 Dec, 2025 Read the published version in BMC Medical Research Methodology → Version 1 posted Editorial decision: Revision requested 04 Jul, 2025 Reviews received at journal 03 Jul, 2025 Reviews received at journal 22 Jun, 2025 Reviews received at journal 14 Jun, 2025 Reviewers agreed at journal 06 Jun, 2025 Reviewers agreed at journal 03 Jun, 2025 Reviewers agreed at journal 03 Jun, 2025 Reviewers invited by journal 03 Jun, 2025 Editor invited by journal 30 May, 2025 Editor assigned by journal 29 May, 2025 Submission checks completed at journal 29 May, 2025 First submitted to journal 26 May, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6751937","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":466227414,"identity":"108b5501-985d-462e-82ff-3f722aa7f96a","order_by":0,"name":"Júlia Lacombe Ossó","email":"","orcid":"","institution":"Sorbonne Université, INSERM, Institut Pierre Louis d’Épidémiologie et de Santé Publique","correspondingAuthor":false,"prefix":"","firstName":"Júlia","middleName":"Lacombe","lastName":"Ossó","suffix":""},{"id":466227415,"identity":"d5cd6449-d0f8-43c7-903c-6c7b1aeaa975","order_by":1,"name":"Benjamin Glemain","email":"","orcid":"","institution":"Sorbonne Université, INSERM, Institut Pierre Louis d’Épidémiologie et de Santé Publique","correspondingAuthor":false,"prefix":"","firstName":"Benjamin","middleName":"","lastName":"Glemain","suffix":""},{"id":466227416,"identity":"0e49e1a2-8c85-4815-a3bf-735691a28678","order_by":2,"name":"Céline Dorival","email":"","orcid":"","institution":"Sorbonne Université, INSERM, Institut Pierre Louis d’Épidémiologie et de Santé Publique","correspondingAuthor":false,"prefix":"","firstName":"Céline","middleName":"","lastName":"Dorival","suffix":""},{"id":466227418,"identity":"301ce36f-eafe-416f-8151-006139b2ba82","order_by":3,"name":"Hélène Blanché","email":"","orcid":"","institution":"CEPH- Biobank","correspondingAuthor":false,"prefix":"","firstName":"Hélène","middleName":"","lastName":"Blanché","suffix":""},{"id":466227419,"identity":"4dcacd8e-1f26-465c-b7fd-1d38b0cd7a2f","order_by":4,"name":"Cédric Lemogne","email":"","orcid":"","institution":"Université Paris Cité and Université Sorbonne Paris Nord, INRAE, Center for Research in Epidemiology and StatisticS (CRESS)","correspondingAuthor":false,"prefix":"","firstName":"Cédric","middleName":"","lastName":"Lemogne","suffix":""},{"id":466227421,"identity":"108b1ce4-316a-435a-b8cb-7014b5656fce","order_by":5,"name":"Jean-François Deleuze","email":"","orcid":"","institution":"CEPH- Biobank","correspondingAuthor":false,"prefix":"","firstName":"Jean-François","middleName":"","lastName":"Deleuze","suffix":""},{"id":466227422,"identity":"698d21ff-1bed-4ce0-b36f-53f0bc6f6924","order_by":6,"name":"Olivier Robineau","email":"","orcid":"","institution":"Sorbonne Université, INSERM, Institut Pierre Louis d’Épidémiologie et de Santé Publique","correspondingAuthor":false,"prefix":"","firstName":"Olivier","middleName":"","lastName":"Robineau","suffix":""},{"id":466227423,"identity":"7220a2c4-cb63-4e65-8803-a64db1bde3f7","order_by":7,"name":"Mathilde Touvier","email":"","orcid":"","institution":"Université Paris Cité and Université Sorbonne Paris Nord, INRAE, Center for Research in Epidemiology and StatisticS (CRESS)","correspondingAuthor":false,"prefix":"","firstName":"Mathilde","middleName":"","lastName":"Touvier","suffix":""},{"id":466227424,"identity":"7865b9cf-9e44-4b23-b253-a87da8ae4895","order_by":8,"name":"Gianluca Severi","email":"","orcid":"","institution":"Université Paris-Saclay, UVSQ, Gustave Roussy","correspondingAuthor":false,"prefix":"","firstName":"Gianluca","middleName":"","lastName":"Severi","suffix":""},{"id":466227425,"identity":"e7b4579a-ef8a-4b67-86ad-0622a23e66c4","order_by":9,"name":"Marie Zins","email":"","orcid":"","institution":"Paris University","correspondingAuthor":false,"prefix":"","firstName":"Marie","middleName":"","lastName":"Zins","suffix":""},{"id":466227426,"identity":"d44f6543-c91a-4bcc-90d6-3a807686cdaa","order_by":10,"name":"Xavier Lamballerie","email":"","orcid":"","institution":"Unité des Virus Émergents, UVE, Aix Marseille Univ, IHU Méditerranée Infection","correspondingAuthor":false,"prefix":"","firstName":"Xavier","middleName":"","lastName":"Lamballerie","suffix":""},{"id":466227427,"identity":"ddcdb401-86b8-42cb-83b7-97710b7245db","order_by":11,"name":"Fabrice Carrat","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABI0lEQVRIie3SMUvDQBTA8Xc8MMvBrYEUv4FwUEgRQ/tVLgSui4aCS4cOCRm6CK7xizhHDuKS6lpwUYRMRQJZKhS0UWlaPOsqeP/pjuPH47gDMJn+YFZEAUS79yhYW8eVhtBshwhJAdtTkv5E2oSC34k1y6vHiQdsOiur1+V9p4eo6hGo8GiakWSsITQMUpFLsIth9+pCPNDj5EA6KahztxAkLr6TAZx2wY8UcJAIdE24oi5SUP71HKynSDOFLRryBpyVSFbibk1YXX8REuuI/TElA25LRCqyZgo4+8lLACIPqD0vETsyaO7iOikfru/ix1rCzhRZTvqH7FIiWXj9Qc9KnuvR+CR0b9WNjmzoZsWheRrevC/ZA7b7JDu/yGQymf5576WaX02seoxmAAAAAElFTkSuQmCC","orcid":"","institution":"Sorbonne Université, INSERM, Institut Pierre Louis d’Épidémiologie et de Santé Publique","correspondingAuthor":true,"prefix":"","firstName":"Fabrice","middleName":"","lastName":"Carrat","suffix":""}],"badges":[],"createdAt":"2025-05-26 14:53:15","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6751937/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6751937/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12874-025-02698-9","type":"published","date":"2025-12-01T15:56:50+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":97723733,"identity":"4a04d848-458a-416d-9dd2-e4efbca32460","added_by":"auto","created_at":"2025-12-08 15:59:21","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":492071,"visible":true,"origin":"","legend":"","description":"","filename":"MainBMCnoZotero.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6751937/v1_covered_1a9464b5-6b87-461e-bb22-e47df98a887e.pdf"},{"id":83972700,"identity":"666d40bc-f10a-4227-8560-474f7dfcd31b","added_by":"auto","created_at":"2025-06-05 08:23:23","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":34342,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementarymaterialsBMC.docx","url":"https://assets-eu.researchsquare.com/files/rs-6751937/v1/ef5c9c65975e5ff02a667daa.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Dealing with differential misclassification of an outcome or a covariate in association studies with an internally validated sample. Application to the use of a serological test for the diagnosis of SARS-CoV-2 infection","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-research-methodology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bmrm","sideBox":"Learn more about [BMC Medical Research Methodology](http://bmcmedresmethodol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bmrm/default.aspx","title":"BMC Medical Research Methodology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Epidemiologic Biases, Differential Misclassification, Imperfect Test, Sampling bias, Serology, SARS-CoV-2, Likelihood","lastPublishedDoi":"10.21203/rs.3.rs-6751937/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6751937/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eBackground\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eTo present an analytical framework for correcting misclassification when an imperfect test is used as an indicator of a disease in association studies, taking into account that part of the sample has joint test and disease data.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003eMethods\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eWe explored two scenarios, depending on whether the disease is a covariate or the outcome. The analysis sample includes an internal validation sample where the disease status is known in addition to the test. Joint likelihood models taking into account classification errors and the possibly non-random selection of the validation sample were used. Simulations were performed to evaluate the methods.\u003c/p\u003e\n\u003cp\u003eWe illustrated our framework using data from a multi-cohort COVID-19 serological study conducted in France between 2020 and 2021, with serology as the imperfect test and SARS-CoV-2 infection as the disease. The dataset included concomitant measurements of the serological test and the SARS-CoV-2 infection status in 7% participants. We estimated 1) the association between incident persistent symptoms (outcome) and SARS-CoV-2 infection (covariate) and 2) the association between infection (outcome) and several covariates. For comparison, we also estimated ‘naïve’ models using serology without correction, or models based solely on the validation sample.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003eResults\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eSimulations confirmed the methods’ abilities to correct for misclassification and non-random selection of the validation sample. In the application, the estimated sensitivities and specificities of the serological test with respect to SARS-CoV-2 infection were 86.2%-87.7% and 95.8%-97.5%, respectively. Considering SARS-CoV-2 infection as a covariate, the corrected analysis showed a significant association between infection and persistent symptoms, while other analyses did not. Considering SARS-CoV-2 infection as the outcome, the corrected analysis confirmed the association between infection and age, gender and active smoking, but did not retrieve an association with living with at least one child at home and previous smoking, which were identified in the naive analysis.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003eConclusion\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThis methodological framework can be applied in association studies when an imperfect test is used as an indicator of a disease and the disease status has been validated in a subset of the sample. We extended previous works to deal with non-random selection of this validated sample.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRegistration: \u003c/strong\u003eNCT04392388\u003c/p\u003e","manuscriptTitle":"Dealing with differential misclassification of an outcome or a covariate in association studies with an internally validated sample. Application to the use of a serological test for the diagnosis of SARS-CoV-2 infection","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-06-05 08:23:14","doi":"10.21203/rs.3.rs-6751937/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-07-04T18:51:08+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-07-03T16:40:19+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-22T17:18:39+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-15T03:51:46+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"281540533967052680681098026352768828145","date":"2025-06-06T14:31:34+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"43164252334139000122055302241995724567","date":"2025-06-03T17:35:14+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"52071995106020402841142649619339988620","date":"2025-06-03T17:09:48+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-06-03T16:01:38+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-05-30T20:16:32+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-05-29T07:05:04+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-05-29T07:01:35+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Research Methodology","date":"2025-05-26T14:45:40+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-research-methodology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bmrm","sideBox":"Learn more about [BMC Medical Research Methodology](http://bmcmedresmethodol.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bmrm/default.aspx","title":"BMC Medical Research Methodology","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"039a650f-9018-4a3d-a2d8-f1f6d0939a2a","owner":[],"postedDate":"June 5th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-12-08T15:58:38+00:00","versionOfRecord":{"articleIdentity":"rs-6751937","link":"https://doi.org/10.1186/s12874-025-02698-9","journal":{"identity":"bmc-medical-research-methodology","isVorOnly":false,"title":"BMC Medical Research Methodology"},"publishedOn":"2025-12-01 15:56:50","publishedOnDateReadable":"December 1st, 2025"},"versionCreatedAt":"2025-06-05 08:23:14","video":"","vorDoi":"10.1186/s12874-025-02698-9","vorDoiUrl":"https://doi.org/10.1186/s12874-025-02698-9","workflowStages":[]},"version":"v1","identity":"rs-6751937","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6751937","identity":"rs-6751937","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.