High-Dimensional Semi-Parametric Model StatisticalInference for Model-Free and FDR Controlled Risk FeatureSelection

preprint OA: closed
Full text JSON View at publisher
Full text 10,886 characters · extracted from preprint-html · click to expand
High-Dimensional Semi-Parametric Model StatisticalInference for Model-Free and FDR Controlled Risk FeatureSelection | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article High-Dimensional Semi-Parametric Model StatisticalInference for Model-Free and FDR Controlled Risk FeatureSelection xue-ting song, zi-tong lu, yu-fan gao, Jian Xiao This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4182761/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Developing high-dimensional statistical inference method to identify the individual features associated with the response is very important in analyzing large-scale datasets from economics, finance, medicine, cancer studies, bioinformatics. However, themodern data sets collected often not only own the high-dimension property but alsoexhibit unknown relationships between the response and its explanatory features.Reliable statistical inference depends on an accurate modeling for the observed data.Such an involved modelling task can be done by the state-of-the-art semi-parametricmodel with few model assumptions. In this paper, based on high-dimensionalsemi-parametric model, we utilize the estimators of unknown parameters’ directions andsymmetrized data aggregation approach to develop a novel and model-free featureselection method for achieving fine-mapping of risk features while controlling the falsediscovery rate (FDR) of selection. The proposed method can be applied for the analysisof both continuous and discrete response data sets. The results of simulation studiesdemonstrate the proposed method has robust feature selection performance whilecontrolling FDR very well. The analysis results of a real ocean microbiome dataindicate our method indeed is effective to detect risk features. Biological sciences/Computational biology and bioinformatics Physical sciences/Mathematics and computing Statistical inference Semi-parametric model Feature selection FDR control Symmetrized data aggrega- tion approach Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4182761","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":296449644,"identity":"6b50f565-6d40-42bf-8853-9949f1125f4d","order_by":0,"name":"xue-ting song","email":"","orcid":"","institution":"Zhongnan University of Economics and Law","correspondingAuthor":false,"prefix":"","firstName":"xue-ting","middleName":"","lastName":"song","suffix":""},{"id":296449645,"identity":"3d070e1a-f184-41c7-b6bd-8866aaea2dee","order_by":1,"name":"zi-tong lu","email":"","orcid":"","institution":"Zhongnan University of Economics and Law","correspondingAuthor":false,"prefix":"","firstName":"zi-tong","middleName":"","lastName":"lu","suffix":""},{"id":296449646,"identity":"0d2a8f20-e559-4dcb-a380-58b5445c6d97","order_by":2,"name":"yu-fan gao","email":"","orcid":"","institution":"Zhongnan University of Economics and Law","correspondingAuthor":false,"prefix":"","firstName":"yu-fan","middleName":"","lastName":"gao","suffix":""},{"id":296449647,"identity":"f3bdc3b1-ef6a-450b-8136-28dd9c114459","order_by":3,"name":"Jian Xiao","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAzElEQVRIiWNgGAWjYDACZiCW/Pefh5+BsYEELRJszDKSDURrAQM2ZhuDA8QqNjjO/PCBBQ8bj/H5w20PfjDYyekSskyymc3YQEKCh8fsRmK7YQ9DsrEZIev4mRnMJCQMJIBaGNskeBgOJG4jpIWNmf2bhESCAY9x/8E2yT/EaOFn5gHaciCBx4AhsU2aKFskm3mKDSQbDvBI3ABqkTEgwi8G549vfAzUYs/ff/yZ5JsKOzmCWkCAWQJhAhHKQYDxA5EKR8EoGAWjYIQCAL02NvjiftWhAAAAAElFTkSuQmCC","orcid":"","institution":"Zhongnan University of Economics and Law","correspondingAuthor":true,"prefix":"","firstName":"Jian","middleName":"","lastName":"Xiao","suffix":""}],"badges":[],"createdAt":"2024-03-28 13:58:17","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4182761/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4182761/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":58102799,"identity":"baf4e032-ddb8-4e92-abc9-a04b14adbb1e","added_by":"auto","created_at":"2024-06-11 06:59:15","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":432381,"visible":true,"origin":"","legend":"","description":"","filename":"Scientificreports.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4182761/v1_covered_9b5a879a-afc9-4829-9113-9d122d72d987.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"High-Dimensional Semi-Parametric Model StatisticalInference for Model-Free and FDR Controlled Risk FeatureSelection","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Statistical inference, Semi-parametric model, Feature selection, FDR control, Symmetrized data aggrega- tion approach","lastPublishedDoi":"10.21203/rs.3.rs-4182761/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4182761/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Developing high-dimensional statistical inference method to identify the individual features associated with the response is very important in analyzing large-scale datasets from economics, finance, medicine, cancer studies, bioinformatics. However, themodern data sets collected often not only own the high-dimension property but alsoexhibit unknown relationships between the response and its explanatory features.Reliable statistical inference depends on an accurate modeling for the observed data.Such an involved modelling task can be done by the state-of-the-art semi-parametricmodel with few model assumptions. In this paper, based on high-dimensionalsemi-parametric model, we utilize the estimators of unknown parameters’ directions andsymmetrized data aggregation approach to develop a novel and model-free featureselection method for achieving fine-mapping of risk features while controlling the falsediscovery rate (FDR) of selection. The proposed method can be applied for the analysisof both continuous and discrete response data sets. The results of simulation studiesdemonstrate the proposed method has robust feature selection performance whilecontrolling FDR very well. The analysis results of a real ocean microbiome dataindicate our method indeed is effective to detect risk features.","manuscriptTitle":"High-Dimensional Semi-Parametric Model StatisticalInference for Model-Free and FDR Controlled Risk FeatureSelection","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-05-02 06:39:46","doi":"10.21203/rs.3.rs-4182761/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"2be2e854-444e-4362-96f4-31eeee3d4ad8","owner":[],"postedDate":"May 2nd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":31265444,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":31265445,"name":"Physical sciences/Mathematics and computing"}],"tags":[],"updatedAt":"2024-06-11T06:51:09+00:00","versionOfRecord":[],"versionCreatedAt":"2024-05-02 06:39:46","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4182761","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4182761","identity":"rs-4182761","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00