{"paper_id":"1ab746dd-0dbf-44d3-9eec-b01ca839b749","body_text":"Protocol Update: The Normative Modelling\nParadigm for Computational Psychiatry\nAugustijn A.A. de Boer 1,2, Johanna M.M. Bayer 1,2,3,\nCharlotte Fraza1,2, Alice Chavanne 1,2, Barbora Rehak Buckova 1,2,\nKonstantinos Tsilimparis 1,2, Emin Serin 1,2,4,5, Antoine Bernas 1,2,\nRamona Cirstian 1,2, Mariam Zabihi 1,2, Saige Rutherford 1,2,\nAbdool Al-Khaledi 1,6, Thomas Wolfers 7,8,\nChristian F. Beckmann 1,2, Seyed Mostafa Kia 1,2,9,\nAndre F. Marquand 1,2*\n1*Medical Neuroscience, Radboud University Medical Center, Geert\nGrooteplein Zuid 10, Nijmegen, 6525GA, The Netherlands.\n2Donders Center for Cognitive Neuroimaging, Kapittelweg 29,\nNijmegen, 6525EN, The Netherlands.\n3Montreal Neurological institute, McGill University, 3801 University\nStreet, Montreal, H3A 2B4, Canada.\n4Research Division of Mind and Brain, Department of Psychiatry and\nPsychotherapy CCM,Charit´ e-Universit¨ atsmedizin Berlin, Corporate\nMember of Freie Universit¨ at Berlin, Humboldt-Universit¨ at zu Berlin,\nand Berlin Institute of Health, Berlin, Germany.\n5German Center for Mental Health (DZPG), Partner Site Berlin -\nPotsdam, Germany.\n6Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.\n7Friedrich-Schiller University, Jena, Germany.\n8University of Tuebingen, Germany.\n9Research Center for Cognitive Science and Artificial Intelligence,\nTilburg University, Warandelaan 2, Nijmegen, 5037AB, The\nNetherlands.\n*Corresponding author(s). E-mail(s): andre.marquand@donders.ru.nl;\n1\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nAbstract\nNormative Modelling (’brain growth charting’) is now a well-established method\nfor computational psychiatry and involves charting centiles of variation across\na population in terms of mappings between biology and behavior, providing\nstatistical inferences at the level of the individual. These models have helped\nthe field to move away from case-control analysis toward individual-level analy-\nsis. Correspondingly, normative modelling has now been applied to chart brain\ndevelopment and ageing in many populations and has been used to quantify indi-\nvidual deviations across various neurological and psychiatric conditions. This has\nbeen supported by large-scale models that are openly accessible for diverse brain\nimaging modalities. As normative modelling continues to grow, several recent\nmethodological developments, such as non-Gaussian models, longitudinal models,\nand federated learning, have been implemented in different software tools, includ-\ning the Predictive Clinical Neuroscience toolkit (PCNtoolkit). In this protocol\nupdate, we provide: (i) a revised overview of this methodological landscape; (ii)\nan update to our 2022 standardised analytical protocol for normative modelling\nof neuroimaging data, including options for federated and longitudinal normative\nmodels; (iii) practical guidance suited to both novice and experienced practi-\ntioners supported by open-source code examples implemented in the refactored\nversion of PCNtoolkit; and (iv) updated models for cortical thickness, volumetric\ndata, diffusion-weighted imaging and longitudinal data for use by the community.\nKeywords:Normative Modeling, Computational Psychiatry, Precision Medicine,\nMachine Learning\n1 Introduction\n1.1 Development of the protocol\nA major goal in the field of computational psychiatry is to develop reliable biomarkers\nfor psychiatric disorders[1]. Traditional case-control approaches largely fail to provide\nmeaningful insight into psychiatric disorders because the rigid and categorical assump-\ntions of the methodology do not match the continuous and heterogeneous nature of\nthe biology [2].\nNormative Modelling (NM) has emerged as a mature statistical method for analyz-\ning heterogeneity in clinical cohorts[3]. The NM paradigm moves away from traditional\ncase-control approaches by mapping trajectories of healthy development – anatomical,\nneurological, behavioral – from a large reference dataset, and quantifies how individ-\nuals deviate from these norms. This is typically achieved by modelling biological or\nother response variables (e.g., regional brain volume) as a function of certain covari-\nates (e.g., age, sex, total brain volume), thereby rendering the deviation space mostly\nindependent of these variables.\nThe approach has been widely adopted in recent years. Large-scale normative\nmodels have been deployed [4–6] and applied to parse heterogeneity in several clinical\napplications [7, 8]. Currently, various methodological innovations have been proposed\nto improve the modelling of non-Gaussian data [9], longitudinal data [10–12], model\n2\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\ncomparison [13], and federated learning [14]. These innovations allow researchers to\nmore accurately estimate the mean and centiles of the population distribution, enable\nthe tracking of deviation scores over time, compare models to find which one gives the\nmost accurate deviation scores and update model parameters without sharing data.\nThe Predictive Clinical Neuroscience toolkit (PCNtoolkit) is an open-source\nPython package for NM, which was developed to fulfill two critical needs. First, to\nprovide a flexible and user-friendly software package for normative modelling, specif-\nically tailored to neuroimaging data, and second, to ease reproducibility efforts. The\ninitial distributions of the PCNtoolkit were published under the name nispat, with\nthe first version of the PCNtoolkit being released as version 0.15 in 2020. A protocol\nwas published for version 0.20 of the PCNtoolkit in 2022 [15], and continued develop-\nment led to the last release within this generation, version0.35 in April 2025. Ongoing\nmethod development and an increased adoption of the NM paradigm demanded a\nmajor refactoring of the PCNtoolkit to keep it usable and maintainable. This led to\nthe PCNtoolkit version 1.x.x, with an easier interface to familiar NM techniques,\nrefinement of model estimation and inference techniques in addition to new features\nsuch as longitudinal modelling and federated pipelines.\nThe present protocol provides an update to the 2022 protocol [15] and: (i) pro-\nvides an overview of the recommended workflow for normative modelling, illustrated\nwith examples of the main methods implemented in the latest PCNtoolkit version\n1.x.x, and (ii) accommodates the updated syntax and also showcases new features,\nincluding new estimation algorithms for modelling non-Gaussian distributions [9] and\nlongitudinal data [12]. In more detail, this protocol provides updated code examples\nfor the methods covered in the 2022 protocol, and extends the protocol by covering\nlongitudinal normative modelling, hierarchical Bayesian regression (HBR), model com-\nparison using the Watanabe–Akaike information criterion (WAIC) [13], data synthesis,\ndata harmonization, post-hoc-analysis suggestions, and federated learning methods;\ntransferring, extending, and merging models. See figure 1 for an overview of the\nprotocol.\nRather than a comprehensive PCNtoolkit guide, this protocol provides general\nrecommendations to brain charting through normative modelling, accompanied by\ncode examples written for the PCNtoolkit version 1.x.x. For a complete reference to\nthe PCNtoolkit, see the PCNtoolkit documentation and additional examples in the\nassociated code repository. Moreover, the general workflow is also applicable to other\nsoftware implementations [5, 6, 16, 17].\nThe protocol is structured into four workflows. The first workflow presents the basic\nsequential process that can be followed to fit and use NM with the PCNtoolkit. Next,\nwe provide a set of secondary workflows that can be applied in any order, but require\na fitted model (i.e., completed main workflow). The subsections on federated learning\nmethods require one or more pre-estimated models, and in some cases an independent\ntraining dataset. Finally, the final section focuses on community engagement.\nIn more detail, insection 1, the reader is guided through a complete workflow of\ncreating a set of normative models with PCNtoolkit. First, for completeness, we pro-\nvide a summary of the NM paradigm, what the goals of NM are, and how the NM\nparadigm can be applied to neuroimaging data. Then, we walk through the installation\n3\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nFig. 1: Overview of the protocol.\nand verification of the PCNtoolkit. This is followed by a step-by-step guide through\nthe process of creating a normative model. The user is guided through the selection\nand filtering of a dataset on the basis of clinical and demographic information, data\nmissingness, research question, and data modality. Then, we discuss Quality Control\n(QC) procedures and demonstrate how to use the PCNtoolkit to perform some basic\ndata preprocessing steps, such as outlier detection and cleaning, missingness detection\nand handling, train-test splitting, and data scaling. We select and configure the algo-\nrithm and model parameters, which depend on the data regime, model requirements,\nand the research question. We fit the model on the selected and prepared dataset,\nand discuss the fitting of NMs in parallel on a compute cluster, and the evaluation of\nmodels in a K-Fold cross-validation setting. We then verify the output and discuss the\ninterpretation of the resulting plots, results, and model evaluation metrics.\nIn section 2, we cover a variety of post-hoc analysis methods and extensions\nto normative modelling, including the following topics: longitudinal NM using Z-\ndiff scores and the velocity modelling framework, HBR model comparison using the\n4\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nWAIC [13], and how to use this framework for hypothesis testing, data synthesis, data\nharmonization, and other post-hoc-analysis ideas.\nIn section 3 we cover the three strategies for federated learning that are imple-\nmented in the PCNtoolkit: i) model transfer, which allows for transferring a reference\nmodel to a new local dataset (resulting in a smaller local model which can be used on\nthe local data), ii) model extension, which allows for extending a reference model with\ndata from a new dataset (resulting in a larger reference model which can be used on\nboth the new and the original dataset; and iii) model merging, which allows for merg-\ning two or more fitted models into a single model. We discuss the differences between\nthese strategies and when to use each one.\nIn section 4, we cover the community engagement aspect of the PCNtoolkit. We\nmention our GitHub repository, which contains all the source code, and how to interact\nwith the community. We also mention our contribution guidelines and how possible\ncontributions could look like.\n1.2 Applications of Normative Modelling\nNM is a general framework for deriving subject-level deviation scores from a reference\npopulation, and as a consequence, it can be applied to a wide range of (medical) data\nand research questions. Here we list a few:\n1. Derive a normative model from a reference population of healthy individuals,\nand draw further conclusions about the population based on the estimated model\nparameters and deviation scores. For example, in Gaiser et al. [18], a large-scale nor-\nmative model is fit to the cerebellum to understand cerebellar development across\nthe adolescent period.\n2. Transfer a reference NM trained on a large, healthy population to a new, smaller\nlocal dataset from a different population, and use it to derive biomarkers from the\nlocal clinical samples, for example, as is done in the context of psychosis in Worker\net al. [19].\n3. Use normative models fit to cross-sectional data to identify centile crossings in\nlongitudinal data, as is done in the context of schizophrenia in Rehak Buckova et al.\n[11] and in dementia in Bayer et al. [12].\n4. Use z-scores as features in downstream analysis, for example, as features in\nsupervised [20] or unsupervised [21] machine learning models.\n1.3 Comparison to other Methods\nWhen NM is viewed as a framework for deriving centiles of variation from a refer-\nence population, a wide range of statistical methods can serve as its foundation. For\nexample, the original Gaussian processes (GP) approach for NM [3, 22], but also other\nalgorithms can be used, including Bayesian Linear Regression (BLR) [23], Hierarchi-\ncal Bayesian Regression (HBR) [24, 25] and generalised additive models of location,\nscale and shape (GAMLSS) [5, 26]. Future developments may expand the range of\nmethods that can be seen as forms of NM even further. We will not compare these\nspecific methods here because all methods are highly flexible, depending on different\nparameterisations (e.g., the specific distributional form for the centiles, underlying\n5\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nbasis for modelling non-linearity, etc.) and whether the model is probabilistic or not.\nThese modelling choices are more important than the particular method chosen, and\nalso make direct comparison difficult. Rather, we believe that the most important\naspect of NM is not the underlying (regression) method itself, but the framework that\nit provides for analyzing deviation scores from a reference population. The important\nfeature of normative modelling is that it estimates centiles of variation in addition\nto the mean. In contrast, a regression model that is fit to data but does not pro-\nvide centiles of variation, such as a simple linear regression model, can be seen as\nan approximation to NM when viewed in this light. However, such simple methods\ndo not properly account for different variance components [7] and may not provide a\ngood fit of the estimated centiles to the underlying data, which can bias downstream\ninferences [9, 27]. Moreover, it is increasingly recognised that the choice of reference\ncohort is of equal importance to the choice of algorithm, because of the potential for\nintroducing demographic or racial bias [28, 29].\n1.4 Experimental Design\nSeveral important design decisions should be considered before applying this protocol.\nThe most fundamental choices in designing a normative modelling analytic strategy\nare the choice of response variables and covariates. Whilst many normative mod-\nels predict brain-derived measures based on age and sex, whilst accounting for site\n(e.g., using batch effects), this is not the only way such models can be employed.\nFor example, in Marquand et al. [3] a normative model was constructed to predict\nreward-related brain activity on the basis of delay discounting scores, which measure\nindividual variability in reward sensitivity and are sometimes taken as a proxy for\nimpulsivity.\nAnother important design consideration is the choice of the reference cohort and\ndata partitioning strategy (i.e., how to divide the available data into training and\ntest sets), which directly influences how the deviations from such a model should be\ninterpreted. Whilst these aspects have been covered elsewhere [7, 15], in brief, the ref-\nerence cohort should be chosen to be as representative as possible of the population\nof study interest. If a healthy cohort is chosen, then deviations should be understood\nwith respect to this cohort. Alternatively, a population-based cohort could be used\n(i.e., including individuals with brain disorders), but then the prevalence of each sub-\ngroup in the reference cohort (or training set) should be taken into consideration when\ninterpreting the target cohort (or test set), where the prevalence may be different. In\nmany cases (for example, for rare conditions), it may be desirable to ensure that a\nsufficient number of individuals having the phenotype of interest are retained in the\ntest set to maximise the possibility of detecting downstream effects. In other words,\nthe training and test sets should be stratified for the clinical labels.\n1.5 Regulatory Approvals\nIf the goal is to fit a normative model from scratch, the main regulatory considera-\ntion is permission to use the data used to fit the model, because in most instances\nit is necessary to aggregate cohorts to provide good coverage of the lifespan. This\n6\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\ncan either be performed using publicly available datasets or according to appropriate\nmaterial transfer agreements (MTAs). If the data cannot be shared, federated learning\ntechniques can be employed to estimate normative models without the need to share\ndata [14, 24]. Alternatively, if the goal is to transfer a pre-fit model to a new dataset,\nsuch approval is not necessary because this is done via the model coefficients, which\nare summary statistics that are no longer traceable to any individual participant.\n2 Materials\n2.1 Hardware\nComputing infrastructure required to run this protocol: a Mac, Linux or Windows\nwith installed Windows Subsystem for Linux (WSL) computer with enough space to\nstore the derived phenotypes from imaging data of the train and test set. In addition,\nan HPC infrastructure (Slurm or Torque) can be beneficial to parallelize fitting large\nnumbers of normative models (e.g., for high-resolution data)\nThe code examples in this protocol have been tested on a MacBook Pro with an\nM3 Chip and 18GB of RAM. The dataset used in these experiments is rather small,\nso everything should be possible to run on any computer with a similar or lower\nconfiguration. Experiments using a larger dataset will require more RAM, and we\nrecommend using a compute cluster for those cases (See section 3.2.9 for more details\nabout fitting models in parallel using the PCNtoolkit). The resources requested for\nthe jobs in these experiments are always 16 GB of RAM and 4 cores.\nConsidering the available hard disk space, we recommend having at least 10 times\nthe size of the dataset in free hard disk space. This is because the PCNtoolkit stores\nthe predicted centiles (5 centiles per subject per feature), the Z-scores and the log-\nprobability of the observed values (both 1 per subject per feature), which all together\nadd up to 7 times the size of the original dataset. The saved model files and the plots,\nwhich are generated by the PCNtoolkit, also take up additional space. However, these\nfunctionalities can be disabled if the user desires to save disk space.\n2.2 Software\nThe protocol requires Python 3.12 or a later installation and a few\ncore scientific packages. Python can be installed from the official\nPython website (https://www.python.org/downloads/) or via Anaconda\n(https://www.anaconda.com/download ). We recommend running the code examples\nwithin an isolated virtual environment, which can be created either with conda or the\nbuilt-in venv module, to ensure dependency consistency across platforms. Detailed\ninstallation instructions are available from the official documentation for each tool:\n• conda: https://docs.anaconda.com/anaconda/install/\n• venv: https://docs.python.org/3/tutorial/venv.html\nThe main package used in this protocol is PCNtoolkit (current ver-\nsion 1.x.x), which can be installed via pip following the instructions at\nhttps://pcntoolkit.readthedocs.io/en/latest/pages/quickstart.html.\n7\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nSoftware environment\nIn this manuscript, all analyses were performed in Python 3.12 using the follow-\ning core packages: PCNtoolkit 1.1.2; PyMC 5.25.1; ArviZ 0.22.0; PyTensor 2.31.7;\nNumPy 2.2.6; SciPy 1.16.1; scikit–learn 1.7.2; pandas 2.3.2; xarray 2025.9.0; xar-\nray–einstants 0.9.1; matplotlib 3.10.6; h5py 3.14.0; and h5netcdf 1.6.4. Optional\ncomponents used where noted include nutpie 0.15.2 (for accelerated sampling) and\nseaborn 0.13.2 (for figure styling).\nRandom seeds were fixed globally using np.random.seed(42) and\npymc.set\ntt rng(42)to ensure bitwise-reproducible sampling, data splitting, and\nposterior predictive simulations. All scripts and environment specifications are\navailable in the accompanying repository, allowing for exact replication of the\ncomputational environment.\n2.3 Data\nThe current protocol uses the FCON1000 dataset, which can be downloaded from\nhttps://fcon\n1000.projects.nitrc.org/.\n3 Procedure\n3.1 Installation and verification of the PCNtoolkit\nWe recommend using a virtual environment to run the PCNtoolkit. This can be done\nusing a Conda environment (conda) or a Python virtual environment (venv). The\nPCNtoolkit can be installed using pip:\npip install pcntoolkit\nAnd the installation can be verified by running the following command:\npython -c \"import pcntoolkit; print(pcntoolkit.__version__)\"\n3.2 Main workflow\n3.2.1 Data Selection\nData inclusion criteria\nA large reference – or normative – set is required to train a well-calibrated normative\nmodel. A generalizable normative model should incorporate two types of uncertainty:\nepistemic and aleatoric [7]. Epistemic uncertainty refers to the uncertainty in our\nknowledge, while aleatoric uncertainty arises from the natural variation in the popula-\ntion. The latter is what we aim to capture with our normative models; to model it as\naccurately as possible, the former has to be minimized. Because epistemic uncertainty\ndecreases with the size of the training data, it is desirable to have a large reference\ndata set. It is also important to consider the range of the covariates [30]. For example,\nsince many datasets have a limited age range, which can introduce collinearity with\n8\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nbatch or site effects, it is desirable to have overlap (i.e., multiple datasets spanning\nthe same age range) to assist in disentangling these effects.\nWhat can be considered a reference set depends on the research question. If the\ngoal is to parse deviations from a healthy population, the reference set should include\nonly healthy participants. If the goal is to parse heterogeneity within a clinical cohort,\none could choose to use a set of participants with that particular condition as the\nreference set. For instance, if one is interested in parsing heterogeneity within a clinical\ncohort of patients with major depressive disorder (MDD), the reference set could\ninclude MDD patients who have responded well to medication, and contrast them\nwith non-responders.\nWe will use cortical thickness imaging-derived phenotypes (IDPs) from several\ndifferent datasets, which we list below. The datasets are all pre-processed and included\nin this repository. All preprocessing was done using Freesurfer with the recon-all\ncommand. We extracted the mean cortical thickness values following the Destrieux\nparcellation [31].\nWe use the FCON1000 dataset, which is freely available at theFCON1000 website.\nIn addition, we will use several clinical datasets from the OpenNeuro portal:\n1. ds003568 – Contains data from 49 subjects (31F, 18M), aged between 12 and\n19 (mean 15.9, std 1.8). Clinical labels are Healthy Control (HC, 20) and Major\nDepressive Disorder (MDD, 29) [32].\n2. ds003653 – Contains data from 73 subjects (56F, 17M), aged between 19 and 44\n(mean 29.1, std 6.6). Clinical labels are HC (39) and MDD (33) [33].\nimport pandas as pd\nfcon_df = pd.read_csv(\n\"https://raw.githubusercontent.com/predictive-clinical-neuroscience/\n,→pu25_code/refs/heads/main/data/fcon1000.csv\"\n)\nopenneuro_df = pd.read_csv(\n\"https://raw.githubusercontent.com/predictive-clinical-neuroscience/\n,→pu25_code/refs/heads/main/data/openneuro.csv\"\n)\ndata = pcn.load_data(\"data/fcon1000.csv\")\nCovariates, batch effects, and response variables\nAs we will see, the input data generally consist of several covariates, batch effects,\nand response variables. The covariates are typically continuous variables, like age and\nheight, but can also be numerically encoded categorical variables (for example, using\none-hot-encoding), like sex. The batch effect dimensions are categorical variables, for\nexample, scan site or scanner brand/model. The response variables are continuous\nvariables, like brain volumes or brain activity patterns. All these variables must be\navailable for all subjects in the reference set.\nNormative models made using PCNtoolkit version1.x.xconsist of one or more\nlinear regressions on the selected covariates, together with a mechanism to handle\n9\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nbatch effects, the details of which depend on the model type. The choice of covariates\nand batch-effect variables depends on the research question, but they should gener-\nally include all factors known to influence the measurements and be available in all\ndatasets, such as age, sex, sleep quality, or site. Batch effects are used to group sub-\njects together in the same way that random effects are used in classical multilevel\nmodelling. Typically, batch effects are used to model site or scanner effects, but they\ncould also be applied to model ancestry or imaging field strength. Whilst batch effects\ncan be encoded as dummy encoded covariates, this is less flexible and is equivalent to\na fixed intercept model (i.e., only allowing shifts in the intercept). In contrast, using\nbatch effects allows the model to estimate separate slope, intercept and basis function\nparameters for each site with partial pooling across levels of the batch effects (see Kia\net al. [14, 24].\nThe primary goal in normative modelling is to compute deviation (Z) scores that\nare as independent as possible from these covariates and batch effects. For example,\nif the aim is to study the effect of meditation on brain activity, the normative model\nshould include all relevant covariates—such as age, sex, sleep quality, and scanner\nsite. By modelling and removing the variance explained by these known factors, the\nresulting deviation scores reflect variability that is most likely related to effects of\ninterest rather than demographic or technical influences.\nWhen selecting covariates, care must be taken when including variables that are\nhighly correlated with one another, as multicollinearity can inflate parameter uncer-\ntainty and make it difficult to interpret the contribution of individual covariates.\nHowever, this is principally relevant for applications where parameter interpretation is\nimportant. For example, if both sleep quality and stress level are included as predictors,\ntheir shared variance may obscure distinct effects on brain activity.\nsubject_id = \"sub_id\"\ncovariates = [\"age\"]\nbatch_effects = [\"site\", \"sex\"]\nresponse_variables = [ col for col in df.columns if col not in covariates +\n,→batch_effects + [subject_id]]\nresponse_variables = list(\nfilter(lambda x: df[x].var() > 0, response_variables)\n)# Remove variables with no variance\n3.2.2 Data Preparation\nPreparing the data for modelling involves a sequence of preprocessing steps designed to\nensure data quality and reproducibility. Note that standard neuroimaging preprocess-\ning steps such as spatial normalisation and/or cortical surface reconstruction should\nbe applied first. All subjects in a normative modelling analysis need to be in the same\nspace.\nSteps specific to normative modelling include handling missing values and outliers,\ncombining multiple data sources, splitting the data into training and test sets, and\nrescaling covariates and response variables (e.g., via standardisation). All procedures\n10\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nare implemented in the PCNtoolkit, with the NormData class providing dedicated\nfunctionality for data cleaning, harmonization, and preparation.\nPCNtoolkit automatically excludes subjects with missing values and identifies out-\nliers based on feature-wise Z-scores. The stringency of this filter is controlled via the\nz thresholdargument. In this tutorial, we setz threshold = 10 to illustrate the proce-\ndure while avoiding unnecessary data loss; however, for practical analyses, a different\nvalue may be selected to balance sensitivity and specificity. However, care needs to be\ntaken not to be overly aggressive with outlier removal as this may result in truncating\nthe empirical distribution, thereby making it difficult to model with a continuous dis-\ntribution. If a customized approach to missingness or outlier handling is desired, data\nmay also be preprocessed externally before being passed to PCNtoolkit.\nThe train–test split is performed using thetrain\ntest splitfunction from\nscikit-learn, with automatic stratification on batch-effect variables (e.g.,siteand\nsex) to maintain balanced distributions and to avoid demographic or acquisition drift\nbetween subsets. Data scaling is applied internally, and the corresponding scaling\ncoefficients are stored within the fitted model object to ensure that identical trans-\nformations are applied to all future data during prediction, or when transferring the\nmodel.\nfrom pcntoolkit import NormData\nimport numpy as np\nreference_norm_data = NormData.from_dataframe(\nname=\"fcon1000\",\ndataframe=df,\ncovariates=covariates,\nbatch_effects=batch_effects,\nresponse_vars=response_variables,\nsubject_ids=subject_id,\nremove_Nan=True,\nremove_outliers=True,\nz_threshold=10,\n)\n3.2.3 Combining Different Data Sources\nIn many cases, the reference data are supplemented with additional cohorts from other\nsources. Here, we demonstrate how to combine the FCON1000 dataset with clinical\ndatasets from OpenNeuro. The reference dataset has already been loaded above; we\nnow load the additional data into NormData objects, merge the healthy controls with\nthe reference set, and retain the patient subset for subsequent analysis.\n# Split into healthy and patient groups\nhealthy_idx = openneuro_df[\"group\"] == \"HC\"\nhealthy_group = openneuro_df[healthy_idx]\npatient_groups = openneuro_df[~healthy_idx]\n# Load the healthy and patient data\n11\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nhealthy_clinical_data = NormData.from_dataframe(\nname=\"openneuro_healthy\",\ndataframe=healthy_group,\ncovariates=covariates,\nbatch_effects=batch_effects,\nresponse_vars=response_variables,\nremove_Nan=True,\nremove_outliers=True,\n)\npatient_clinical_data = NormData.from_dataframe(\nname=\"openneuro_patient\",\ndataframe=patient_groups,\ncovariates=covariates,\nbatch_effects=batch_effects,\nresponse_vars=response_variables,\nremove_Nan=True,\nremove_outliers=True,\n)\n# Merge the healthy data with the reference data\nnorm_train = reference_norm_data.merge(healthy_clinical_data)\n# Check for schema compatibility before merging with the patient data\nif not norm_train.check_compatibility(patient_clinical_data):\nraise ValueError(\"The data are not compatible.\")\nelse:\nprint(\"The data are compatible.\")\n3.2.4 Train–Test Split\nThe reference set is divided into a training set used for model fitting and a test set\nreserved for model evaluation. Assessing the model on independent data is critical\nto ensure accurate estimates of generalisabilty. If the goal is to provide a reference\nmodel for new data, the model can be refit to the whole dataset once generalisability\nhas been established. A trade-off exists between model accuracy and the reliability\nof performance estimates: a larger training set improves model precision, whereas a\nlarger test set yields more stable validation metrics. We recommend an 80/20, 70/30\nor 50/50 train–test ratio, depending on dataset size.\nTo obtain unbiased evaluation metrics, the test set should be statistically similar\nto the training set. Stratification of the train–test split ensures that the distributions\nof key batch-effect variables (e.g., site, sex) remain consistent across subsets. The\nfollowing command performs this operation using theNormData.train\ntest split\nmethod, which internally calls the scikit-learn function of the same name.\ntrain, test = norm_train.train_test_split(\nsplits=(0.8, 0.2),\nsplit_names=[\"train\", \"test\"],\nrandom_state=42,\n12\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\n)\n3.2.5 Standardization\nAs described above, PCNtoolkit automatically stores the scaling coefficients used dur-\ning feature scaling parameters within the fitted model object. Note that these are\nderived from the training set only. This ensures that all subsequent predictions and\ntransferred models apply the same transformations to incoming data, eliminating man-\nual preprocessing and guaranteeing consistent scaling across datasets during model\ndevelopment and deployment stages.\n3.2.6 Fitting Models\nTheNormativeModelclass\nThe central class in PCNtoolkit is the NormativeModel class. It has all the functions\nthat are required to build and use a normative model. To construct aNormativeModel\nobject, the user needs to provide a template regression model, which is copied and fit\nfor each feature in the reference set. Here we use the default (unparameterized) HBR\nclass as a template, but we will elaborate on this next.\nfrom pcntoolkit import NormativeModel, HBR\nmodel = NormativeModel(\ntemplate_regression_model=HBR(),\nsave_dir=\"../out/models/main_workflow_model_default_HBR\",\ninscaler=\"standardize\",\noutscaler=\"standardize\",\nname=\"main_workflow_model_default_HBR\",\n)\nTheRegressionModelclass\nThe current implementation of PCNtoolkit only supports the creation of univariate\nnormative models. This means that each NormativeModel object contains a collection\nof RegressionModel objects. For each response variable, the NormativeModel class\nmakes a copy of a template regression and fits it to the response variable. This template\nis what the user should provide when creating a NormativeModel.\nThe RegressionModel type currently has two concrete implementations: Bayesian\nLinear Regression (BLR) and Hierarchical Bayesian Regression (HBR), and both\ncome with their strengths and weaknesses. For an in-depth coverage of both methods,\nincluding their parameters and usage, please refer to the GitHub repository.\n3.2.7 Hierarchical Bayesian Regression (HBR)\nA Hierarchical Bayesian Regression (HBR) model assumes a hierarchical generative\nprocess for the data and performs full posterior estimation via Markov Chain Monte\nCarlo (MCMC). Configuring an HBR model involves specifying a likelihood function\nand defining priors over its parameters, which may include both fixed and random\neffects.\n13\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nThe PCNtoolkit provides several likelihood functions, including Normal, SHASHb,\nand Beta likelihoods, each suited to different data characteristics [9]. The Normal\nlikelihood is used for approximately Gaussian response variables, the SHASHb likeli-\nhood for unbounded and asymmetric data, and the Beta likelihood for interval data\nbounded within a fixed range. Each likelihood has interpretable parameters, such as\nµ(mean),σ(scale),ϵ(skewness), and δ (kurtosis), which can each be modeled hier-\narchically to capture structured variation across covariates and batch effects. See [9]\nfor more details.\nOnce a likelihood is chosen, priors for each parameter are specified using the\nmake\npriorfunction. Parameters can vary linearly with covariates, include random\n(batch) effects, or remain fixed across all observations. Random-effect priors are imple-\nmented in a non-centered hierarchical form (reparameterized) to stabilize sampling\nand avoid funnel pathologies during inference.\nIn this workflow, we use the Normal likelihood, which assumes Gaussian feature\ndistributions after preprocessing. This is appropriate for cortical thickness data, where\ndeviations are approximately symmetric and unbounded. Bothµandσare modeled\nhierarchically as functions of age (covariate) with random intercepts for site and sex,\nallowing age effects to vary smoothly while accounting for systematic offsets across\nacquisition sites and sexes.\nExample with Normal likelihood configuration\nFor illustration, we define a hierarchical Normal model where each observation yn is\ngenerated as:\nyn ∼ N (µn, σ+2\nn) n ∈ {1, ..., N } (1)\nµn = w⊤\nµ ϕµ(xn) + τµn (2)\nσn = w⊤\nσ ϕσ(xn) + τσ (3)\nσ+\nn = ln(1 + exp(σn/3)) ∗ 3 (4)\nwσ ∼ pwσ(wσ), τ σ ∼ pτσ(τσ), w µ ∼ pwµ(wµ), (5)\nτµn = µτµ +\nβX\nb=1\nσ+\nτµ b\nντµ b[Bn,b] (6)\nµτµ ∼ pµτµ (µτµ), σ τµ b ∼ p+\nστµ b\n(στµ b) (7)\nντµ b ∼ N (ντµ b | 0vb, Ivb) (8)\nwhere:\n• β is the number of batch effect dimensions, i.e., for batch effects (site, sex), β = 2.\n• vb denotes the number of unique batch effect values of the b’th batch effect\ndimension.\n• N denotes the number of observations/datapoints.\n• B denotes the N × β matrix of batch effect indices.\n14\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nEquations 1–8 describe the generative process implemented in the PCNtoolkit’s\nHBR class, where each response variable is modeled with a Normal likelihood whose\nmean (µ n) depends on covariates (here, age) and batch effects (here, site and sex),\nand whose variance (σ 2\nn) depends on covariates. The following code block imple-\nments this hierarchical Normal model. The mean term varies nonlinearly with age\nand includes site- and sex-specific intercepts, while the standard deviation term cap-\ntures heteroskedasticity across age. For brevity, comments have been omitted; a fully\nannotated version and extended examples are provided in the accompanying Jupyter\nnotebook.\nfrom pcntoolkit import make_prior, BsplineBasisFunction, NormalLikelihood, HBR\nmu = make_prior(\nlinear=True,\nslope=make_prior(),\nintercept=make_prior(\nrandom=True,\nmu=make_prior(dist_params=(0.0, 3.0)),\nsigma=make_prior(\ndist_params=(0.0, 3.0),\nmapping=\"softplus\",# <- ensures sigma > 0\nmapping_params=(0.0, 3.0),\n),\n),\nbasis_function=BsplineBasisFunction(basis_column=0, nknots=5, degree=3),#\n,→<- Allow nonlinearity\n)\nsigma = make_prior(\nlinear=True,\nslope=make_prior(dist_params=(0.0, 5.0)),\nintercept=make_prior(dist_params=(1.0, 3.0)),\nbasis_function=BsplineBasisFunction(basis_column=0, nknots=5, degree=3),\nmapping=\"softplus\",# <- ensures sigma > 0\nmapping_params=(0.0, 3.0),\n)\nlikelihood = NormalLikelihood(mu, sigma)\ntemplate_hbr = HBR(\nname=\"template_hbr\",\nlikelihood=likelihood,\n)\n# Create a NormativeModel with the template regression model\nmodel = NormativeModel(\ntemplate_regression_model=template_hbr,\nsave_dir=\"../out/models/main_workflow_model_HBR\",\n)\n15\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nThis example demonstrates how hierarchical priors for µ and σ can be defined\nin the PCNtoolkit using B-spline basis functions and positive-domain mappings\n(via softplus). The non-centered parameterization and weakly informative pri-\nors improve sampling efficiency and numerical stability. Detailed explanations,\nadditional likelihoods (SHASHb, Beta), and alternative parameterizations (linear,\nrandom, fixed) are available in the accompanying Jupyter notebook. For full\nconfiguration details and extended examples, we refer the readers to the tuto-\nrial on the main PCNToolkit workflow at https://github.com/predictive-clinical-\nneuroscience/pu25\ncode/blob/main/notebooks/1 main workflow.ipynb.\n3.2.8 Fitting the normative model to the data\nWith the data prepared and both the template regression model and the normative\nmodel configured, we fit on the training set and generate predictions for the test set in\na single call. This command estimates the hierarchical parameters, applies the trained\nmodel to the test data, and stores predictions, model artifacts, evaluation metrics, and\ndiagnostic plots.\nmodel.fit_predict(train, test)\nAn overview of model outputs\nRunning the workflow writes all intermediate and final outputs in the specified output\npath undermodels/main workflow model HBR/. This directory contains the fitted\nmodel specifications, posterior samples, evaluation metrics, and visual diagnostics used\nin the subsequent sections.\n• model/— per-feature subfolders containing idata.nc (posterior traces and diag-\nnostics) and regression\nmodel.json(serialized feature-level model specification);\nplus a top-levelnormative model.jsonsummarizing the full multi-feature model\nconfiguration.\n• plots/— centile and quantile–quantile (QQ) diagnostics for both train and test\nsets, one set per feature.\n• results/— CSV outputs with all numerical results:\ncentiles\n{train,test}.csv,Z {train,test}.csv,logp {train,test}.csv,\nandstatistics {train,test}.csv.\nA schematic of the default folder hierarchy:\nmodels/\n|-- main_workflow_model_HBR/\n| |-- model/\n| | |-- <feature_1>/\n| | | |-- idata.nc\n| | | ‘-- regression_model.json\n| | |-- <feature_2>/\n| | | |-- idata.nc\n| | | ‘-- regression_model.json\n| | ‘-- normative_model.json\n16\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\n| |\n| |-- plots/\n| | |-- centiles_<feature>_{train,test}_*.png\n| | ‘-- qq_<feature>_{train,test}.png\n| |\n| ‘-- results/\n| |-- centiles_{train,test}.csv\n| |-- Z_{train,test}.csv\n| |-- logp_{train,test}.csv\n| ‘-- statistics_{train,test}.csv\nEach folder is automatically generated by the PCNtoolkit during model fitting\nand prediction. Together, these outputs contain all information needed to reproduce\nthe diagnostics, metrics, and visualizations, which will be described in Section 3.3.1.\nA fully commented walkthrough of the saving conventions, file contents, and plot\ninterpretation is provided in the accompanying Jupyter notebook.\n3.2.9 Fitting Models in Parallel\nFor large datasets (either in terms of the number of samples and response variables),\nfitting models sequentially can be quite time-consuming. In case of access to a high-\nperformance computing cluster running standard workload managers such as Torque\nor Slurm, operations such as ‘fit\npredict‘ can be parallelized by using the Runner class.\nThe Runner class will allow running the same tasks as in the previous step, but in\na distributed fashion. This is a little more involved than in the previous step. For\nexample, you need to specify the Python executable and time and memory limits for\neach job. You can also specify the number of batches into which to split the workload.\nHere we will use a Slurm cluster, using one model per batch with 12 hours max run\ntime and 2GB memory per job. This can be achieved in the following way:\nimport os\nimport sys\nfrom pcntoolkit import Runner\n# get path to python executable\nvenv_path = os.path.join(os.path.dirname(os.path.dirname(sys.executable)))\n# configure runner\nrunner = Runner(\ncross_validate=False,\nparallelize=True,\nn_batches = len(reference_norm_data.response_vars),\nenvironment=venv_path,\njob_type=\"slurm\",\ntime_limit=\"12:00:00\",\nmemory = \"2GB\",\nn_cores=1,\n17\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nlog_dir=\"../out/models/main_workflow_model_HBR/logs/\",\ntemp_dir=\"../out/models/main_workflow_model_HBR/tmp/\",\n#preamble = \"<commands to run before the job starts>\",\n)\nrunner.fit_predict(model, train, train, observe=False)\n3.3 Secondary workflows\nIn this section, we describe secondary workflows in Figure 1 that can be performed\nafter the primary normative model has been fit.\n3.3.1 Model Evaluation and Calibration\nModel evaluation in normative modelling involves three complementary components:\n(i) inspection of convergence diagnostics to ensure stable inference, (ii) assessment of\nquantitative model performance using both mean-based and probabilistic metrics, and\n(iii) visualization of model calibration through posterior predictive plots.\nConvergence diagnostics\nPosterior summaries were inspected for all model parameters using the ArviZ visuali-\nsation framework, including slope coefficients, site- and sex-specific offsets, and scale\nparameters. For the Hierarchical Bayesian Regression (HBR) models used in this\nworkflow, convergence and sampling reliability were evaluated using standard MCMC\ndiagnostics: all parameters were required to have potential scale reduction factors ( ˆR)\n≤ 1.01, bulk and tail effective sample sizes (ESS) ≥ 1000, and Monte Carlo standard\nerrors (MCSE) below 1% of the corresponding posterior standard deviation. Chains\ncan also be visually inspected to confirm stationary and adequate mixing across draws.\nThese diagnostics apply specifically to HBR models, which rely on Markov Chain\nMonte Carlo sampling. For Bayesian Linear Regression (BLR) models—estimated via\ndeterministic optimization rather than MCMC—different convergence criteria apply,\ntypically based on gradient norms and optimizer tolerance; however, such models are\nnot used in the present analysis.\nTo illustrate the diagnostic workflow, we performed this analysis for one represen-\ntative feature (lh\nG and S frontomargin). This feature was chosen as a typical example\nwith a moderate age effect, heteroskedasticity, and variance across sites. Because con-\nvergence behavior is generally consistent across features with comparable sample sizes\nand data characteristics, diagnostics for this representative case serve as a practical\nindicator of overall model stability.\nAll convergence diagnostics were computed with ArviZ and\nsaved per feature in the corresponding idata.nc file under\n../out/models/main\nworkflow model HBR/model/<feature>/. Each file stores\nposterior draws and diagnostics, including ˆR, ESS, and MCSE. The aggregated\ndiagnostic table can be reproduced directly from disk:\nimport arviz as az\n18\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nFig. 2: Traceplots for the covariate slope parameters (slope mu) from the Normal\nlikelihood model. Each color represents one of the MCMC chains. The dense, well-\nmixed traces indicate stable posterior exploration and full chain convergence.\nimport pandas as pd\nfrom pathlib import Path\nmodel_dir = Path(\"../out/models/main_workflow_model_HBR/model\")\nrows = []\nfor d in model_dir.iterdir():\nif d.is_dir() and (d / \"idata.nc\").exists():\nidata = az.from_netcdf(d / \"idata.nc\")\ns = az.summary(idata)# includes R-hat, ESS_bulk, ESS_tail, MCSE\ns[\"feature\"] = d.name\nrows.append(s)\ndiag = pd.concat(rows)\ndiag.to_csv(\"../out/models/main_workflow_model_HBR/results/diagnostics_summary\n,→.csv\")\nTraceplots for the slope parameters of µ (slope mu) are shown in Fig. 2. All\nfour chains overlap tightly and explore the posterior space evenly, with no visual\nsigns of non-stationarity, confirming that the hierarchical regression components have\nconverged well.\nTable 1: Aggregated posterior diagnostics\nacross all parameters (74 total). Values are\nshown as median [IQR].\nMetric Median IQR\nˆR1.00 1.00–1.01\nESSbulk 4513 2176–5502\nESStail 3789 2533–4454\nPosterior mean -0.11 [-0.61, 0.35]\nPosterior SD 0.42 [0.24, 0.92]\nMonte Carlo SE (mean) 0.006 [0.003, 0.026]\n19\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nQuantitative model performance\nModel performance was then evaluated using both mean-based and probabilistic met-\nrics exported by the PCNtoolkit. Mean-based metrics—the coefficient of determination\n(R2), root mean squared error (RMSE), standardized mean squared error (SMSE),\nand explained variance (EXPV)—quantify how well the predicted mean ˆyapproxi-\nmates the observed data. Probabilistic metrics assess the full predictive distribution;\ntherefore, they are more suitable for evaluating the quality of estimated centiles in\nthe NM context: the mean standardized log loss (MSLL) compares predictive log-\nlikelihoods to a variance-standardized baseline, while the mean absolute centile error\n(MACE) [34] measures the average deviation of predicted centiles from their expected\nuniform targets. Lower RMSE, SMSE, MSLL and MACE values, and higher EXPV\nandR 2 indicate better overall fit.\nBecause normative models estimate both mean and variance, mean-based\nmetrics alone provide an incomplete view of performance. A model may display\nmodestR 2 yet excellent calibration of uncertainty (and vice versa), as reflected\nin the centile and QQ plots (Fig. 3). This reflects the purpose of normative mod-\nelling: to accurately estimate the conditional distribution of each feature rather\nthan only its central tendency. All metric values are stored automatically in\n../out/models/main\nworkflow model HBR/results/statistics {train,test}.csv\nand can be loaded as follows:\nimport pandas as pd\ntable = pd.read_csv(\n\"../out/models/main_workflow_model_HBR/results/statistics_test.csv\",\nindex_col=\"statistic\"\n).T\nVisual assessment of calibration\nQualitative assessment of calibration is performed using the PCNtoolkit’s built-in\nplotting utilities.plot centilesvisualizes predicted centile trajectories across covari-\nates, whileplot qqcompares empirical and theoretical quantiles of standardized\nresiduals (Z-scores). Together, these provide complementary views of functional and\ndistributional model fit. Figure 3 shows the two plots produced by this code snippet.\nfrom pcntoolkit import plot_centiles, plot_qq\nplot_centiles(\nmodel,\ncentiles=[0.05, 0.5, 0.95],\ncovariate=\"age\",\ncovariate_range=(10, 80),\nbatch_effects={\"site\": [\"ICBM\", \"Leiden_2180\"], \"sex\": [\"M\", \"F\"]},\nscatter_data=train,\nhue_data=\"site\",\nmarkers_data=\"sex\",\nshow_other_data=True,\n)\n20\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\n(a) Centile plot\n (b) QQ plot\nFig. 3: Visualization of normative model calibration for a representative region of\ninterest.(a)Predicted centiles (5th, 50th, 95th) across age, with observed data over-\nlaid. Smooth median and outer centile alignment indicate a good functional fit across\ncovariates. (b) Quantile–quantile (QQ) plot comparing empirical versus theoretical\nquantiles of standardized residuals. Alignment with the identity line indicates accu-\nrate predictive variance; curvature suggests over- or under-dispersion. For a detailed\ninterpretation of QQ plots in normative modelling, see [26].\nplot_qq(train, plot_id_line=True)\n3.3.2 Longitudinal normative modelling\nWhilst many normative models are estimated using longitudinal data, the models\nthemselves are fundamentally cross-sectional in nature in that they involve estimating\nsmooth curves to link the group-level centiles across different timepoints. In order to\nobtain true longitudinal inferences, additional steps are necessary. Most importantly,\nto provide a statistical estimate to quantify the magnitude of change between two\nor more consecutive z-score measurements derived from a normative model. In other\nwords, longitudinal normative modelling seeks to determine whether a centile crossing\nhas occurred for a given individual between two or more timepoints. Since individuals\ndo not track centiles exactly as they move through time (e.g., due to measurement\nnoise), it is essential to take longitudinal variance into account both when estimating\nthe normative model and when evaluating change between time points, to obtain a\nreliable characterization of individual trajectories [12]. This requires longitudinal data.\nThe PCNtoolkit and supporting scripts allow for two ways of obtaining an estimate\nof the significance of longitudinal change: namely constructing ‘Z-diff’ scores that\nprovide a statistical estimate of centile crossing between two timepoints based on BLR\nmodels [11] or using velocity models, which provide the ability to estimate ’thrive\nlines’ [12] which can quantify centile crossings for larger numbers of timepoints. An\nalternative method for longitudinal data analysis is presented in Di Biase et al. [35].\n21\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\n3.3.3 Federated Learning\nA key feature of PCNtoolkit is the ability to support federated learning, as described\nin detail in [14, 24]. Concrete examples of this workflow are provide a notebook in\nthe associated code repository. In brief, several methods for federated normative mod-\neling, i.e., estimating normative models on decentralized data, are supported: model\ntransfer involves deriving a new set of coefficients for data sites that were not\nincluded in the reference dataset by distilling the information from an existing ref-\nerence model. In the context of HBR, this is operationalised by means of applying\nan informative prior distribution (corresponding to the posterior distribution from\nthe base reference model) to the new data and re-estimating parameters. Whilst this\napproach is usually the most efficient, please note that a model transfer may be prob-\nlematic if the data from the new site has a very different distribution from the data in\nthe training set. In contrast, a model extend involves updating the reference model\nwith data from new sites, which can then be passed back to the site initially con-\ntributing the original model. Whilst this is usually fairly robust, please note that\nduring model extension, if the new sites have poor quality data, this may have a dele-\nterious impact on the original model (e.g., containing severe outliers or spanning a\nnon-overlapping portion of the covariate range). A model merge can be performed to\ncombine two separately estimated normative models (e.g., from different sites). All\nthree modules operate without requiring access to the reference datasets used to derive\nthe original reference model, which is essential in settings where data privacy is a\nprimary concern.\n3.4 Community Aspects\nIn order to maximise value to the community, we provide access to an updated set\nof lifespan reference models derived from tens of thousands of individuals aggregated\nfrom datasets that span the lifespan. These models supersede the cortical thickness and\nvolumetric models presented in Rutherford et al. [4] and fractional anisotropy measures\nderived from diffusion data [36], with additional models to be brought online in the\nfuture, for example, functional connectivity normative models presented in Rutherford\net al. [20]. This is necessary because models estimated in early versions of PCNtoolkit\nare not compatible with PCNtoolkit 1.x.x. These reference models are available for\ndownload via a notebook available in the associated code repository. We welcome\nthe contributions of new models from other groups via PCNPortal [16], our online\ncollaborative normative modelling portal.\nWe also welcome contributions from the field, in terms of contributions of new\nmethods for normative modeling and different assessment metrics, as well as general\nimprovements to the code. These can be submitted through standard pull requests\nor GitHub issues to PCNToolkit GitHub repository. This was a major redesign con-\nsideration in PCNtoolkit 1.x.x and is now much easier due to the object-oriented\ndesign.\n22\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\n4 Troubleshooting\nTroubleshooting is best addressed by consulting the additional documentation avail-\nable online at ReadTheDocs, including additional tutorials for other algorithm\nimplementations, a glossary to clarify the jargon associated with the software, a\nreference guide with links to normative modeling publications and a frequently-asked-\nquestions page where many common errors (and their solutions) are discussed in detail.\nAdditionally, questions can be raised and answered by submitting issues via GitHub.\n5 Timing\nThe normative modeling portion of this protocol (including evaluation and visualiza-\ntion) can be completed in ∼57–72 minutes, although additional time is required to\nassess additional models. These timing estimates are based on the use of the compute\nplatform described above to run the code.\n6 Anticipated Results\nTogether, this protocol describes the process by which normative models can be fit,\nmodel convergence assessed and the fit of the centiles to the data to be ascertained. In\nthe example given, these quantitative and visual diagnostics confirm robust model con-\nvergence, well-calibrated uncertainty estimates, and accurate functional dependence\nacross covariates. Starting from raw, multi-site neuroimaging data, we have demon-\nstrated a reproducible end-to-end workflow covering data selection, preprocessing,\nmodel fitting, evaluation, and visualization. The resulting normative model captures\nboth central trends and population variability, allowing individual-level deviations to\nbe expressed in a statistically calibrated manner. With convergence and calibration\nverified, the model is ready for downstream applications such as longitudinal analyses,\ndata harmonization, and federated normative modeling on decentralized data.\nThis protocol covers the basic normative model estimation workflow (i.e., how to\nfit a normative model from scratch). We provide additional examples in the accom-\npanying GitHub repository, including examples showing how to transfer models to a\nnew dataset, to extend an existing model to include new sites, to synthesize data, to\nperform longitudinal analysis and to harmonise data to a common normative reference\nmodel.\n7 Key references\n• De Boer et al. [9]\n• Bayer et al. [12]\n• Fraza et al. [37]\n• Berthet et al. [38]\n• Cirstian et al. [36]\n23\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nAcknowledgments\nWe greatly acknowledge funding from the European Research Council under a\nconsolidator grant (grant number 101001118), the Wellcome Trust (grant numbers\n215698/Z/19/Z and 226706/Z/22/Z) and the Raynor Foundation (Raynor Cerebellum\nCharts project).\nCompeting Interests\nCFB is a shareholder and director of SBGNeuro. The other authors report no\ncompeting interests.\nAuthor Contributions\nAAAdB, JMMB, CF, AC, BRB: writing software, running experiments, drafting and\nrevising the manuscript; AC: running experiments, revising the manuscript; KT, ES:\nwriting software, drafting and revising the manuscript AB, RC, MZ, SR, AAK: running\nexperiments, drafting and revising the manuscript; TW, CFB, SMK: conceptualisa-\ntion, supervision, drafting and revising the manuscript; AFM: funding acqusition,\nconceptualisation, writing software, supervision, drafting and revising the manuscript.\nData Availability\nAll data used in this manuscript are publicly available. This includes the FCON1000\ndataset, which is freely available at the FCON1000 website and two clinical datasets\nfrom the OpenNeuro portal, namely:\n1. ds003568, see ref: [32].\n2. ds003653, see ref: [33].\nCode Availability\nThe PCNtoolkit software package is available online at :\nhttps://github.com/amarquand/PCNtoolkit and all code necessary to run this pro-\ntocol is available at: https://github.com/predictive-clinical-neuroscience/pu25\ncode.\nReferences\n[1] Schumann, G., Binder, E.B., Holte, A., de Kloet, E.R., Oedegaard, K.J., Robbins,\nT.W., Walker-Tilley, T.R., Bitter, I., Brown, V.J., Buitelaar, J., Ciccocioppo,\nR., Cools, R., Escera, C., Fleischhacker, W., Flor, H., Frith, C.D., Heinz, A.,\nJohnsen, E., Kirschbaum, C., Klingberg, T., Lesch, K.-P., Lewis, S., Maier, W.,\nMann, K., Martinot, J.-L., Meyer-Lindenberg, A., M¨ uller, C.P., M¨ uller, W.E.,\nNutt, D.J., Persico, A., Perugi, G., Pessiglione, M., Preuss, U.W., Roiser, J.P.,\nRossini, P.M., Rybakowski, J.K., Sandi, C., Stephan, K.E., Undurraga, J., Vieta,\nE., van der Wee, N., Wykes, T., Haro, J.M., Wittchen, H.U.: Stratified medicine\n24\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nfor mental disorders. European Neuropsychopharmacology 24(1), 5–50 (2014)\nhttps://doi.org/10.1016/j.euroneuro.2013.09.010\n[2] Finn, E.S., Todd Constable, R.: Individual variation in functional brain connec-\ntivity: Implications for personalized approaches to psychiatric disease. Dialogues\nin Clinical Neuroscience 18(3), 277–287 (2016) https://doi.org/10.31887/DCNS.\n2016.18.3/efinn\n[3] Marquand, A.F., Rezek, I., Buitelaar, J., Beckmann, C.F.: Understanding Het-\nerogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control\nStudies. Biological Psychiatry 80(7), 552–561 (2016) https://doi.org/10.1016/j.\nbiopsych.2015.12.023\n[4] Rutherford, S., Fraza, C., Dinga, R., Kia, S.M., Wolfers, T., Zabihi, M., Berthet,\nP., Worker, A., Verdi, S., Andrews, D., Han, L.K., Bayer, J.M., Dazzan, P.,\nMcGuire, P., Mocking, R.T., Schene, A., Sripada, C., Tso, I.F., Duval, E.R.,\nChang, S.-E., Penninx, B.W., Heitzeg, M.M., Burt, S.A., Hyde, L.W., Amaral,\nD., Wu Nordahl, C., Andreasssen, O.A., Westlye, L.T., Zahn, R., Ruhe, H.G.,\nBeckmann, C., Marquand, A.F.: Charting brain growth and aging at high spatial\nprecision. eLife 11, 72904 https://doi.org/10.7554/eLife.72904\n[5] Bethlehem, R.A.I., Seidlitz, J., White, S.R., Vogel, J.W., Anderson, K.M., Adam-\nson, C., Adler, S., Alexopoulos, G.S., Anagnostou, E., Areces-Gonzalez, A., Astle,\nD.E., Auyeung, B., Ayub, M., Bae, J., Ball, G., Baron-Cohen, S., Beare, R.,\nBedford, S.A., Benegal, V., Beyer, F., Blangero, J., Blesa C´ abez, M., Boardman,\nJ.P., Borzage, M., Bosch-Bayard, J.F., Bourke, N., Calhoun, V.D., Chakravarty,\nM.M., Chen, C., Chertavian, C., Chetelat, G., Chong, Y.S., Cole, J.H., Corvin,\nA., Costantino, M., Courchesne, E., Crivello, F., Cropley, V.L., Crosbie, J., Cross-\nley, N., Delarue, M., Delorme, R., Desrivieres, S., Devenyi, G.A., Di Biase, M.A.,\nDolan, R., Donald, K.A., Donohoe, G., Dunlop, K., Edwards, A.D., Elison, J.T.,\nEllis, C.T., Elman, J.A., Eyler, L., Fair, D.A., Feczko, E., Fletcher, P.C., Fon-\nagy, P., Franz, C.E., Galan-Garcia, L., Gholipour, A., Giedd, J., Gilmore, J.H.,\nGlahn, D.C., Goodyer, I.M., Grant, P.E., Groenewold, N.A., Gunning, F.M., Gur,\nR.E., Gur, R.C., Hammill, C.F., Hansson, O., Hedden, T., Heinz, A., Henson,\nR.N., Heuer, K., Hoare, J., Holla, B., Holmes, A.J., Holt, R., Huang, H., Im,\nK., Ipser, J., Jack, C.R., Jackowski, A.P., Jia, T., Johnson, K.A., Jones, P.B.,\nJones, D.T., Kahn, R.S., Karlsson, H., Karlsson, L., Kawashima, R., Kelley, E.A.,\nKern, S., Kim, K.W., Kitzbichler, M.G., Kremen, W.S., Lalonde, F., Landeau,\nB., Lee, S., Lerch, J., Lewis, J.D., Li, J., Liao, W., Liston, C., Lombardo, M.V.,\nLv, J., Lynch, C., Mallard, T.T., Marcelis, M., Markello, R.D., Mathias, S.R.,\nMazoyer, B., McGuire, P., Meaney, M.J., Mechelli, A., Medic, N., Misic, B., Mor-\ngan, S.E., Mothersill, D., Nigg, J., Ong, M.Q.W., Ortinau, C., Ossenkoppele, R.,\nOuyang, M., Palaniyappan, L., Paly, L., Pan, P.M., Pantelis, C., Park, M.M.,\nPaus, T., Pausova, Z., Paz-Linares, D., Pichet Binette, A., Pierce, K., Qian,\nX., Qiu, J., Qiu, A., Raznahan, A., Rittman, T., Rodrigue, A., Rollins, C.K.,\nRomero-Garcia, R., Ronan, L., Rosenberg, M.D., Rowitch, D.H., Salum, G.A.,\n25\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nSatterthwaite, T.D., Schaare, H.L., Schachar, R.J., Schultz, A.P., Schumann,\nG., Sch¨ oll, M., Sharp, D., Shinohara, R.T., Skoog, I., Smyser, C.D., Sperling,\nR.A., Stein, D.J., Stolicyn, A., Suckling, J., Sullivan, G., Taki, Y., Thyreau, B.,\nToro, R., Traut, N., Tsvetanov, K.A., Turk-Browne, N.B., Tuulari, J.J., Tzou-\nrio, C., Vachon-Presseau, E., Valdes-Sosa, M.J., Valdes-Sosa, P.A., Valk, S.L.,\nAmelsvoort, T., Vandekar, S.N., Vasung, L., Victoria, L.W., Villeneuve, S., Vill-\nringer, A., Vertes, P.E., Wagstyl, K., Wang, Y.S., Warfield, S.K., Warrier, V.,\nWestman, E., Westwater, M.L., Whalley, H.C., Witte, A.V., Yang, N., Yeo, B.,\nYun, H., Zalesky, A., Zar, H.J., Zettergren, A., Zhou, J.H., Ziauddeen, H., Zug-\nman, A., Zuo, X.N., Rowe, C., Frisoni, G.B., Binette, A.P., Bullmore, E.T.,\nAlexander-Bloch, A.F.: Brain charts for the human lifespan. Nature 604(7906),\n525–533 (2022) https://doi.org/10.1038/s41586-022-04554-y\n[6] Ge, R., Yu, Y., Qi, Y.X., Fan, Y.-n., Chen, S., Gao, C., Haas, S.S., New, F.,\nBoomsma, D.I., Brodaty, H., Brouwer, R.M., Buckner, R., Caseras, X., Criv-\nello, F., Crone, E.A., Erk, S., Fisher, S.E., Franke, B., Glahn, D.C., Dannlowski,\nU., Grotegerd, D., Gruber, O., Pol, H.E.H., Schumann, G., Tamnes, C.K., Wal-\nter, H., Wierenga, L.M., Jahanshad, N., Thompson, P.M., Frangou, S., Agartz,\nI., Asherson, P., Ayesa-Arriola, R., Banaj, N., Banaschewski, T., Baumeister,\nS., Bertolino, A., Borgwardt, S., Bourque, J., Brandeis, D., Breier, A., Buite-\nlaar, J.K., Cannon, D.M., Cervenka, S., Conrod, P.J., Crespo-Facorro, B., Davey,\nC.G., Haan, L., Zubicaray, G.I., Giorgio, A.D., Frodl, T., Gruner, P., Gur, R.E.,\nGur, R.C., Harrison, B.J., Hatton, S.N., Hickie, I., Howells, F.M., Huyser, C.,\nJernigan, T.L., Jiang, J., Joska, J.A., Kahn, R.S., Kalnin, A.J., Kochan, N.A.,\nKoops, S., Kuntsi, J., Lagopoulos, J., Lazaro, L., Lebedeva, I.S., Lochner, C.,\nMartin, N.G., Mazoyer, B., McDonald, B.C., McDonald, C., McMahon, K.L.,\nMedland, S., Modabbernia, A., Mwangi, B., Nakao, T., Nyberg, L., Piras, F.,\nPortella, M.J., Qiu, J., Roffman, J.L., Sachdev, P.S., Sanford, N., Satterthwaite,\nT.D., Saykin, A.J., Sellgren, C.M., Sim, K., Smoller, J.W., Soares, J.C., Sommer,\nI.E., Spalletta, G., Stein, D.J., Thomopoulos, S.I., Tomyshev, A.S., Tordesillas-\nGuti´ errez, D., Trollor, J.N., Ent, D., Heuvel, O.A., Erp, T.G., Haren, N.E.,\nVecchio, D., Veltman, D.J., Wang, Y., Weber, B., Wei, D., Wen, W., Westlye,\nL.T., Williams, S.C., Wright, M.J., Wu, M.-J., Yu, K.: Normative modelling of\nbrain morphometry across the lifespan with CentileBrain: Algorithm benchmark-\ning and model optimisation. The Lancet Digital Health 6(3), 211–221 (2024)\nhttps://doi.org/10.1016/S2589-7500(23)00250-9\n[7] Marquand, A.F., Kia, S.M., Zabihi, M., Wolfers, T., Buitelaar, J.K., Beckmann,\nC.F.: Conceptualizing mental disorders as deviations from normative function-\ning. Molecular Psychiatry 24(10), 1415–1424 (2019) https://doi.org/10.1038/\ns41380-019-0441-1\n[8] Fraza, C., Rutherford, S., Buˇ ckov´ a, B.R., Beckmann, C.F., Marquand, A.F.:\nThe promise of quantifying individual risk for brain disorders through normative\nmodeling, a narrative review. Neuroscience andhttps://doi.org/10.1038/s41586-\n022-04554-y Biobehavioral Reviews 176, 106284 (2025) https://doi.org/10.1016/\n26\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\nj.neubiorev.2025.106284\n[9] De Boer, A.A.A., Bayer, J.M.M., Kia, S.M., Rutherford, S., Zabihi, M., Fraza,\nC., Barkema, P., Westlye, L.T., Andreassen, O.A., Hinne, M., Beckmann, C.F.,\nMarquand, A.: Non-Gaussian normative modelling with hierarchical Bayesian\nregression. Imaging Neuroscience 2, 2–00132 (2024) https://doi.org/10.1162/\nimag\na 00132\n[10] Di Biase, M., Tian, Y., Bethlehem, J. Richard and Seidlitz, Alexander-Bloch, T.\nAaron. and Yeo, Zalesky, A.: Mapping human brain charts cross-sectionally and\nlongitudinally. Proceedings of the National Academy of Sciences 120(20) (2023)\nhttps://doi.org/10.1073/pnas.2216798120\n[11] Rehak Buckova, B., Fraza, C., Reh´ ak, R., Koleniˇ c, M., Beckmann, C.F.,ˇSpaniel,\nF., Marquand, A.F., Hlinka, J.: Using normative models pre-trained on cross-\nsectional data to evaluate intra-individual longitudinal changes in neuroimaging\ndata. eLife 13, 95823 (2025) https://doi.org/10.7554/eLife.95823\n[12] Bayer, J.M.M., Boer, A.A.A., Rehak-Bucova, B., Fraza, C.J., Banaschewski,\nT., Barker, G.J., Bokde, A.L.W., Bruehl, R., Desrivieres, S., Flor, H., Gar-\navan, H., Gowland, P., Grigis, A., Heinz, A., Lemaitre, H., Martinot, J.-L.,\nMartinot, M.-L.P., Artigues, E., Nees, F., Orfanos, D.P., Paus, T., Poustka, L.,\nSmolka, M.N., Holz, N., Vaidya, N., Walter, H., Whelan, R., Wirsching, P.,\nSchumann, G., Initiative, A.D.N., Kraguljac, N., Beckmann, C.F., Marquand,\nA.F.: Charting the velocity of brain growth and development. arXiv (2026).\nhttps://doi.org/10.48550/ARXIV.2601.07591\n[13] Watanabe, S.: A widely applicable Bayesian information criterion. J. Mach. Learn.\nRes. 14(1), 867–897 (2013)\n[14] Kia, S.M., Huijsdens, H., Rutherford, S., De Boer, A., Dinga, R., Wolfers, T.,\nBerthet, P., Mennes, M., Andreassen, O.A., Westlye, L.T., Beckmann, C.F.,\nMarquand, A.F.: Closing the life-cycle of normative modeling using federated\nhierarchical Bayesian regression. PLOS ONE 17(12), 0278776 (2022) https://doi.\norg/10.1371/journal.pone.0278776\n[15] Rutherford, S., Kia, S.M., Wolfers, T., Fraza, C., Zabihi, M., Dinga, R., Berthet,\nP., Worker, A., Verdi, S., Ruhe, H.G., Beckmann, C.F., Marquand, A.F.: The\nnormative modeling framework for computational psychiatry. Nature Protocols\n17(7), 1711–1734 (2022) https://doi.org/10.1038/s41596-022-00696-5\n[16] Barkema, P., Rutherford, S., Lee, H.-C., Kia, S.M., Savage, H., Beckmann,\nC., Marquand, A.: Predictive Clinical Neuroscience Portal (PCNportal): Instant\nonline access to research-grade normative models for clinical neuroscientists. Well-\ncome Open Research 8, 326 (2023) https://doi.org/10.12688/wellcomeopenres.\n19591.2\n27\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\n[17] Little, B., Alyas, N., Surtees, A., Winston, G.P., Duncan, J.S., Cousins, D.A.,\nTaylor, J.-P., Taylor, P., Leiberg, K., Wang, Y.: Brain morphology normative\nmodelling platform for abnormality and centile estimation: Brain MoNoCle.\nImaging Neuroscience3, 00438 (2025) https://doi.org/10.1162/imag\na 00438\n[18] Gaiser, C., van der Vliet, R., de Boer, A.A.A., Donchin, O., Berthet, P., Devenyi,\nG.A., Mallar Chakravarty, M., Diedrichsen, J., Marquand, A.F., Frens, M.A.,\nMuetzel, R.L.: Population-wide cerebellar growth models of children and ado-\nlescents. Nature Communications 15(1), 2351 (2024) https://doi.org/10.1038/\ns41467-024-46398-2\n[19] Worker, A., Berthert, P., Lawrence, A.J., Kia, S.M., Arango, C., Dinga, R.,\nGalderisi, S., Glenthøj, B., Kahn, R.S., Leslie, A., Murray, R.M., Pariante,\nC.M., Pantelis, C., Weiser, M., Winter-van Rossum, I., McGuire, P., Dazzan, P.,\nMarquand, A.F.: Extreme deviations from the normative model reveal cortical\nheterogeneity and associations with negative symptom severity in first-episode\npsychosis from the OPTiMiSE and GAP studies. Translational Psychiatry 13(1),\n373 (2023) https://doi.org/10.1038/s41398-023-02661-6\n[20] Rutherford, S., Barkema, P., Tso, I.F., Sripada, C., Beckmann, C.F., Ruhe, H.G.,\nMarquand, A.F.: Evidence for embracing normative modeling. eLife 12, 85082\n(2023) https://doi.org/10.7554/eLife.85082\n[21] Zabihi, M., Floris, D.L., Kia, S.M., Wolfers, T., Tillmann, J., Arenas, A.L.,\nMoessnang, C., Banaschewski, T., Holt, R., Baron-Cohen, S., Loth, E., Char-\nman, T., Bourgeron, T., Murphy, D., Ecker, C., Buitelaar, J.K., Beckmann,\nC.F., Marquand, A.: Fractionating autism based on neuroanatomical norma-\ntive modeling. Translational Psychiatry 10(1) (2020) https://doi.org/10.1038/\ns41398-020-01057-0\n[22] Kia, S.M., Marquand, A.: Normative Modeling of Neuroimaging Data using\nScalable Multi-Task Gaussian Processes. International Conference on Medi-\ncal Image Computing and Computer-Assisted Intervention, 127–135 (2018)\narXiv:1806.01047\n[23] Fraza, C.J., Dinga, R., Beckmann, C.F., Marquand, A.F.: Warped Bayesian Lin-\near Regression for Normative Modelling of Big Data. bioRxiv, 2021–0405438429\n(2021)\n[24] Kia, S.M., Huijsdens, H., Dinga, R., Wolfers, T., Mennes, M., Andreassen, O.A.,\nWestlye, L.T., Beckmann, C.F., Marquand, A.F.: Hierarchical Bayesian Regres-\nsion for Multi-Site Normative Modeling of Neuroimaging Data. International\nConference on Medical Image Computing and Computer-Assisted Intervention,\n699–709 (2020) arXiv:2005.12055\n[25] Bayer, J.M.M., Dinga, R., Kia, S.M., Kottaram, A.R., Wolfers, T., Lv, J., Zalesky,\nA., Schmaal, L., Marquand, A.: Accommodating site variation in neuroimaging\n28\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\ndata using normative and hierarchical Bayesian models. NeuroImage264, 119699\n(2022) https://doi.org/10.1016/j.neuroimage.2022.119699\n[26] Dinga, R., Fraza, C.J., Bayer, J.M.M., Kia, S.M., Beckmann, C.F., Marquand,\nA.F.: Normative Modeling of Neuroimaging Data Using Generalized Additive\nModels of Location Scale and Shape. Neuroscience (2021). https://doi.org/10.\n1101/2021.06.14.448106\n[27] Marquand, A., Rutherford, S., Dinga, R.: Fairly evaluating the performance of\nnormative models. The Lancet Digital Health 6(11), 775 (2024) https://doi.org/\n10.1016/S2589-7500(24)00200-0\n[28] Rutherford, S., Wolfers, T., Fraza, C., Harnett, N.G., Beckmann, C.F., Ruhe,\nH.G., Marquand, A.F.: To which reference class do you belong? Measuring racial\nfairness of reference classes with normative modeling. arXiv (2024). https://doi.\norg/10.48550/ARXIV.2407.19114\n[29] Sun, L., Qin, W., Liang, X., Wang, C., Men, W., Duan, Y., Fan, X.-R., Cai, Q.,\nQiu, S., Wang, M., Gong, Q., Tian, Y., Liang, P., Liu, Z., Zhang, X., Song, H.,\nYe, Z., Zhang, P., Dong, Q., Tao, S., Zhu, W., Zhang, J., Xie, F., Feng, J., Zhang,\nJ., Liu, C., Qian, Q., Zhang, B., Meng, M., Hu, L., Gao, J.-H., Jiang, T., Zhu,\nX., Zhang, Y., Liu, L., Liu, H., Liao, W., Wang, D., Wang, H., Guo, T., Dai, Z.,\nLui, S., Xu, K., Li, L., Xie, P., Feng, C., Cui, G., Wu, J., Yin, X., Ding, G., Xian,\nJ., Zhao, L., Lu, J., Liu, Z., Han, Y., Yuan, Z., Zhang, X., Si, T., Zhou, F., Bi,\nY., Wu, D., Gao, F., Wang, F., Qin, S., Wang, G., Chen, F., Zhang, Z., Sui, J.,\nChen, H., Cai, J., Liu, S., Geng, Z., Zhang, C., Mao, N., Yin, H., Liu, B., Ma, H.,\nGao, B., Miao, Y., Kong, X.-Z., Zhou, Y., Liu, L., Hu, J., Wang, L., Zhang, Q.,\nShu, H., Wang, P., Lee, T.M.C., Cao, Q., Yang, L., Zhang, X., Luo, W., Liang,\nM., Yao, H., Li, M., Huang, H., Peng, Y., Han, Z., Zhou, C., Xu, H., Feng, M.,\nShen, W., Hu, Y., Chen, H., Wang, Y., Gong, G., Yan, Z., Xu, X., Liu, J., Chen,\nG., Wang, P., Yang, Y., Yao, D., Han, T., He, H., Chen, C., Zou, Q., Liu, H.,\nZhang, H., Chai, C., Lu, C., Tu, Y., Liu, Y., Lin, D., Zhao, W., Xu, X., Liu, X.,\nCui, Z., Wang, Z., Huang, R., Li, Z., Liu, Y., Li, X., Yang, X., Zhang, N., Chen,\nA., Zhang, B., Qin, P., Liu, C., Yao, Z., Wei, Y., Yuan, H., Wang, F., Zhang,\nY., Zhang, Q., Hu, F., Xie, H., Wu, X., Wang, J., Fan, G., Wang, Z., Zhang, D.,\nZhong, H., Wang, Y., Bai, L., Li, Y., Wei, X., Wang, J., Zhang, Y., He, H., Li,\nS., Zhang, T., Jiang, F., Yang, J., Chen, F., Liu, F., Liu, H., Chen, N., Yang, J.,\nHou, B., Huang, C.-C., Zhu, J., Cai, H., Wei, D., Chen, Q., Wei, Y., Miao, P., Li,\nY., Liu, Y., Yang, N., Gao, X., Liu, Y., Shen, Y., Huang, X., Ji, G.-J., Zhang, L.,\nQiu, J., Yu, Y., Lin, C.-P., Feng, F., Li, K., Yu, C., He, Y.: Population-specific\nbrain charts reveal chinese-western differences in neurodevelopmental trajectories\n(2025) https://doi.org/10.1101/2025.06.17.659820\n[30] Elleaume, C., Hebling Vieira, B., Floris, D.L., Langer, N.: Toward robust\nneuroanatomical normative models: Influence of sample size and covariates\ndistributions (2025) https://doi.org/10.7554/elife.108952.1\n29\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint \n\n[31] DESTRIEUX, C., FISCHL, B., DALE, A., HALGREN, E.: Automatic parcella-\ntion of human cortical gyri and sulci using standard anatomical nomenclature.\nNeuroImage53(1), 1–15 (2010) https://doi.org/10.1016/j.neuroimage.2010.06.\n010\n[32] Liuzzi, L., Chang, K.K., Zheng, C., Keren, H., Saha, D., Nielson, D.M., Stringaris,\nA.: Magnetoencephalographic correlates of mood and reward dynamics in human\nadolescents. Cerebral Cortex 32(15), 3318–3330 (2022) https://doi.org/10.1093/\ncercor/bhab417\n[33] Baranger, D.A.A., Halchenko, Y.O., Satz, S., Ragozzino, R., Iyengar, S., Swartz,\nH.A., Manelis, A.: Aberrant Levels of Cortical Myelin Distinguish Individuals\nwith Unipolar Depression from Healthy Controls. medRxiv (2021). https://doi.\norg/10.1101/2021.02.25.21252472\n[34] Zamanzadeh, M., Verduyn, Y., Boer, A., Ros, T., Wolfers, T., Dinga, R.,\nˇSaf´ aˇ r Postma, M., Marquand, A.F., Wingerden, M., Kia, S.M.: MEGaNorm: Nor-\nmative modeling of MEG brain oscillations across the human lifespan. bioRxiv,\n2025–06 (2025)\n[35] Di Biase, M., Tian, Y., Bethlehem, R., Seidlitz, J., Alexander-Bloch, A.,\nYeo, B.T., Zalesky, A.: Mapping human brain charts cross-sectionally\nand longitudinally. Proceedings of the National Academy of Sciences\n120(20), 2216798120 (2023) https://doi.org/10.1073/pnas.2216798120\nhttps://www.pnas.org/doi/pdf/10.1073/pnas.2216798120\n[36] Cirstian, R., Forde, N.J., Zhang, H., Hellemann, G.S., Beckmann, C.F., Kraguljac,\nN.V., Marquand, A.F.: Lifespan normative models of white matter fractional\nanisotropy: Applications to early psychosis. Biological Psychiatry (2025) https:\n//doi.org/10.1016/j.biopsych.2025.07.021\n[37] Fraza, C., Sønderby, I.E., Boen, R., Shi, Y., Beckmann, C.F., Marquand, A.F.:\nUnraveling the link between CNVs, cognition and individual neuroimaging devi-\nation scores from a population-based reference cohort. Nature Mental Health\n2(12), 1451–1463 (2024) https://doi.org/10.1038/s44220-024-00322-1 . Accessed\n2026-02-10\n[38] Berthet, P., Haatveit, B.C., Kjelkenes, R., Worker, A., Kia, S.M., Wolfers, T.,\nRutherford, S., Alnaes, D., Dinga, R., Pedersen, M.L., Dahl, A., Fernandez-\nCabello, S., Dazzan, P., Agartz, I., Nesv˚ ag, R., Ueland, T., Andreassen, O.A.,\nSimonsen, C., Westlye, L.T., Melle, I., Marquand, A.: A 10-Year Longitudinal\nStudy of Brain Cortical Thickness in People with First-Episode Psychosis Using\nNormative Models. Schizophrenia Bulletin, 107 (2024) https://doi.org/10.1093/\nschbul/sbae107 . Accessed 2024-10-27\n30\n.CC-BY 4.0 International licenseperpetuity. It is made available under a \npreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in \nThe copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint","source_license":"CC-BY-4.0","license_restricted":false}