Reference
set. For instance, if one is interested in parsing heterogeneity within a clinical
cohort of patients with major depressive disorder (MDD), the reference set could
include MDD patients who have responded well to medication, and contrast them
with non-responders.
We will use cortical thickness imaging-derived phenotypes (IDPs) from several
different datasets, which we list below. The datasets are all pre-processed and included
in this repository. All preprocessing was done using Freesurfer with the recon-all
command. We extracted the mean cortical thickness values following the Destrieux
parcellation [31].
We use the FCON1000 dataset, which is freely available at theFCON1000 website.
In addition, we will use several clinical datasets from the OpenNeuro portal:
1. ds003568 – Contains data from 49 subjects (31F, 18M), aged between 12 and
19 (mean 15.9, std 1.8). Clinical labels are Healthy Control (HC, 20) and Major
Depressive Disorder (MDD, 29) [32].
2. ds003653 – Contains data from 73 subjects (56F, 17M), aged between 19 and 44
(mean 29.1, std 6.6). Clinical labels are HC (39) and MDD (33) [33].
import pandas as pd
fcon_df = pd.read_csv(
"https://raw.githubusercontent.com/predictive-clinical-neuroscience/
,→pu25_code/refs/heads/main/data/fcon1000.csv"
)
openneuro_df = pd.read_csv(
"https://raw.githubusercontent.com/predictive-clinical-neuroscience/
,→pu25_code/refs/heads/main/data/openneuro.csv"
)
data = pcn.load_data("data/fcon1000.csv")
Covariates, batch effects, and response variables
As we will see, the input data generally consist of several covariates, batch effects,
and response variables. The covariates are typically continuous variables, like age and
height, but can also be numerically encoded categorical variables (for example, using
one-hot-encoding), like sex. The batch effect dimensions are categorical variables, for
example, scan site or scanner brand/model. The response variables are continuous
variables, like brain volumes or brain activity patterns. All these variables must be
available for all subjects in the reference set.
Normative models made using PCNtoolkit version1.x.xconsist of one or more
linear regressions on the selected covariates, together with a mechanism to handle
9
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
batch effects, the details of which depend on the model type. The choice of covariates
and batch-effect variables depends on the research question, but they should gener-
ally include all factors known to influence the measurements and be available in all
datasets, such as age, sex, sleep quality, or site. Batch effects are used to group sub-
jects together in the same way that random effects are used in classical multilevel
modelling. Typically, batch effects are used to model site or scanner effects, but they
could also be applied to model ancestry or imaging field strength. Whilst batch effects
can be encoded as dummy encoded covariates, this is less flexible and is equivalent to
a fixed intercept model (i.e., only allowing shifts in the intercept). In contrast, using
batch effects allows the model to estimate separate slope, intercept and basis function
parameters for each site with partial pooling across levels of the batch effects (see Kia
et al. [14, 24].
The primary goal in normative modelling is to compute deviation (Z) scores that
are as independent as possible from these covariates and batch effects. For example,
if the aim is to study the effect of meditation on brain activity, the normative model
should include all relevant covariates—such as age, sex, sleep quality, and scanner
site. By modelling and removing the variance explained by these known factors, the
resulting deviation scores reflect variability that is most likely related to effects of
interest rather than demographic or technical influences.
When selecting covariates, care must be taken when including variables that are
highly correlated with one another, as multicollinearity can inflate parameter uncer-
tainty and make it difficult to interpret the contribution of individual covariates.
However, this is principally relevant for applications where parameter interpretation is
important. For example, if both sleep quality and stress level are included as predictors,
their shared variance may obscure distinct effects on brain activity.
subject_id = "sub_id"
covariates = ["age"]
batch_effects = ["site", "sex"]
response_variables = [ col for col in df.columns if col not in covariates +
,→batch_effects + [subject_id]]
response_variables = list(
filter(lambda x: df[x].var() > 0, response_variables)
)# Remove variables with no variance
3.2.2 Data Preparation
Preparing the data for modelling involves a sequence of preprocessing steps designed to
ensure data quality and reproducibility. Note that standard neuroimaging preprocess-
ing steps such as spatial normalisation and/or cortical surface reconstruction should
be applied first. All subjects in a normative modelling analysis need to be in the same
space.
Steps specific to normative modelling include handling missing values and outliers,
combining multiple data sources, splitting the data into training and test sets, and
rescaling covariates and response variables (e.g., via standardisation). All procedures
10
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
are implemented in the PCNtoolkit, with the NormData class providing dedicated
functionality for data cleaning, harmonization, and preparation.
PCNtoolkit automatically excludes subjects with missing values and identifies out-
liers based on feature-wise Z-scores. The stringency of this filter is controlled via the
z thresholdargument. In this tutorial, we setz threshold = 10 to illustrate the proce-
dure while avoiding unnecessary data loss; however, for practical analyses, a different
value may be selected to balance sensitivity and specificity. However, care needs to be
taken not to be overly aggressive with outlier removal as this may result in truncating
the empirical distribution, thereby making it difficult to model with a continuous dis-
tribution. If a customized approach to missingness or outlier handling is desired, data
may also be preprocessed externally before being passed to PCNtoolkit.
The train–test split is performed using thetrain
test splitfunction from
scikit-learn, with automatic stratification on batch-effect variables (e.g.,siteand
sex) to maintain balanced distributions and to avoid demographic or acquisition drift
between subsets. Data scaling is applied internally, and the corresponding scaling
coefficients are stored within the fitted model object to ensure that identical trans-
formations are applied to all future data during prediction, or when transferring the
model.
from pcntoolkit import NormData
import numpy as np
reference_norm_data = NormData.from_dataframe(
name="fcon1000",
dataframe=df,
covariates=covariates,
batch_effects=batch_effects,
response_vars=response_variables,
subject_ids=subject_id,
remove_Nan=True,
remove_outliers=True,
z_threshold=10,
)
3.2.3 Combining Different Data Sources
In many cases, the reference data are supplemented with additional cohorts from other
sources. Here, we demonstrate how to combine the FCON1000 dataset with clinical
datasets from OpenNeuro. The reference dataset has already been loaded above; we
now load the additional data into NormData objects, merge the healthy controls with
the reference set, and retain the patient subset for subsequent analysis.
# Split into healthy and patient groups
healthy_idx = openneuro_df["group"] == "HC"
healthy_group = openneuro_df[healthy_idx]
patient_groups = openneuro_df[~healthy_idx]
# Load the healthy and patient data
11
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
healthy_clinical_data = NormData.from_dataframe(
name="openneuro_healthy",
dataframe=healthy_group,
covariates=covariates,
batch_effects=batch_effects,
response_vars=response_variables,
remove_Nan=True,
remove_outliers=True,
)
patient_clinical_data = NormData.from_dataframe(
name="openneuro_patient",
dataframe=patient_groups,
covariates=covariates,
batch_effects=batch_effects,
response_vars=response_variables,
remove_Nan=True,
remove_outliers=True,
)
# Merge the healthy data with the reference data
norm_train = reference_norm_data.merge(healthy_clinical_data)
# Check for schema compatibility before merging with the patient data
if not norm_train.check_compatibility(patient_clinical_data):
raise ValueError("The data are not compatible.")
else:
print("The data are compatible.")
3.2.4 Train–Test Split
The reference set is divided into a training set used for model fitting and a test set
reserved for model evaluation. Assessing the model on independent data is critical
to ensure accurate estimates of generalisabilty. If the goal is to provide a reference
model for new data, the model can be refit to the whole dataset once generalisability
has been established. A trade-off exists between model accuracy and the reliability
of performance estimates: a larger training set improves model precision, whereas a
larger test set yields more stable validation metrics. We recommend an 80/20, 70/30
or 50/50 train–test ratio, depending on dataset size.
To obtain unbiased evaluation metrics, the test set should be statistically similar
to the training set. Stratification of the train–test split ensures that the distributions
of key batch-effect variables (e.g., site, sex) remain consistent across subsets. The
following command performs this operation using theNormData.train
test split
method, which internally calls the scikit-learn function of the same name.
train, test = norm_train.train_test_split(
splits=(0.8, 0.2),
split_names=["train", "test"],
random_state=42,
12
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
)
3.2.5 Standardization
As described above, PCNtoolkit automatically stores the scaling coefficients used dur-
ing feature scaling parameters within the fitted model object. Note that these are
derived from the training set only. This ensures that all subsequent predictions and
transferred models apply the same transformations to incoming data, eliminating man-
ual preprocessing and guaranteeing consistent scaling across datasets during model
development and deployment stages.
3.2.6 Fitting Models
TheNormativeModelclass
The central class in PCNtoolkit is the NormativeModel class. It has all the functions
that are required to build and use a normative model. To construct aNormativeModel
object, the user needs to provide a template regression model, which is copied and fit
for each feature in the reference set. Here we use the default (unparameterized) HBR
class as a template, but we will elaborate on this next.
from pcntoolkit import NormativeModel, HBR
model = NormativeModel(
template_regression_model=HBR(),
save_dir="../out/models/main_workflow_model_default_HBR",
inscaler="standardize",
outscaler="standardize",
name="main_workflow_model_default_HBR",
)
TheRegressionModelclass
The current implementation of PCNtoolkit only supports the creation of univariate
normative models. This means that each NormativeModel object contains a collection
of RegressionModel objects. For each response variable, the NormativeModel class
makes a copy of a template regression and fits it to the response variable. This template
is what the user should provide when creating a NormativeModel.
The RegressionModel type currently has two concrete implementations: Bayesian
Linear Regression (BLR) and Hierarchical Bayesian Regression (HBR), and both
come with their strengths and weaknesses. For an in-depth coverage of both methods,
including their parameters and usage, please refer to the GitHub repository.
3.2.7 Hierarchical Bayesian Regression (HBR)
A Hierarchical Bayesian Regression (HBR) model assumes a hierarchical generative
process for the data and performs full posterior estimation via Markov Chain Monte
Carlo (MCMC). Configuring an HBR model involves specifying a likelihood function
and defining priors over its parameters, which may include both fixed and random
effects.
13
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
The PCNtoolkit provides several likelihood functions, including Normal, SHASHb,
and Beta likelihoods, each suited to different data characteristics [9]. The Normal
likelihood is used for approximately Gaussian response variables, the SHASHb likeli-
hood for unbounded and asymmetric data, and the Beta likelihood for interval data
bounded within a fixed range. Each likelihood has interpretable parameters, such as
µ(mean),σ(scale),ϵ(skewness), and δ (kurtosis), which can each be modeled hier-
archically to capture structured variation across covariates and batch effects. See [9]
for more details.
Once a likelihood is chosen, priors for each parameter are specified using the
make
priorfunction. Parameters can vary linearly with covariates, include random
(batch) effects, or remain fixed across all observations. Random-effect priors are imple-
mented in a non-centered hierarchical form (reparameterized) to stabilize sampling
and avoid funnel pathologies during inference.
In this workflow, we use the Normal likelihood, which assumes Gaussian feature
distributions after preprocessing. This is appropriate for cortical thickness data, where
deviations are approximately symmetric and unbounded. Bothµandσare modeled
hierarchically as functions of age (covariate) with random intercepts for site and sex,
allowing age effects to vary smoothly while accounting for systematic offsets across
acquisition sites and sexes.
Example with Normal likelihood configuration
For illustration, we define a hierarchical Normal model where each observation yn is
generated as:
yn ∼ N (µn, σ+2
n) n ∈ {1, ..., N } (1)
µn = w⊤
µ ϕµ(xn) + τµn (2)
σn = w⊤
σ ϕσ(xn) + τσ (3)
σ+
n = ln(1 + exp(σn/3)) ∗ 3 (4)
wσ ∼ pwσ(wσ), τ σ ∼ pτσ(τσ), w µ ∼ pwµ(wµ), (5)
τµn = µτµ +
βX
b=1
σ+
τµ b
ντµ b[Bn,b] (6)
µτµ ∼ pµτµ (µτµ), σ τµ b ∼ p+
στµ b
(στµ b) (7)
ντµ b ∼ N (ντµ b | 0vb, Ivb) (8)
where:
• β is the number of batch effect dimensions, i.e., for batch effects (site, sex), β = 2.
• vb denotes the number of unique batch effect values of the b’th batch effect
dimension.
• N denotes the number of observations/datapoints.
• B denotes the N × β matrix of batch effect indices.
14
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
Equations 1–8 describe the generative process implemented in the PCNtoolkit’s
HBR class, where each response variable is modeled with a Normal likelihood whose
mean (µ n) depends on covariates (here, age) and batch effects (here, site and sex),
and whose variance (σ 2
n) depends on covariates. The following code block imple-
ments this hierarchical Normal model. The mean term varies nonlinearly with age
and includes site- and sex-specific intercepts, while the standard deviation term cap-
tures heteroskedasticity across age. For brevity, comments have been omitted; a fully
annotated version and extended examples are provided in the accompanying Jupyter
notebook.
from pcntoolkit import make_prior, BsplineBasisFunction, NormalLikelihood, HBR
mu = make_prior(
linear=True,
slope=make_prior(),
intercept=make_prior(
random=True,
mu=make_prior(dist_params=(0.0, 3.0)),
sigma=make_prior(
dist_params=(0.0, 3.0),
mapping="softplus",# 0
mapping_params=(0.0, 3.0),
),
),
basis_function=BsplineBasisFunction(basis_column=0, nknots=5, degree=3),#
,→<- Allow nonlinearity
)
sigma = make_prior(
linear=True,
slope=make_prior(dist_params=(0.0, 5.0)),
intercept=make_prior(dist_params=(1.0, 3.0)),
basis_function=BsplineBasisFunction(basis_column=0, nknots=5, degree=3),
mapping="softplus",# 0
mapping_params=(0.0, 3.0),
)
likelihood = NormalLikelihood(mu, sigma)
template_hbr = HBR(
name="template_hbr",
likelihood=likelihood,
)
# Create a NormativeModel with the template regression model
model = NormativeModel(
template_regression_model=template_hbr,
save_dir="../out/models/main_workflow_model_HBR",
)
15
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
This example demonstrates how hierarchical priors for µ and σ can be defined
in the PCNtoolkit using B-spline basis functions and positive-domain mappings
(via softplus). The non-centered parameterization and weakly informative pri-
ors improve sampling efficiency and numerical stability. Detailed explanations,
additional likelihoods (SHASHb, Beta), and alternative parameterizations (linear,
random, fixed) are available in the accompanying Jupyter notebook. For full
configuration details and extended examples, we refer the readers to the tuto-
rial on the main PCNToolkit workflow at https://github.com/predictive-clinical-
neuroscience/pu25
code/blob/main/notebooks/1 main workflow.ipynb.
3.2.8 Fitting the normative model to the data
With the data prepared and both the template regression model and the normative
model configured, we fit on the training set and generate predictions for the test set in
a single call. This command estimates the hierarchical parameters, applies the trained
model to the test data, and stores predictions, model artifacts, evaluation metrics, and
diagnostic plots.
model.fit_predict(train, test)
An overview of model outputs
Running the workflow writes all intermediate and final outputs in the specified output
path undermodels/main workflow model HBR/. This directory contains the fitted
model specifications, posterior samples, evaluation metrics, and visual diagnostics used
in the subsequent sections.
• model/— per-feature subfolders containing idata.nc (posterior traces and diag-
nostics) and regression
model.json(serialized feature-level model specification);
plus a top-levelnormative model.jsonsummarizing the full multi-feature model
configuration.
• plots/— centile and quantile–quantile (QQ) diagnostics for both train and test
sets, one set per feature.
• results/— CSV outputs with all numerical results:
centiles
{train,test}.csv,Z {train,test}.csv,logp {train,test}.csv,
andstatistics {train,test}.csv.
A schematic of the default folder hierarchy:
models/
|-- main_workflow_model_HBR/
| |-- model/
| | |-- /
| | | |-- idata.nc
| | | ‘-- regression_model.json
| | |-- /
| | | |-- idata.nc
| | | ‘-- regression_model.json
| | ‘-- normative_model.json
16
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
| |
| |-- plots/
| | |-- centiles__{train,test}_*.png
| | ‘-- qq__{train,test}.png
| |
| ‘-- results/
| |-- centiles_{train,test}.csv
| |-- Z_{train,test}.csv
| |-- logp_{train,test}.csv
| ‘-- statistics_{train,test}.csv
Each folder is automatically generated by the PCNtoolkit during model fitting
and prediction. Together, these outputs contain all information needed to reproduce
the diagnostics, metrics, and visualizations, which will be described in Section 3.3.1.
A fully commented walkthrough of the saving conventions, file contents, and plot
interpretation is provided in the accompanying Jupyter notebook.
3.2.9 Fitting Models in Parallel
For large datasets (either in terms of the number of samples and response variables),
fitting models sequentially can be quite time-consuming. In case of access to a high-
performance computing cluster running standard workload managers such as Torque
or Slurm, operations such as ‘fit
predict‘ can be parallelized by using the Runner class.
The Runner class will allow running the same tasks as in the previous step, but in
a distributed fashion. This is a little more involved than in the previous step. For
example, you need to specify the Python executable and time and memory limits for
each job. You can also specify the number of batches into which to split the workload.
Here we will use a Slurm cluster, using one model per batch with 12 hours max run
time and 2GB memory per job. This can be achieved in the following way:
import os
import sys
from pcntoolkit import Runner
# get path to python executable
venv_path = os.path.join(os.path.dirname(os.path.dirname(sys.executable)))
# configure runner
runner = Runner(
cross_validate=False,
parallelize=True,
n_batches = len(reference_norm_data.response_vars),
environment=venv_path,
job_type="slurm",
time_limit="12:00:00",
memory = "2GB",
n_cores=1,
17
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
log_dir="../out/models/main_workflow_model_HBR/logs/",
temp_dir="../out/models/main_workflow_model_HBR/tmp/",
#preamble = "",
)
runner.fit_predict(model, train, train, observe=False)
3.3 Secondary workflows
In this section, we describe secondary workflows in Figure 1 that can be performed
after the primary normative model has been fit.
3.3.1 Model Evaluation and Calibration
Model evaluation in normative modelling involves three complementary components:
(i) inspection of convergence diagnostics to ensure stable inference, (ii) assessment of
quantitative model performance using both mean-based and probabilistic metrics, and
(iii) visualization of model calibration through posterior predictive plots.
Convergence diagnostics
Posterior summaries were inspected for all model parameters using the ArviZ visuali-
sation framework, including slope coefficients, site- and sex-specific offsets, and scale
parameters. For the Hierarchical Bayesian Regression (HBR) models used in this
workflow, convergence and sampling reliability were evaluated using standard MCMC
diagnostics: all parameters were required to have potential scale reduction factors ( ˆR)
≤ 1.01, bulk and tail effective sample sizes (ESS) ≥ 1000, and Monte Carlo standard
errors (MCSE) below 1% of the corresponding posterior standard deviation. Chains
can also be visually inspected to confirm stationary and adequate mixing across draws.
These diagnostics apply specifically to HBR models, which rely on Markov Chain
Monte Carlo sampling. For Bayesian Linear Regression (BLR) models—estimated via
deterministic optimization rather than MCMC—different convergence criteria apply,
typically based on gradient norms and optimizer tolerance; however, such models are
not used in the present analysis.
To illustrate the diagnostic workflow, we performed this analysis for one represen-
tative feature (lh
G and S frontomargin). This feature was chosen as a typical example
with a moderate age effect, heteroskedasticity, and variance across sites. Because con-
vergence behavior is generally consistent across features with comparable sample sizes
and data characteristics, diagnostics for this representative case serve as a practical
indicator of overall model stability.
All convergence diagnostics were computed with ArviZ and
saved per feature in the corresponding idata.nc file under
../out/models/main
workflow model HBR/model//. Each file stores
posterior draws and diagnostics, including ˆR, ESS, and MCSE. The aggregated
diagnostic table can be reproduced directly from disk:
import arviz as az
18
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
Fig. 2: Traceplots for the covariate slope parameters (slope mu) from the Normal
likelihood model. Each color represents one of the MCMC chains. The dense, well-
mixed traces indicate stable posterior exploration and full chain convergence.
import pandas as pd
from pathlib import Path
model_dir = Path("../out/models/main_workflow_model_HBR/model")
rows = []
for d in model_dir.iterdir():
if d.is_dir() and (d / "idata.nc").exists():
idata = az.from_netcdf(d / "idata.nc")
s = az.summary(idata)# includes R-hat, ESS_bulk, ESS_tail, MCSE
s["feature"] = d.name
rows.append(s)
diag = pd.concat(rows)
diag.to_csv("../out/models/main_workflow_model_HBR/results/diagnostics_summary
,→.csv")
Traceplots for the slope parameters of µ (slope mu) are shown in Fig. 2. All
four chains overlap tightly and explore the posterior space evenly, with no visual
signs of non-stationarity, confirming that the hierarchical regression components have
converged well.
Table 1: Aggregated posterior diagnostics
across all parameters (74 total). Values are
shown as median [IQR].
Metric Median IQR
ˆR1.00 1.00–1.01
ESSbulk 4513 2176–5502
ESStail 3789 2533–4454
Posterior mean -0.11 [-0.61, 0.35]
Posterior SD 0.42 [0.24, 0.92]
Monte Carlo SE (mean) 0.006 [0.003, 0.026]
19
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
Quantitative model performance
Model performance was then evaluated using both mean-based and probabilistic met-
rics exported by the PCNtoolkit. Mean-based metrics—the coefficient of determination
(R2), root mean squared error (RMSE), standardized mean squared error (SMSE),
and explained variance (EXPV)—quantify how well the predicted mean ˆyapproxi-
mates the observed data. Probabilistic metrics assess the full predictive distribution;
therefore, they are more suitable for evaluating the quality of estimated centiles in
the NM context: the mean standardized log loss (MSLL) compares predictive log-
likelihoods to a variance-standardized baseline, while the mean absolute centile error
(MACE) [34] measures the average deviation of predicted centiles from their expected
uniform targets. Lower RMSE, SMSE, MSLL and MACE values, and higher EXPV
andR 2 indicate better overall fit.
Because normative models estimate both mean and variance, mean-based
metrics alone provide an incomplete view of performance. A model may display
modestR 2 yet excellent calibration of uncertainty (and vice versa), as reflected
in the centile and QQ plots (Fig. 3). This reflects the purpose of normative mod-
elling: to accurately estimate the conditional distribution of each feature rather
than only its central tendency. All metric values are stored automatically in
../out/models/main
workflow model HBR/results/statistics {train,test}.csv
and can be loaded as follows:
import pandas as pd
table = pd.read_csv(
"../out/models/main_workflow_model_HBR/results/statistics_test.csv",
index_col="statistic"
).T
Visual assessment of calibration
Qualitative assessment of calibration is performed using the PCNtoolkit’s built-in
plotting utilities.plot centilesvisualizes predicted centile trajectories across covari-
ates, whileplot qqcompares empirical and theoretical quantiles of standardized
residuals (Z-scores). Together, these provide complementary views of functional and
distributional model fit. Figure 3 shows the two plots produced by this code snippet.
from pcntoolkit import plot_centiles, plot_qq
plot_centiles(
model,
centiles=[0.05, 0.5, 0.95],
covariate="age",
covariate_range=(10, 80),
batch_effects={"site": ["ICBM", "Leiden_2180"], "sex": ["M", "F"]},
scatter_data=train,
hue_data="site",
markers_data="sex",
show_other_data=True,
)
20
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
(a) Centile plot
(b) QQ plot
Fig. 3: Visualization of normative model calibration for a representative region of
interest.(a)Predicted centiles (5th, 50th, 95th) across age, with observed data over-
laid. Smooth median and outer centile alignment indicate a good functional fit across
covariates. (b) Quantile–quantile (QQ) plot comparing empirical versus theoretical
quantiles of standardized residuals. Alignment with the identity line indicates accu-
rate predictive variance; curvature suggests over- or under-dispersion. For a detailed
interpretation of QQ plots in normative modelling, see [26].
plot_qq(train, plot_id_line=True)
3.3.2 Longitudinal normative modelling
Whilst many normative models are estimated using longitudinal data, the models
themselves are fundamentally cross-sectional in nature in that they involve estimating
smooth curves to link the group-level centiles across different timepoints. In order to
obtain true longitudinal inferences, additional steps are necessary. Most importantly,
to provide a statistical estimate to quantify the magnitude of change between two
or more consecutive z-score measurements derived from a normative model. In other
words, longitudinal normative modelling seeks to determine whether a centile crossing
has occurred for a given individual between two or more timepoints. Since individuals
do not track centiles exactly as they move through time (e.g., due to measurement
noise), it is essential to take longitudinal variance into account both when estimating
the normative model and when evaluating change between time points, to obtain a
reliable characterization of individual trajectories [12]. This requires longitudinal data.
The PCNtoolkit and supporting scripts allow for two ways of obtaining an estimate
of the significance of longitudinal change: namely constructing ‘Z-diff’ scores that
provide a statistical estimate of centile crossing between two timepoints based on BLR
models [11] or using velocity models, which provide the ability to estimate ’thrive
lines’ [12] which can quantify centile crossings for larger numbers of timepoints. An
alternative method for longitudinal data analysis is presented in Di Biase et al. [35].
21
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
3.3.3 Federated Learning
A key feature of PCNtoolkit is the ability to support federated learning, as described
in detail in [14, 24]. Concrete examples of this workflow are provide a notebook in
the associated code repository. In brief, several methods for federated normative mod-
eling, i.e., estimating normative models on decentralized data, are supported: model
transfer involves deriving a new set of coefficients for data sites that were not
included in the reference dataset by distilling the information from an existing ref-
erence model. In the context of HBR, this is operationalised by means of applying
an informative prior distribution (corresponding to the posterior distribution from
the base reference model) to the new data and re-estimating parameters. Whilst this
approach is usually the most efficient, please note that a model transfer may be prob-
lematic if the data from the new site has a very different distribution from the data in
the training set. In contrast, a model extend involves updating the reference model
with data from new sites, which can then be passed back to the site initially con-
tributing the original model. Whilst this is usually fairly robust, please note that
during model extension, if the new sites have poor quality data, this may have a dele-
terious impact on the original model (e.g., containing severe outliers or spanning a
non-overlapping portion of the covariate range). A model merge can be performed to
combine two separately estimated normative models (e.g., from different sites). All
three modules operate without requiring access to the reference datasets used to derive
the original reference model, which is essential in settings where data privacy is a
primary concern.
3.4 Community Aspects
In order to maximise value to the community, we provide access to an updated set
of lifespan reference models derived from tens of thousands of individuals aggregated
from datasets that span the lifespan. These models supersede the cortical thickness and
volumetric models presented in Rutherford et al. [4] and fractional anisotropy measures
derived from diffusion data [36], with additional models to be brought online in the
future, for example, functional connectivity normative models presented in Rutherford
et al. [20]. This is necessary because models estimated in early versions of PCNtoolkit
are not compatible with PCNtoolkit 1.x.x. These reference models are available for
download via a notebook available in the associated code repository. We welcome
the contributions of new models from other groups via PCNPortal [16], our online
collaborative normative modelling portal.
We also welcome contributions from the field, in terms of contributions of new
References
[1] Schumann, G., Binder, E.B., Holte, A., de Kloet, E.R., Oedegaard, K.J., Robbins,
T.W., Walker-Tilley, T.R., Bitter, I., Brown, V.J., Buitelaar, J., Ciccocioppo,
R., Cools, R., Escera, C., Fleischhacker, W., Flor, H., Frith, C.D., Heinz, A.,
Johnsen, E., Kirschbaum, C., Klingberg, T., Lesch, K.-P., Lewis, S., Maier, W.,
Mann, K., Martinot, J.-L., Meyer-Lindenberg, A., M¨ uller, C.P., M¨ uller, W.E.,
Nutt, D.J., Persico, A., Perugi, G., Pessiglione, M., Preuss, U.W., Roiser, J.P.,
Rossini, P.M., Rybakowski, J.K., Sandi, C., Stephan, K.E., Undurraga, J., Vieta,
E., van der Wee, N., Wykes, T., Haro, J.M., Wittchen, H.U.: Stratified medicine
24
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
for mental disorders. European Neuropsychopharmacology 24(1), 5–50 (2014)
https://doi.org/10.1016/j.euroneuro.2013.09.010
[2] Finn, E.S., Todd Constable, R.: Individual variation in functional brain connec-
tivity: Implications for personalized approaches to psychiatric disease. Dialogues
in Clinical Neuroscience 18(3), 277–287 (2016) https://doi.org/10.31887/DCNS.
2016.18.3/efinn
[3] Marquand, A.F., Rezek, I., Buitelaar, J., Beckmann, C.F.: Understanding Het-
erogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control
Studies. Biological Psychiatry 80(7), 552–561 (2016) https://doi.org/10.1016/j.
biopsych.2015.12.023
[4] Rutherford, S., Fraza, C., Dinga, R., Kia, S.M., Wolfers, T., Zabihi, M., Berthet,
P., Worker, A., Verdi, S., Andrews, D., Han, L.K., Bayer, J.M., Dazzan, P.,
McGuire, P., Mocking, R.T., Schene, A., Sripada, C., Tso, I.F., Duval, E.R.,
Chang, S.-E., Penninx, B.W., Heitzeg, M.M., Burt, S.A., Hyde, L.W., Amaral,
D., Wu Nordahl, C., Andreasssen, O.A., Westlye, L.T., Zahn, R., Ruhe, H.G.,
Beckmann, C., Marquand, A.F.: Charting brain growth and aging at high spatial
precision. eLife 11, 72904 https://doi.org/10.7554/eLife.72904
[5] Bethlehem, R.A.I., Seidlitz, J., White, S.R., Vogel, J.W., Anderson, K.M., Adam-
son, C., Adler, S., Alexopoulos, G.S., Anagnostou, E., Areces-Gonzalez, A., Astle,
D.E., Auyeung, B., Ayub, M., Bae, J., Ball, G., Baron-Cohen, S., Beare, R.,
Bedford, S.A., Benegal, V., Beyer, F., Blangero, J., Blesa C´ abez, M., Boardman,
J.P., Borzage, M., Bosch-Bayard, J.F., Bourke, N., Calhoun, V.D., Chakravarty,
M.M., Chen, C., Chertavian, C., Chetelat, G., Chong, Y.S., Cole, J.H., Corvin,
A., Costantino, M., Courchesne, E., Crivello, F., Cropley, V.L., Crosbie, J., Cross-
ley, N., Delarue, M., Delorme, R., Desrivieres, S., Devenyi, G.A., Di Biase, M.A.,
Dolan, R., Donald, K.A., Donohoe, G., Dunlop, K., Edwards, A.D., Elison, J.T.,
Ellis, C.T., Elman, J.A., Eyler, L., Fair, D.A., Feczko, E., Fletcher, P.C., Fon-
agy, P., Franz, C.E., Galan-Garcia, L., Gholipour, A., Giedd, J., Gilmore, J.H.,
Glahn, D.C., Goodyer, I.M., Grant, P.E., Groenewold, N.A., Gunning, F.M., Gur,
R.E., Gur, R.C., Hammill, C.F., Hansson, O., Hedden, T., Heinz, A., Henson,
R.N., Heuer, K., Hoare, J., Holla, B., Holmes, A.J., Holt, R., Huang, H., Im,
K., Ipser, J., Jack, C.R., Jackowski, A.P., Jia, T., Johnson, K.A., Jones, P.B.,
Jones, D.T., Kahn, R.S., Karlsson, H., Karlsson, L., Kawashima, R., Kelley, E.A.,
Kern, S., Kim, K.W., Kitzbichler, M.G., Kremen, W.S., Lalonde, F., Landeau,
B., Lee, S., Lerch, J., Lewis, J.D., Li, J., Liao, W., Liston, C., Lombardo, M.V.,
Lv, J., Lynch, C., Mallard, T.T., Marcelis, M., Markello, R.D., Mathias, S.R.,
Mazoyer, B., McGuire, P., Meaney, M.J., Mechelli, A., Medic, N., Misic, B., Mor-
gan, S.E., Mothersill, D., Nigg, J., Ong, M.Q.W., Ortinau, C., Ossenkoppele, R.,
Ouyang, M., Palaniyappan, L., Paly, L., Pan, P.M., Pantelis, C., Park, M.M.,
Paus, T., Pausova, Z., Paz-Linares, D., Pichet Binette, A., Pierce, K., Qian,
X., Qiu, J., Qiu, A., Raznahan, A., Rittman, T., Rodrigue, A., Rollins, C.K.,
Romero-Garcia, R., Ronan, L., Rosenberg, M.D., Rowitch, D.H., Salum, G.A.,
25
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
Satterthwaite, T.D., Schaare, H.L., Schachar, R.J., Schultz, A.P., Schumann,
G., Sch¨ oll, M., Sharp, D., Shinohara, R.T., Skoog, I., Smyser, C.D., Sperling,
R.A., Stein, D.J., Stolicyn, A., Suckling, J., Sullivan, G., Taki, Y., Thyreau, B.,
Toro, R., Traut, N., Tsvetanov, K.A., Turk-Browne, N.B., Tuulari, J.J., Tzou-
rio, C., Vachon-Presseau, E., Valdes-Sosa, M.J., Valdes-Sosa, P.A., Valk, S.L.,
Amelsvoort, T., Vandekar, S.N., Vasung, L., Victoria, L.W., Villeneuve, S., Vill-
ringer, A., Vertes, P.E., Wagstyl, K., Wang, Y.S., Warfield, S.K., Warrier, V.,
Westman, E., Westwater, M.L., Whalley, H.C., Witte, A.V., Yang, N., Yeo, B.,
Yun, H., Zalesky, A., Zar, H.J., Zettergren, A., Zhou, J.H., Ziauddeen, H., Zug-
man, A., Zuo, X.N., Rowe, C., Frisoni, G.B., Binette, A.P., Bullmore, E.T.,
Alexander-Bloch, A.F.: Brain charts for the human lifespan. Nature 604(7906),
525–533 (2022) https://doi.org/10.1038/s41586-022-04554-y
[6] Ge, R., Yu, Y., Qi, Y.X., Fan, Y.-n., Chen, S., Gao, C., Haas, S.S., New, F.,
Boomsma, D.I., Brodaty, H., Brouwer, R.M., Buckner, R., Caseras, X., Criv-
ello, F., Crone, E.A., Erk, S., Fisher, S.E., Franke, B., Glahn, D.C., Dannlowski,
U., Grotegerd, D., Gruber, O., Pol, H.E.H., Schumann, G., Tamnes, C.K., Wal-
ter, H., Wierenga, L.M., Jahanshad, N., Thompson, P.M., Frangou, S., Agartz,
I., Asherson, P., Ayesa-Arriola, R., Banaj, N., Banaschewski, T., Baumeister,
S., Bertolino, A., Borgwardt, S., Bourque, J., Brandeis, D., Breier, A., Buite-
laar, J.K., Cannon, D.M., Cervenka, S., Conrod, P.J., Crespo-Facorro, B., Davey,
C.G., Haan, L., Zubicaray, G.I., Giorgio, A.D., Frodl, T., Gruner, P., Gur, R.E.,
Gur, R.C., Harrison, B.J., Hatton, S.N., Hickie, I., Howells, F.M., Huyser, C.,
Jernigan, T.L., Jiang, J., Joska, J.A., Kahn, R.S., Kalnin, A.J., Kochan, N.A.,
Koops, S., Kuntsi, J., Lagopoulos, J., Lazaro, L., Lebedeva, I.S., Lochner, C.,
Martin, N.G., Mazoyer, B., McDonald, B.C., McDonald, C., McMahon, K.L.,
Medland, S., Modabbernia, A., Mwangi, B., Nakao, T., Nyberg, L., Piras, F.,
Portella, M.J., Qiu, J., Roffman, J.L., Sachdev, P.S., Sanford, N., Satterthwaite,
T.D., Saykin, A.J., Sellgren, C.M., Sim, K., Smoller, J.W., Soares, J.C., Sommer,
I.E., Spalletta, G., Stein, D.J., Thomopoulos, S.I., Tomyshev, A.S., Tordesillas-
Guti´ errez, D., Trollor, J.N., Ent, D., Heuvel, O.A., Erp, T.G., Haren, N.E.,
Vecchio, D., Veltman, D.J., Wang, Y., Weber, B., Wei, D., Wen, W., Westlye,
L.T., Williams, S.C., Wright, M.J., Wu, M.-J., Yu, K.: Normative modelling of
brain morphometry across the lifespan with CentileBrain: Algorithm benchmark-
ing and model optimisation. The Lancet Digital Health 6(3), 211–221 (2024)
https://doi.org/10.1016/S2589-7500(23)00250-9
[7] Marquand, A.F., Kia, S.M., Zabihi, M., Wolfers, T., Buitelaar, J.K., Beckmann,
C.F.: Conceptualizing mental disorders as deviations from normative function-
ing. Molecular Psychiatry 24(10), 1415–1424 (2019) https://doi.org/10.1038/
s41380-019-0441-1
[8] Fraza, C., Rutherford, S., Buˇ ckov´ a, B.R., Beckmann, C.F., Marquand, A.F.:
The promise of quantifying individual risk for brain disorders through normative
modeling, a narrative review. Neuroscience andhttps://doi.org/10.1038/s41586-
022-04554-y Biobehavioral Reviews 176, 106284 (2025) https://doi.org/10.1016/
26
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
j.neubiorev.2025.106284
[9] De Boer, A.A.A., Bayer, J.M.M., Kia, S.M., Rutherford, S., Zabihi, M., Fraza,
C., Barkema, P., Westlye, L.T., Andreassen, O.A., Hinne, M., Beckmann, C.F.,
Marquand, A.: Non-Gaussian normative modelling with hierarchical Bayesian
regression. Imaging Neuroscience 2, 2–00132 (2024) https://doi.org/10.1162/
imag
a 00132
[10] Di Biase, M., Tian, Y., Bethlehem, J. Richard and Seidlitz, Alexander-Bloch, T.
Aaron. and Yeo, Zalesky, A.: Mapping human brain charts cross-sectionally and
longitudinally. Proceedings of the National Academy of Sciences 120(20) (2023)
https://doi.org/10.1073/pnas.2216798120
[11] Rehak Buckova, B., Fraza, C., Reh´ ak, R., Koleniˇ c, M., Beckmann, C.F.,ˇSpaniel,
F., Marquand, A.F., Hlinka, J.: Using normative models pre-trained on cross-
sectional data to evaluate intra-individual longitudinal changes in neuroimaging
data. eLife 13, 95823 (2025) https://doi.org/10.7554/eLife.95823
[12] Bayer, J.M.M., Boer, A.A.A., Rehak-Bucova, B., Fraza, C.J., Banaschewski,
T., Barker, G.J., Bokde, A.L.W., Bruehl, R., Desrivieres, S., Flor, H., Gar-
avan, H., Gowland, P., Grigis, A., Heinz, A., Lemaitre, H., Martinot, J.-L.,
Martinot, M.-L.P., Artigues, E., Nees, F., Orfanos, D.P., Paus, T., Poustka, L.,
Smolka, M.N., Holz, N., Vaidya, N., Walter, H., Whelan, R., Wirsching, P.,
Schumann, G., Initiative, A.D.N., Kraguljac, N., Beckmann, C.F., Marquand,
A.F.: Charting the velocity of brain growth and development. arXiv (2026).
https://doi.org/10.48550/ARXIV.2601.07591
[13] Watanabe, S.: A widely applicable Bayesian information criterion. J. Mach. Learn.
Res. 14(1), 867–897 (2013)
[14] Kia, S.M., Huijsdens, H., Rutherford, S., De Boer, A., Dinga, R., Wolfers, T.,
Berthet, P., Mennes, M., Andreassen, O.A., Westlye, L.T., Beckmann, C.F.,
Marquand, A.F.: Closing the life-cycle of normative modeling using federated
hierarchical Bayesian regression. PLOS ONE 17(12), 0278776 (2022) https://doi.
org/10.1371/journal.pone.0278776
[15] Rutherford, S., Kia, S.M., Wolfers, T., Fraza, C., Zabihi, M., Dinga, R., Berthet,
P., Worker, A., Verdi, S., Ruhe, H.G., Beckmann, C.F., Marquand, A.F.: The
normative modeling framework for computational psychiatry. Nature Protocols
17(7), 1711–1734 (2022) https://doi.org/10.1038/s41596-022-00696-5
[16] Barkema, P., Rutherford, S., Lee, H.-C., Kia, S.M., Savage, H., Beckmann,
C., Marquand, A.: Predictive Clinical Neuroscience Portal (PCNportal): Instant
online access to research-grade normative models for clinical neuroscientists. Well-
come Open Research 8, 326 (2023) https://doi.org/10.12688/wellcomeopenres.
19591.2
27
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
[17] Little, B., Alyas, N., Surtees, A., Winston, G.P., Duncan, J.S., Cousins, D.A.,
Taylor, J.-P., Taylor, P., Leiberg, K., Wang, Y.: Brain morphology normative
modelling platform for abnormality and centile estimation: Brain MoNoCle.
Imaging Neuroscience3, 00438 (2025) https://doi.org/10.1162/imag
a 00438
[18] Gaiser, C., van der Vliet, R., de Boer, A.A.A., Donchin, O., Berthet, P., Devenyi,
G.A., Mallar Chakravarty, M., Diedrichsen, J., Marquand, A.F., Frens, M.A.,
Muetzel, R.L.: Population-wide cerebellar growth models of children and ado-
lescents. Nature Communications 15(1), 2351 (2024) https://doi.org/10.1038/
s41467-024-46398-2
[19] Worker, A., Berthert, P., Lawrence, A.J., Kia, S.M., Arango, C., Dinga, R.,
Galderisi, S., Glenthøj, B., Kahn, R.S., Leslie, A., Murray, R.M., Pariante,
C.M., Pantelis, C., Weiser, M., Winter-van Rossum, I., McGuire, P., Dazzan, P.,
Marquand, A.F.: Extreme deviations from the normative model reveal cortical
heterogeneity and associations with negative symptom severity in first-episode
psychosis from the OPTiMiSE and GAP studies. Translational Psychiatry 13(1),
373 (2023) https://doi.org/10.1038/s41398-023-02661-6
[20] Rutherford, S., Barkema, P., Tso, I.F., Sripada, C., Beckmann, C.F., Ruhe, H.G.,
Marquand, A.F.: Evidence for embracing normative modeling. eLife 12, 85082
(2023) https://doi.org/10.7554/eLife.85082
[21] Zabihi, M., Floris, D.L., Kia, S.M., Wolfers, T., Tillmann, J., Arenas, A.L.,
Moessnang, C., Banaschewski, T., Holt, R., Baron-Cohen, S., Loth, E., Char-
man, T., Bourgeron, T., Murphy, D., Ecker, C., Buitelaar, J.K., Beckmann,
C.F., Marquand, A.: Fractionating autism based on neuroanatomical norma-
tive modeling. Translational Psychiatry 10(1) (2020) https://doi.org/10.1038/
s41398-020-01057-0
[22] Kia, S.M., Marquand, A.: Normative Modeling of Neuroimaging Data using
Scalable Multi-Task Gaussian Processes. International Conference on Medi-
cal Image Computing and Computer-Assisted Intervention, 127–135 (2018)
arXiv:1806.01047
[23] Fraza, C.J., Dinga, R., Beckmann, C.F., Marquand, A.F.: Warped Bayesian Lin-
ear Regression for Normative Modelling of Big Data. bioRxiv, 2021–0405438429
(2021)
[24] Kia, S.M., Huijsdens, H., Dinga, R., Wolfers, T., Mennes, M., Andreassen, O.A.,
Westlye, L.T., Beckmann, C.F., Marquand, A.F.: Hierarchical Bayesian Regres-
sion for Multi-Site Normative Modeling of Neuroimaging Data. International
Conference on Medical Image Computing and Computer-Assisted Intervention,
699–709 (2020) arXiv:2005.12055
[25] Bayer, J.M.M., Dinga, R., Kia, S.M., Kottaram, A.R., Wolfers, T., Lv, J., Zalesky,
A., Schmaal, L., Marquand, A.: Accommodating site variation in neuroimaging
28
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
data using normative and hierarchical Bayesian models. NeuroImage264, 119699
(2022) https://doi.org/10.1016/j.neuroimage.2022.119699
[26] Dinga, R., Fraza, C.J., Bayer, J.M.M., Kia, S.M., Beckmann, C.F., Marquand,
A.F.: Normative Modeling of Neuroimaging Data Using Generalized Additive
Models of Location Scale and Shape. Neuroscience (2021). https://doi.org/10.
1101/2021.06.14.448106
[27] Marquand, A., Rutherford, S., Dinga, R.: Fairly evaluating the performance of
normative models. The Lancet Digital Health 6(11), 775 (2024) https://doi.org/
10.1016/S2589-7500(24)00200-0
[28] Rutherford, S., Wolfers, T., Fraza, C., Harnett, N.G., Beckmann, C.F., Ruhe,
H.G., Marquand, A.F.: To which reference class do you belong? Measuring racial
fairness of reference classes with normative modeling. arXiv (2024). https://doi.
org/10.48550/ARXIV.2407.19114
[29] Sun, L., Qin, W., Liang, X., Wang, C., Men, W., Duan, Y., Fan, X.-R., Cai, Q.,
Qiu, S., Wang, M., Gong, Q., Tian, Y., Liang, P., Liu, Z., Zhang, X., Song, H.,
Ye, Z., Zhang, P., Dong, Q., Tao, S., Zhu, W., Zhang, J., Xie, F., Feng, J., Zhang,
J., Liu, C., Qian, Q., Zhang, B., Meng, M., Hu, L., Gao, J.-H., Jiang, T., Zhu,
X., Zhang, Y., Liu, L., Liu, H., Liao, W., Wang, D., Wang, H., Guo, T., Dai, Z.,
Lui, S., Xu, K., Li, L., Xie, P., Feng, C., Cui, G., Wu, J., Yin, X., Ding, G., Xian,
J., Zhao, L., Lu, J., Liu, Z., Han, Y., Yuan, Z., Zhang, X., Si, T., Zhou, F., Bi,
Y., Wu, D., Gao, F., Wang, F., Qin, S., Wang, G., Chen, F., Zhang, Z., Sui, J.,
Chen, H., Cai, J., Liu, S., Geng, Z., Zhang, C., Mao, N., Yin, H., Liu, B., Ma, H.,
Gao, B., Miao, Y., Kong, X.-Z., Zhou, Y., Liu, L., Hu, J., Wang, L., Zhang, Q.,
Shu, H., Wang, P., Lee, T.M.C., Cao, Q., Yang, L., Zhang, X., Luo, W., Liang,
M., Yao, H., Li, M., Huang, H., Peng, Y., Han, Z., Zhou, C., Xu, H., Feng, M.,
Shen, W., Hu, Y., Chen, H., Wang, Y., Gong, G., Yan, Z., Xu, X., Liu, J., Chen,
G., Wang, P., Yang, Y., Yao, D., Han, T., He, H., Chen, C., Zou, Q., Liu, H.,
Zhang, H., Chai, C., Lu, C., Tu, Y., Liu, Y., Lin, D., Zhao, W., Xu, X., Liu, X.,
Cui, Z., Wang, Z., Huang, R., Li, Z., Liu, Y., Li, X., Yang, X., Zhang, N., Chen,
A., Zhang, B., Qin, P., Liu, C., Yao, Z., Wei, Y., Yuan, H., Wang, F., Zhang,
Y., Zhang, Q., Hu, F., Xie, H., Wu, X., Wang, J., Fan, G., Wang, Z., Zhang, D.,
Zhong, H., Wang, Y., Bai, L., Li, Y., Wei, X., Wang, J., Zhang, Y., He, H., Li,
S., Zhang, T., Jiang, F., Yang, J., Chen, F., Liu, F., Liu, H., Chen, N., Yang, J.,
Hou, B., Huang, C.-C., Zhu, J., Cai, H., Wei, D., Chen, Q., Wei, Y., Miao, P., Li,
Y., Liu, Y., Yang, N., Gao, X., Liu, Y., Shen, Y., Huang, X., Ji, G.-J., Zhang, L.,
Qiu, J., Yu, Y., Lin, C.-P., Feng, F., Li, K., Yu, C., He, Y.: Population-specific
brain charts reveal chinese-western differences in neurodevelopmental trajectories
(2025) https://doi.org/10.1101/2025.06.17.659820
[30] Elleaume, C., Hebling Vieira, B., Floris, D.L., Langer, N.: Toward robust
neuroanatomical normative models: Influence of sample size and covariates
distributions (2025) https://doi.org/10.7554/elife.108952.1
29
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint
[31] DESTRIEUX, C., FISCHL, B., DALE, A., HALGREN, E.: Automatic parcella-
tion of human cortical gyri and sulci using standard anatomical nomenclature.
NeuroImage53(1), 1–15 (2010) https://doi.org/10.1016/j.neuroimage.2010.06.
010
[32] Liuzzi, L., Chang, K.K., Zheng, C., Keren, H., Saha, D., Nielson, D.M., Stringaris,
A.: Magnetoencephalographic correlates of mood and reward dynamics in human
adolescents. Cerebral Cortex 32(15), 3318–3330 (2022) https://doi.org/10.1093/
cercor/bhab417
[33] Baranger, D.A.A., Halchenko, Y.O., Satz, S., Ragozzino, R., Iyengar, S., Swartz,
H.A., Manelis, A.: Aberrant Levels of Cortical Myelin Distinguish Individuals
with Unipolar Depression from Healthy Controls. medRxiv (2021). https://doi.
org/10.1101/2021.02.25.21252472
[34] Zamanzadeh, M., Verduyn, Y., Boer, A., Ros, T., Wolfers, T., Dinga, R.,
ˇSaf´ aˇ r Postma, M., Marquand, A.F., Wingerden, M., Kia, S.M.: MEGaNorm: Nor-
mative modeling of MEG brain oscillations across the human lifespan. bioRxiv,
2025–06 (2025)
[35] Di Biase, M., Tian, Y., Bethlehem, R., Seidlitz, J., Alexander-Bloch, A.,
Yeo, B.T., Zalesky, A.: Mapping human brain charts cross-sectionally
and longitudinally. Proceedings of the National Academy of Sciences
120(20), 2216798120 (2023) https://doi.org/10.1073/pnas.2216798120
https://www.pnas.org/doi/pdf/10.1073/pnas.2216798120
[36] Cirstian, R., Forde, N.J., Zhang, H., Hellemann, G.S., Beckmann, C.F., Kraguljac,
N.V., Marquand, A.F.: Lifespan normative models of white matter fractional
anisotropy: Applications to early psychosis. Biological Psychiatry (2025) https:
//doi.org/10.1016/j.biopsych.2025.07.021
[37] Fraza, C., Sønderby, I.E., Boen, R., Shi, Y., Beckmann, C.F., Marquand, A.F.:
Unraveling the link between CNVs, cognition and individual neuroimaging devi-
ation scores from a population-based reference cohort. Nature Mental Health
2(12), 1451–1463 (2024) https://doi.org/10.1038/s44220-024-00322-1 . Accessed
2026-02-10
[38] Berthet, P., Haatveit, B.C., Kjelkenes, R., Worker, A., Kia, S.M., Wolfers, T.,
Rutherford, S., Alnaes, D., Dinga, R., Pedersen, M.L., Dahl, A., Fernandez-
Cabello, S., Dazzan, P., Agartz, I., Nesv˚ ag, R., Ueland, T., Andreassen, O.A.,
Simonsen, C., Westlye, L.T., Melle, I., Marquand, A.: A 10-Year Longitudinal
Study of Brain Cortical Thickness in People with First-Episode Psychosis Using
Normative Models. Schizophrenia Bulletin, 107 (2024) https://doi.org/10.1093/
schbul/sbae107 . Accessed 2024-10-27
30
.CC-BY 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted February 18, 2026. ; https://doi.org/10.64898/2026.02.17.706268doi: bioRxiv preprint