BARTharm#
Paper
Prevot E, et al., (2025). BARTharm: MRI Harmonization Using Image Quality Metrics and Bayesian Non-parametric. bioRxiv. Published online 2025. doi:10.1101/2025.06.04.657792 https://www.biorxiv.org/content/10.1101/2025.06.04.657792v1
Source code
Overview#
BARTharm is a statistical harmonization framework for MRI-derived phenotypes (IDPs) that removes scanner-induced variability while preserving biological signal.
Unlike traditional methods (e.g., ComBat), BARTharm:
does not rely on discrete scanner/site labels
leverages Image Quality Metrics (IQMs) as continuous proxies of acquisition variability
uses Bayesian Additive Regression Trees (BART) to model complex, non-linear effects
This enables:
modeling continuous scanner variation
capturing within-scanner heterogeneity
harmonizing unseen or anonymized datasets
The method jointly models:
biological signal
scanner-related variation
within a unified Bayesian framework.
BARTharm is a fully data-driven harmonization framework with clear advantages in scenarios such as model misspecification or when scanner-related variables are correlated with biological covariates, where standard methods can lead to inflated false positive rates (FPR). By flexibly modeling scanner effects as non-linear functions of IQMs, BARTharm reduces residual acquisition-related variability that would otherwise bias downstream analyses, resulting in better-calibrated inference and more reliable detection of true biological effects.
Method Summary#
BARTharm decomposes each IDP as:
ฮผ(ยท) โ scanner-related effects (learned from IQMs)
ฯ(ยท) โ biological signal
ฮต โ noise
Both components are modeled using independent BART ensembles, allowing:
non-linear effects
high-order interactions
fully data-driven learning
Harmonized data is obtained by removing the estimated scanner component:
This avoids the restrictive location-scale assumptions of classical approaches like ComBat.
Above is the homoskedastic version, which captures scanner effects in the mean structure through flexible, non-linear functions of IQMs, without requiring scanner or site labels. This allows the model to account for complex, continuous acquisition variability and within-scanner heterogeneity.
There is also a heteroskedastic version, which extends the model to account for scanner-specific differences in variance. In this setting, the residual variance is allowed to vary across scanners (when available), introducing a multiplicative scaling term that captures differences in noise levels and reliability across acquisition settings. The resulting harmonization removes both additive (mean) and multiplicative (variance) scanner effects, while preserving the estimated biological signal.
Key Advantages#
No reliance on scanner IDs
Works with missing or anonymized metadata
Handles non-linear scanner effects
Captures continuous acquisition variability
Naturally extends to unseen datasets
Provides uncertainty quantification via Bayesian inference
Compared to standard harmonization:
ComBat assumes linear additive + multiplicative effects
BARTharm learns flexible functions directly from data
Implementation#
To be implemented.