NeuroComBat#
- class uniharmony.combat.NeuroComBat(empirical_bayes: bool = True, parametric_adjustments: bool = True, mean_only: bool = False)#
Harmonize scanner effects in multi-site imaging data.
This transformer performs harmonization using a parametric empirical Bayes framework proposed in ComBat [Rc2682b75cd7b-1] and adapted to neuroimaging data here [Rc2682b75cd7b-2] .
- Parameters:
- empirical_bayesbool, optional (default True)
Whether to perform empirical Bayes.
- parametric_adjustmentsbool, optional (default True)
Whether to perform parametric adjustments.
- mean_onlybool, optional (default False)
Whether to only adjust mean (no scaling).
- Attributes:
- sites_array, shape (n_samples,)
Fitted site names.
Methods
fit(X, sites[, categorical_covariates, ...])Compute per-feature statistics to perform harmonization.
fit_design_matrix(sites, ...)Fit encoders and make design matrix.
fit_ls_model(data, design, idx_per_site[, ...])Fit L/S model.
fit_standardize(X, design, n_samples, ...[, ...])Standardization of the features.
fit_transform(X, sites, **fit_params)Fit to data, then transform it.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
harmonize(data, mean, idx_per_site[, epsilon])Compute the final harmonized data.
set_fit_request(*[, categorical_covariates, ...])Configure whether metadata should be requested to be passed to the
fitmethod.set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
set_transform_request(*[, ...])Configure whether metadata should be requested to be passed to the
transformmethod.transform(X, sites[, ...])Harmonize data.
transform_design_matrix(sites, ...)Transform using fitted encoders and make design matrix.
transform_standardize(X, design, n_samples)Standardize features on fitted standardization of input.
References
[1]W. Evan Johnson and Cheng Li “Adjusting batch effects in microarray expression data using empirical Bayes methods.” Biostatistics, 8(1):118-127, 2007. https://doi.org/10.1093/biostatistics/kxj037
[2]Fortin, Jean-Philippe, et al. “Harmonization of cortical thickness measurements across scanners and sites.” Neuroimage 167 (2018): 104-120. https://doi.org/10.1016/j.neuroimage.2017.11.024
- fit(X: ArrayLike, sites: ArrayLike, categorical_covariates: ArrayLike | None = None, continuous_covariates: ArrayLike | None = None, var_epsilon: float = 1e-08, delta_epsilon: float = 1e-08, tau_2_epsilon: float = 1e-10, max_iter: int = 1000) NeuroComBat#
Compute per-feature statistics to perform harmonization.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
The training input samples.
- sitesarray-like, shape (n_samples,)
Sites.
- categorical_covariatesarray-like, shape (n_samples, n_categorical_covariates) or None, optional (default None)
The categorical covariates to be preserved during harmonization. (e.g., sex, disease).
- continuous_covariatesarray-like, shape (n_samples, n_continuous_covariates) or None, optional (default None)
The continuous covariates to be preserved during harmonization. (e.g., age, clinical scores).
- var_epsilonfloat, optional (default 1e-8)
Small constant to add to variance to avoid division by zero.
- delta_epsilonfloat, optional (default 1e-8)
Small constant to add to delta variance to avoid division by zero in full mode. This is only used if empirical_bayes=True and parametric_adjustments=True.
- tau_2_epsilonfloat, optional (default 1e-10)
Small constant to add to tau_2 variance to avoid division by zero in full mode. This is only used if empirical_bayes=True and parametric_adjustments=True.
- max_iterint, optional (default 1000)
Maximum number of iterations for the solver in full mode. This is only used if empirical_bayes=True and parametric_adjustments=True.
- set_fit_request(*, categorical_covariates: bool | None | str = '$UNCHANGED$', continuous_covariates: bool | None | str = '$UNCHANGED$', delta_epsilon: bool | None | str = '$UNCHANGED$', max_iter: bool | None | str = '$UNCHANGED$', sites: bool | None | str = '$UNCHANGED$', tau_2_epsilon: bool | None | str = '$UNCHANGED$', var_epsilon: bool | None | str = '$UNCHANGED$') NeuroComBat#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- categorical_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
categorical_covariatesparameter infit.- continuous_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
continuous_covariatesparameter infit.- delta_epsilonstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
delta_epsilonparameter infit.- max_iterstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
max_iterparameter infit.- sitesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sitesparameter infit.- tau_2_epsilonstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
tau_2_epsilonparameter infit.- var_epsilonstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
var_epsilonparameter infit.
- Returns:
- selfobject
The updated object.
- set_transform_request(*, categorical_covariates: bool | None | str = '$UNCHANGED$', continuous_covariates: bool | None | str = '$UNCHANGED$', sites: bool | None | str = '$UNCHANGED$') NeuroComBat#
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- categorical_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
categorical_covariatesparameter intransform.- continuous_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
continuous_covariatesparameter intransform.- sitesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sitesparameter intransform.
- Returns:
- selfobject
The updated object.
- transform(X: ArrayLike, sites: ArrayLike, categorical_covariates: ArrayLike | None = None, continuous_covariates: ArrayLike | None = None) ndarray[tuple[Any, ...], dtype[_ScalarT]]#
Harmonize data.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
The data to be harmonized.
- sitesarray-like, shape (n_samples,)
Sites.
- categorical_covariatesarray-like, shape (n_samples, n_categorical_covariates) or None, optional (default None)
The categorical covariates to be preserved during harmonization. (e.g., sex, disease).
- continuous_covariatesarray-like, shape (n_samples, n_continuous_covariates) or None, optional (default None)
The continuous covariates to be preserved during harmonization. (e.g., age, clinical scores).
- Returns:
- array, shape (n_samples, n_features)
The array containing the harmonized data across sites.
- Raises:
- ValueError
If one or more site or sites is or are unseen.
Examples#
Analysing NeuroComBat behaviour with imbalance across sites