NeuroComBat#

class uniharmony.combat.NeuroComBat(empirical_bayes: bool = True, parametric_adjustments: bool = True, mean_only: bool = False)#

Harmonize scanner effects in multi-site imaging data.

This transformer performs harmonization using a parametric empirical Bayes framework proposed in ComBat [Rc2682b75cd7b-1] and adapted to neuroimaging data here [Rc2682b75cd7b-2] .

Parameters:
empirical_bayesbool, optional (default True)

Whether to perform empirical Bayes.

parametric_adjustmentsbool, optional (default True)

Whether to perform parametric adjustments.

mean_onlybool, optional (default False)

Whether to only adjust mean (no scaling).

Attributes:
sites_array, shape (n_samples,)

Fitted site names.

Methods

fit(X, sites[, categorical_covariates, ...])

Compute per-feature statistics to perform harmonization.

fit_design_matrix(sites, ...)

Fit encoders and make design matrix.

fit_ls_model(data, design, idx_per_site[, ...])

Fit L/S model.

fit_standardize(X, design, n_samples, ...[, ...])

Standardization of the features.

fit_transform(X, sites, **fit_params)

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

harmonize(data, mean, idx_per_site[, epsilon])

Compute the final harmonized data.

set_fit_request(*[, categorical_covariates, ...])

Configure whether metadata should be requested to be passed to the fit method.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

set_transform_request(*[, ...])

Configure whether metadata should be requested to be passed to the transform method.

transform(X, sites[, ...])

Harmonize data.

transform_design_matrix(sites, ...)

Transform using fitted encoders and make design matrix.

transform_standardize(X, design, n_samples)

Standardize features on fitted standardization of input.

References

[1]

W. Evan Johnson and Cheng Li “Adjusting batch effects in microarray expression data using empirical Bayes methods.” Biostatistics, 8(1):118-127, 2007. https://doi.org/10.1093/biostatistics/kxj037

[2]

Fortin, Jean-Philippe, et al. “Harmonization of cortical thickness measurements across scanners and sites.” Neuroimage 167 (2018): 104-120. https://doi.org/10.1016/j.neuroimage.2017.11.024

fit(X: ArrayLike, sites: ArrayLike, categorical_covariates: ArrayLike | None = None, continuous_covariates: ArrayLike | None = None, var_epsilon: float = 1e-08, delta_epsilon: float = 1e-08, tau_2_epsilon: float = 1e-10, max_iter: int = 1000) NeuroComBat#

Compute per-feature statistics to perform harmonization.

Parameters:
Xarray-like, shape (n_samples, n_features)

The training input samples.

sitesarray-like, shape (n_samples,)

Sites.

categorical_covariatesarray-like, shape (n_samples, n_categorical_covariates) or None, optional (default None)

The categorical covariates to be preserved during harmonization. (e.g., sex, disease).

continuous_covariatesarray-like, shape (n_samples, n_continuous_covariates) or None, optional (default None)

The continuous covariates to be preserved during harmonization. (e.g., age, clinical scores).

var_epsilonfloat, optional (default 1e-8)

Small constant to add to variance to avoid division by zero.

delta_epsilonfloat, optional (default 1e-8)

Small constant to add to delta variance to avoid division by zero in full mode. This is only used if empirical_bayes=True and parametric_adjustments=True.

tau_2_epsilonfloat, optional (default 1e-10)

Small constant to add to tau_2 variance to avoid division by zero in full mode. This is only used if empirical_bayes=True and parametric_adjustments=True.

max_iterint, optional (default 1000)

Maximum number of iterations for the solver in full mode. This is only used if empirical_bayes=True and parametric_adjustments=True.

set_fit_request(*, categorical_covariates: bool | None | str = '$UNCHANGED$', continuous_covariates: bool | None | str = '$UNCHANGED$', delta_epsilon: bool | None | str = '$UNCHANGED$', max_iter: bool | None | str = '$UNCHANGED$', sites: bool | None | str = '$UNCHANGED$', tau_2_epsilon: bool | None | str = '$UNCHANGED$', var_epsilon: bool | None | str = '$UNCHANGED$') NeuroComBat#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
categorical_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for categorical_covariates parameter in fit.

continuous_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for continuous_covariates parameter in fit.

delta_epsilonstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for delta_epsilon parameter in fit.

max_iterstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for max_iter parameter in fit.

sitesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sites parameter in fit.

tau_2_epsilonstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for tau_2_epsilon parameter in fit.

var_epsilonstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for var_epsilon parameter in fit.

Returns:
selfobject

The updated object.

set_transform_request(*, categorical_covariates: bool | None | str = '$UNCHANGED$', continuous_covariates: bool | None | str = '$UNCHANGED$', sites: bool | None | str = '$UNCHANGED$') NeuroComBat#

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
categorical_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for categorical_covariates parameter in transform.

continuous_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for continuous_covariates parameter in transform.

sitesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sites parameter in transform.

Returns:
selfobject

The updated object.

transform(X: ArrayLike, sites: ArrayLike, categorical_covariates: ArrayLike | None = None, continuous_covariates: ArrayLike | None = None) ndarray[tuple[Any, ...], dtype[_ScalarT]]#

Harmonize data.

Parameters:
Xarray-like, shape (n_samples, n_features)

The data to be harmonized.

sitesarray-like, shape (n_samples,)

Sites.

categorical_covariatesarray-like, shape (n_samples, n_categorical_covariates) or None, optional (default None)

The categorical covariates to be preserved during harmonization. (e.g., sex, disease).

continuous_covariatesarray-like, shape (n_samples, n_continuous_covariates) or None, optional (default None)

The continuous covariates to be preserved during harmonization. (e.g., age, clinical scores).

Returns:
array, shape (n_samples, n_features)

The array containing the harmonized data across sites.

Raises:
ValueError

If one or more site or sites is or are unseen.

Examples#

Binary classification with NeuroComBat

Binary classification with NeuroComBat

Analysing NeuroComBat behaviour with imbalance across sites

Analysing NeuroComBat behaviour with imbalance across sites

Using NeuroComBat with MAREoS dataset

Using NeuroComBat with MAREoS dataset