ComBatGAM#

class uniharmony.combat.ComBatGAM(empirical_bayes: bool = True, parametric_adjustments: bool = True, mean_only: bool = False)#

Harmonize multi-site scanner effects controlling for non-linear age effects.

This is an improvement on NeuroComBat allowing for non-linear effects to be controlled by Generalized Additive Models (GAMs).

Parameters:
empirical_bayesbool, optional (default True)

Whether to perform empirical Bayes.

parametric_adjustmentsbool, optional (default True)

Whether to perform parametric adjustments.

mean_onlybool, optional (default False)

Whether to only adjust mean (no scaling).

Attributes:
sites_array, shape (n_samples,)

Fitted site names.

Methods

fit(X, sites, smooth_covariates[, ...])

Compute per-feature statistics to perform harmonization.

fit_design_matrix(sites, ...)

Fit encoders and make design matrix.

fit_ls_model(data, design, idx_per_site[, ...])

Fit L/S model.

fit_standardize(X, design, n_samples, ...[, ...])

Standardization of the features.

fit_transform(X, sites, smooth_covariates, ...)

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

harmonize(data, mean, idx_per_site[, epsilon])

Compute the final harmonized data.

set_fit_request(*[, continuous_covariates, ...])

Configure whether metadata should be requested to be passed to the fit method.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

set_transform_request(*[, ...])

Configure whether metadata should be requested to be passed to the transform method.

transform(X, sites, smooth_covariates[, ...])

Harmonize data.

transform_design_matrix(sites, ...)

Transform using fitted encoders and make design matrix.

transform_standardize(X, design, n_samples)

Standardize features on fitted standardization of input.

References

[1]

Pomponio, R., Shou, H., Davatzikos, C., et al., (2019). “Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan.” Neuroimage 208. https://doi.org/10.1016/j.neuroimage.2019.116450.

fit(X: ArrayLike, sites: ArrayLike, smooth_covariates: ArrayLike, smooth_covariates_bounds: tuple[float, float] | None = None, continuous_covariates: ArrayLike | None = None, df: int = 10, degree: int = 3, var_epsilon: float = 1e-08, delta_epsilon: float = 1e-08, tau_2_epsilon: float = 1e-10, max_iter: int = 1000) ComBatGAM#

Compute per-feature statistics to perform harmonization.

Parameters:
Xarray-like, shape (n_samples, n_features)

The training input samples.

sitesarray-like, shape (n_samples,)

Sites.

smooth_covariatesarray-like, shape (n_samples, n_smooth_covariates)

The smooth, non-linear covariates. GAMs are used for optimal smoothing (e.g., age).

smooth_covariates_boundstuple of float and float or None, optional (default None)

Custom boundaries of the smoothing terms useful when holdout data covers different range than specify the bounds as (minimum, maximum). Currently not supported for models with multiple smooth covariates.

continuous_covariatesarray-like, shape (n_samples, n_continuous_covariates) or None, optional (default None)

The continuous covariates to be preserved during harmonization (e.g., clinical scores).

dfint, optional (default 10)

Number of basis functions or degrees of freedom for BSplines. Default value used in the original implementation.

degreeint, optional (default 3)

Degree(s) of the spline for BSplines. Default value used in the original implementation.

var_epsilonfloat, optional (default 1e-8)

Small constant to add to variance to avoid division by zero.

delta_epsilonfloat, optional (default 1e-8)

Small constant to add to delta variance to avoid division by zero in full mode. This is only used if empirical_bayes=True and parametric_adjustments=True.

tau_2_epsilonfloat, optional (default 1e-10)

Small constant to add to tau_2 variance to avoid division by zero in full mode. This is only used if empirical_bayes=True and parametric_adjustments=True.

max_iterint, optional (default 1000)

Maximum number of iterations for the solver in full mode. This is only used if empirical_bayes=True and parametric_adjustments=True.

fit_transform(X: ArrayLike, sites: ArrayLike, smooth_covariates: ArrayLike, **fit_params) ndarray[tuple[Any, ...], dtype[_ScalarT]]#

Fit to data, then transform it.

Fits transformer to X and sites with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like, shape (n_samples, n_features)

Input samples.

sitesarray-like, shape (n_samples, 1)

Sites.

smooth_covariatesarray-like, shape (n_samples, n_smooth_terms)

The smooth, non-linear covariates. GAMs are used for optimal smoothing (e.g., age).

**fit_paramsdict

Additional fit parameters.

Returns:
array, shape (n_samples, n_features)

Transformed array.

set_fit_request(*, continuous_covariates: bool | None | str = '$UNCHANGED$', degree: bool | None | str = '$UNCHANGED$', delta_epsilon: bool | None | str = '$UNCHANGED$', df: bool | None | str = '$UNCHANGED$', max_iter: bool | None | str = '$UNCHANGED$', sites: bool | None | str = '$UNCHANGED$', smooth_covariates: bool | None | str = '$UNCHANGED$', smooth_covariates_bounds: bool | None | str = '$UNCHANGED$', tau_2_epsilon: bool | None | str = '$UNCHANGED$', var_epsilon: bool | None | str = '$UNCHANGED$') ComBatGAM#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
continuous_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for continuous_covariates parameter in fit.

degreestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for degree parameter in fit.

delta_epsilonstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for delta_epsilon parameter in fit.

dfstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for df parameter in fit.

max_iterstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for max_iter parameter in fit.

sitesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sites parameter in fit.

smooth_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for smooth_covariates parameter in fit.

smooth_covariates_boundsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for smooth_covariates_bounds parameter in fit.

tau_2_epsilonstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for tau_2_epsilon parameter in fit.

var_epsilonstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for var_epsilon parameter in fit.

Returns:
selfobject

The updated object.

set_transform_request(*, continuous_covariates: bool | None | str = '$UNCHANGED$', sites: bool | None | str = '$UNCHANGED$', smooth_covariates: bool | None | str = '$UNCHANGED$') ComBatGAM#

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
continuous_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for continuous_covariates parameter in transform.

sitesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sites parameter in transform.

smooth_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for smooth_covariates parameter in transform.

Returns:
selfobject

The updated object.

transform(X: ArrayLike, sites: ArrayLike, smooth_covariates: ArrayLike, continuous_covariates: ArrayLike | None = None) ndarray[tuple[Any, ...], dtype[_ScalarT]]#

Harmonize data.

Parameters:
Xarray-like, shape (n_samples, n_features)

The data to be harmonized.

sitesarray-like, shape (n_samples,)

Sites.

smooth_covariatesarray-like, shape (n_samples, n_smooth_covariates)

The smooth, non-linear terms. GAMs are used for optimal smoothing (e.g., age).

continuous_covariatesarray-like, shape (n_samples, n_continuous_covariates) or None, optional (default None)

The continuous covariates to be preserved during harmonization. (e.g., clinical scores).

Returns:
array, shape (n_samples, n_features)

The array containing the harmonized data across sites.

Raises:
ValueError

If one or more site or sites is or are unseen.

Examples#

Binary classification with ComBatGAM

Binary classification with ComBatGAM

Analysing ComBatGAM behaviour with imbalance across sites

Analysing ComBatGAM behaviour with imbalance across sites

Using ComBatGAM with MAREoS dataset

Using ComBatGAM with MAREoS dataset