ComBatGAM#
- class uniharmony.combat.ComBatGAM(empirical_bayes: bool = True, parametric_adjustments: bool = True, mean_only: bool = False)#
Harmonize multi-site scanner effects controlling for non-linear age effects.
This is an improvement on NeuroComBat allowing for non-linear effects to be controlled by Generalized Additive Models (GAMs).
- Parameters:
- empirical_bayesbool, optional (default True)
Whether to perform empirical Bayes.
- parametric_adjustmentsbool, optional (default True)
Whether to perform parametric adjustments.
- mean_onlybool, optional (default False)
Whether to only adjust mean (no scaling).
- Attributes:
- sites_array, shape (n_samples,)
Fitted site names.
Methods
fit(X, sites, smooth_covariates[, ...])Compute per-feature statistics to perform harmonization.
fit_design_matrix(sites, ...)Fit encoders and make design matrix.
fit_ls_model(data, design, idx_per_site[, ...])Fit L/S model.
fit_standardize(X, design, n_samples, ...[, ...])Standardization of the features.
fit_transform(X, sites, smooth_covariates, ...)Fit to data, then transform it.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
harmonize(data, mean, idx_per_site[, epsilon])Compute the final harmonized data.
set_fit_request(*[, continuous_covariates, ...])Configure whether metadata should be requested to be passed to the
fitmethod.set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
set_transform_request(*[, ...])Configure whether metadata should be requested to be passed to the
transformmethod.transform(X, sites, smooth_covariates[, ...])Harmonize data.
transform_design_matrix(sites, ...)Transform using fitted encoders and make design matrix.
transform_standardize(X, design, n_samples)Standardize features on fitted standardization of input.
References
[1]Pomponio, R., Shou, H., Davatzikos, C., et al., (2019). “Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan.” Neuroimage 208. https://doi.org/10.1016/j.neuroimage.2019.116450.
- fit(X: ArrayLike, sites: ArrayLike, smooth_covariates: ArrayLike, smooth_covariates_bounds: tuple[float, float] | None = None, continuous_covariates: ArrayLike | None = None, df: int = 10, degree: int = 3, var_epsilon: float = 1e-08, delta_epsilon: float = 1e-08, tau_2_epsilon: float = 1e-10, max_iter: int = 1000) ComBatGAM#
Compute per-feature statistics to perform harmonization.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
The training input samples.
- sitesarray-like, shape (n_samples,)
Sites.
- smooth_covariatesarray-like, shape (n_samples, n_smooth_covariates)
The smooth, non-linear covariates. GAMs are used for optimal smoothing (e.g., age).
- smooth_covariates_boundstuple of float and float or None, optional (default None)
Custom boundaries of the smoothing terms useful when holdout data covers different range than specify the bounds as (minimum, maximum). Currently not supported for models with multiple smooth covariates.
- continuous_covariatesarray-like, shape (n_samples, n_continuous_covariates) or None, optional (default None)
The continuous covariates to be preserved during harmonization (e.g., clinical scores).
- dfint, optional (default 10)
Number of basis functions or degrees of freedom for BSplines. Default value used in the original implementation.
- degreeint, optional (default 3)
Degree(s) of the spline for BSplines. Default value used in the original implementation.
- var_epsilonfloat, optional (default 1e-8)
Small constant to add to variance to avoid division by zero.
- delta_epsilonfloat, optional (default 1e-8)
Small constant to add to delta variance to avoid division by zero in full mode. This is only used if empirical_bayes=True and parametric_adjustments=True.
- tau_2_epsilonfloat, optional (default 1e-10)
Small constant to add to tau_2 variance to avoid division by zero in full mode. This is only used if empirical_bayes=True and parametric_adjustments=True.
- max_iterint, optional (default 1000)
Maximum number of iterations for the solver in full mode. This is only used if empirical_bayes=True and parametric_adjustments=True.
- fit_transform(X: ArrayLike, sites: ArrayLike, smooth_covariates: ArrayLike, **fit_params) ndarray[tuple[Any, ...], dtype[_ScalarT]]#
Fit to data, then transform it.
Fits transformer to X and sites with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
Input samples.
- sitesarray-like, shape (n_samples, 1)
Sites.
- smooth_covariatesarray-like, shape (n_samples, n_smooth_terms)
The smooth, non-linear covariates. GAMs are used for optimal smoothing (e.g., age).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- array, shape (n_samples, n_features)
Transformed array.
- set_fit_request(*, continuous_covariates: bool | None | str = '$UNCHANGED$', degree: bool | None | str = '$UNCHANGED$', delta_epsilon: bool | None | str = '$UNCHANGED$', df: bool | None | str = '$UNCHANGED$', max_iter: bool | None | str = '$UNCHANGED$', sites: bool | None | str = '$UNCHANGED$', smooth_covariates: bool | None | str = '$UNCHANGED$', smooth_covariates_bounds: bool | None | str = '$UNCHANGED$', tau_2_epsilon: bool | None | str = '$UNCHANGED$', var_epsilon: bool | None | str = '$UNCHANGED$') ComBatGAM#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- continuous_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
continuous_covariatesparameter infit.- degreestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
degreeparameter infit.- delta_epsilonstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
delta_epsilonparameter infit.- dfstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
dfparameter infit.- max_iterstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
max_iterparameter infit.- sitesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sitesparameter infit.- smooth_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
smooth_covariatesparameter infit.- smooth_covariates_boundsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
smooth_covariates_boundsparameter infit.- tau_2_epsilonstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
tau_2_epsilonparameter infit.- var_epsilonstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
var_epsilonparameter infit.
- Returns:
- selfobject
The updated object.
- set_transform_request(*, continuous_covariates: bool | None | str = '$UNCHANGED$', sites: bool | None | str = '$UNCHANGED$', smooth_covariates: bool | None | str = '$UNCHANGED$') ComBatGAM#
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- continuous_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
continuous_covariatesparameter intransform.- sitesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sitesparameter intransform.- smooth_covariatesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
smooth_covariatesparameter intransform.
- Returns:
- selfobject
The updated object.
- transform(X: ArrayLike, sites: ArrayLike, smooth_covariates: ArrayLike, continuous_covariates: ArrayLike | None = None) ndarray[tuple[Any, ...], dtype[_ScalarT]]#
Harmonize data.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
The data to be harmonized.
- sitesarray-like, shape (n_samples,)
Sites.
- smooth_covariatesarray-like, shape (n_samples, n_smooth_covariates)
The smooth, non-linear terms. GAMs are used for optimal smoothing (e.g., age).
- continuous_covariatesarray-like, shape (n_samples, n_continuous_covariates) or None, optional (default None)
The continuous covariates to be preserved during harmonization. (e.g., clinical scores).
- Returns:
- array, shape (n_samples, n_features)
The array containing the harmonized data across sites.
- Raises:
- ValueError
If one or more site or sites is or are unseen.
Examples#
Analysing ComBatGAM behaviour with imbalance across sites