make_multisite_classification

make_multisite_classification#

uniharmony.datasets.make_multisite_classification(n_sites: int = 2, n_samples: int | list[int] = 1000, n_features: int = 10, n_classes: int = 2, balance_per_site: list[float] | list[list[float]] | None = None, signal_type: Literal['linear', 'circular', 'moons', 'blobs', 'gaussian_quantiles'] = 'linear', signal_strength: float = 1.0, noise_strength: list[float] | float = 0.1, site_effect_type: Literal['location', 'scale', 'location+scale'] = 'location', site_effect_strength: list[float] | float = 3.0, site_effect_homogeneous: bool = True, random_state: int | RandomState = 42, **kwargs) tuple[ndarray, ndarray, ndarray]#

Simulate multi-site data with signal, noise, and site effect components.

In the data generation process, first a ‘base’ problem is generated using sklearn functions, selected with “signal_type”. Then, each site is simulated and a site effect component is added to X, selected with “site_effect_type”. The strength of the ‘Effect of Site’ (EoS) is controlled by site_effect_strength. If a list is passed, which element corresponds to the site_effect_strength in each site. List len musts be equal to n_sites. If a single value is passed, all sites has the same EoS Finally a gaussian noise is added to each site, controlled by “noise_strength”.

Parameters:
n_classesint, optional (default 2)

Number of classes to simulate (2 for binary, >2 for multi-class).

n_sitesint, optional (default 2)

Number of sites to simulate.

n_samplesint | list[int], optional (default 1000)

If an int is provided, total number of samples across all sites. If a list is provided, N for each site, must have the same len as n_sites.

balance_per_sitelist of float, list of list of float or None, optional (default None)

Class balance for each site. If None, uses balanced classes (0.5 for binary, equal distribution for multi-class).

n_featuresint, optional (default 10)

Number of features per sample.

signal_typestr, optional (default “linear”)

Which type of signal to generate the base problem.

signal_strengthlist of float or float, optional (default 1.0)

Strength of the signal component separating classes. Passed as ‘class_sep` to ``sklearn.datasets.make_classification`.

noise_strengthlist of float or float, optional (default 0.1)

Strength of the noise component by site. If one component is passed, all sites has the same noise_strength.

site_effect_typestr, optional (default “location”)

Type of site effect to add to the original data. Options: “location”, “scale”, “location+scale”.

site_effect_strengthfloat, optional (default 3.0)

Strength of site-specific effects.

site_effect_homogeneousbool, optional (default True)

Whether the site effect is homogeneous (same for all samples in a site).

random_stateint or RandomState instance, (default 42)

The seed of the pseudo random number generator or RandomState for reproducibility.

kwargsdict

Additional keyword arguments passed to sklearn.datasets.make_classification.

Returns:
Xnp.ndarray of shape (n_samples, n_features)

Simulated feature matrix

ynp.ndarray of shape (n_samples,)

Class labels (0 to n_classes-1)

sitesnp.ndarray of shape (n_samples,)

Site labels (0 to n_sites-1)

Examples

>>> X, y, sites = make_multisite_classification(
...     n_sites=3, n_samples=300, n_features=20, n_classes=3
... )
>>> X.shape, y.shape, sites.shape
((300, 20), (300,), (300,))

Examples#

Impact of Effects of Site in ML

Impact of Effects of Site in ML

Compute metrics by site

Compute metrics by site

Discover biases in metrics by site

Discover biases in metrics by site

Explore EoS with dimensionality reduction techniques

Explore EoS with dimensionality reduction techniques

Characterise a multisite problem

Characterise a multisite problem

Multisite data generation

Multisite data generation

Generate imbalance multisite data

Generate imbalance multisite data

Binary classification with NeuroComBat

Binary classification with NeuroComBat

Analysing NeuroComBat behaviour with imbalance across sites

Analysing NeuroComBat behaviour with imbalance across sites

Binary classification with ComBatGAM

Binary classification with ComBatGAM

Analysing ComBatGAM behaviour with imbalance across sites

Analysing ComBatGAM behaviour with imbalance across sites

Binary classification using ISI

Binary classification using ISI

Multiclass classification using ISI

Multiclass classification using ISI

Multisite Harmonization using Inter-Site Matched Interpolation (ISMI)

Multisite Harmonization using Inter-Site Matched Interpolation (ISMI)

IntraSiteInterpolation advance usage

IntraSiteInterpolation advance usage

Binary classification using OTDA

Binary classification using OTDA