make_multisite_classification#
- uniharmony.datasets.make_multisite_classification(n_sites: int = 2, n_samples: int | list[int] = 1000, n_features: int = 10, n_classes: int = 2, balance_per_site: list[float] | list[list[float]] | None = None, signal_type: Literal['linear', 'circular', 'moons', 'blobs', 'gaussian_quantiles'] = 'linear', signal_strength: float = 1.0, noise_strength: list[float] | float = 0.1, site_effect_type: Literal['location', 'scale', 'location+scale'] = 'location', site_effect_strength: list[float] | float = 3.0, site_effect_homogeneous: bool = True, random_state: int | RandomState = 42, **kwargs) tuple[ndarray, ndarray, ndarray]#
Simulate multi-site data with signal, noise, and site effect components.
In the data generation process, first a ‘base’ problem is generated using sklearn functions, selected with “signal_type”. Then, each site is simulated and a site effect component is added to X, selected with “site_effect_type”. The strength of the ‘Effect of Site’ (EoS) is controlled by site_effect_strength. If a list is passed, which element corresponds to the site_effect_strength in each site. List len musts be equal to n_sites. If a single value is passed, all sites has the same EoS Finally a gaussian noise is added to each site, controlled by “noise_strength”.
- Parameters:
- n_classesint, optional (default 2)
Number of classes to simulate (2 for binary, >2 for multi-class).
- n_sitesint, optional (default 2)
Number of sites to simulate.
- n_samplesint | list[int], optional (default 1000)
If an int is provided, total number of samples across all sites. If a list is provided, N for each site, must have the same len as n_sites.
- balance_per_sitelist of float, list of list of float or None, optional (default None)
Class balance for each site. If None, uses balanced classes (0.5 for binary, equal distribution for multi-class).
- n_featuresint, optional (default 10)
Number of features per sample.
- signal_typestr, optional (default “linear”)
Which type of signal to generate the base problem.
- signal_strengthlist of float or float, optional (default 1.0)
Strength of the signal component separating classes. Passed as ‘class_sep` to ``sklearn.datasets.make_classification`.
- noise_strengthlist of float or float, optional (default 0.1)
Strength of the noise component by site. If one component is passed, all sites has the same noise_strength.
- site_effect_typestr, optional (default “location”)
Type of site effect to add to the original data. Options: “location”, “scale”, “location+scale”.
- site_effect_strengthfloat, optional (default 3.0)
Strength of site-specific effects.
- site_effect_homogeneousbool, optional (default True)
Whether the site effect is homogeneous (same for all samples in a site).
- random_stateint or RandomState instance, (default 42)
The seed of the pseudo random number generator or RandomState for reproducibility.
- kwargsdict
Additional keyword arguments passed to
sklearn.datasets.make_classification.
- Returns:
- Xnp.ndarray of shape (n_samples, n_features)
Simulated feature matrix
- ynp.ndarray of shape (n_samples,)
Class labels (0 to n_classes-1)
- sitesnp.ndarray of shape (n_samples,)
Site labels (0 to
n_sites-1)
Examples
>>> X, y, sites = make_multisite_classification( ... n_sites=3, n_samples=300, n_features=20, n_classes=3 ... ) >>> X.shape, y.shape, sites.shape ((300, 20), (300,), (300,))
Examples#
Explore EoS with dimensionality reduction techniques
Analysing NeuroComBat behaviour with imbalance across sites
Analysing ComBatGAM behaviour with imbalance across sites
Multisite Harmonization using Inter-Site Matched Interpolation (ISMI)