Multisite Data Simulation

Multisite Data Simulation#

Overview#

uniharmony.datasets.make_multisite_classfication helps researchers generate synthetic datasets that mimic real-world multi-center studies. Whether you’re testing machine learning algorithms, developing statistical methods, or teaching data science concepts, this simulator creates realistic data with built-in complexities that mirror actual research scenarios.

Usage#

from uniharmony import make_multisite_classification

# Generate data with 3 sites, 500 samples total
X, y, sites = make_multisite_classification(
    n_sites=3,
    n_samples=500,
    n_features=20,
    random_state=42  # For reproducibility
)
# X contains your features
# y has labels
# sites tells you which site each sample came from

Behind the Scenes#

Each sample’s features combine three components that can be adapted:

Feature = Signal + Noise + Site Effect

Flexible Configuration#

You can choose how many sites you want, set the total number of samples, control how many features and control the class balance. Work with binary classification (yes/no) or multi-class problems.