Note
Go to the end to download the full example code.
IntraSiteInterpolation advance usage#
Global Maximum Balancing#
# Balance all sites to the single largest class count found anywhere:
import numpy as np
from uniharmony.datasets import make_multisite_classification
from uniharmony.interpolation import IntraSiteInterpolation
X, y, sites = make_multisite_classification(balance_per_site=[0.3, 0.7])
isi = IntraSiteInterpolation(balance_strategy="global_max", interpolator="random", random_state=42)
X_balanced, y_balanced = isi.fit_resample(X, y, sites=sites)
print(f"Global target count: {isi.target_count_}")
2026-05-18 13:27:57 [info ] Overall class balance across sites: [np.float64(0.5), np.float64(0.5)]
2026-05-18 13:27:57 [debug ] Total Samples to generate 1000
2026-05-18 13:27:57 [debug ] Total Samples to generate per site [500 500]
2026-05-18 13:27:57 [info ] For site 0
2026-05-18 13:27:57 [info ] Generating 500 samples
2026-05-18 13:27:57 [debug ] Balance 0.3 for site 0
2026-05-18 13:27:57 [debug ] Site 0, site effect strength [3.0, 3.0]
2026-05-18 13:27:57 [info ] For site 1
2026-05-18 13:27:57 [info ] Generating 500 samples
2026-05-18 13:27:57 [debug ] Balance 0.7 for site 1
2026-05-18 13:27:57 [debug ] Site 1, site effect strength [3.0, 3.0]
2026-05-18 13:27:57 [info ] Generated 1000 samples across 2 sites
2026-05-18 13:27:57 [info ] Class distribution: [500 500]
2026-05-18 13:27:57 [info ] Site distribution: [500 500]
2026-05-18 13:27:57 [info ] [ISI] Starting fit_resample
2026-05-18 13:27:57 [debug ] [ISI] N target for global_max strategy = 350
2026-05-18 13:27:57 [info ] [ISI] Processing site 0
2026-05-18 13:27:57 [debug ] [ISI] For site 0, N target for per_site strategy = 350
2026-05-18 13:27:57 [info ] [ISI] Processing site 1
2026-05-18 13:27:57 [debug ] [ISI] For site 1, N target for per_site strategy = 350
Global target count: 350
Covariates#
Stratified Interpolation with Covariates#
# Preserve demographic distributions while balancing classes.
# Synthetic samples are interpolated only between participants matching on all covariates:
rng = np.random.default_rng(54)
n_samples = 1000
X, y, sites = make_multisite_classification(n_samples=n_samples, balance_per_site=[0.3, 0.7])
sex = rng.integers(0, 2, (n_samples, 1))
age = rng.standard_normal((n_samples, 1)) * 10 + 50
isi = IntraSiteInterpolation(balance_strategy="per_site", random_state=42)
X_balanced, y_balanced = isi.fit_resample(
X, y, sites=sites, categorical_covariate=sex, continuous_covariate=age, n_bins_cont_cov=2
) # Age binned with 5 bins
2026-05-18 13:27:57 [info ] Overall class balance across sites: [np.float64(0.5), np.float64(0.5)]
2026-05-18 13:27:57 [debug ] Total Samples to generate 1000
2026-05-18 13:27:57 [debug ] Total Samples to generate per site [500 500]
2026-05-18 13:27:57 [info ] For site 0
2026-05-18 13:27:57 [info ] Generating 500 samples
2026-05-18 13:27:57 [debug ] Balance 0.3 for site 0
2026-05-18 13:27:57 [debug ] Site 0, site effect strength [3.0, 3.0]
2026-05-18 13:27:57 [info ] For site 1
2026-05-18 13:27:57 [info ] Generating 500 samples
2026-05-18 13:27:57 [debug ] Balance 0.7 for site 1
2026-05-18 13:27:57 [debug ] Site 1, site effect strength [3.0, 3.0]
2026-05-18 13:27:57 [info ] Generated 1000 samples across 2 sites
2026-05-18 13:27:57 [info ] Class distribution: [500 500]
2026-05-18 13:27:57 [info ] Site distribution: [500 500]
2026-05-18 13:27:57 [info ] [ISI] Starting fit_resample
2026-05-18 13:27:57 [debug ] Using 1 categorical covariates
2026-05-18 13:27:57 [debug ] No tolerance specified, using exact matching
2026-05-18 13:27:57 [debug ] Using 1 continuous covariates with tolerance: [0.]
2026-05-18 13:27:57 [info ] [ISI] Processing site 0
2026-05-18 13:27:57 [debug ] [ISI] For site 0, N target for per_site strategy = 350
2026-05-18 13:27:57 [info ] [ISI] Processing site 1
2026-05-18 13:27:57 [debug ] [ISI] For site 1, N target for per_site strategy = 350
Total running time of the script: (0 minutes 1.062 seconds)