IntraSiteInterpolation advance usage

IntraSiteInterpolation advance usage#

Global Maximum Balancing#

# Balance all sites to the single largest class count found anywhere:

import numpy as np

from uniharmony.datasets import make_multisite_classification
from uniharmony.interpolation import IntraSiteInterpolation


X, y, sites = make_multisite_classification(balance_per_site=[0.3, 0.7])
isi = IntraSiteInterpolation(balance_strategy="global_max", interpolator="random", random_state=42)

X_balanced, y_balanced = isi.fit_resample(X, y, sites=sites)
print(f"Global target count: {isi.target_count_}")
2026-05-18 13:27:57 [info     ] Overall class balance across sites: [np.float64(0.5), np.float64(0.5)]
2026-05-18 13:27:57 [debug    ] Total Samples to generate 1000
2026-05-18 13:27:57 [debug    ] Total Samples to generate per site [500 500]
2026-05-18 13:27:57 [info     ] For site 0
2026-05-18 13:27:57 [info     ] Generating 500 samples
2026-05-18 13:27:57 [debug    ] Balance 0.3 for site 0
2026-05-18 13:27:57 [debug    ] Site 0, site effect strength [3.0, 3.0]
2026-05-18 13:27:57 [info     ] For site 1
2026-05-18 13:27:57 [info     ] Generating 500 samples
2026-05-18 13:27:57 [debug    ] Balance 0.7 for site 1
2026-05-18 13:27:57 [debug    ] Site 1, site effect strength [3.0, 3.0]
2026-05-18 13:27:57 [info     ] Generated 1000 samples across 2 sites
2026-05-18 13:27:57 [info     ] Class distribution: [500 500]
2026-05-18 13:27:57 [info     ] Site distribution: [500 500]
2026-05-18 13:27:57 [info     ] [ISI] Starting fit_resample
2026-05-18 13:27:57 [debug    ] [ISI] N target for global_max strategy = 350
2026-05-18 13:27:57 [info     ] [ISI] Processing site 0
2026-05-18 13:27:57 [debug    ] [ISI] For site 0, N target for per_site strategy = 350
2026-05-18 13:27:57 [info     ] [ISI] Processing site 1
2026-05-18 13:27:57 [debug    ] [ISI] For site 1, N target for per_site strategy = 350
Global target count: 350

Covariates#

Stratified Interpolation with Covariates#

# Preserve demographic distributions while balancing classes.
# Synthetic samples are interpolated only between participants matching on all covariates:
rng = np.random.default_rng(54)
n_samples = 1000
X, y, sites = make_multisite_classification(n_samples=n_samples, balance_per_site=[0.3, 0.7])
sex = rng.integers(0, 2, (n_samples, 1))
age = rng.standard_normal((n_samples, 1)) * 10 + 50
isi = IntraSiteInterpolation(balance_strategy="per_site", random_state=42)

X_balanced, y_balanced = isi.fit_resample(
    X, y, sites=sites, categorical_covariate=sex, continuous_covariate=age, n_bins_cont_cov=2
)  # Age binned with 5 bins
2026-05-18 13:27:57 [info     ] Overall class balance across sites: [np.float64(0.5), np.float64(0.5)]
2026-05-18 13:27:57 [debug    ] Total Samples to generate 1000
2026-05-18 13:27:57 [debug    ] Total Samples to generate per site [500 500]
2026-05-18 13:27:57 [info     ] For site 0
2026-05-18 13:27:57 [info     ] Generating 500 samples
2026-05-18 13:27:57 [debug    ] Balance 0.3 for site 0
2026-05-18 13:27:57 [debug    ] Site 0, site effect strength [3.0, 3.0]
2026-05-18 13:27:57 [info     ] For site 1
2026-05-18 13:27:57 [info     ] Generating 500 samples
2026-05-18 13:27:57 [debug    ] Balance 0.7 for site 1
2026-05-18 13:27:57 [debug    ] Site 1, site effect strength [3.0, 3.0]
2026-05-18 13:27:57 [info     ] Generated 1000 samples across 2 sites
2026-05-18 13:27:57 [info     ] Class distribution: [500 500]
2026-05-18 13:27:57 [info     ] Site distribution: [500 500]
2026-05-18 13:27:57 [info     ] [ISI] Starting fit_resample
2026-05-18 13:27:57 [debug    ] Using 1 categorical covariates
2026-05-18 13:27:57 [debug    ] No tolerance specified, using exact matching
2026-05-18 13:27:57 [debug    ] Using 1 continuous covariates with tolerance: [0.]
2026-05-18 13:27:57 [info     ] [ISI] Processing site 0
2026-05-18 13:27:57 [debug    ] [ISI] For site 0, N target for per_site strategy = 350
2026-05-18 13:27:57 [info     ] [ISI] Processing site 1
2026-05-18 13:27:57 [debug    ] [ISI] For site 1, N target for per_site strategy = 350

Total running time of the script: (0 minutes 1.062 seconds)

Gallery generated by Sphinx-Gallery