Utilities and helper functions¶
Unified harmonization methods for ML and DL.
load_MAREoS(effects=None, effect_types=None, effect_examples=None, as_numpy=True, data_dir=None, force_download=False, verbose=False)
¶
Load multiple MAREoS datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
effects
|
list of str, str or None, optional (default None)
|
List of effects to load. If None, loads all ["eos", "true"] |
None
|
effect_types
|
list of str, str or None, optional (default None)
|
List of effect types to load. If None, loads all ["simple", "interaction"] |
None
|
effect_examples
|
list of str, str or None, optional (default None)
|
List of examples to load. If None, loads all ["1", "2"]. |
None
|
as_numpy
|
bool, optional (default True)
|
If True, return |
True
|
data_dir
|
Path | None, optional (default None)
|
Directory containing MAREoS data files. If None, downloads to cache. |
None
|
force_download
|
bool, optional (default False)
|
Force to download again the dataset in case of corrupt files. |
False
|
verbose
|
bool, optional (default False)
|
Control verbosity. |
False
|
Returns:
| Type | Description |
|---|---|
dict of str and dict
|
Nested dictionary where keys are dataset names containing:
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If any parameter contains invalid values. |
Examples:
>>> datasets = load_MAREoS()
>>> len(datasets)
8
>>> datasets = load_MAREoS(effects=["eos"], effect_types=["simple"])
>>> len(datasets)
2
>>> list(datasets.keys())
['eos_simple1', 'eos_simple2']
Source code in src/uniharmony/datasets/_load_mareos.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | |
make_multisite_classification(n_sites=2, n_samples=1000, balance_per_site=None, n_features=10, signal_strength=1.0, noise_strength=1.0, site_effect_strength=3.0, site_effect_homogeneous=True, n_classes=2, random_state=42, verbose=False)
¶
Simulate multi-site data with signal, noise, and site effect components.
The data generation follows: X = signal + noise + site_effect All components are sampled from Gaussian distributions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_sites
|
int, optional (default 2)
|
Number of sites to simulate. |
2
|
n_samples
|
int, optional (default 1000)
|
Total number of samples across all sites. |
1000
|
balance_per_site
|
list of float or None, optional (default None)
|
Class balance for each site. If None, uses balanced classes (0.5 for binary, equal distribution for multi-class). |
None
|
n_features
|
int, optional (default 10)
|
Number of features per sample. |
10
|
signal_strength
|
float, optional (default 1.0)
|
Strength of the signal component separating classes. |
1.0
|
noise_strength
|
float, optional (default 1.0)
|
Strength of the noise component. |
1.0
|
site_effect_strength
|
float, optional (default 3.0)
|
Strength of site-specific effects. |
3.0
|
site_effect_homogeneous
|
bool, optional (default True)
|
Whether the site effect is homogeneous (same for all samples in a site). |
True
|
n_classes
|
int, optional (default 2)
|
Number of classes to simulate (2 for binary, >2 for multi-class). |
2
|
random_state
|
int or RandomState instance, (default 42)
|
The seed of the pseudo random number generator or RandomState for reproducibility. |
42
|
verbose
|
bool, optional (default False)
|
Whether to print progress information. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
X |
np.ndarray of shape (n_samples, n_features)
|
Simulated feature matrix |
y |
np.ndarray of shape (n_samples,)
|
Class labels (0 to n_classes-1) |
site_labels |
np.ndarray of shape (n_samples,)
|
Site labels (0 to |
Examples:
>>> X, y, site_labels = make_multisite_classification(
... n_sites=3, n_samples=300, n_features=20, n_classes=3
... )
>>> X.shape, y.shape, site_labels.shape
((300, 20), (300,), (300,))
Source code in src/uniharmony/datasets/_make_multisite_classification.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | |
verbosity(min_level='info')
¶
Set verbosity level of logger.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_level
|
(critical, error, warning, info, debug)
|
Minimum level to log. |
"critical"
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in src/uniharmony/_verbose.py
verbosity_context(min_level='info')
¶
Context manager to control the logger verbosity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_level
|
(critical, error, warning, info, debug)
|
Minimum level to log. |
"critical"
|
Yields:
| Type | Description |
|---|---|
None
|
|