Load MAREoS dataset

Load MAREoS dataset#

Imports#

from pathlib import Path

import pandas as pd

from uniharmony.datasets import load_MAREoS

We can call the helper funtion to load all the dataset (aprox 3MB). The files will be stored in the cache, so we don’t have to worry about them

datasets = load_MAREoS()
2026-05-18 13:04:33 [info     ] MAREoS datasets already exist at: /home/runner/.cache/uniharmony/MAREoS
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple1_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple2_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_interaction1_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_interaction2_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/true_simple1_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/true_simple2_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/true_interaction1_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/true_interaction2_data.csv

Exploration#

Let’s explore now how the datasets looks like

print(datasets.keys())
dict_keys(['eos_simple1', 'eos_simple2', 'eos_interaction1', 'eos_interaction2', 'true_simple1', 'true_simple2', 'true_interaction1', 'true_interaction2'])

We have now all the datasets in a dictionary. There is a total of 8 datasets.

# Select one dataset and explore what is inside the dictionary
dataset = datasets["eos_simple1"]
print(dataset.keys())
dict_keys(['X', 'y', 'sites', 'covs', 'folds'])

Let’s unpack what is inside the keys. This is the typical way you can use the dataset for further downstream analysis.

X = dataset["X"]
y = dataset["y"]

print(f"Load X with shape:{X.shape} and y:{y.shape}")
Load X with shape:(1001, 14) and y:(1001,)

Variations#

# You can use the helper function to only return a part of the datasets
datasets = load_MAREoS(effects="eos")
print(datasets.keys())
2026-05-18 13:04:33 [info     ] MAREoS datasets already exist at: /home/runner/.cache/uniharmony/MAREoS
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple1_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple2_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_interaction1_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_interaction2_data.csv
dict_keys(['eos_simple1', 'eos_simple2', 'eos_interaction1', 'eos_interaction2'])
datasets = load_MAREoS(effects="eos", effect_types="simple")
print(datasets.keys())
2026-05-18 13:04:33 [info     ] MAREoS datasets already exist at: /home/runner/.cache/uniharmony/MAREoS
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple1_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple2_data.csv
dict_keys(['eos_simple1', 'eos_simple2'])
datasets = load_MAREoS(effects="eos", effect_types="simple", effect_examples="1")
print(datasets.keys())
2026-05-18 13:04:33 [info     ] MAREoS datasets already exist at: /home/runner/.cache/uniharmony/MAREoS
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple1_data.csv
dict_keys(['eos_simple1'])

Returning the dataset as DataFrame allows to see the simulated areas You can chose to load the dataset as pandas.DataFrame, with has the simulated areas of the brain.

datasets = load_MAREoS(effects="eos", effect_types="simple", effect_examples="1", as_numpy=False)
dataset = datasets["eos_simple1"]["X"]
# Show only 5 columns
pd.set_option("display.max_columns", 8)
dataset.head()
2026-05-18 13:04:33 [info     ] MAREoS datasets already exist at: /home/runner/.cache/uniharmony/MAREoS
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple1_data.csv
Lthal Rthal Lcaud Rcaud ... Lamyg Ramyg Laccumb Raccumb
id
1 8895.369099 8383.870372 3803.558492 4357.165963 ... 1563.622497 1854.229137 698.637972 701.906213
2 8679.346875 6654.136742 3924.041654 3745.063498 ... 1511.933005 1020.152948 709.090077 534.448106
3 9191.801201 7159.776871 3444.265568 3158.455008 ... 1761.132128 1114.841164 785.199200 717.806882
4 7531.473405 6694.021219 4984.063517 4689.035649 ... 1774.472741 742.652391 1104.007368 769.837240
5 7070.478721 5575.244389 3285.175734 2234.129050 ... 1722.570593 1414.669078 1000.597680 440.375965

5 rows × 14 columns



Load the dataset in a user determine folder We could also want to see the csv files in a folder, we could pass a directory for the function to save the data Let’s pass a directory inside the repository. We will use a relative path from this example to look for appropiated path

data_dir = Path().resolve().parents[1] / "src" / "uniharmony" / "datasets" / "data"
datasets = load_MAREoS(data_dir=data_dir)
Unzipping contents of '/home/runner/.cache/uniharmony/public_datasets.zip' to '/home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS'
2026-05-18 13:04:33 [info     ] MAREoS datasets downloaded: 16 CSV files in /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/eos_simple1_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/eos_simple2_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/eos_interaction1_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/eos_interaction2_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/true_simple1_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/true_simple2_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/true_interaction1_data.csv
2026-05-18 13:04:33 [info     ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/true_interaction2_data.csv

Total running time of the script: (0 minutes 0.903 seconds)

Gallery generated by Sphinx-Gallery