Note
Go to the end to download the full example code.
Load MAREoS dataset#
Imports#
from pathlib import Path
import pandas as pd
from uniharmony.datasets import load_MAREoS
We can call the helper funtion to load all the dataset (aprox 3MB). The files will be stored in the cache, so we don’t have to worry about them
datasets = load_MAREoS()
2026-05-18 13:04:33 [info ] MAREoS datasets already exist at: /home/runner/.cache/uniharmony/MAREoS
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple1_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple2_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_interaction1_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_interaction2_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/true_simple1_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/true_simple2_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/true_interaction1_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/true_interaction2_data.csv
Exploration#
Let’s explore now how the datasets looks like
print(datasets.keys())
dict_keys(['eos_simple1', 'eos_simple2', 'eos_interaction1', 'eos_interaction2', 'true_simple1', 'true_simple2', 'true_interaction1', 'true_interaction2'])
We have now all the datasets in a dictionary. There is a total of 8 datasets.
# Select one dataset and explore what is inside the dictionary
dataset = datasets["eos_simple1"]
print(dataset.keys())
dict_keys(['X', 'y', 'sites', 'covs', 'folds'])
Let’s unpack what is inside the keys. This is the typical way you can use the dataset for further downstream analysis.
Load X with shape:(1001, 14) and y:(1001,)
Variations#
2026-05-18 13:04:33 [info ] MAREoS datasets already exist at: /home/runner/.cache/uniharmony/MAREoS
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple1_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple2_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_interaction1_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_interaction2_data.csv
dict_keys(['eos_simple1', 'eos_simple2', 'eos_interaction1', 'eos_interaction2'])
2026-05-18 13:04:33 [info ] MAREoS datasets already exist at: /home/runner/.cache/uniharmony/MAREoS
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple1_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple2_data.csv
dict_keys(['eos_simple1', 'eos_simple2'])
2026-05-18 13:04:33 [info ] MAREoS datasets already exist at: /home/runner/.cache/uniharmony/MAREoS
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple1_data.csv
dict_keys(['eos_simple1'])
Returning the dataset as DataFrame allows to see the simulated areas You can chose to load the dataset as pandas.DataFrame, with has the simulated areas of the brain.
datasets = load_MAREoS(effects="eos", effect_types="simple", effect_examples="1", as_numpy=False)
dataset = datasets["eos_simple1"]["X"]
# Show only 5 columns
pd.set_option("display.max_columns", 8)
dataset.head()
2026-05-18 13:04:33 [info ] MAREoS datasets already exist at: /home/runner/.cache/uniharmony/MAREoS
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/.cache/uniharmony/MAREoS/public_datasets/eos_simple1_data.csv
Load the dataset in a user determine folder We could also want to see the csv files in a folder, we could pass a directory for the function to save the data Let’s pass a directory inside the repository. We will use a relative path from this example to look for appropiated path
Unzipping contents of '/home/runner/.cache/uniharmony/public_datasets.zip' to '/home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS'
2026-05-18 13:04:33 [info ] MAREoS datasets downloaded: 16 CSV files in /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/eos_simple1_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/eos_simple2_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/eos_interaction1_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/eos_interaction2_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/true_simple1_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/true_simple2_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/true_interaction1_data.csv
2026-05-18 13:04:33 [info ] Getting data file: /home/runner/work/UniHarmony/UniHarmony/src/uniharmony/datasets/data/MAREoS/public_datasets/true_interaction2_data.csv
Total running time of the script: (0 minutes 0.903 seconds)