Skip to content

This tutorial shows how to download the MAREoS dataset from the web

Let's start with the imports

# Imports
from pathlib import Path

from uniharmony import load_MAREoS
# We can call the helper funtion to load all the dataset (aprox 3MB).
# The files will be stored in the cache, so we don't have to worry about them
datasets = load_MAREoS()
# Let's explore now how the datasets looks like
print(datasets.keys())
dict_keys(['eos_simple1', 'eos_simple2', 'eos_interaction1', 'eos_interaction2', 'true_simple1', 'true_simple2', 'true_interaction1', 'true_interaction2'])

We have now all the datasets in a dictionary. There is a total of 8 datasets.

# Select one dataset and explore what is inside the dictionary
dataset = datasets["eos_simple1"]
print(dataset.keys())
dict_keys(['X', 'y', 'sites', 'covs', 'folds'])
# Let's unpack what is inside the keys. This is the typical way you can use
# the dataset for further downstream analysis.
X = dataset["X"]
y = dataset["y"]

print(f"Load X with shape:{X.shape} and y:{y.shape}")
Load X with shape:(1001, 14) and y:(1001,)

Load datasets by condition

# You can use the helper function to only return a part of the datasets
datasets = load_MAREoS(effects="eos")
print(datasets.keys())
dict_keys(['eos_simple1', 'eos_simple2', 'eos_interaction1', 'eos_interaction2'])
datasets = load_MAREoS(effects="eos", effect_types="simple")
print(datasets.keys())
dict_keys(['eos_simple1', 'eos_simple2'])
datasets = load_MAREoS(effects="eos", effect_types="simple", effect_examples="1")
print(datasets.keys())
dict_keys(['eos_simple1'])

Returning the dataset as DataFrame allows to see the simulated areas

You can chose to load the dataset as pandas.DataFrame, with has the simulated areas of the brain.

datasets = load_MAREoS(effects="eos", effect_types="simple", effect_examples="1", as_numpy=False)
dataset = datasets["eos_simple1"]["X"]
dataset.head()
Lthal Rthal Lcaud Rcaud Lput Rput Lpal Rpal Lhippo Rhippo Lamyg Ramyg Laccumb Raccumb
id
1 8895.369099 8383.870372 3803.558492 4357.165963 7231.227420 5647.496253 1294.448052 2270.489516 3928.692453 5421.703185 1563.622497 1854.229137 698.637972 701.906213
2 8679.346875 6654.136742 3924.041654 3745.063498 5895.190311 5164.702016 1939.028843 2017.027485 3110.500978 6202.638815 1511.933005 1020.152948 709.090077 534.448106
3 9191.801201 7159.776871 3444.265568 3158.455008 4858.917213 5392.683202 2191.623004 2415.533638 4202.892743 5147.131015 1761.132128 1114.841164 785.199200 717.806882
4 7531.473405 6694.021219 4984.063517 4689.035649 5553.881746 6494.219094 2242.955773 2381.085589 3629.289874 5748.866950 1774.472741 742.652391 1104.007368 769.837240
5 7070.478721 5575.244389 3285.175734 2234.129050 6526.943910 6041.981350 1601.192576 1630.750753 3944.610918 6150.220123 1722.570593 1414.669078 1000.597680 440.375965

Load the dataset in a user determine folder

We could also want to see the csv files in a folder, we could pass a directory for the function to save the data

# Let's pass a directory inside the repository.
# We will use a relative path from this example to look for appropiated path
data_dir = Path().resolve().parent / "src" / "uniharmony" / "datasets" / "data"
datasets = load_MAREoS(data_dir=data_dir)