This tutorial shows how to download the MAREoS dataset from the web¶

Let's start with the imports¶

# Imports
from pathlib import Path

from uniharmony import load_MAREoS

# We can call the helper funtion to load all the dataset (aprox 3MB).
# The files will be stored in the cache, so we don't have to worry about them
datasets = load_MAREoS()

# Let's explore now how the datasets looks like
print(datasets.keys())

dict_keys(['eos_simple1', 'eos_simple2', 'eos_interaction1', 'eos_interaction2', 'true_simple1', 'true_simple2', 'true_interaction1', 'true_interaction2'])

We have now all the datasets in a dictionary. There is a total of 8 datasets.¶

# Select one dataset and explore what is inside the dictionary
dataset = datasets["eos_simple1"]
print(dataset.keys())

dict_keys(['X', 'y', 'sites', 'covs', 'folds'])

# Let's unpack what is inside the keys. This is the typical way you can use
# the dataset for further downstream analysis.
X = dataset["X"]
y = dataset["y"]

print(f"Load X with shape:{X.shape} and y:{y.shape}")

Load X with shape:(1001, 14) and y:(1001,)

Load datasets by condition¶

# You can use the helper function to only return a part of the datasets
datasets = load_MAREoS(effects="eos")
print(datasets.keys())

dict_keys(['eos_simple1', 'eos_simple2', 'eos_interaction1', 'eos_interaction2'])

datasets = load_MAREoS(effects="eos", effect_types="simple")
print(datasets.keys())

dict_keys(['eos_simple1', 'eos_simple2'])

datasets = load_MAREoS(effects="eos", effect_types="simple", effect_examples="1")
print(datasets.keys())

dict_keys(['eos_simple1'])

Returning the dataset as DataFrame allows to see the simulated areas¶

You can chose to load the dataset as pandas.DataFrame, with has the simulated areas of the brain.

datasets = load_MAREoS(effects="eos", effect_types="simple", effect_examples="1", as_numpy=False)
dataset = datasets["eos_simple1"]["X"]
dataset.head()

	Lthal	Rthal	Lcaud	Rcaud	Lput	Rput	Lpal	Rpal	Lhippo	Rhippo	Lamyg	Ramyg	Laccumb	Raccumb
id
1	8895.369099	8383.870372	3803.558492	4357.165963	7231.227420	5647.496253	1294.448052	2270.489516	3928.692453	5421.703185	1563.622497	1854.229137	698.637972	701.906213
2	8679.346875	6654.136742	3924.041654	3745.063498	5895.190311	5164.702016	1939.028843	2017.027485	3110.500978	6202.638815	1511.933005	1020.152948	709.090077	534.448106
3	9191.801201	7159.776871	3444.265568	3158.455008	4858.917213	5392.683202	2191.623004	2415.533638	4202.892743	5147.131015	1761.132128	1114.841164	785.199200	717.806882
4	7531.473405	6694.021219	4984.063517	4689.035649	5553.881746	6494.219094	2242.955773	2381.085589	3629.289874	5748.866950	1774.472741	742.652391	1104.007368	769.837240
5	7070.478721	5575.244389	3285.175734	2234.129050	6526.943910	6041.981350	1601.192576	1630.750753	3944.610918	6150.220123	1722.570593	1414.669078	1000.597680	440.375965

Load the dataset in a user determine folder¶

We could also want to see the csv files in a folder, we could pass a directory for the function to save the data¶

# Let's pass a directory inside the repository.
# We will use a relative path from this example to look for appropiated path
data_dir = Path().resolve().parent / "src" / "uniharmony" / "datasets" / "data"
datasets = load_MAREoS(data_dir=data_dir)