ON-Harmony Dataset

Contents

ON-Harmony Dataset#

Overview#

The ON-Harmony dataset (~58 GB) contains:

20 participants × 6 scanners each
9 participants with additional 5 within-scanner repeats
5 modalities: T1w, T2w, SWI, dMRI, rfMRI
Defaced anatomical images with defacing masks

Usage#

from uniharmony.datasets import load_ONharmony

datasets = load_ONharmony()

Examples#

Download all files for a subject and a session#

from uniharmony.datasets import load_ONharmony

# Download everything for two participants
load_ONharmony(
    subjects="15320",
    sessions="NOT1ACH001",
    modalities="anat",
    suffixes="all",
    extensions="all",
    target_path="./ON-Harmony",
    dataset_source_URL="https://github.com/OpenNeuroDatasets/ds004712.git", # This is also the default
    root_files=[],  # Passing an empty list to not get any file
    hidden=True,
    copy=True,
    tmp_clean=False,  # Keep cache for reuse, this allows to recall the same function and not downloading the dataset again.
)

# Now if we want all the sessions for the same subject, as we did not clean the tmp directory, there is no need for clone the dataset again
# This will speed up the process, as we only need to "get" the files.

# Download everything for two participants
load_ONharmony(
    subjects="15320",
    sessions="all",
    modalities="anat",
    suffixes="all",
    extensions="all",
    target_path="./ON-Harmony",
    root_files=[],  # Passing an empty list to not get any file
    hidden=True,
    copy=True,
    tmp_clean=True,  # Now we clean the tmp
)

Direct Download (No hidden cache)#

from uniharmony.datasets import (
    load_ONharmony,
)

# Download directly to target (no cache)
load_ONharmony(
    subjects="15320",
    sessions="NOT1ACH001",
    modalities="anat",
    suffixes="all",
    extensions="all",
    target_path="./ON-Harmony",
    root_files=[],  # Passing an empty list to not get any file
    hidden=False,
    copy=True,          # Not use when hidden is False
    tmp_clean=False,    # Not use when hidden is False
)

Download diffusion MRI#

from uniharmony.datasets import load_ONharmony

# DWI requires .nii.gz, .json, .bval, and .bvec
load_ONharmony(
    subjects="all",
    sessions="all",
    modalities="dwi",
    tasks="all",
    runs="all",
    target_path="./on_harmony_dwi",
    suffixes="dwi",
    extensions=[".nii.gz", ".json", ".bval", ".bvec"],
    root_files=["dataset_description.json"],
    hidden=True,
    copy=True,
    tmp_clean=True,
)

Advanced Usage#

Working with ON-Harmony Session Names#

The ON-Harmony dataset uses session codes that encode scanner information:

Session Code	Scanner	Site
`NOT1ACH001`	Philips Achieva	Nottingham
`NOT2ING001`	Philips Ingenia	Nottingham
`NOT3GEM001`	GE MR750	Nottingham
`NOT4GEP001`	GE Premier	Nottingham
`OXF1PRI001`	Siemens Prisma (32ch)	Oxford
`OXF2PRI001`	Siemens Prisma (64ch)	Oxford
`OXF3TRI001`	Siemens Trio	Oxford
`OXF4GEP001`	GE Premier (21ch)	Oxford

# Download only Oxford Prisma scans
load_ONharmony(
    subjects="all",
    sessions=["OXF1PRI001", "OXF2PRI001"],
    modalities="anat",
    target_path="./oxford_prisma",
    suffixes="T1w",
    extensions=".nii.gz",
    root_files="all",
)

Citation#

If you use the ON-Harmony dataset, please cite:

@article{warrington2025multi,
  title={A multi-site, multi-modal travelling-heads resource for brain MRI harmonisation},
  author={Warrington, Shaun and Torchi, Andrea and Mougin, Olivier and Campbell, Jon and Ntata, Asante and Craig, Martin and Assimopoulos, Stephania and Alfaro-Almagro, Fidel and Miller, Karla L and Jenkinson, Mark and others},
  journal={Scientific data},
  volume={12},
  number={1},
  pages={609},
  year={2025},
  publisher={Nature Publishing Group UK London}
}