Datalad Integration#
uniharmony provides a transparent way for downloading, managing, and interacting with BIDS-compatible neuroimaging datasets using DataLad. Designed primarily for the ON-Harmony dataset (ds004712) — a multi-site, multi-modal travelling-heads MRI harmonisation resource.
Overview#
We can use it for:
Cloning DataLad datasets from available repositories
Selectively downloading BIDS-compliant files (subjects, sessions, modalities, tasks, runs, suffixes, extensions)
Managing disk space via hidden caches and automatic cleanup
Operating in two modes: hidden cache (default) or direct target directory
Key Features#
Feature |
Description |
|---|---|
Hidden/Visible mode |
Use a temporary cache ( |
Selective downloads |
Filter by subject, session, modality, task, run, suffix, and extension |
Automatic cleanup |
Drop files from cache after copying to save disk space ( |
Symlink resolution |
Convert DataLad annex symlinks to real files when using |
Cache management |
Clean temporary folders on demand |
BIDS-aware |
Follows Brain Imaging Data Structure conventions for file discovery |
Core Concepts#
Usage#
Downloading Data#
download_bids_dataset()#
Download derivative files (processed data) from a BIDS dataset.
from uniharmony.datasets import download_bids_dataset
download_bids_dataset(
subjects=subjects,
sessions=sessions,
modalities=modalities,
tasks="all",
runs="all",
suffixes=suffixes,
extensions=extensions,
target_path=target_path,
dataset_url=dataset_url,
root_files=root_files,
)
Dataset Management#
clean_tmp()#
Remove the temporary DataLad cache directory.
from uniharmony.datasets import clean_tmp
# Remove default cache
clean_tmp()
# Remove custom-named cache
clean_tmp("my_custom_cache")
Utility Functions#
list_available_files()#
List all files in a dataset (useful for exploration).
from uniharmony.datasets import list_available_files
from pathlib import Path
files = list_available_files(Path("/tmp/datalad_cache/ds004712"))
print(f"Found {len(files)} files")
for f in files[:10]:
print(f)
Troubleshooting#
Disk space running out#
Enable tmp_clean=True to drop files from cache after copying:
download_bids_dataset(
# ...
tmp_clean=True,
)
Or clean the cache manually:
from uniharmony.datasets import clean_tmp_folder
clean_tmp_folder()
Clone fails with SSL error#
Configure Git to use HTTPS instead of SSH:
git config --global url."https://".insteadOf "git://"
git config --global url."https://github.com/".insteadOf "git@github.com:"