report_metrics_by_site#

uniharmony.metrics.report_metrics_by_site(y_true: ndarray[tuple[Any, ...], dtype[_ScalarT]], y_pred: ndarray[tuple[Any, ...], dtype[_ScalarT]], sites: ndarray[tuple[Any, ...], dtype[_ScalarT]], metrics: Callable | list[Callable], metric_kwargs: dict[str, Any] | list[dict[str, Any]] | None = None, overall_performance: bool = True, skip_empty_sites: bool = True) → dict[str, dict[str | int, float]]#

Compute one or more metrics stratified by site.

Accepts either a single metric function or a sequence of metrics. Each metric can receive its own set of keyword arguments via metric_kwargs. If y_pred contains continuous scores but a metric requires discrete predictions, the scores are automatically binarized using the threshold keyword argument for that metric (default: 0.5).

Parameters:

y_truenp.ndarray: Ground-truth (correct) target values.
y_prednp.ndarray: Estimated targets as returned by a classifier, or probability estimates / decision function outputs.
sitesnp.ndarray: Site identifiers for stratification. Can be strings or integers.
metricscallable or list of callable: Metric function or list of metric functions to compute (e.g., from sklearn.metrics). Pass a single callable for one metric, or a sequence for multiple metrics.
metric_kwargsdict or list of dict or None, optional (default None): Keyword arguments for each metric. If a single dict, it is passed to all metrics. If a list, metric_kwargs[i] is passed to metrics[i]. Must have the same length as metrics. Include threshold (default: 0.5) for metrics that require discrete predictions when y_pred contains continuous scores.
overall_performancebool, optional (default True): If True, include an "overall" key for each metric computed across all sites.
skip_empty_sitesbool, optional (default True): If True, skip sites with no samples.

Returns:

dict: Dictionary mapping metric names to site-wise results. Each inner dictionary maps site identifiers to metric values. When a single metric is passed, the result contains one top-level key (the metric’s __name__).

Raises:

TypeError: If inputs have incorrect types.
ValueError: If metric_kwargs length does not match metrics length or if input arrays have mismatched lengths.

Examples

Single metric:

>>> from sklearn.metrics import accuracy_score
>>> y_true = np.array([0, 1, 0, 1, 0, 1])
>>> y_scores = np.array([0.1, 0.9, 0.2, 0.4, 0.3, 0.8])
>>> sites = np.array(["A", "A", "B", "B", "A", "B"])
>>> report_metrics_by_site(y_true, y_scores, sites, accuracy_score)
{'accuracy_score': {'A': 1.0, 'B': 0.5}}

Single metric with custom threshold:

>>> report_metrics_by_site(
...     y_true, y_scores, sites, accuracy_score, metric_kwargs={"threshold": 0.3}
... )
{'accuracy_score': {'A': 1.0, 'B': 0.5}}

Multiple metrics:

>>> from sklearn.metrics import roc_auc_score, f1_score
>>> report_metrics_by_site(
...     y_true,
...     y_scores,
...     sites,
...     metrics=[roc_auc_score, accuracy_score, f1_score],
...     metric_kwargs=[
...         {},
...         {"threshold": 0.5},
...         {"threshold": 0.5, "average": "macro"},
...     ],
...     overall_performance=True,
... )
{'roc_auc_score': {'overall': 0.833, 'A': 1.0, 'B': 0.5},
 'accuracy_score': {'overall': 0.833, 'A': 1.0, 'B': 0.5},
 'f1_score': {'overall': 0.833, 'A': 1.0, 'B': 0.5}}

Examples#

Compute metrics by site

Discover biases in metrics by site

report_metrics_by_site

Contents

report_metrics_by_site#

Examples#