report_metrics_by_site

report_metrics_by_site#

uniharmony.metrics.report_metrics_by_site(y_true: ndarray[tuple[Any, ...], dtype[_ScalarT]], y_pred: ndarray[tuple[Any, ...], dtype[_ScalarT]], sites: ndarray[tuple[Any, ...], dtype[_ScalarT]], metrics: Callable | list[Callable], metric_kwargs: dict[str, Any] | list[dict[str, Any]] | None = None, overall_performance: bool = True, skip_empty_sites: bool = True) dict[str, dict[str | int, float]]#

Compute one or more metrics stratified by site.

Accepts either a single metric function or a sequence of metrics. Each metric can receive its own set of keyword arguments via metric_kwargs. If y_pred contains continuous scores but a metric requires discrete predictions, the scores are automatically binarized using the threshold keyword argument for that metric (default: 0.5).

Parameters:
y_truenp.ndarray

Ground-truth (correct) target values.

y_prednp.ndarray

Estimated targets as returned by a classifier, or probability estimates / decision function outputs.

sitesnp.ndarray

Site identifiers for stratification. Can be strings or integers.

metricscallable or list of callable

Metric function or list of metric functions to compute (e.g., from sklearn.metrics). Pass a single callable for one metric, or a sequence for multiple metrics.

metric_kwargsdict or list of dict or None, optional (default None)

Keyword arguments for each metric. If a single dict, it is passed to all metrics. If a list, metric_kwargs[i] is passed to metrics[i]. Must have the same length as metrics. Include threshold (default: 0.5) for metrics that require discrete predictions when y_pred contains continuous scores.

overall_performancebool, optional (default True)

If True, include an "overall" key for each metric computed across all sites.

skip_empty_sitesbool, optional (default True)

If True, skip sites with no samples.

Returns:
dict

Dictionary mapping metric names to site-wise results. Each inner dictionary maps site identifiers to metric values. When a single metric is passed, the result contains one top-level key (the metric’s __name__).

Raises:
TypeError

If inputs have incorrect types.

ValueError

If metric_kwargs length does not match metrics length or if input arrays have mismatched lengths.

Examples

Single metric:

>>> from sklearn.metrics import accuracy_score
>>> y_true = np.array([0, 1, 0, 1, 0, 1])
>>> y_scores = np.array([0.1, 0.9, 0.2, 0.4, 0.3, 0.8])
>>> sites = np.array(["A", "A", "B", "B", "A", "B"])
>>> report_metrics_by_site(y_true, y_scores, sites, accuracy_score)
{'accuracy_score': {'A': 1.0, 'B': 0.5}}

Single metric with custom threshold:

>>> report_metrics_by_site(
...     y_true, y_scores, sites, accuracy_score, metric_kwargs={"threshold": 0.3}
... )
{'accuracy_score': {'A': 1.0, 'B': 0.5}}

Multiple metrics:

>>> from sklearn.metrics import roc_auc_score, f1_score
>>> report_metrics_by_site(
...     y_true,
...     y_scores,
...     sites,
...     metrics=[roc_auc_score, accuracy_score, f1_score],
...     metric_kwargs=[
...         {},
...         {"threshold": 0.5},
...         {"threshold": 0.5, "average": "macro"},
...     ],
...     overall_performance=True,
... )
{'roc_auc_score': {'overall': 0.833, 'A': 1.0, 'B': 0.5},
 'accuracy_score': {'overall': 0.833, 'A': 1.0, 'B': 0.5},
 'f1_score': {'overall': 0.833, 'A': 1.0, 'B': 0.5}}

Examples#

Compute metrics by site

Compute metrics by site

Discover biases in metrics by site

Discover biases in metrics by site