
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/02-multisite-data/03-plot_generate_imbalance_multisite_data.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_02-multisite-data_03-plot_generate_imbalance_multisite_data.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_02-multisite-data_03-plot_generate_imbalance_multisite_data.py:


Generate imbalance multisite data
=================================

This example shows how to generate an unbalanced multisite dataset
using the ``balance_per_site`` parameter of the ``make_multisite_classification`` function.

.. GENERATED FROM PYTHON SOURCE LINES 10-12

Imports
-------

.. GENERATED FROM PYTHON SOURCE LINES 12-24

.. code-block:: Python


    import matplotlib.pyplot as plt
    import pandas as pd
    import seaborn as sns

    from uniharmony import verbosity
    from uniharmony.datasets import make_multisite_classification


    sns.set_theme(style="whitegrid")
    verbosity("warning")








.. GENERATED FROM PYTHON SOURCE LINES 25-28

Data generation
---------------
Let's start with the function as default, this will create a 2 site balanced problem.

.. GENERATED FROM PYTHON SOURCE LINES 30-52

.. code-block:: Python


    X, y, sites = make_multisite_classification()
    df = pd.DataFrame({"Class": y, "Site": sites})

    general_balance = len(y[y == 1]) / len(y)
    y_site_0 = y[sites == 0]
    y_site_1 = y[sites == 1]
    site_0_balance = len(y_site_0[y_site_0 == 1]) / len(y_site_0)
    site_1_balance = len(y_site_1[y_site_1 == 1]) / len(y_site_1)

    print(
        "The class distribution is balanced across sites and in general \n"
        f"General balance: {general_balance:.2f} \n"
        f"site 0 balance: {site_0_balance:.2f} \n"
        f"site 1 balance: {site_1_balance:.2f}"
    )

    plt.figure(figsize=[10, 6])
    plt.title("Class and site distribution")
    sns.countplot(df, x="Class", hue="Site")
    plt.grid(axis="y", color="black", alpha=0.5, linestyle="--")




.. image-sg:: /auto_examples/02-multisite-data/images/sphx_glr_03-plot_generate_imbalance_multisite_data_001.png
   :alt: Class and site distribution
   :srcset: /auto_examples/02-multisite-data/images/sphx_glr_03-plot_generate_imbalance_multisite_data_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    The class distribution is balanced across sites and in general 
    General balance: 0.50 
    site 0 balance: 0.50 
    site 1 balance: 0.50




.. GENERATED FROM PYTHON SOURCE LINES 53-54

Let's now create a site imbalance problem. That means that, while the total number of examples per class is imbalance, the classes are not equally distributed by site.

.. GENERATED FROM PYTHON SOURCE LINES 56-78

.. code-block:: Python


    X, y, sites = make_multisite_classification(balance_per_site=[0.3, 0.7])
    df = pd.DataFrame({"Class": y, "Site": sites})
    general_balance = len(y[y == 1]) / len(y)
    y_site_0 = y[sites == 0]
    y_site_1 = y[sites == 1]
    site_0_balance = len(y_site_0[y_site_0 == 1]) / len(y_site_0)
    site_1_balance = len(y_site_1[y_site_1 == 1]) / len(y_site_1)

    print(
        "The class distribution is imbalanced across sites but balanced in general \n"
        f"General balance: {general_balance:.2f} \n"
        f"site 0 balance: {site_0_balance:.2f} \n"
        f"site 1 balance: {site_1_balance:.2f}"
    )

    plt.figure(figsize=[10, 6])
    plt.title("Class and site distribution")
    sns.countplot(df, x="Class", hue="Site")
    plt.grid(axis="y", color="black", alpha=0.5, linestyle="--")





.. image-sg:: /auto_examples/02-multisite-data/images/sphx_glr_03-plot_generate_imbalance_multisite_data_002.png
   :alt: Class and site distribution
   :srcset: /auto_examples/02-multisite-data/images/sphx_glr_03-plot_generate_imbalance_multisite_data_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    The class distribution is imbalanced across sites but balanced in general 
    General balance: 0.50 
    site 0 balance: 0.30 
    site 1 balance: 0.70




.. GENERATED FROM PYTHON SOURCE LINES 79-99

.. code-block:: Python


    X, y, sites = make_multisite_classification(balance_per_site=[0.3, 0.3])
    df = pd.DataFrame({"Class": y, "Site": sites})
    general_balance = len(y[y == 1]) / len(y)
    y_site_0 = y[sites == 0]
    y_site_1 = y[sites == 1]
    site_0_balance = len(y_site_0[y_site_0 == 1]) / len(y_site_0)
    site_1_balance = len(y_site_1[y_site_1 == 1]) / len(y_site_1)

    print(
        "The class are imbalanced in general, but have the same imbalance across sites\n"
        f"General balance: {general_balance:.2f} \n"
        f"site 0 balance: {site_0_balance:.2f} \n"
        f"site 1 balance: {site_1_balance:.2f}"
    )

    plt.figure(figsize=[10, 6])
    plt.title("Class and site distribution")
    sns.countplot(df, x="Class", hue="Site")
    plt.grid(axis="y", color="black", alpha=0.5, linestyle="--")



.. image-sg:: /auto_examples/02-multisite-data/images/sphx_glr_03-plot_generate_imbalance_multisite_data_003.png
   :alt: Class and site distribution
   :srcset: /auto_examples/02-multisite-data/images/sphx_glr_03-plot_generate_imbalance_multisite_data_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-05-18 13:04:34 [warning  ] Not enough samples of class 0 in global dataset. Requested 350, available 330. Consider adjusting balance_per_site or generating more samples.
    2026-05-18 13:04:34 [warning  ] Not enough samples of class 0 in global dataset. Requested 350, available 330. Consider adjusting balance_per_site or generating more samples.
    The class are imbalanced in general, but have the same imbalance across sites
    General balance: 0.30 
    site 0 balance: 0.30 
    site 1 balance: 0.30





.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.432 seconds)


.. _sphx_glr_download_auto_examples_02-multisite-data_03-plot_generate_imbalance_multisite_data.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: 03-plot_generate_imbalance_multisite_data.ipynb <03-plot_generate_imbalance_multisite_data.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: 03-plot_generate_imbalance_multisite_data.py <03-plot_generate_imbalance_multisite_data.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: 03-plot_generate_imbalance_multisite_data.zip <03-plot_generate_imbalance_multisite_data.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
