
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/01-basic-examples/05-plot_multisite_data_characterization.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_01-basic-examples_05-plot_multisite_data_characterization.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_01-basic-examples_05-plot_multisite_data_characterization.py:


Characterise a multisite problem
================================

.. GENERATED FROM PYTHON SOURCE LINES 7-8

The first step before applying any harmonization technique is to understand and characterize our data

.. GENERATED FROM PYTHON SOURCE LINES 10-12

Imports
-------

.. GENERATED FROM PYTHON SOURCE LINES 12-29

.. code-block:: Python


    import matplotlib.pyplot as plt
    import seaborn as sns

    from uniharmony import verbosity
    from uniharmony.datasets import (
        get_multisite_data_statistics,
        make_multisite_classification,
        print_statistics_summary,
    )
    from uniharmony.plot import plot_features_by_site


    sns.set_theme(style="whitegrid")
    verbosity("warning")









.. GENERATED FROM PYTHON SOURCE LINES 30-33

Data generation
---------------
Let's use the multisite data generator to simulate some data

.. GENERATED FROM PYTHON SOURCE LINES 33-46

.. code-block:: Python


    print("Generating example data...")
    X, y, sites = make_multisite_classification(
        n_sites=5,
        n_samples=1000,
        n_features=10,
        n_classes=3,
        random_state=42,
    )

    print("\n" + "=" * 60)






.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Generating example data...

    ============================================================




.. GENERATED FROM PYTHON SOURCE LINES 47-48

Now let's compute some statistics

.. GENERATED FROM PYTHON SOURCE LINES 48-65

.. code-block:: Python


    print("Computing statistics...")
    print("=" * 60)

    # Compute statistics
    stats = get_multisite_data_statistics(
        X=X,
        y=y,
        sites=sites,
        feature_names=[f"feat_{i}" for i in range(X.shape[1])],
    )
    verbosity("info")
    # Print summary
    print_statistics_summary(stats)
    verbosity("warning")






.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Computing statistics...
    ============================================================
    2026-05-18 13:03:34 [info     ] ============================================================
    2026-05-18 13:03:34 [info     ] DATASET STATISTICS SUMMARY
    2026-05-18 13:03:34 [info     ] ============================================================
    2026-05-18 13:03:34 [info     ] 
    OVERALL:
    2026-05-18 13:03:34 [info     ]   Samples: 1000
    2026-05-18 13:03:34 [info     ]   Features: 10
    2026-05-18 13:03:34 [info     ]   Sites: 5
    2026-05-18 13:03:34 [info     ]   Classes: 3
    2026-05-18 13:03:34 [info     ] 
    CLASS DISTRIBUTION:
    2026-05-18 13:03:34 [info     ]   class_0: 335 samples (33.5%)
    2026-05-18 13:03:34 [info     ]   class_1: 335 samples (33.5%)
    2026-05-18 13:03:34 [info     ]   class_2: 330 samples (33.0%)
    2026-05-18 13:03:34 [info     ] 
    SITE DISTRIBUTION:
    2026-05-18 13:03:34 [info     ]   site_0: 200 samples (20.0%)
    2026-05-18 13:03:34 [info     ]   site_1: 200 samples (20.0%)
    2026-05-18 13:03:34 [info     ]   site_2: 200 samples (20.0%)
    2026-05-18 13:03:34 [info     ]   site_3: 200 samples (20.0%)
    2026-05-18 13:03:34 [info     ]   site_4: 200 samples (20.0%)
    2026-05-18 13:03:34 [info     ] 
    SITE STATISTICS (summary):
    2026-05-18 13:03:34 [info     ]   site_0:
    2026-05-18 13:03:34 [info     ]     Samples: 200
    2026-05-18 13:03:34 [info     ]     Class distribution: {'class_0': 67, 'class_1': 67, 'class_2': 66}
    2026-05-18 13:03:34 [info     ]   site_1:
    2026-05-18 13:03:34 [info     ]     Samples: 200
    2026-05-18 13:03:34 [info     ]     Class distribution: {'class_0': 67, 'class_1': 67, 'class_2': 66}
    2026-05-18 13:03:34 [info     ]   site_2:
    2026-05-18 13:03:34 [info     ]     Samples: 200
    2026-05-18 13:03:34 [info     ]     Class distribution: {'class_0': 67, 'class_1': 67, 'class_2': 66}
    2026-05-18 13:03:34 [info     ]   site_3:
    2026-05-18 13:03:34 [info     ]     Samples: 200
    2026-05-18 13:03:34 [info     ]     Class distribution: {'class_0': 67, 'class_1': 67, 'class_2': 66}
    2026-05-18 13:03:34 [info     ]   site_4:
    2026-05-18 13:03:34 [info     ]     Samples: 200
    2026-05-18 13:03:34 [info     ]     Class distribution: {'class_0': 67, 'class_1': 67, 'class_2': 66}
    2026-05-18 13:03:34 [info     ] 
    FEATURE STATISTICS (first 5 features):
    2026-05-18 13:03:34 [info     ]   feat_0: mean=0.8137, std=1.6804, MAD=1.2405
    2026-05-18 13:03:34 [info     ]   feat_1: mean=-0.2313, std=1.9461, MAD=1.4023
    2026-05-18 13:03:34 [info     ]   feat_2: mean=1.0819, std=2.0994, MAD=1.4588
    2026-05-18 13:03:34 [info     ]   feat_3: mean=0.8475, std=1.6420, MAD=1.2095
    2026-05-18 13:03:34 [info     ]   feat_4: mean=0.7886, std=1.5675, MAD=1.1310
    2026-05-18 13:03:34 [info     ] 
    CORRELATIONS:
    2026-05-18 13:03:34 [info     ]   Average Inter-Site Correlation: 0.9485
    2026-05-18 13:03:34 [info     ] ============================================================




.. GENERATED FROM PYTHON SOURCE LINES 66-77

.. code-block:: Python


    # Same plot individual points overlay
    fig2, ax2 = plot_features_by_site(
        X,
        sites,
        figsize=(14, 7),
        rotation=45,
        show_points=True,
        title="All Features by Site (with individual points)",
    )
    plt.show()



.. image-sg:: /auto_examples/01-basic-examples/images/sphx_glr_05-plot_multisite_data_characterization_001.png
   :alt: All Features by Site (with individual points)
   :srcset: /auto_examples/01-basic-examples/images/sphx_glr_05-plot_multisite_data_characterization_001.png
   :class: sphx-glr-single-img






.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 3.870 seconds)


.. _sphx_glr_download_auto_examples_01-basic-examples_05-plot_multisite_data_characterization.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: 05-plot_multisite_data_characterization.ipynb <05-plot_multisite_data_characterization.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: 05-plot_multisite_data_characterization.py <05-plot_multisite_data_characterization.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: 05-plot_multisite_data_characterization.zip <05-plot_multisite_data_characterization.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
