Data Loader Module
==================

The data loader module provides comprehensive support for loading multiple microscopy file formats commonly used in cellular imaging analysis. It offers a unified interface for handling various commercial and open-source imaging formats with automatic format detection and optional dependency management.

Overview
--------

The ``UniversalDataLoader`` class serves as the primary entry point for all data loading operations in iPA. It automatically detects file formats based on extensions and routes to appropriate loaders, providing a consistent API regardless of the input format.

Key Features
~~~~~~~~~~~~

* **Unified Interface**: Single method for loading all supported formats
* **Automatic Format Detection**: Intelligent format identification based on file extension
* **Multi-channel Support**: Selective channel loading for multi-dimensional data
* **Optional Normalization**: Built-in intensity normalization
* **Graceful Degradation**: Clear warnings for missing optional dependencies
* **Batch Processing**: Efficient loading of multiple files

Supported File Formats
-----------------------

Core Formats (Always Available)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These formats are supported by default with no additional dependencies:

+-----------+--------------------------------------------------+---------------+
| Extension | Description                                      | Library       |
+===========+==================================================+===============+
| .mrc      | Medical Research Council format                  | mrcfile       |
+-----------+--------------------------------------------------+---------------+
| .tif/.tiff| Tagged Image File Format                         | tifffile      |
+-----------+--------------------------------------------------+---------------+
| .npz      | Compressed NumPy array format                    | numpy         |
+-----------+--------------------------------------------------+---------------+

Optional Formats (Require Additional Packages)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These formats require installing optional dependencies:

+-----------+------------------------------------------+---------------+
| Extension | Description                              | Package       |
+===========+==========================================+===============+
| .lif      | Leica Image Format                       | readlif       |
+-----------+------------------------------------------+---------------+
| .czi      | Carl Zeiss Image format                  | czifile       |
+-----------+------------------------------------------+---------------+
| .nd2      | Nikon ND2 format                         | nd2reader     |
+-----------+------------------------------------------+---------------+

To install optional dependencies:

.. code-block:: bash

    pip install readlif      # For Leica .lif files
    pip install czifile      # For Zeiss .czi files
    pip install nd2reader    # For Nikon .nd2 files

UniversalDataLoader Class
-------------------------

.. autoclass:: ipa.data_loader.UniversalDataLoader
   :members:
   :undoc-members:
   :show-inheritance:
   :member-order: bysource

Core Methods
~~~~~~~~~~~~

load_data
^^^^^^^^^

.. automethod:: ipa.data_loader.UniversalDataLoader.load_data

Loads microscopy data from a file with automatic format detection.

**Parameters**:

- ``file_path`` (str): Path to the image file
- ``channel`` (int, optional): Channel index for multi-channel data. Default: None (loads all channels)
- ``normalize`` (bool, optional): Whether to normalize intensity to [0, 1]. Default: False
- ``return_metadata`` (bool, optional): Whether to return metadata dictionary. Default: False

**Returns**:

- If ``return_metadata=False``: ``data`` (np.ndarray) - Loaded image data
- If ``return_metadata=True``: Tuple of ``(data, metadata)`` where metadata is a dict containing:
  - ``format`` (str): Detected file format
  - ``shape`` (tuple): Data shape
  - ``dtype`` (str): Data type
  - ``channels`` (int): Number of channels (if applicable)

**Example**:

.. code-block:: python

    from ipa.data_loader import UniversalDataLoader
    
    # Simple loading
    data = UniversalDataLoader.load_data('sample.mrc')
    print(f"Shape: {data.shape}, Dtype: {data.dtype}")
    
    # Load specific channel with normalization
    data = UniversalDataLoader.load_data(
        'multichannel.tif',
        channel=0,
        normalize=True
    )
    
    # Load with metadata
    data, metadata = UniversalDataLoader.load_data(
        'experiment.czi',
        return_metadata=True
    )
    print(f"Format: {metadata['format']}")
    print(f"Shape: {metadata['shape']}")
    print(f"Channels: {metadata.get('channels', 'N/A')}")

batch_load
^^^^^^^^^^

.. automethod:: ipa.data_loader.UniversalDataLoader.batch_load

Loads multiple files efficiently with consistent parameters.

**Parameters**:

- ``file_list`` (list): List of file paths to load
- ``channel`` (int, optional): Channel index. Default: None
- ``normalize`` (bool, optional): Whether to normalize. Default: False

**Returns**:

- ``results`` (dict): Dictionary mapping filenames to loaded data arrays
  - Key: filename (str)
  - Value: np.ndarray or None (if loading failed)

**Example**:

.. code-block:: python

    from ipa.data_loader import UniversalDataLoader
    
    # Batch load multiple files
    file_list = [
        'exp_01.mrc',
        'exp_02.mrc',
        'exp_03.mrc'
    ]
    
    results = UniversalDataLoader.batch_load(
        file_list,
        channel=0,
        normalize=True
    )
    
    # Process results
    for filename, data in results.items():
        if data is not None:
            print(f"{filename}: {data.shape}")
        else:
            print(f"Failed to load {filename}")

Complete Workflow Examples
--------------------------

Basic Data Loading
~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from ipa.data_loader import UniversalDataLoader
    
    # Load MRC file (cryo-ET data)
    volume = UniversalDataLoader.load_data('tomogram.mrc')
    print(f"Loaded volume: {volume.shape}")
    
    # Load TIFF with normalization (SIM/WFM data)
    image = UniversalDataLoader.load_data(
        'sim_image.tif',
        normalize=True
    )
    print(f"Normalized image range: [{image.min():.3f}, {image.max():.3f}]")

Multi-channel Data Handling
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from ipa.data_loader import UniversalDataLoader
    
    # Load multi-channel ND2 file
    # Channel 0: Nuclei (DAPI)
    nuclei = UniversalDataLoader.load_data(
        'cells.nd2',
        channel=0,
        normalize=True
    )
    
    # Channel 1: Actin (Phalloidin)
    actin = UniversalDataLoader.load_data(
        'cells.nd2',
        channel=1,
        normalize=True
    )
    
    # Channel 2: ISGs (Insulin)
    isgs = UniversalDataLoader.load_data(
        'cells.nd2',
        channel=2,
        normalize=True
    )
    
    print(f"Nuclei shape: {nuclei.shape}")
    print(f"Actin shape: {actin.shape}")
    print(f"ISGs shape: {isgs.shape}")

Batch Processing Pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from ipa.data_loader import UniversalDataLoader
    import os
    from glob import glob
    
    # Find all MRC files in directory
    mrc_files = glob('data/*.mrc')
    print(f"Found {len(mrc_files)} MRC files")
    
    # Batch load all files
    results = UniversalDataLoader.batch_load(
        mrc_files,
        normalize=True
    )
    
    # Process loaded data
    for filepath, data in results.items():
        if data is not None:
            filename = os.path.basename(filepath)
            print(f"{filename}: shape={data.shape}, "
                  f"mean={data.mean():.3f}, std={data.std():.3f}")
        else:
            print(f"Failed to load: {filepath}")

Integration with Partitioning Workflow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This example shows how data loading integrates with the partitioning module:

.. code-block:: python

    from ipa.data_loader import UniversalDataLoader
    from ipa.processing.partitioning import Partitioning
    
    # Step 1: Load masks
    pm_mask = UniversalDataLoader.load_data('plasma_membrane.mrc')
    ne_mask = UniversalDataLoader.load_data('nuclear_envelope.mrc')
    
    print(f"PM mask shape: {pm_mask.shape}")
    print(f"NE mask shape: {ne_mask.shape}")
    
    # Step 2: Create partitions
    partitioner = Partitioning(root_dir="results/", n_slices=8)
    center, ne_edge, pm_edge = partitioner.extract_ne_pm_edges(pm_mask, ne_mask)
    
    partition_mask = partitioner.create_nepm_radial_partitions(
        ne_edge, pm_edge,
        shape=pm_mask.shape,
        n_slices=8,
        pm_mask=pm_mask,
        ne_mask=ne_mask
    )
    
    print(f"Created {len(partition_mask.unique())-1} partitions")

Logging System
--------------

The data_loader module includes a comprehensive logging system for tracking analysis workflows:

.. toctree::
   :maxdepth: 2

   log