Data Loader Module ================== The data loader module provides comprehensive support for loading multiple microscopy file formats commonly used in cellular imaging analysis. It offers a unified interface for handling various commercial and open-source imaging formats with automatic format detection and optional dependency management. Overview -------- The ``UniversalDataLoader`` class serves as the primary entry point for all data loading operations in iPA. It automatically detects file formats based on extensions and routes to appropriate loaders, providing a consistent API regardless of the input format. Key Features ~~~~~~~~~~~~ * **Unified Interface**: Single method for loading all supported formats * **Automatic Format Detection**: Intelligent format identification based on file extension * **Multi-channel Support**: Selective channel loading for multi-dimensional data * **Optional Normalization**: Built-in intensity normalization * **Graceful Degradation**: Clear warnings for missing optional dependencies * **Batch Processing**: Efficient loading of multiple files Supported File Formats ----------------------- Core Formats (Always Available) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These formats are supported by default with no additional dependencies: +-----------+--------------------------------------------------+---------------+ | Extension | Description | Library | +===========+==================================================+===============+ | .mrc | Medical Research Council format | mrcfile | +-----------+--------------------------------------------------+---------------+ | .tif/.tiff| Tagged Image File Format | tifffile | +-----------+--------------------------------------------------+---------------+ | .npz | Compressed NumPy array format | numpy | +-----------+--------------------------------------------------+---------------+ Optional Formats (Require Additional Packages) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These formats require installing optional dependencies: +-----------+------------------------------------------+---------------+ | Extension | Description | Package | +===========+==========================================+===============+ | .lif | Leica Image Format | readlif | +-----------+------------------------------------------+---------------+ | .czi | Carl Zeiss Image format | czifile | +-----------+------------------------------------------+---------------+ | .nd2 | Nikon ND2 format | nd2reader | +-----------+------------------------------------------+---------------+ To install optional dependencies: .. code-block:: bash pip install readlif # For Leica .lif files pip install czifile # For Zeiss .czi files pip install nd2reader # For Nikon .nd2 files UniversalDataLoader Class ------------------------- .. autoclass:: ipa.data_loader.UniversalDataLoader :members: :undoc-members: :show-inheritance: :member-order: bysource Core Methods ~~~~~~~~~~~~ load_data ^^^^^^^^^ .. automethod:: ipa.data_loader.UniversalDataLoader.load_data Loads microscopy data from a file with automatic format detection. **Parameters**: - ``file_path`` (str): Path to the image file - ``channel`` (int, optional): Channel index for multi-channel data. Default: None (loads all channels) - ``normalize`` (bool, optional): Whether to normalize intensity to [0, 1]. Default: False - ``return_metadata`` (bool, optional): Whether to return metadata dictionary. Default: False **Returns**: - If ``return_metadata=False``: ``data`` (np.ndarray) - Loaded image data - If ``return_metadata=True``: Tuple of ``(data, metadata)`` where metadata is a dict containing: - ``format`` (str): Detected file format - ``shape`` (tuple): Data shape - ``dtype`` (str): Data type - ``channels`` (int): Number of channels (if applicable) **Example**: .. code-block:: python from ipa.data_loader import UniversalDataLoader # Simple loading data = UniversalDataLoader.load_data('sample.mrc') print(f"Shape: {data.shape}, Dtype: {data.dtype}") # Load specific channel with normalization data = UniversalDataLoader.load_data( 'multichannel.tif', channel=0, normalize=True ) # Load with metadata data, metadata = UniversalDataLoader.load_data( 'experiment.czi', return_metadata=True ) print(f"Format: {metadata['format']}") print(f"Shape: {metadata['shape']}") print(f"Channels: {metadata.get('channels', 'N/A')}") batch_load ^^^^^^^^^^ .. automethod:: ipa.data_loader.UniversalDataLoader.batch_load Loads multiple files efficiently with consistent parameters. **Parameters**: - ``file_list`` (list): List of file paths to load - ``channel`` (int, optional): Channel index. Default: None - ``normalize`` (bool, optional): Whether to normalize. Default: False **Returns**: - ``results`` (dict): Dictionary mapping filenames to loaded data arrays - Key: filename (str) - Value: np.ndarray or None (if loading failed) **Example**: .. code-block:: python from ipa.data_loader import UniversalDataLoader # Batch load multiple files file_list = [ 'exp_01.mrc', 'exp_02.mrc', 'exp_03.mrc' ] results = UniversalDataLoader.batch_load( file_list, channel=0, normalize=True ) # Process results for filename, data in results.items(): if data is not None: print(f"{filename}: {data.shape}") else: print(f"Failed to load {filename}") Complete Workflow Examples -------------------------- Basic Data Loading ~~~~~~~~~~~~~~~~~~ .. code-block:: python from ipa.data_loader import UniversalDataLoader # Load MRC file (cryo-ET data) volume = UniversalDataLoader.load_data('tomogram.mrc') print(f"Loaded volume: {volume.shape}") # Load TIFF with normalization (SIM/WFM data) image = UniversalDataLoader.load_data( 'sim_image.tif', normalize=True ) print(f"Normalized image range: [{image.min():.3f}, {image.max():.3f}]") Multi-channel Data Handling ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from ipa.data_loader import UniversalDataLoader # Load multi-channel ND2 file # Channel 0: Nuclei (DAPI) nuclei = UniversalDataLoader.load_data( 'cells.nd2', channel=0, normalize=True ) # Channel 1: Actin (Phalloidin) actin = UniversalDataLoader.load_data( 'cells.nd2', channel=1, normalize=True ) # Channel 2: ISGs (Insulin) isgs = UniversalDataLoader.load_data( 'cells.nd2', channel=2, normalize=True ) print(f"Nuclei shape: {nuclei.shape}") print(f"Actin shape: {actin.shape}") print(f"ISGs shape: {isgs.shape}") Batch Processing Pipeline ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from ipa.data_loader import UniversalDataLoader import os from glob import glob # Find all MRC files in directory mrc_files = glob('data/*.mrc') print(f"Found {len(mrc_files)} MRC files") # Batch load all files results = UniversalDataLoader.batch_load( mrc_files, normalize=True ) # Process loaded data for filepath, data in results.items(): if data is not None: filename = os.path.basename(filepath) print(f"{filename}: shape={data.shape}, " f"mean={data.mean():.3f}, std={data.std():.3f}") else: print(f"Failed to load: {filepath}") Integration with Partitioning Workflow ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This example shows how data loading integrates with the partitioning module: .. code-block:: python from ipa.data_loader import UniversalDataLoader from ipa.processing.partitioning import Partitioning # Step 1: Load masks pm_mask = UniversalDataLoader.load_data('plasma_membrane.mrc') ne_mask = UniversalDataLoader.load_data('nuclear_envelope.mrc') print(f"PM mask shape: {pm_mask.shape}") print(f"NE mask shape: {ne_mask.shape}") # Step 2: Create partitions partitioner = Partitioning(root_dir="results/", n_slices=8) center, ne_edge, pm_edge = partitioner.extract_ne_pm_edges(pm_mask, ne_mask) partition_mask = partitioner.create_nepm_radial_partitions( ne_edge, pm_edge, shape=pm_mask.shape, n_slices=8, pm_mask=pm_mask, ne_mask=ne_mask ) print(f"Created {len(partition_mask.unique())-1} partitions") Logging System -------------- The data_loader module includes a comprehensive logging system for tracking analysis workflows: .. toctree:: :maxdepth: 2 log