Data Loader Module

The data loader module provides comprehensive support for loading multiple microscopy file formats commonly used in cellular imaging analysis. It offers a unified interface for handling various commercial and open-source imaging formats with automatic format detection and optional dependency management.

Overview

The UniversalDataLoader class serves as the primary entry point for all data loading operations in iPA. It automatically detects file formats based on extensions and routes to appropriate loaders, providing a consistent API regardless of the input format.

Key Features

  • Unified Interface: Single method for loading all supported formats

  • Automatic Format Detection: Intelligent format identification based on file extension

  • Multi-channel Support: Selective channel loading for multi-dimensional data

  • Optional Normalization: Built-in intensity normalization

  • Graceful Degradation: Clear warnings for missing optional dependencies

  • Batch Processing: Efficient loading of multiple files

Supported File Formats

Core Formats (Always Available)

These formats are supported by default with no additional dependencies:

Extension

Description

Library

.mrc

Medical Research Council format

mrcfile

.tif/.tiff

Tagged Image File Format

tifffile

.npz

Compressed NumPy array format

numpy

Optional Formats (Require Additional Packages)

These formats require installing optional dependencies:

Extension

Description

Package

.lif

Leica Image Format

readlif

.czi

Carl Zeiss Image format

czifile

.nd2

Nikon ND2 format

nd2reader

To install optional dependencies:

pip install readlif      # For Leica .lif files
pip install czifile      # For Zeiss .czi files
pip install nd2reader    # For Nikon .nd2 files

UniversalDataLoader Class

class ipa.data_loader.UniversalDataLoader[source]

Bases: object

Universal data loader for multiple microscopy file formats.

Supports .mrc, .tif/.tiff, .lif, .czi, .nd2 and other formats with automatic format detection and optional dependency handling.

static load_data(filepath: str, channel: int | None = None, normalize: bool = False) ndarray[source]

Load data from various microscopy file formats.

Parameters:
  • filepath – Path to the data file (supports absolute paths, relative paths, or indexed keys like ‘sxt/784_5/raw’)

  • channel – Channel number for multi-channel data (0-indexed)

  • normalize – Whether to normalize values to [0,1] range

Returns:

Data array

get_last_metadata() Dict[source]

Get metadata from the last loaded file.

static get_supported_formats() Dict[str, str][source]

Get supported file formats.

Returns:

Dictionary mapping file extensions to descriptions

static batch_load(file_list: list, channel: int | None = None, normalize: bool = False) Dict[str, ndarray][source]

Load multiple files in batch.

Parameters:
  • file_list – List of file paths to load

  • channel – Channel selection for all files

  • normalize – Whether to normalize all data

Returns:

Dictionary mapping filenames to data arrays

Core Methods

load_data

static UniversalDataLoader.load_data(filepath: str, channel: int | None = None, normalize: bool = False) ndarray[source]

Load data from various microscopy file formats.

Parameters:
  • filepath – Path to the data file (supports absolute paths, relative paths, or indexed keys like ‘sxt/784_5/raw’)

  • channel – Channel number for multi-channel data (0-indexed)

  • normalize – Whether to normalize values to [0,1] range

Returns:

Data array

Loads microscopy data from a file with automatic format detection.

Parameters:

  • file_path (str): Path to the image file

  • channel (int, optional): Channel index for multi-channel data. Default: None (loads all channels)

  • normalize (bool, optional): Whether to normalize intensity to [0, 1]. Default: False

  • return_metadata (bool, optional): Whether to return metadata dictionary. Default: False

Returns:

  • If return_metadata=False: data (np.ndarray) - Loaded image data

  • If return_metadata=True: Tuple of (data, metadata) where metadata is a dict containing: - format (str): Detected file format - shape (tuple): Data shape - dtype (str): Data type - channels (int): Number of channels (if applicable)

Example:

from ipa.data_loader import UniversalDataLoader

# Simple loading
data = UniversalDataLoader.load_data('sample.mrc')
print(f"Shape: {data.shape}, Dtype: {data.dtype}")

# Load specific channel with normalization
data = UniversalDataLoader.load_data(
    'multichannel.tif',
    channel=0,
    normalize=True
)

# Load with metadata
data, metadata = UniversalDataLoader.load_data(
    'experiment.czi',
    return_metadata=True
)
print(f"Format: {metadata['format']}")
print(f"Shape: {metadata['shape']}")
print(f"Channels: {metadata.get('channels', 'N/A')}")

batch_load

static UniversalDataLoader.batch_load(file_list: list, channel: int | None = None, normalize: bool = False) Dict[str, ndarray][source]

Load multiple files in batch.

Parameters:
  • file_list – List of file paths to load

  • channel – Channel selection for all files

  • normalize – Whether to normalize all data

Returns:

Dictionary mapping filenames to data arrays

Loads multiple files efficiently with consistent parameters.

Parameters:

  • file_list (list): List of file paths to load

  • channel (int, optional): Channel index. Default: None

  • normalize (bool, optional): Whether to normalize. Default: False

Returns:

  • results (dict): Dictionary mapping filenames to loaded data arrays - Key: filename (str) - Value: np.ndarray or None (if loading failed)

Example:

from ipa.data_loader import UniversalDataLoader

# Batch load multiple files
file_list = [
    'exp_01.mrc',
    'exp_02.mrc',
    'exp_03.mrc'
]

results = UniversalDataLoader.batch_load(
    file_list,
    channel=0,
    normalize=True
)

# Process results
for filename, data in results.items():
    if data is not None:
        print(f"{filename}: {data.shape}")
    else:
        print(f"Failed to load {filename}")

Complete Workflow Examples

Basic Data Loading

from ipa.data_loader import UniversalDataLoader

# Load MRC file (cryo-ET data)
volume = UniversalDataLoader.load_data('tomogram.mrc')
print(f"Loaded volume: {volume.shape}")

# Load TIFF with normalization (SIM/WFM data)
image = UniversalDataLoader.load_data(
    'sim_image.tif',
    normalize=True
)
print(f"Normalized image range: [{image.min():.3f}, {image.max():.3f}]")

Multi-channel Data Handling

from ipa.data_loader import UniversalDataLoader

# Load multi-channel ND2 file
# Channel 0: Nuclei (DAPI)
nuclei = UniversalDataLoader.load_data(
    'cells.nd2',
    channel=0,
    normalize=True
)

# Channel 1: Actin (Phalloidin)
actin = UniversalDataLoader.load_data(
    'cells.nd2',
    channel=1,
    normalize=True
)

# Channel 2: ISGs (Insulin)
isgs = UniversalDataLoader.load_data(
    'cells.nd2',
    channel=2,
    normalize=True
)

print(f"Nuclei shape: {nuclei.shape}")
print(f"Actin shape: {actin.shape}")
print(f"ISGs shape: {isgs.shape}")

Batch Processing Pipeline

from ipa.data_loader import UniversalDataLoader
import os
from glob import glob

# Find all MRC files in directory
mrc_files = glob('data/*.mrc')
print(f"Found {len(mrc_files)} MRC files")

# Batch load all files
results = UniversalDataLoader.batch_load(
    mrc_files,
    normalize=True
)

# Process loaded data
for filepath, data in results.items():
    if data is not None:
        filename = os.path.basename(filepath)
        print(f"{filename}: shape={data.shape}, "
              f"mean={data.mean():.3f}, std={data.std():.3f}")
    else:
        print(f"Failed to load: {filepath}")

Integration with Partitioning Workflow

This example shows how data loading integrates with the partitioning module:

from ipa.data_loader import UniversalDataLoader
from ipa.processing.partitioning import Partitioning

# Step 1: Load masks
pm_mask = UniversalDataLoader.load_data('plasma_membrane.mrc')
ne_mask = UniversalDataLoader.load_data('nuclear_envelope.mrc')

print(f"PM mask shape: {pm_mask.shape}")
print(f"NE mask shape: {ne_mask.shape}")

# Step 2: Create partitions
partitioner = Partitioning(root_dir="results/", n_slices=8)
center, ne_edge, pm_edge = partitioner.extract_ne_pm_edges(pm_mask, ne_mask)

partition_mask = partitioner.create_nepm_radial_partitions(
    ne_edge, pm_edge,
    shape=pm_mask.shape,
    n_slices=8,
    pm_mask=pm_mask,
    ne_mask=ne_mask
)

print(f"Created {len(partition_mask.unique())-1} partitions")

Logging System

The data_loader module includes a comprehensive logging system for tracking analysis workflows: