Data Loader Module

The data loader module provides comprehensive support for loading multiple microscopy file formats commonly used in cellular imaging analysis. It offers a unified interface for handling various commercial and open-source imaging formats with automatic format detection and optional dependency management.

Overview

The UniversalDataLoader class serves as the primary entry point for all data loading operations in iPA. It automatically detects file formats based on extensions and routes to appropriate loaders, providing a consistent API regardless of the input format.

Key Features

Unified Interface: Single method for loading all supported formats
Automatic Format Detection: Intelligent format identification based on file extension
Multi-channel Support: Selective channel loading for multi-dimensional data
Optional Normalization: Built-in intensity normalization
Graceful Degradation: Clear warnings for missing optional dependencies
Batch Processing: Efficient loading of multiple files

Supported File Formats

Core Formats (Always Available)

These formats are supported by default with no additional dependencies:

Extension	Description	Library
.mrc	Medical Research Council format	mrcfile
.tif/.tiff	Tagged Image File Format	tifffile
.npz	Compressed NumPy array format	numpy

Optional Formats (Require Additional Packages)

These formats require installing optional dependencies:

Extension	Description	Package
.lif	Leica Image Format	readlif
.czi	Carl Zeiss Image format	czifile
.nd2	Nikon ND2 format	nd2reader

To install optional dependencies:

pip install readlif      # For Leica .lif files
pip install czifile      # For Zeiss .czi files
pip install nd2reader    # For Nikon .nd2 files

UniversalDataLoader Class

class ipa.data_loader.UniversalDataLoader[source]

Bases: object

Universal data loader for multiple microscopy file formats.

Supports .mrc, .tif/.tiff, .lif, .czi, .nd2 and other formats with automatic format detection and optional dependency handling.

static load_data(filepath: str, channel: int | None = None, normalize: bool = False) → ndarray[source]

Load data from various microscopy file formats.

Parameters:

filepath – Path to the data file (supports absolute paths, relative paths, or indexed keys like ‘sxt/784_5/raw’)
channel – Channel number for multi-channel data (0-indexed)
normalize – Whether to normalize values to [0,1] range

Returns:

Data array

get_last_metadata() → Dict[source]: Get metadata from the last loaded file.

static get_supported_formats() → Dict[str, str][source]

Get supported file formats.

Returns:: Dictionary mapping file extensions to descriptions

static batch_load(file_list: list, channel: int | None = None, normalize: bool = False) → Dict[str, ndarray][source]

Load multiple files in batch.

Parameters:

file_list – List of file paths to load
channel – Channel selection for all files
normalize – Whether to normalize all data

Returns:

Dictionary mapping filenames to data arrays

Core Methods

load_data

static UniversalDataLoader.load_data(filepath: str, channel: int | None = None, normalize: bool = False) → ndarray[source]

Load data from various microscopy file formats.

Parameters:

filepath – Path to the data file (supports absolute paths, relative paths, or indexed keys like ‘sxt/784_5/raw’)
channel – Channel number for multi-channel data (0-indexed)
normalize – Whether to normalize values to [0,1] range

Returns:

Data array

Loads microscopy data from a file with automatic format detection.

Parameters:

file_path (str): Path to the image file
channel (int, optional): Channel index for multi-channel data. Default: None (loads all channels)
normalize (bool, optional): Whether to normalize intensity to [0, 1]. Default: False
return_metadata (bool, optional): Whether to return metadata dictionary. Default: False

Returns:

If return_metadata=False: data (np.ndarray) - Loaded image data
If return_metadata=True: Tuple of (data, metadata) where metadata is a dict containing: - format (str): Detected file format - shape (tuple): Data shape - dtype (str): Data type - channels (int): Number of channels (if applicable)

Example:

from ipa.data_loader import UniversalDataLoader

# Simple loading
data = UniversalDataLoader.load_data('sample.mrc')
print(f"Shape: {data.shape}, Dtype: {data.dtype}")

# Load specific channel with normalization
data = UniversalDataLoader.load_data(
    'multichannel.tif',
    channel=0,
    normalize=True
)

# Load with metadata
data, metadata = UniversalDataLoader.load_data(
    'experiment.czi',
    return_metadata=True
)
print(f"Format: {metadata['format']}")
print(f"Shape: {metadata['shape']}")
print(f"Channels: {metadata.get('channels', 'N/A')}")

batch_load

static UniversalDataLoader.batch_load(file_list: list, channel: int | None = None, normalize: bool = False) → Dict[str, ndarray][source]

Load multiple files in batch.

Parameters:

file_list – List of file paths to load
channel – Channel selection for all files
normalize – Whether to normalize all data

Returns:

Dictionary mapping filenames to data arrays

Loads multiple files efficiently with consistent parameters.

Parameters:

file_list (list): List of file paths to load
channel (int, optional): Channel index. Default: None
normalize (bool, optional): Whether to normalize. Default: False

Returns:

results (dict): Dictionary mapping filenames to loaded data arrays - Key: filename (str) - Value: np.ndarray or None (if loading failed)

Example:

from ipa.data_loader import UniversalDataLoader

# Batch load multiple files
file_list = [
    'exp_01.mrc',
    'exp_02.mrc',
    'exp_03.mrc'
]

results = UniversalDataLoader.batch_load(
    file_list,
    channel=0,
    normalize=True
)

# Process results
for filename, data in results.items():
    if data is not None:
        print(f"{filename}: {data.shape}")
    else:
        print(f"Failed to load {filename}")

Complete Workflow Examples

Basic Data Loading

from ipa.data_loader import UniversalDataLoader

# Load MRC file (cryo-ET data)
volume = UniversalDataLoader.load_data('tomogram.mrc')
print(f"Loaded volume: {volume.shape}")

# Load TIFF with normalization (SIM/WFM data)
image = UniversalDataLoader.load_data(
    'sim_image.tif',
    normalize=True
)
print(f"Normalized image range: [{image.min():.3f}, {image.max():.3f}]")

Multi-channel Data Handling

from ipa.data_loader import UniversalDataLoader

# Load multi-channel ND2 file
# Channel 0: Nuclei (DAPI)
nuclei = UniversalDataLoader.load_data(
    'cells.nd2',
    channel=0,
    normalize=True
)

# Channel 1: Actin (Phalloidin)
actin = UniversalDataLoader.load_data(
    'cells.nd2',
    channel=1,
    normalize=True
)

# Channel 2: ISGs (Insulin)
isgs = UniversalDataLoader.load_data(
    'cells.nd2',
    channel=2,
    normalize=True
)

print(f"Nuclei shape: {nuclei.shape}")
print(f"Actin shape: {actin.shape}")
print(f"ISGs shape: {isgs.shape}")

Batch Processing Pipeline

from ipa.data_loader import UniversalDataLoader
import os
from glob import glob

# Find all MRC files in directory
mrc_files = glob('data/*.mrc')
print(f"Found {len(mrc_files)} MRC files")

# Batch load all files
results = UniversalDataLoader.batch_load(
    mrc_files,
    normalize=True
)

# Process loaded data
for filepath, data in results.items():
    if data is not None:
        filename = os.path.basename(filepath)
        print(f"{filename}: shape={data.shape}, "
              f"mean={data.mean():.3f}, std={data.std():.3f}")
    else:
        print(f"Failed to load: {filepath}")

Integration with Partitioning Workflow

This example shows how data loading integrates with the partitioning module:

from ipa.data_loader import UniversalDataLoader
from ipa.processing.partitioning import Partitioning

# Step 1: Load masks
pm_mask = UniversalDataLoader.load_data('plasma_membrane.mrc')
ne_mask = UniversalDataLoader.load_data('nuclear_envelope.mrc')

print(f"PM mask shape: {pm_mask.shape}")
print(f"NE mask shape: {ne_mask.shape}")

# Step 2: Create partitions
partitioner = Partitioning(root_dir="results/", n_slices=8)
center, ne_edge, pm_edge = partitioner.extract_ne_pm_edges(pm_mask, ne_mask)

partition_mask = partitioner.create_nepm_radial_partitions(
    ne_edge, pm_edge,
    shape=pm_mask.shape,
    n_slices=8,
    pm_mask=pm_mask,
    ne_mask=ne_mask
)

print(f"Created {len(partition_mask.unique())-1} partitions")

Logging System

The data_loader module includes a comprehensive logging system for tracking analysis workflows:

Log Module