Data Loader Module
The data loader module provides comprehensive support for loading multiple microscopy file formats commonly used in cellular imaging analysis. It offers a unified interface for handling various commercial and open-source imaging formats with automatic format detection and optional dependency management.
Overview
The UniversalDataLoader class serves as the primary entry point for all data loading operations in iPA. It automatically detects file formats based on extensions and routes to appropriate loaders, providing a consistent API regardless of the input format.
Key Features
Unified Interface: Single method for loading all supported formats
Automatic Format Detection: Intelligent format identification based on file extension
Multi-channel Support: Selective channel loading for multi-dimensional data
Optional Normalization: Built-in intensity normalization
Graceful Degradation: Clear warnings for missing optional dependencies
Batch Processing: Efficient loading of multiple files
Supported File Formats
Core Formats (Always Available)
These formats are supported by default with no additional dependencies:
Extension |
Description |
Library |
|---|---|---|
.mrc |
Medical Research Council format |
mrcfile |
.tif/.tiff |
Tagged Image File Format |
tifffile |
.npz |
Compressed NumPy array format |
numpy |
Optional Formats (Require Additional Packages)
These formats require installing optional dependencies:
Extension |
Description |
Package |
|---|---|---|
.lif |
Leica Image Format |
readlif |
.czi |
Carl Zeiss Image format |
czifile |
.nd2 |
Nikon ND2 format |
nd2reader |
To install optional dependencies:
pip install readlif # For Leica .lif files
pip install czifile # For Zeiss .czi files
pip install nd2reader # For Nikon .nd2 files
UniversalDataLoader Class
- class ipa.data_loader.UniversalDataLoader[source]
Bases:
objectUniversal data loader for multiple microscopy file formats.
Supports .mrc, .tif/.tiff, .lif, .czi, .nd2 and other formats with automatic format detection and optional dependency handling.
- static load_data(filepath: str, channel: int | None = None, normalize: bool = False) ndarray[source]
Load data from various microscopy file formats.
- Parameters:
filepath – Path to the data file (supports absolute paths, relative paths, or indexed keys like ‘sxt/784_5/raw’)
channel – Channel number for multi-channel data (0-indexed)
normalize – Whether to normalize values to [0,1] range
- Returns:
Data array
- static get_supported_formats() Dict[str, str][source]
Get supported file formats.
- Returns:
Dictionary mapping file extensions to descriptions
- static batch_load(file_list: list, channel: int | None = None, normalize: bool = False) Dict[str, ndarray][source]
Load multiple files in batch.
- Parameters:
file_list – List of file paths to load
channel – Channel selection for all files
normalize – Whether to normalize all data
- Returns:
Dictionary mapping filenames to data arrays
Core Methods
load_data
- static UniversalDataLoader.load_data(filepath: str, channel: int | None = None, normalize: bool = False) ndarray[source]
Load data from various microscopy file formats.
- Parameters:
filepath – Path to the data file (supports absolute paths, relative paths, or indexed keys like ‘sxt/784_5/raw’)
channel – Channel number for multi-channel data (0-indexed)
normalize – Whether to normalize values to [0,1] range
- Returns:
Data array
Loads microscopy data from a file with automatic format detection.
Parameters:
file_path(str): Path to the image filechannel(int, optional): Channel index for multi-channel data. Default: None (loads all channels)normalize(bool, optional): Whether to normalize intensity to [0, 1]. Default: Falsereturn_metadata(bool, optional): Whether to return metadata dictionary. Default: False
Returns:
If
return_metadata=False:data(np.ndarray) - Loaded image dataIf
return_metadata=True: Tuple of(data, metadata)where metadata is a dict containing: -format(str): Detected file format -shape(tuple): Data shape -dtype(str): Data type -channels(int): Number of channels (if applicable)
Example:
from ipa.data_loader import UniversalDataLoader
# Simple loading
data = UniversalDataLoader.load_data('sample.mrc')
print(f"Shape: {data.shape}, Dtype: {data.dtype}")
# Load specific channel with normalization
data = UniversalDataLoader.load_data(
'multichannel.tif',
channel=0,
normalize=True
)
# Load with metadata
data, metadata = UniversalDataLoader.load_data(
'experiment.czi',
return_metadata=True
)
print(f"Format: {metadata['format']}")
print(f"Shape: {metadata['shape']}")
print(f"Channels: {metadata.get('channels', 'N/A')}")
batch_load
- static UniversalDataLoader.batch_load(file_list: list, channel: int | None = None, normalize: bool = False) Dict[str, ndarray][source]
Load multiple files in batch.
- Parameters:
file_list – List of file paths to load
channel – Channel selection for all files
normalize – Whether to normalize all data
- Returns:
Dictionary mapping filenames to data arrays
Loads multiple files efficiently with consistent parameters.
Parameters:
file_list(list): List of file paths to loadchannel(int, optional): Channel index. Default: Nonenormalize(bool, optional): Whether to normalize. Default: False
Returns:
results(dict): Dictionary mapping filenames to loaded data arrays - Key: filename (str) - Value: np.ndarray or None (if loading failed)
Example:
from ipa.data_loader import UniversalDataLoader
# Batch load multiple files
file_list = [
'exp_01.mrc',
'exp_02.mrc',
'exp_03.mrc'
]
results = UniversalDataLoader.batch_load(
file_list,
channel=0,
normalize=True
)
# Process results
for filename, data in results.items():
if data is not None:
print(f"{filename}: {data.shape}")
else:
print(f"Failed to load {filename}")
Complete Workflow Examples
Basic Data Loading
from ipa.data_loader import UniversalDataLoader
# Load MRC file (cryo-ET data)
volume = UniversalDataLoader.load_data('tomogram.mrc')
print(f"Loaded volume: {volume.shape}")
# Load TIFF with normalization (SIM/WFM data)
image = UniversalDataLoader.load_data(
'sim_image.tif',
normalize=True
)
print(f"Normalized image range: [{image.min():.3f}, {image.max():.3f}]")
Multi-channel Data Handling
from ipa.data_loader import UniversalDataLoader
# Load multi-channel ND2 file
# Channel 0: Nuclei (DAPI)
nuclei = UniversalDataLoader.load_data(
'cells.nd2',
channel=0,
normalize=True
)
# Channel 1: Actin (Phalloidin)
actin = UniversalDataLoader.load_data(
'cells.nd2',
channel=1,
normalize=True
)
# Channel 2: ISGs (Insulin)
isgs = UniversalDataLoader.load_data(
'cells.nd2',
channel=2,
normalize=True
)
print(f"Nuclei shape: {nuclei.shape}")
print(f"Actin shape: {actin.shape}")
print(f"ISGs shape: {isgs.shape}")
Batch Processing Pipeline
from ipa.data_loader import UniversalDataLoader
import os
from glob import glob
# Find all MRC files in directory
mrc_files = glob('data/*.mrc')
print(f"Found {len(mrc_files)} MRC files")
# Batch load all files
results = UniversalDataLoader.batch_load(
mrc_files,
normalize=True
)
# Process loaded data
for filepath, data in results.items():
if data is not None:
filename = os.path.basename(filepath)
print(f"{filename}: shape={data.shape}, "
f"mean={data.mean():.3f}, std={data.std():.3f}")
else:
print(f"Failed to load: {filepath}")
Integration with Partitioning Workflow
This example shows how data loading integrates with the partitioning module:
from ipa.data_loader import UniversalDataLoader
from ipa.processing.partitioning import Partitioning
# Step 1: Load masks
pm_mask = UniversalDataLoader.load_data('plasma_membrane.mrc')
ne_mask = UniversalDataLoader.load_data('nuclear_envelope.mrc')
print(f"PM mask shape: {pm_mask.shape}")
print(f"NE mask shape: {ne_mask.shape}")
# Step 2: Create partitions
partitioner = Partitioning(root_dir="results/", n_slices=8)
center, ne_edge, pm_edge = partitioner.extract_ne_pm_edges(pm_mask, ne_mask)
partition_mask = partitioner.create_nepm_radial_partitions(
ne_edge, pm_edge,
shape=pm_mask.shape,
n_slices=8,
pm_mask=pm_mask,
ne_mask=ne_mask
)
print(f"Created {len(partition_mask.unique())-1} partitions")
Logging System
The data_loader module includes a comprehensive logging system for tracking analysis workflows: