Usage Guide#
xarray-ome provides seamless integration between OME-Zarr files and xarray’s DataTree/Dataset structures.
Installation#
pip install xarray-ome
Or with uv:
uv add xarray-ome
Sample Data#
All examples in this documentation use real OME-NGFF sample data from the Image Data Resource (IDR). You can try these examples yourself without downloading any data - they work directly with remote URLs!
Featured Sample Dataset:
URL:
https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarrDimensions: 4D (2 channels × 236 z-slices × 275 y × 271 x)
Size: Multi-resolution pyramid with 3 levels
Source: idr0062A - Drug treatment study
Browse more sample datasets at: OME-NGFF Samples
Basic Usage#
Opening OME-Zarr Files#
xarray-ome provides two ways to open OME-Zarr files:
Using dedicated functions (
open_ome_datatree,open_ome_dataset)Using xarray’s native backend (
xr.open_datatree,xr.open_datasetwithengine="ome-zarr")
Both approaches provide the same functionality - use whichever fits your workflow better.
Using Xarray Backend (Recommended)#
The most natural way to use xarray-ome is through xarray’s native functions:
import xarray as xr
# Open entire multiscale pyramid as DataTree
dt = xr.open_datatree("image.ome.zarr", engine="ome-zarr")
# Open single resolution level as Dataset (works for single or multi-resolution)
ds = xr.open_dataset("image.ome.zarr", engine="ome-zarr")
# Open specific resolution level in multi-resolution pyramid
ds_low = xr.open_dataset("image.ome.zarr", engine="ome-zarr", resolution=2)
# Single-resolution files work automatically with default resolution=0
ds_single = xr.open_dataset("single_scale.ome.zarr", engine="ome-zarr")
# Works with remote URLs too!
url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr"
dt = xr.open_datatree(url, engine="ome-zarr")
Using Dedicated Functions#
Alternatively, use the dedicated functions for more explicit control:
from xarray_ome import open_ome_datatree, open_ome_dataset
# Open from local file
dt = open_ome_datatree("path/to/image.ome.zarr")
# Open from remote URL (using OME-NGFF sample data)
url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr"
dt = open_ome_datatree(url)
# Access different resolution levels
print(dt.children.keys()) # dict_keys(['scale0', 'scale1', 'scale2'])
# Access highest resolution
high_res = dt["scale0"]
# Access lower resolution
low_res = dt["scale2"]
# Open specific resolution level
ds = open_ome_dataset("path/to/image.ome.zarr")
ds_low = open_ome_dataset("path/to/image.ome.zarr", resolution=2)
# Access data and coordinates
print(ds.dims) # Dimensions: (c, z, y, x)
print(ds.coords) # Physical coordinates in micrometers
print(ds.data_vars) # Data variables
Features#
Lazy Loading#
xarray-ome supports lazy loading with Dask, enabling efficient work with large datasets:
import xarray_ome
# Opening is fast - only metadata is read
ds = xarray_ome.open_ome_dataset("large_file.ome.zarr")
# Data is not loaded until needed
subset = ds.isel(z=slice(0, 10), y=slice(0, 100), x=slice(0, 100))
# Actually load the subset into memory
result = subset.compute()
Remote Access#
Read directly from remote URLs without downloading the entire file:
from xarray_ome import open_ome_datatree
# Works with HTTP/HTTPS - using IDR OME-NGFF sample data
url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr"
dt = open_ome_datatree(url)
# Only the chunks you access are downloaded
subset = dt["scale0"].ds.isel(z=0).compute()
# This downloads only ~few MB instead of the full dataset
print(f"Shape: {subset.dims}")
Physical Coordinates#
OME-NGFF coordinate transformations are automatically converted to xarray coordinates:
ds = open_ome_dataset("image.ome.zarr")
# Coordinates represent physical space (e.g., micrometers)
print(ds.coords["x"]) # [0.0, 0.36, 0.72, ..., 97.31] micrometers
print(ds.coords["y"]) # [0.0, 0.36, 0.72, ..., 98.75] micrometers
print(ds.coords["z"]) # [0.0, 0.50, 1.00, ..., 117.5] micrometers
# Access metadata
print(ds.attrs["ome_scale"]) # Scale factors per dimension
print(ds.attrs["ome_translation"]) # Translation offsets
print(ds.attrs["ome_axes_units"]) # Physical units
Metadata Handling#
OME-NGFF metadata is preserved in the xarray attributes for round-tripping:
dt = open_ome_datatree("image.ome.zarr")
# Access full OME-NGFF metadata
metadata = dt.attrs["ome_ngff_metadata"]
print(metadata["axes"]) # Axis definitions
print(metadata["datasets"]) # Dataset paths and transforms
print(metadata["version"]) # OME-NGFF version
Working with DataTree#
The DataTree structure preserves the multiscale pyramid:
dt = open_ome_datatree("image.ome.zarr")
# Iterate through resolution levels
for name, child in dt.children.items():
ds = child.ds
print(f"{name}: {ds.dims}")
# Access as dictionary
scale0 = dt["scale0"].ds
scale1 = dt["scale1"].ds
# Use xarray operations on any level
mean_projection = scale0["image"].mean(dim="z")
Advanced Usage#
Validation#
Enable metadata validation against the OME-NGFF specification:
# Validate metadata when opening
dt = open_ome_datatree("image.ome.zarr", validate=True)
ds = open_ome_dataset("image.ome.zarr", validate=True)
Selecting Specific Regions#
Use xarray’s powerful indexing to work with subsets:
ds = open_ome_dataset("large_image.ome.zarr")
# Select by dimension
channel_0 = ds.sel(c=0)
# Select by coordinate range
region = ds.sel(x=slice(10, 50), y=slice(20, 60))
# Integer indexing
slice_z10 = ds.isel(z=10)
# Combine operations
subset = ds.sel(c=0).isel(z=slice(0, 10)).mean(dim="z")
Working with Dask#
Take advantage of Dask for parallel and out-of-core computation:
import dask
ds = open_ome_dataset("image.ome.zarr")
# Chain operations without loading data
result = (
ds["image"]
.sel(c=0)
.mean(dim="z")
.compute() # Only compute at the end
)
# Configure Dask for parallel processing
with dask.config.set(scheduler="threads", num_workers=4):
result = ds["image"].mean(dim=["y", "x"]).compute()
Limitations#
Current Limitations#
HCS/Plate structures not supported: Only simple multiscale images are currently supported. High Content Screening (HCS) plate/well structures will raise an informative error.
OME-NGFF v0.4+: Tested primarily with OME-NGFF versions 0.4 and 0.5 for writing (reading supports v0.1-v0.5).
Error Handling#
xarray-ome provides helpful error messages for unsupported formats:
# Attempting to open an HCS plate
try:
dt = open_ome_datatree("plate.ome.zarr")
except ValueError as e:
print(e)
# "The OME-Zarr store appears to be an HCS (High Content
# Screening) plate structure, which is not yet supported."
Examples#
Example: Multi-channel Z-stack#
from xarray_ome import open_ome_dataset
import matplotlib.pyplot as plt
# Open a multi-channel z-stack from IDR sample data
# This is a 4D image (c, z, y, x) from idr0062A
url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr"
ds = open_ome_dataset(url)
# Create maximum intensity projection for first channel
mip = ds["image"].sel(c=0).max(dim="z")
# Plot
plt.imshow(mip, cmap="gray")
plt.title("Maximum Intensity Projection - Channel 0")
plt.xlabel("X (µm)")
plt.ylabel("Y (µm)")
plt.show()
Example: Working with Multiple Resolutions#
from xarray_ome import open_ome_datatree
# Open all resolution levels from IDR sample data
url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr"
dt = open_ome_datatree(url)
# Compare resolutions
for name, child in dt.children.items():
ds = child.ds
shape = ds["image"].shape
print(f"{name}: {shape}")
# Output:
# scale0: (2, 236, 275, 271)
# scale1: (2, 118, 137, 135)
# scale2: (2, 59, 68, 67)
# Use lower resolution for quick preview
preview = dt["scale2"].ds["image"].sel(c=0, z=0).compute()
# Use high resolution for detailed analysis
high_res = dt["scale0"].ds["image"].sel(c=0, z=slice(0, 10))
Example: Remote Data Processing#
from xarray_ome import open_ome_dataset
# Open remote OME-Zarr from IDR (no download required!)
url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr"
ds = open_ome_dataset(url)
# Process a subset without downloading the full file
# Only downloads the chunks needed for this computation
result = (
ds["image"]
.sel(c=0) # Select first channel
.isel(z=slice(0, 10)) # Select first 10 z-slices
.mean(dim="z") # Average across z
.compute() # Execute computation
)
# Only the required chunks were downloaded (~few MB instead of full dataset)
print(f"Result shape: {result.shape}") # (275, 271)
print(f"Result type: {type(result.values)}") # numpy array
API Reference#
Main Functions#
- open_ome_datatree(path, validate=False)#
Open an OME-Zarr store as an xarray DataTree.
- Parameters:
path (str or pathlib.Path) – Path or URL to the OME-Zarr store
validate (bool) – Validate metadata against OME-NGFF spec
- Returns:
DataTree with multiscale pyramid
- Return type:
- Raises:
ValueError – If store is not a supported type
- open_ome_dataset(path, resolution=0, validate=False)#
Open a single resolution level as an xarray Dataset.
- Parameters:
path (str or pathlib.Path) – Path or URL to the OME-Zarr store
resolution (int) – Resolution level to open (0 is highest)
validate (bool) – Validate metadata against OME-NGFF spec
- Returns:
Dataset with physical coordinates
- Return type:
- Raises:
ValueError – If store is not a supported type or resolution not found