OME-NGFF to xarray Mapping#

This page documents how each component of the OME-NGFF metadata specification is mapped to xarray data structures.

Design Philosophy#

Core Principle: Metadata that can be represented in xarray’s native data model (coordinates, dimension names) is stored there, not duplicated in attrs.

  • xarray coordinates: Represent actual data (physical positions, channel labels)

  • xarray dimensions: Represent axis names

  • xarray attrs: Store metadata that has no native xarray representation

This design ensures:

  • Natural xarray workflows (.sel(), .isel(), coordinate-based indexing)

  • No redundancy between coords and attrs

  • Full round-trip fidelity via preserved metadata dict


Metadata Mapping Reference#

axes#

OME-NGFF Spec: §2.1 Axes

OME-NGFF axes metadata describes the dimensions of the array.

OME-NGFF Field

xarray Location

Notes

axes[].name

Dataset.dims

Axis names become dimension names (e.g., ['c', 'z', 'y', 'x'])

axes[].type

attrs['ome_axes_types']

List of types (e.g., ['channel', 'space', 'space', 'space'])

axes[].unit

attrs['ome_axes_units']

Dict mapping axis name to unit (e.g., {'z': 'micrometer'})

Example:

ds = xr.open_dataset("image.ome.zarr", engine="ome-zarr")

# Axis names → dimensions
print(ds.dims)  # {'c': 2, 'z': 236, 'y': 275, 'x': 271}

# Axis types → attrs
print(ds.attrs['ome_axes_types'])  # ['channel', 'space', 'space', 'space']

# Axis units → attrs
print(ds.attrs['ome_axes_units'])  # {'z': 'micrometer', 'y': 'micrometer', 'x': 'micrometer'}

coordinateTransformations#

OME-NGFF Spec: §2.3 Coordinate Transformations

Coordinate transformations define the mapping from array indices to physical coordinates.

OME-NGFF Field

xarray Location

Notes

scale transformation

Dataset.coords

Converted to coordinate arrays via translation + scale * arange(size)

translation transformation

Dataset.coords

Offset applied to coordinate arrays

Original values

attrs['ome_scale'], attrs['ome_translation']

Preserved for efficient round-tripping

Example:

# OME-NGFF metadata:
# scale = {'z': 0.5, 'y': 0.36, 'x': 0.36}
# translation = {'z': 0.0, 'y': 0.0, 'x': 0.0}

ds = xr.open_dataset("image.ome.zarr", engine="ome-zarr")

# Coordinates derived from transforms
print(ds.coords['z'].values[:3])  # [0.0, 0.5, 1.0] (0 + 0.5 * [0,1,2,...])
print(ds.coords['y'].values[:3])  # [0.0, 0.36, 0.72]

# Original transforms preserved
print(ds.attrs['ome_scale'])  # {'z': 0.5, 'y': 0.36, 'x': 0.36}

Round-trip:

When writing, coords_to_transforms() extracts scale and translation from coordinates, or uses stored values for exact fidelity.


multiscales#

OME-NGFF Spec: §2.4 Multiscales

Multiscales metadata describes the image pyramid structure.

OME-NGFF Field

xarray Location

Notes

name

attrs['ome_name']

Image identifier

version

attrs['ome_version']

OME-NGFF spec version

type

Not currently mapped

Downscaling method

metadata

Not currently mapped

Additional downscaling info

datasets[].path

attrs['ome_multiscale_paths']

List of resolution paths (e.g., ['0', '1', '2'])

Number of datasets

attrs['ome_num_resolutions']

Count of resolution levels

coordinateTransformations

Dataset.coords (per dataset)

Applied per resolution level

Example:

dt = xr.open_datatree("image.ome.zarr", engine="ome-zarr")

# Multiscale info in root attrs
print(dt.attrs['ome_name'])  # 'image'
print(dt.attrs['ome_version'])  # '0.4'
print(dt.attrs['ome_num_resolutions'])  # 3
print(dt.attrs['ome_multiscale_paths'])  # ['0', '1', '2']

# Each resolution as a separate DataTree node
print(list(dt.children.keys()))  # ['scale0', 'scale1', 'scale2']

omero#

OME-NGFF Spec: §2.5 OMERO Metadata (Transitional)

OMERO metadata provides channel information and rendering settings.

OME-NGFF Field

xarray Location

Notes

omero.channels[].label

Dataset.coords[‘c’]

Channel labels as coordinate values (string dtype)

omero.channels[].color

attrs['ome_channel_colors']

List of hex color codes (e.g., ['0000FF', 'FFFF00'])

omero.channels[].window

attrs['ome_channel_windows']

List of rendering window dicts

Other OMERO fields

attrs['ome_ngff_metadata']['omero']

Preserved in full metadata dict

Example:

ds = xr.open_dataset("image.ome.zarr", engine="ome-zarr")

# Channel labels → coordinates (PRIMARY LOCATION)
print(ds.coords['c'].values)  # array(['LaminB1', 'Dapi'], dtype='<U7')
print(ds.coords['c'].dtype)   # dtype('<U7')  (Unicode string)

# Select by channel name
lamin_data = ds.sel(c='LaminB1')

# Channel colors → attrs
print(ds.attrs['ome_channel_colors'])  # ['0000FF', 'FFFF00']

# Rendering windows → attrs
print(ds.attrs['ome_channel_windows'][0])
# {'min': 0.0, 'max': 65535.0, 'start': 0.0, 'end': 1500.0}

Why channel labels are coordinates:

Channel labels represent actual data dimensions, making them perfect for xarray coordinates:

# Coordinate-based selection (natural xarray API)
ds.sel(c='DAPI')

# Coordinate-based filtering
ds.where(ds.c.isin(['DAPI', 'GFP']), drop=True)

# Coordinate-based iteration
for channel in ds.coords['c'].values:
    process(ds.sel(c=channel))

labels#

OME-NGFF Spec: §2.6 Labels

Label images are not yet supported. When implemented, they will likely be stored as separate DataArrays or referenced paths.


plate / well#

OME-NGFF Spec: §2.7 Plate | §2.8 Well

HCS (High Content Screening) plate structures are not yet supported. See TODO.md in the project root for implementation notes.


Complete Attribute Reference#

Common Attributes (DataTree & Dataset)#

These attributes are present in both DataTree root nodes and individual Datasets:

attrs = {
    # Basic metadata
    'ome_name': 'image',                    # Image name
    'ome_version': '0.4',                   # OME-NGFF version

    # Axes information
    'ome_axes_types': ['channel', 'space', ...],  # Axis types
    'ome_axes_units': {'z': 'micrometer', ...},   # Physical units (optional)
    'ome_axes_orientations': {...},               # Anatomical orientations (optional)

    # Multiscale info
    'ome_num_resolutions': 3,                     # Number of pyramid levels
    'ome_multiscale_paths': ['0', '1', '2'],      # Resolution paths

    # Channel metadata (if channels present)
    'ome_channel_colors': ['0000FF', 'FFFF00'],   # Hex colors
    'ome_channel_windows': [{...}, {...}],        # Rendering windows

    # Complete metadata for round-tripping
    'ome_ngff_metadata': {...},                   # Full OME-NGFF metadata dict
}

Dataset-Only Attributes#

Datasets also contain coordinate transformation info:

attrs = {
    # Coordinate transforms (for efficient round-trip)
    'ome_scale': {'c': 1.0, 'z': 0.5, ...},       # Scale factors
    'ome_translation': {'c': 0.0, 'z': 0.0, ...}, # Translation offsets

    # Resolution level (only in open_ome_dataset())
    'ome_ngff_resolution': 0,                     # Resolution index
}

Round-Trip Fidelity#

All metadata is preserved for perfect round-tripping:

# Read
ds = xr.open_dataset("input.ome.zarr", engine="ome-zarr")

# Modify data (coords/attrs preserved automatically)
ds_modified = ds * 2

# Write - metadata reconstructed from coords + attrs
from xarray_ome import write_ome_dataset
write_ome_dataset(ds_modified, "output.ome.zarr")

# Verify
ds2 = xr.open_dataset("output.ome.zarr", engine="ome-zarr")
assert ds2.attrs['ome_ngff_metadata'] == ds.attrs['ome_ngff_metadata']

The full OME-NGFF metadata dict is always preserved in attrs['ome_ngff_metadata'], ensuring that even unknown or future metadata fields survive round-tripping.


Implementation Details#

Conversion Functions#

Reading: OME-NGFF → xarray#

  • metadata_to_xarray_attrs(): Extracts non-coordinate metadata to attrs

  • transforms_to_coords(): Converts scale/translation to coordinate arrays

  • _extract_channel_labels(): Gets channel labels from omero.channels

Writing: xarray → OME-NGFF#

  • xarray_to_metadata(): Reconstructs OME-NGFF metadata from attrs

  • coords_to_transforms(): Extracts scale/translation from coordinates

See xarray_ome/metadata.py and xarray_ome/transforms.py for implementation.


Version Support#

Reading: All OME-NGFF versions (v0.1 - v0.5) via ngff-zarr

Writing: v0.4 and v0.5 (current standard versions)

Version information is preserved in attrs['ome_version'].