Calculating NDVI directly from xarray DataArrays

Calculating NDVI directly from xarray DataArrays transforms traditional raster band math into a labeled, coordinate-aware operation. Instead of managing raw NumPy indices and manual axis alignment, you apply the standard formula (NIR - Red) / (NIR + Red) using declarative xarray operators. The library automatically handles spatial broadcasting, lazy execution via Dask, and metadata propagation. This eliminates silent misalignment bugs, preserves CF-compliant metadata, and scales seamlessly from single-scene processing to multi-temporal continental stacks.

Core Implementation

import xarray as xr
import numpy as np

def calculate_ndvi_xarray(data: xr.DataArray | xr.Dataset) -> xr.DataArray:
    """Compute NDVI from xarray objects with safe division and metadata preservation."""
    # Extract bands regardless of input structure
    if isinstance(data, xr.Dataset):
        red = data["red"]
        nir = data["nir"]
    else:
        # Assumes a 'band' coordinate exists with named values
        red = data.sel(band="red")
        nir = data.sel(band="nir")

    # Prevent integer wrap-around (e.g., uint16 underflow) by casting to float32
    red = red.astype("float32")
    nir = nir.astype("float32")

    # Core NDVI calculation with safe division
    denominator = nir + red
    ndvi = (nir - red) / denominator

    # Mask pixels where denominator is zero (black borders, nodata regions)
    ndvi = ndvi.where(denominator != 0, other=np.nan)
    ndvi = ndvi.clip(-1, 1)

    # Attach CF-compliant metadata
    ndvi.attrs.update({
        "long_name": "Normalized Difference Vegetation Index",
        "standard_name": "normalized_difference_vegetation_index",
        "units": "1",
        "formula": "(NIR - RED) / (NIR + RED)",
        "valid_range": [-1.0, 1.0]
    })
    return ndvi

Step-by-Step Breakdown

1. Structure-Agnostic Band Extraction

The function accepts either an xr.Dataset (where bands are separate variables) or an xr.DataArray (where bands are stacked along a coordinate dimension). Using .sel(band="red") leverages xarray’s label-based indexing, removing the need to hardcode array positions like data[0] or data[1]. This is critical when processing multi-sensor archives where band ordering varies.

2. Safe Type Casting

Raw satellite products (Sentinel-2 L2A, Landsat Collection 2) typically ship as uint16. Performing subtraction on unsigned integers causes wrap-around underflow: 50 - 100 becomes 65486 instead of -50. Casting to float32 before arithmetic prevents this while keeping memory overhead minimal compared to float64. For NDVI’s output range of -1 to 1, float32 provides more than adequate precision.

3. Division-by-Zero Handling

Satellite scenes contain padding, scan gaps, or fill pixels where both NIR and Red bands equal zero. Direct division produces NaN or Inf, which can break downstream aggregations. The .where(denominator != 0, other=np.nan) call explicitly masks these invalid pixels while preserving the original coordinate grid. The subsequent .clip(-1, 1) enforces physical bounds, catching any floating-point drift or calibration artifacts.

4. Metadata Preservation

Attaching CF-compliant attributes ensures interoperability with GIS software, NetCDF exporters, and visualization libraries. The standard_name and valid_range keys are recognized by tools like rioxarray, MetPy, and Panoply, enabling automatic colormap scaling and unit-aware plotting without manual overrides.

Performance & Production Considerations

Dask Chunking Strategy

NDVI computation is memory-light but benefits from spatial locality. When working with large archives, chunk along spatial dimensions to maximize cache locality and minimize inter-worker communication. Optimal chunk sizes for (time, y, x) layouts are typically (1, 1024, 1024). Avoid chunking along the band dimension if you plan to compute multiple indices (EVI, SAVI, NDWI) from the same stack, as it fragments the task graph and increases scheduling overhead. Refer to the official xarray Dask integration guide for chunking diagnostics and memory profiling workflows.

Coordinate Alignment & Lazy Execution

xarray aligns arrays by coordinate labels, not positional order. If your red and NIR bands originate from separate files or different processing pipelines, xarray automatically reindexes them to a common grid before arithmetic. This prevents subtle georeferencing offsets that plague NumPy-based workflows. Because operations are lazy, calling calculate_ndvi_xarray() only builds a computational graph. Actual calculation triggers when you call .compute(), .to_netcdf(), or .rio.to_raster().

Memory Footprint & Data Types

For continental-scale stacks, monitor peak memory during the .compute() phase. If OOM errors occur, reduce spatial chunk sizes or process in temporal batches. Always write outputs to float32 GeoTIFF or Zarr to maintain precision without inflating storage costs. Avoid xr.set_options(enable_cftimeindex=False) in modern xarray (>=2023.0) — CF time indexes are handled automatically.

Pipeline Integration

This pattern integrates cleanly into broader Band Math Operations with Xarray workflows. Because xarray tracks spatial coordinates (x, y, crs) and temporal dimensions natively, you can chain NDVI computation directly after STAC catalog ingestion without manual array slicing or coordinate reordering.

A typical production pipeline looks like this:

  1. Query a STAC API using pystac-client to fetch item URIs.
  2. Open with stackstac or odc-stac to build a lazy xr.DataArray.
  3. Apply calculate_ndvi_xarray() to the stacked cube.
  4. Compute temporal aggregations (.mean(dim="time"), .quantile()).
  5. Export to cloud-optimized GeoTIFF or Zarr.

By keeping operations lazy until the final export step, you avoid materializing intermediate arrays in RAM. This approach scales to petabyte-scale archives when paired with distributed Dask clusters or cloud-native storage backends. For foundational concepts on coordinate systems, raster I/O, and STAC metadata mapping, review the Core Raster Fundamentals & STAC Mapping reference guide.