Skip to content

Aggregation

aggregate_to_shapes(ds, gdf, variable='DNB_BRDF-Corrected_NTL', agg_type='mean', is_valid_pct=False, valid_pct_threshold=None, geo_id_col='geonameid')

High-level helper to aggregate an xarray dataset to vector shapes.

Parameters:

Name Type Description Default
ds Dataset

The xarray Dataset containing the variable to aggregate.

required
gdf GeoDataFrame | DataFrame

The GeoDataFrame containing the vector shapes.

required
variable str

The variable name to aggregate.

'DNB_BRDF-Corrected_NTL'
agg_type Literal['mean', 'median']

Aggregation type ('mean' or 'median').

'mean'
is_valid_pct bool

Whether to calculate the percentage of non-nan pixels.

False
valid_pct_threshold Optional[float]

Percentage (0-1) below which aggregated values are set to np.nan.

None
geo_id_col str

The column in the GeoDataFrame identifying the shapes.

'geonameid'

Returns:

Type Description
Dataset

Dataset containing the aggregated spatial values.

get_agg_per_shape(ds, mask, variable, agg_type='mean', is_valid_pct=False, valid_pct_threshold=None, geo_id_col='geonameid')

Memory-safe aggregation using Dask and Zarr.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing the input variable.

required
mask Dataset

Dataset containing the shape mappings.

required
variable str

Name of the variable to aggregate.

required
agg_type Literal['mean', 'median']

Aggregation type to apply ('mean' or 'median'). Defaults to 'mean'.

'mean'
is_valid_pct bool

Whether to calculate the percentage of non-nan pixels.

False
valid_pct_threshold float | None

Percentage (0-1) below which aggregated values are set to np.nan.

None
geo_id_col str

Column name containing shape IDs.

'geonameid'

Returns:

Type Description
Dataset

Dataset containing the aggregated spatial values and optionally the percentage of valid pixels.

get_gdf_mask_for_ds(ds, gdf, geo_id_col='geonameid')

Creates a spatial mask for an xarray Dataset based on a given GeoDataFrame.

Parameters:

Name Type Description Default
ds Dataset

The reference dataset to match the spatial grid.

required
gdf GeoDataFrame | DataFrame

The GeoDataFrame containing the vector shapes.

required
geo_id_col str

The column name in the GeoDataFrame that uniquely identifies each shape.

'geonameid'

Returns:

Type Description
Dataset

A rasterized dataset mask where pixel values correspond to the geometry IDs.

get_spatial_dims(ds)

Dynamically identifiy the spatial dimensions (x/lon, y/lat) of a dataset.

Parameters:

Name Type Description Default
ds Union[Dataset, DataArray]

The input xarray Dataset or DataArray.

required

Returns:

Type Description
Tuple[str, str]

A tuple containing the names of the x and y dimensions.