Skip to content

Pipeline

Orchestrate the sequential execution of NTL transformations.

aggregated_ds property

Returns the cached aggregated datasets.

preprocessed_ds property

Returns the final preprocessed dataset.

__init__(steps)

Initialize the pipeline with a sequence of processing steps.

Parameters:

Name Type Description Default
steps

A list of instantiated processing step objects.

required

aggregate(gdf, geo_id_col='geonameid', is_valid_pct=False, compute=False)

Aggregate cached intermediate datasets over the provided geometries.

Parameters:

Name Type Description Default
gdf GeoDataFrame | DataFrame

GeoDataFrame containing the regions to aggregate over.

required
geo_id_col str

The column name in the GeoDataFrame that uniquely identifies each shape.

'geonameid'
is_valid_pct bool

Whether to calculate the percentage of non-nan pixels. Defaults to False.

False
compute bool

If True, evaluates the Dask computation graph for all aggregated datasets and pulls the results into memory. Defaults to False.

False

Returns:

Type Description
List[Dataset]

A list of datasets, where each dataset contains the spatial aggregation

List[Dataset]

for a specific processing step. The step name is stored in the

List[Dataset]

.attrs['step'] of each returned dataset.

plot(variable='ntl', indexers=None, start_date=None, end_date=None, titles=None, y_max=None, title=None, font_scale=1.0, moving_average=None)

Plots the aggregated time-series data using the visualization module. Must run aggregate() before calling this method.

Parameters:

Name Type Description Default
variable str

The variable name to plot (default "ntl").

'ntl'
indexers dict | None

Optional dictionary of dimension coordinates to select specific data (e.g. {"geonameid": "Region_A"} or {"x": 10.5, "y": 20.1}).

None
start_date str | None

Optional start date string (e.g., '2022-01-01').

None
end_date str | None

Optional end date string (e.g., '2022-12-31').

None
titles list[str] | None

Optional list of titles corresponding to each subplot.

None
y_max float | None

Optional maximum limit for the y-axis.

None
title str | None

Optional string for the overall figure title.

None
font_scale float

The scaling factor for the plot's font sizes.

1.0
moving_average int | None

Optional moving average window size (days).

None

Returns:

Type Description

A matplotlib Figure.

run(ds, catalog=None, cache_intermediates=False)

Execute the pipeline sequentially over the input dataset.

Parameters:

Name Type Description Default
ds Dataset

The primary dataset to process.

required
catalog Optional[dict]

Dictionary containing auxiliary datasets.

None
cache_intermediates bool

If True, caches intermediate datasets for later aggregation.

False

Returns:

Type Description
Dataset

The fully processed dataset containing the execution history.

to_csv(file_path=None)

Convert the aggregated pipeline results into a long-format pandas DataFrame and optionally save it to a CSV file.

Parameters:

Name Type Description Default
file_path Optional[str | Path]

Optional path to save the CSV. If None, just return the DataFrame.

None

Returns:

Type Description
DataFrame

A pandas DataFrame containing the aggregated data across all regions

DataFrame

and preprocessing steps.