Pipeline¶

Orchestrate the sequential execution of NTL transformations.

`aggregated_ds` `property` ¶

Returns the cached aggregated datasets.

Returns the final preprocessed dataset.

Initialize the pipeline with a sequence of processing steps.

Parameters:

Name	Type	Description	Default
`steps`	`Optional[List[Any]]`	A list of instantiated processing step objects. Defaults to None.	`None`

Aggregate cached intermediate datasets over the provided geometries.

Parameters:

Name	Type	Description	Default
`gdf`	`GeoDataFrame \| DataFrame`	GeoDataFrame containing the regions to aggregate over.	required
`geo_id_col`	`str`	The column name in the GeoDataFrame that uniquely identifies each shape.	`'geonameid'`
`is_valid_pct`	`bool`	Whether to calculate the percentage of non-nan pixels. Defaults to False.	`False`
`compute`	`bool`	If True, evaluates the Dask computation graph for all aggregated datasets and pulls the results into memory. Defaults to False.	`False`

Returns:

Type	Description
`List[Dataset]`	A list of datasets, where each dataset contains the spatial aggregation
`List[Dataset]`	for a specific processing step. The step name is stored in the
`List[Dataset]`	`.attrs['step']` of each returned dataset.

Plots the aggregated time-series data using the visualization module. Must run aggregate() before calling this method.

Parameters:

Name	Type	Description	Default
`variable`	`str`	The variable name to plot (default "ntl").	`'ntl'`
`indexers`	`dict \| None`	Optional dictionary of dimension coordinates to select specific data (e.g. `{"geonameid": "Region_A"}` or `{"x": 10.5, "y": 20.1}`).	`None`
`start_date`	`str \| None`	Optional start date string (e.g., '2022-01-01').	`None`
`end_date`	`str \| None`	Optional end date string (e.g., '2022-12-31').	`None`
`titles`	`list[str] \| None`	Optional list of titles corresponding to each subplot.	`None`
`y_max`	`float \| None`	Optional maximum limit for the y-axis.	`None`
`title`	`str \| None`	Optional string for the overall figure title.	`None`
`font_scale`	`float`	The scaling factor for the plot's font sizes.	`1.0`
`moving_average`	`int \| None`	Optional moving average window size (days).	`None`

Returns:

Type	Description
	A matplotlib Figure.

Execute the pipeline sequentially over the input dataset.

Parameters:

Name	Type	Description	Default
`ds`	`Dataset`	The primary dataset to process.	required
`catalog`	`Optional[dict]`	Dictionary containing auxiliary datasets.	`None`
`cache_intermediates`	`bool`	If True, caches intermediate datasets for later aggregation.	`False`

Returns:

Type	Description
`Dataset`	The fully processed dataset containing the execution history.

Convert the aggregated pipeline results into a long-format pandas DataFrame and optionally save it to a CSV file.

Parameters:

Name	Type	Description	Default
`file_path`	`Optional[str \| Path]`	Optional path to save the CSV. If None, just return the DataFrame.	`None`

Returns:

Type	Description
`DataFrame`	A pandas DataFrame containing the aggregated data across all regions
`DataFrame`	and preprocessing steps.