Pipeline¶
Orchestrate the sequential execution of NTL transformations.
aggregated_ds
property
¶
Returns the cached aggregated datasets.
preprocessed_ds
property
¶
Returns the final preprocessed dataset.
__init__(steps)
¶
Initialize the pipeline with a sequence of processing steps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
steps
|
A list of instantiated processing step objects. |
required |
aggregate(gdf, geo_id_col='geonameid', is_valid_pct=False, compute=False)
¶
Aggregate cached intermediate datasets over the provided geometries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gdf
|
GeoDataFrame | DataFrame
|
GeoDataFrame containing the regions to aggregate over. |
required |
geo_id_col
|
str
|
The column name in the GeoDataFrame that uniquely identifies each shape. |
'geonameid'
|
is_valid_pct
|
bool
|
Whether to calculate the percentage of non-nan pixels. Defaults to False. |
False
|
compute
|
bool
|
If True, evaluates the Dask computation graph for all aggregated datasets and pulls the results into memory. Defaults to False. |
False
|
Returns:
| Type | Description |
|---|---|
List[Dataset]
|
A list of datasets, where each dataset contains the spatial aggregation |
List[Dataset]
|
for a specific processing step. The step name is stored in the |
List[Dataset]
|
|
plot(variable='ntl', indexers=None, start_date=None, end_date=None, titles=None, y_max=None, title=None, font_scale=1.0, moving_average=None)
¶
Plots the aggregated time-series data using the visualization module.
Must run aggregate() before calling this method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variable
|
str
|
The variable name to plot (default "ntl"). |
'ntl'
|
indexers
|
dict | None
|
Optional dictionary of dimension coordinates to select specific data (e.g. |
None
|
start_date
|
str | None
|
Optional start date string (e.g., '2022-01-01'). |
None
|
end_date
|
str | None
|
Optional end date string (e.g., '2022-12-31'). |
None
|
titles
|
list[str] | None
|
Optional list of titles corresponding to each subplot. |
None
|
y_max
|
float | None
|
Optional maximum limit for the y-axis. |
None
|
title
|
str | None
|
Optional string for the overall figure title. |
None
|
font_scale
|
float
|
The scaling factor for the plot's font sizes. |
1.0
|
moving_average
|
int | None
|
Optional moving average window size (days). |
None
|
Returns:
| Type | Description |
|---|---|
|
A matplotlib Figure. |
run(ds, catalog=None, cache_intermediates=False)
¶
Execute the pipeline sequentially over the input dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
The primary dataset to process. |
required |
catalog
|
Optional[dict]
|
Dictionary containing auxiliary datasets. |
None
|
cache_intermediates
|
bool
|
If True, caches intermediate datasets for later aggregation. |
False
|
Returns:
| Type | Description |
|---|---|
Dataset
|
The fully processed dataset containing the execution history. |
to_csv(file_path=None)
¶
Convert the aggregated pipeline results into a long-format pandas DataFrame and optionally save it to a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
Optional[str | Path]
|
Optional path to save the CSV. If None, just return the DataFrame. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A pandas DataFrame containing the aggregated data across all regions |
DataFrame
|
and preprocessing steps. |