Exporting¶

While the blackmarble-ntl-toolkit uses xarray internally to manage multi-dimensional datasets, many downstream workflows in data science and economics rely on tabular data structures. Because some users might prefer pandas (in Python) or R for downstream statistical analysis and plotting, we provide a convenient option to convert your aggregated results directly into a CSV file or a pandas DataFrame.

Exporting to CSV¶

Once you have run the .aggregate() method on your NTLPipeline, you can easily export the results using the .to_csv() method.

import geopandas as gpd
from blackmarble_toolkit.pipeline import NTLPipeline

pipeline = NTLPipeline(steps)
pipeline.run(ds=raw_ds, cache_intermediates=True) # (1)!

regions = gpd.read_file("path/to/regions.geojson")
regions['numeric_id'] = range(len(regions))

pipeline.aggregate(gdf=regions, geo_id_col="numeric_id") # (2)!

pipeline.to_csv("ntl_results.csv") # (3)!

Run pipeline and optionally cache intermediates
Aggregate the data over regions
Save directly to a CSV file

Understanding the Output¶

The to_csv() method automatically converts the multi-dimensional dataset into a long-format tabular structure. It tracks the chronological order of your preprocessing pipeline so you can easily compare the effects of different filters.

The output will contain the following columns:

time: The date of the observation.
<geo_id_col>: The identifier for the vector shape (whatever you passed to aggregate()).
step_index: The numerical sequence of the preprocessing step (e.g., 0 for Raw, 1 for the first filter).
step: The string name of the preprocessing step applied.
ntl: The aggregated Nighttime Light radiance value.
valid_pct: The percentage of valid pixels for the region (if you enabled is_valid_pct=True during aggregation).

Returning a DataFrame¶

If you don't provide a file_path, the .to_csv() method will simply return the pandas.DataFrame in memory. This is particularly useful if you want to immediately feed the data into libraries like Seaborn, Statsmodels, or scikit-learn without touching the disk.

import seaborn as sns
import matplotlib.pyplot as plt

df = pipeline.to_csv() # (1)!

final_step_df = df[df['step'] == 'LinearInterpolationGapFilling'] # (2)!

sns.lineplot(data=final_step_df, x='time', y='ntl', hue='numeric_id') # (3)!
plt.show()

Get the dataframe without saving it to disk
Filter for the final processing step
Plot the timeseries using seaborn