GitHub - interTwin-eu/downScaleML: Climate downscaling module which acts as input to the WFlow_SBM model developed by EURAC Research, Italy for the intertwin Project

downScaleML: openEO-enabled Downscaling Pipeline for Climate Data

This repository provides an openEO-based, Docker-compatible, reproducible testing framework for downscaling Earth Observation (EO) data. It supports a modular data processing and machine learning pipeline for climate dataβ€”powered by STAC, Dask, and LightGBM. The tests are designed to validate processing and modeling logic using both public and restricted datasets.


πŸ“¦ Repository Features

  • openEO-compatible test pipelines using openeo-processes-dask
  • STAC-based data loading for ERA5, SEAS5, DEM, and EMO1 products
  • Hybrid Improved Precipitation downscaling framework using LightGBM
  • Docker-based isolated runtime with all dependencies
  • Flexible Makefile-based automation
  • AWS-secured workflows for SEAS5 datasets

πŸ“‚ Dataset Summary

Dataset Access Spatial Extent Temporal Extent Notes
ERA5 Public 2Β°E–20Β°E, 40Β°N–52Β°N (Alps region) 2000–2020 (daily) Used in open pytests
SEAS5 Requires AWS Same as ERA5 12 Inits(Aug '21 - July '22) Used in closed pytests only
EMO1, DEM Public Same as ERA5 2000–2022 used EMO1 contains downstream targets

πŸš€ Quick Start

🐳 Run with Docker (Recommended)

1. Build the Docker Image and automatically run public pytest

This will build the DockerFile and run the pre-processing and downScaling pipeline using:

/app/tests/test_downscaleml_pipeline.py

2. Run in an Interactive Shell

You’ll drop into a Docker shell with the micromamba environment pre-activated.

3. Clean Docker Containers


πŸ” Running Closed Tests (with SEAS5 data)

Some tests require access to SEAS5 data through AWS-authenticated STAC endpoints. These tests will only work if valid AWS credentials are provided.

Precondition:

Set your AWS credentials as environment variables:

export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret

Then, run:

make run TEST_FILE=/app/tests/pytest_A.py

πŸ§ͺ Test Coverage

βœ… test_downscaleml_pipeline.py

  • Loads open ERA5, DEM, and EMO1 datasets via STAC
  • Performs resampling, cube merging, and sin_cos_doy feature expansion
  • Saves output as .zarr and registers with raster2stac
  • Trains pixel-based LightGBM models for the target variable
  • Validates predictions and saves them as Zarr

πŸ” pytest_A.py (closed tests)

  • Same workflow as above, but includes SEAS5 datasets
  • Requires valid AWS credentials

πŸ› οΈ Components Used

  • downScaleML: core downscaling package
  • raster2stac: Zarr-to-STAC converter
  • openeo-processes-dask: local execution of openEO processes
  • LightGBM + scikit-learn: pixel-based regression models
  • Dask: parallel computation backend
  • Micromamba: lightweight conda environment manager

🧬 Environment Setup (outside Docker, optional)

micromamba env create -f environment.yml
micromamba activate openEO_downScaleML
pip install -r test_requirements.txt
pytest tests/test_downscaleml_pipeline.py

πŸ“ Project Structure

.
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ Makefile
β”œβ”€β”€ environment.yml
β”œβ”€β”€ test_requirements.txt
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_downscaleml_pipeline.py   # open test pipeline
β”‚   β”œβ”€β”€ pytest_A.py                    # closed test pipeline (requires AWS)
β”‚   └── ...
└── app/
    └── test_data/                     # Output and intermediate results

πŸ” Notes

  • The sin_cos_doy and raster2stac operations are registered openEO processes from openeo-processes-dask.
  • The .zarr and STAC metadata output are stored in /app/test_data/.
  • pytest_A.py and any reference to SEAS5 require valid AWS credentials.

πŸ“ License

Distributed under an open-source license aligned with interTwin project guidelines.


🀝 Acknowledgements

This work is part of the interTwin project, and integrates components from the broader openEO ecosystem.