downScaleML: openEO-enabled Downscaling Pipeline for Climate Data
This repository provides an openEO-based, Docker-compatible, reproducible testing framework for downscaling Earth Observation (EO) data. It supports a modular data processing and machine learning pipeline for climate dataβpowered by STAC, Dask, and LightGBM. The tests are designed to validate processing and modeling logic using both public and restricted datasets.
π¦ Repository Features
- openEO-compatible test pipelines using openeo-processes-dask
- STAC-based data loading for ERA5, SEAS5, DEM, and EMO1 products
- Hybrid Improved Precipitation downscaling framework using LightGBM
- Docker-based isolated runtime with all dependencies
- Flexible Makefile-based automation
- AWS-secured workflows for SEAS5 datasets
π Dataset Summary
| Dataset | Access | Spatial Extent | Temporal Extent | Notes |
|---|---|---|---|---|
| ERA5 | Public | 2Β°Eβ20Β°E, 40Β°Nβ52Β°N (Alps region) | 2000β2020 (daily) | Used in open pytests |
| SEAS5 | Requires AWS | Same as ERA5 | 12 Inits(Aug '21 - July '22) | Used in closed pytests only |
| EMO1, DEM | Public | Same as ERA5 | 2000β2022 used | EMO1 contains downstream targets |
π Quick Start
π³ Run with Docker (Recommended)
1. Build the Docker Image and automatically run public pytest
This will build the DockerFile and run the pre-processing and downScaling pipeline using:
/app/tests/test_downscaleml_pipeline.py
2. Run in an Interactive Shell
Youβll drop into a Docker shell with the micromamba environment pre-activated.
3. Clean Docker Containers
π Running Closed Tests (with SEAS5 data)
Some tests require access to SEAS5 data through AWS-authenticated STAC endpoints. These tests will only work if valid AWS credentials are provided.
Precondition:
Set your AWS credentials as environment variables:
export AWS_ACCESS_KEY_ID=your_key export AWS_SECRET_ACCESS_KEY=your_secret
Then, run:
make run TEST_FILE=/app/tests/pytest_A.py
π§ͺ Test Coverage
β
test_downscaleml_pipeline.py
- Loads open ERA5, DEM, and EMO1 datasets via STAC
- Performs resampling, cube merging, and
sin_cos_doyfeature expansion - Saves output as
.zarrand registers with raster2stac - Trains pixel-based LightGBM models for the target variable
- Validates predictions and saves them as Zarr
π pytest_A.py (closed tests)
- Same workflow as above, but includes SEAS5 datasets
- Requires valid AWS credentials
π οΈ Components Used
- downScaleML: core downscaling package
- raster2stac: Zarr-to-STAC converter
- openeo-processes-dask: local execution of openEO processes
- LightGBM + scikit-learn: pixel-based regression models
- Dask: parallel computation backend
- Micromamba: lightweight conda environment manager
𧬠Environment Setup (outside Docker, optional)
micromamba env create -f environment.yml micromamba activate openEO_downScaleML pip install -r test_requirements.txt pytest tests/test_downscaleml_pipeline.py
π Project Structure
.
βββ Dockerfile
βββ Makefile
βββ environment.yml
βββ test_requirements.txt
βββ tests/
β βββ test_downscaleml_pipeline.py # open test pipeline
β βββ pytest_A.py # closed test pipeline (requires AWS)
β βββ ...
βββ app/
βββ test_data/ # Output and intermediate results
π Notes
- The
sin_cos_doyandraster2stacoperations are registered openEO processes fromopeneo-processes-dask. - The
.zarrand STAC metadata output are stored in/app/test_data/. pytest_A.pyand any reference to SEAS5 require valid AWS credentials.
π License
Distributed under an open-source license aligned with interTwin project guidelines.
π€ Acknowledgements
This work is part of the interTwin project, and integrates components from the broader openEO ecosystem.