GitHub - GIS-PuppetMaster/DistJoin: DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation

GitHub - GIS-PuppetMaster/DistJoin: DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation

This is the source code of paper DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation

Env setup

Install python3.12
Install required packages in requirements.txt python install -r requirements.txt
Install our sampler package that response for generating training data dynamically during training python install ./MySampler/setup.py install

Prepare Dataset

Put the JOB datasets into ./datasets/job, all csv table should have headers
You can update the true cards by first removing all {wokload}.pkl file in ./queries and run the ./queries/ConvertMSCNTestWorkload.py, which will automatically calculates true cards and convert the test workloads to MSCN's format
Use ./queries/GetJoinWithoutPredicatesCard.py to pre-calculates the cardinality of queries' join schemas if needed

Setup experiments

Use ./Configs/IMDB/IMDB.yaml to set experiments, or you can use the default one to perform our experiments in the paper

Train DistJoin

Run python train.py
Copy the exp mark in the output for latter testing, which is a timestamp

Test DistJoin

Run python eval-IMDB-all.py --config=IMDB --no_wandb and enter the exp mark to evaluate the workloads configurated in the IMDB.yaml file, it will cover all five join conditions on that workload
Check the results in the output and the ./results/DistJoin