This is the source code of paper DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation
Env setup
- Install python3.12
- Install required packages in requirements.txt
python install -r requirements.txt - Install our sampler package that response for generating training data dynamically during training
python install ./MySampler/setup.py install
Prepare Dataset
- Put the JOB datasets into
./datasets/job, all csv table should have headers - You can update the true cards by first removing all
{wokload}.pklfile in./queriesand run the./queries/ConvertMSCNTestWorkload.py, which will automatically calculates true cards and convert the test workloads to MSCN's format - Use
./queries/GetJoinWithoutPredicatesCard.pyto pre-calculates the cardinality of queries' join schemas if needed
Setup experiments
- Use
./Configs/IMDB/IMDB.yamlto set experiments, or you can use the default one to perform our experiments in the paper
Train DistJoin
- Run
python train.py - Copy the
exp markin the output for latter testing, which is a timestamp
Test DistJoin
- Run
python eval-IMDB-all.py --config=IMDB --no_wandband enter theexp markto evaluate the workloads configurated in theIMDB.yamlfile, it will cover all five join conditions on that workload - Check the results in the output and the ./results/DistJoin