DeepGraphGenerator
Awesome Deep Graph Generator
Source codes implementation of papers:
BTGAE: Divide and Conquer: A Topological Heterogeneity-based Framework for Scalable and Realistic Graph Generation. (Official PyTorch Implementation)VRDAG: Efficient Dynamic Attributed Graph Generation, in ICDE 2025.TGAE: Efficient Learning-based Graph Simulation for Temporal Graphs, in ICDE 2025.CPGAE: Efficient Learning-based Community-Preserving Graph Generation, in ICDE 2022. (GAE version of CPGAN)
Implementation of baselines:
GraphRNN: GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, in ICML 2018.
Usage
Data processing
- Run
python experiment/preprocess.pyto process a graph into a sparse matrix and save the matrix as an.npzfile or.pklfile.
Training
To test implementations of the methods, run
python main.py --method BTGAE python main.py --method VRDAG python main.py --method TGAE python main.py --method CPGAE python main.py --method GraphRNN python main.py --method GraphRNN-S
All predefined configurations are in config/. Dynamically override parameters using:
python train.py --method <name> --update <key1>=<value1> <key2>=<value2> ...
Key values shall exactly match (case-sensitive) the corresponding parameters defined in configuration files.
Examples:
python main.py --method BTGAE --update data=cora epochs=150 learning_rate=3e-3 python main.py --method TGAE --update epochs=100 lr=1e-3
Evaluaion
The evaluation tools are located in the experiment/graph_metrics directory. For usage instructions, see README.md.
Data Description
The following datasets are mainly from linqs and snap.
| Data | #Nodes | #Edges | $d_{mean}$ | GINI | PWE |
|---|---|---|---|---|---|
| citeseer | 3,327 | 4,732 | 2.774 | 0.435 | 2.420 |
| cora | 2,708 | 5,429 | 3.898 | 0.405 | 1.932 |
| pubmed | 19,717 | 44,338 | 4.496 | 0.604 | 2.176 |
| Epinions | 75,879 | 508,837 | 10.694 | 0.805 | 2.026 |
| 875,713 | 5,105,039 | 9.871 | 0.587 | 1.617 | |
| YelpChi | 45,954 | 3,846,979 | 167.427 | 0.322 | 1.205 |
$d_{mean}$: mean degree.
GINI: GINI index, which is a common measure for inequality in a degree distribution.
PWE: power-law exponent.
The following temporal network datasets are from snap and Network Repository. For more information, please refer to the VRDAG paper.
| Data | #Nodes | #Edges | T |
|---|---|---|---|
| Emails-DNC | 1,891 | 39,264 | 14 |
| Bitcoin-Alpha | 3,783 | 24,186 | 37 |
| Wiki-Vote | 7,115 | 103,689 | 43 |
Test Result
The performance of models tested on static graph datasets are listed as follows:
| Dataset | Method | Deg. dist. | Clus. dist. | Wedge count | Triangle count | LCC | PLE | Gini | Clus. coef. |
|---|---|---|---|---|---|---|---|---|---|
| Citeseer | CPGAE | 0.0035 | 0.0124 | 0.0812 | 0.0274 | 0.066 | 0.003 | 0.0026 | 0.1182 |
| BTGAE | 0.0029 | 0.0119 | 0.0402 | 0.335 | 0.0085 | 0.0711 | 0.0208 | 0.2473 | |
| Cora | CPGAE | 0.004 | 0.0088 | 0.1333 | 0.056 | 0.0193 | 0.0026 | 0.0252 | 0.2161 |
| BTGAE | 0.0061 | 0.0078 | 0.0026 | 0.054 | 0.0129 | 0.0028 | 0.0113 | 0.0376 | |
| Pubmed | CPGAE | 0.0156 | 0.0144 | 0.423 | 0.2835 | 0 | 0.0807 | 0.0994 | 1.2245 |
| BTGAE | 0.0164 | 0.0093 | 0.227 | 0.1541 | 0.0512 | 0.1233 | 0.0737 | 0.0307 | |
| Epinions | CPGAE | 0.0175 | 0.0362 | 0.6739 | 0.3723 | 0 | 0.1768 | 0.1058 | 0.9246 |
| BTGAE | 0.0189 | 0.0265 | 3.3051 | 3.3247 | 0.0538 | 0.1394 | 0.0597 | 0.0104 | |
| YelpChi | CPGAE | 0.0236 | 0.02 | 0.1487 | 0.133 | 0.0012 | 0.0416 | 0.3062 | 0.0186 |
| BTGAE | 0.0286 | 0.0396 | 0.0361 | 0.6578 | 0.0057 | 0.0037 | 0.1709 | 0.6429 |
The second table shows the test results of dynamic graph generation methods.
| Dataset | Method | Deg. dist. | Clus. dist. | Wedge count | Triangle count | LCC | PLE | Gini | Clus. coef. |
|---|---|---|---|---|---|---|---|---|---|
| Emails-DNC | VRDAG | 0.0084 | 0.0473 | 0.8118 | 0.8705 | 1.5973 | 0.0735 | 0.0837 | 0.2465 |
| TGAE | 0.0027 | 0.0144 | 0.4768 | 0.4273 | 0.2105 | 0.1353 | 0.0199 | 0.2641 | |
| Bitcoin-Alpha | VRDAG | 0.0052 | 0.0121 | 0.8572 | 0.7422 | 0.8909 | 0.1263 | 0.0652 | 0.9037 |
| TGAE | 0.0019 | 0.0035 | 0.3014 | 0.3636 | 0.2712 | 0.2009 | 0.0096 | 0.5274 | |
| Wiki-Vote | VRDAG | 0.0404 | 0.0656 | 4.2868 | 0.8584 | 12.2631 | 0.1463 | 0.3611 | 0.9466 |
| TGAE | 0.0037 | 0.0075 | 0.2768 | 0.2843 | 0.1786 | 0.4262 | 0.0024 | 0.1824 |
The evaluation metrics used in these tests are as follows:
-
Deg. dist.: It measures the degree distribution similarity between the generated graph and the real graph using Maximum Mean Discrepancy (MMD). A smaller MMD value indicates a closer degree distribution.
-
Clus. dist.: This metric focuses on the clustering coefficient distribution similarity using MMD.
-
Wedge count: It counts the number of wedges in the graph.
-
Triangle count: The number of triangles in the graph is counted.
-
LCC: It represents the size of the largest connected component in the graph.
-
PLE: It is the power-law exponent associated with the degree distribution of the graph.
-
Gini: GINI index, which is a common measure for inequality in a degree distribution.
-
Clus. coef.: The global clustering coefficient of the graph.
For temporal graphs, given a metric $f_m(\cdot)$, the real graph $\widetilde{G}$, and the synthetic one $\widetilde{G^{\prime}}$, we construct a sequence of snapshots $\widetilde{S}^t$ ($\widetilde{S^{\prime}}^t$), $t = 1, ...,T$, of $\widetilde{G}$ ($\widetilde{G^{\prime}}$) by aggregating edges from the initial timestamp to the current timestamp $t$. Then, we measure the average difference (in percentage) of the given metric $f_m(\cdot)$ between two graphs as follows:
$$ f_{avg}(\widetilde{G},\widetilde{G^{\prime}},f_m)=Mean_{t=1:T}(|\frac{f_m(\widetilde{S}^t)-f_m(\widetilde{S^{\prime}}^t)}{f_m(\widetilde{S}^t)}|) $$
Repo Structure
The repository is organized as follows:
main.py: organize all models.experiment/: preprocessing and evaluation.methods/: implementations of models.config/: configuration files for different models.models/: the checkpoints or the trained models for each method.data/: dataset files.requirements.txt: package dependencies.
Requirements
torch 2.3.1+cu121
networkx 2.8
scipy 1.14.1
scikit-learn 1.5.2
numpy 1.26.4
community 1.0.0b1
python-louvain 0.16
dgl 2.4.0+cu121
Contributors :
Citing
If you find GraphGenerator is useful for your research, please consider citing the following papers:
@inproceedings{li2025efficient, title={Efficient dynamic attributed graph generation}, author={Li, Fan and Wang, Xiaoyang and Cheng, Dawei and Chen, Cong and Zhang, Ying and Lin, Xuemin}, booktitle={2025 IEEE 41st International Conference on Data Engineering (ICDE)}, pages={1415--1428}, year={2025}, organization={IEEE} } @inproceedings{xiang2025efficient, title={Efficient learning-based graph simulation for temporal graphs}, author={Xiang, Sheng and Xu, Chenhao and Cheng, Dawei and Wang, Xiaoyang and Zhang, Ying}, booktitle={2025 IEEE 41st International Conference on Data Engineering (ICDE)}, pages={251--264}, year={2025}, organization={IEEE} } @inproceedings{xiang2022efficient, title={Efficient learning-based community-preserving graph generation}, author={Xiang, Sheng and Cheng, Dawei and Zhang, Jianfu and Ma, Zhenwei and Wang, Xiaoyang and Zhang, Ying}, booktitle={2022 IEEE 38th International Conference on Data Engineering (ICDE)}, pages={1982--1994}, year={2022}, organization={IEEE} } @article{xiang2022general, title={General graph generators: experiments, analyses, and improvements}, author={Xiang, Sheng and Wen, Dong and Cheng, Dawei and Zhang, Ying and Qin, Lu and Qian, Zhengping and Lin, Xuemin}, journal={The VLDB Journal}, pages={1--29}, year={2022}, publisher={Springer} }