SHISO is a method for mining log formats and retrieving log types and parameters in an online manner. By creating a structured tree using the nodes generated from log messages, SHISO refines log format continuously in realtime. We implemented SHISO using Python with a standard interface for benchmarking purpose.
Read more information about SHISO from the following paper:
- Masayoshi Mizutani. Incremental Mining of System Log Format, IEEE International Conference on Services Computing (SCC), 2013.
Running
The code has been tested in the following enviornment:
- python 3.7.6
- regex 2022.3.2
- pandas 1.0.1
- numpy 1.18.1
- scipy 1.4.1
- nltk 3.4.5
Run the following script to start the demo:
Run the following script to execute the benchmark:
Benchmark
Running the benchmark script on Loghub_2k datasets, you could obtain the following results.
| Dataset | F1_measure | Accuracy |
|---|---|---|
| HDFS | 0.999984 | 0.9975 |
| Hadoop | 0.997513 | 0.867 |
| Spark | 0.991526 | 0.906 |
| Zookeeper | 0.993337 | 0.66 |
| BGL | 0.99445 | 0.711 |
| HPC | 0.541336 | 0.3245 |
| Thunderbird | 0.911185 | 0.576 |
| Windows | 0.912983 | 0.7005 |
| Linux | 0.975457 | 0.6715 |
| Android | 0.843701 | 0.585 |
| HealthApp | 0.842471 | 0.397 |
| Apache | 1 | 1 |
| Proxifier | 0.77964 | 0.5165 |
| OpenSSH | 0.997639 | 0.619 |
| OpenStack | 0.993697 | 0.7215 |
| Mac | 0.959845 | 0.595 |
Citation
🔭 If you use our logparser tools or benchmarking results in your publication, please kindly cite the following papers.
- [ICSE'19] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. Tools and Benchmarks for Automated Log Parsing. International Conference on Software Engineering (ICSE), 2019.
- [DSN'16] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. An Evaluation Study on Log Parsing and Its Use in Log Mining. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2016.