GitHub - Dhanjith01/log-classification

๐Ÿง  Log Classifier API

This project is a FastAPI-based microservice that classifies log messages into target labels using three different strategies:

  • ๐Ÿงพ Rule-based classification with regular expressions
  • ๐Ÿค– ML-based classification using BERT + Logistic Regression
  • ๐Ÿง  Contextual classification using a Large Language Model (LLM)

๐Ÿงช Classification Pipeline

  1. ๐Ÿ” Regex-based Classifier
  • Uses predefined regular expressions
  • Ideal for simple, pattern-matching use cases
  1. ๐Ÿค– BERT + Logistic Regression
  • Uses bert-base-uncased from HuggingFace for embeddings
  • Logistic regression is trained over vectorized representations
  • Good balance of accuracy and speed
  1. ๐Ÿ’ฌ LLM-based Classification
  • Leverages a powerful LLM (e.g., OpenAI GPT, Azure OpenAI)
  • Best for nuanced, free-form logs with complex semantics
  • Requires API access and secret key set via .env

๐Ÿ“ Project Structure

โ”œโ”€โ”€ server.py # FastAPI app

โ”œโ”€โ”€ classify.py # Core classification logic

โ”œโ”€โ”€ processor_regex.py # Regex

โ”œโ”€โ”€ processor_bert.py # BERT

โ”œโ”€โ”€ processor_llm.py # LLM

โ”œโ”€โ”€ models/

โ”‚ โ”œโ”€โ”€ log_classifier.joblib

โ”œโ”€โ”€ resources/ # Temporary storage for CSVs

โ”‚ โ”œโ”€โ”€ test.csv

โ”œโ”€โ”€ training/

โ”‚ โ”œโ”€โ”€ training.ipynb

โ”‚ โ”œโ”€โ”€ dataset/

โ”‚ โ”œโ”€โ”€ โ”‚ โ”œโ”€โ”€ synthetic_logs.csv

โ”œโ”€โ”€ requirements.txt

โ””โ”€โ”€ README.md

๎ท™๎ทš

Installation

git clone https://github.com/Dhanjith01/log-classification.git
cd log-classification
pip install -r requirements.txt

Running the API

uvicorn server:app --reload

Request

Upload a .csv file with the following columns:

  • source: the source system/service
  • log_message: the log text to classify

Response

Returns a new CSV file with a third column: target_label