๐ง Log Classifier API
This project is a FastAPI-based microservice that classifies log messages into target labels using three different strategies:
- ๐งพ Rule-based classification with regular expressions
- ๐ค ML-based classification using BERT + Logistic Regression
- ๐ง Contextual classification using a Large Language Model (LLM)
๐งช Classification Pipeline
- ๐ Regex-based Classifier
- Uses predefined regular expressions
- Ideal for simple, pattern-matching use cases
- ๐ค BERT + Logistic Regression
- Uses bert-base-uncased from HuggingFace for embeddings
- Logistic regression is trained over vectorized representations
- Good balance of accuracy and speed
- ๐ฌ LLM-based Classification
- Leverages a powerful LLM (e.g., OpenAI GPT, Azure OpenAI)
- Best for nuanced, free-form logs with complex semantics
- Requires API access and secret key set via .env
๐ Project Structure
โโโ server.py # FastAPI app
โโโ classify.py # Core classification logic
โโโ processor_regex.py # Regex
โโโ processor_bert.py # BERT
โโโ processor_llm.py # LLM
โโโ models/
โ โโโ log_classifier.joblib
โโโ resources/ # Temporary storage for CSVs
โ โโโ test.csv
โโโ training/
โ โโโ training.ipynb
โ โโโ dataset/
โ โโโ โ โโโ synthetic_logs.csv
โโโ requirements.txt
โโโ README.md
๎ท๎ท
Installation
git clone https://github.com/Dhanjith01/log-classification.git
cd log-classification
pip install -r requirements.txtRunning the API
uvicorn server:app --reload
Request
Upload a .csv file with the following columns:
- source: the source system/service
- log_message: the log text to classify
Response
Returns a new CSV file with a third column: target_label