GitHub - kangyangWHU/LLMAlignment: Official Code for S&P 25 paper "Alleviating the Fear of Losing Alignment in LLM Fine-tuning"

Official Code for S&P 25 paper "Alleviating the Fear of Losing Alignment in LLM Fine-tuning"

LLM Alignment Framework

A framework for evaluating and improving language model alignment through fine-tuning and parameter recovery techniques.

Overview

This project provides tools for:

Fine-tuning language models with alignment objectives
Recovering model parameters after harmful fine-tuning
Evaluating model performance and safety
Supporting multiple LLM architectures (Llama2, Gemma, Mistral, Qwen)

Core Components

1. Fine-tuning (`run_finetune_exp.py`)

Supports LoRA-based fine-tuning
Handles both benign and harmful training data
Configurable training parameters
Supports multiple LLM architectures

2. Parameter Recovery (`sgdg_rollback_final.py`)

Implements gradient-guided parameter recovery
Supports multi-GPU training
Features warmup steps and rollback mechanisms
Configurable recovery rates and thresholds

3. Evaluation (`run_eval_exp.py`)

Measures model performance on various tasks
Evaluates model safety and harmful behaviors
Supports multiple evaluation datasets
Tracks metrics across recovery steps

4. Results Analysis (`run_res.py`)

Analyzes experimental results
Processes metrics across different models and tasks
Generates comparative analysis

Supported Models

Llama2 (7B, 13B)
Gemma 2B
Mistral v2 7B
Qwen 7B
You can add more.

Tasks

SQL
Cheat detection
NL2Bash conversion
Text summarization
Toxicity detection

Installation

Clone the repository:

git clone https://github.com/kangyangWHU/LLMAlignment.git
cd LLMAlignment

Install dependencies:

conda create -n myenv python=3.9

# Step 2: Activate the environment
conda activate myenv

# install pytorch
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121

# Step 3: Install requirements via pip
pip install -r requirements.txt

If you use Gemma serires, you need also install FlashAttention, which requires cuda > 12.0.

pip install flash-attn==2.6.1

Usage

1. Fine-tuning a Model

python run_finetune_exp.py

2. Running Parameter Recovery

python run_recover_exp.py

3. Evaluating Results

4. Analyzing Results

Project Structure

LLMAlignment/
├── run_finetune_exp.py    # Fine-tuning experiments
├── sgdg_rollback_final.py # Parameter recovery implementation
├── run_recover_exp.py     # run parameter recovery experiments
├── run_eval_exp.py        # Evaluation pipeline
├── run_res.py             # Results analysis
├── utils/                 # Utility functions
│   ├── constant.py        # Constants and mappings
│   ├── inference_utils.py # Inference helpers
│   ├── lora_utils.py      # LoRA utilities
│   └── res_utils.py       # Results processing
├── dataset/               # datasets
└── cfg/                   # Configuration files

Key Features

Multi-GPU Support
- Distributed training and evaluation
- Efficient parameter recovery across multiple GPUs
Flexible Evaluation
- Support for multiple tasks
- Customizable evaluation metrics
- Safety evaluation
Parameter Recovery
- Gradient-guided recovery
- Configurable recovery strategies
- Progress tracking and checkpointing
Modular Design
- Easy to extend to new models
- Configurable components
- Reusable utilities

Citation

If you use this code in your research, please cite:

@INPROCEEDINGS {,
author = { Yang, Kang and Tao, Guanhong and Chen, Xun and Xu, Jun },
booktitle = { 2025 IEEE Symposium on Security and Privacy (SP) },
title = {{ Alleviating the Fear of Losing Alignment in LLM Fine-tuning }},
year = {2025},
volume = {},
ISSN = {2375-1207},
pages = {2004-2022},
keywords = {},
doi = {10.1109/SP61157.2025.00171},
url = {https://doi.ieeecomputersociety.org/10.1109/SP61157.2025.00171},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month =May}