Integrate SwanLab for offline/online experiment tracking for Accelerate by ShaohonChen · Pull Request #3605

Integrate SwanLab for offline/online experiment tracking for Accelerate by ShaohonChen · Pull Request #3605 · huggingface/accelerate

What does this PR do?

This PR introduces SwanLab, a lightweight open-source experiment tracking tool, as a new logging option for the training framework. The integration provides both online and offline tracking capabilities, along with a local dashboard for visualizing results.

SwanLab has previously supported:

Tracking via Transformers' report_to parameter (documentation)
The Accelerate training framework through external callbacks (documentation)

We've received numerous requests from the community to add native Accelerate support (see here), and we're excited to officially integrate with this excellent project to provide a more seamless experience for developers. This integration is particularly valuable for users in regions with limited network connectivity (such as China), offering them robust experiment tracking capabilities.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. (see here)
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests? (I don't see any tests for any of the callbacks but please let me know if I missed them somewhere. )

Who can review?

@SunMarc I noticed that you recently reviewed some related PRs—would you mind helping review my PR as well? Thank you! (I believe you also reviewed the Hugging Face Transformers integration previously—looking forward to collaborating again! 😄)

Additional information about this PR

Usage guidline

Step 0: Set Up Accelerate and example environment

Following the accelerate official cv example (pet image classification task):

# prepare code and environments
git clone https://github.com/huggingface/accelerate
cd accelerate
pip install -e .
pip install timm     # use in example

Step 1: Set Up SwanLab Online Tracking

Install:

To use SwanLab's online tracking, log in to the SwanLab website and obtain your API key from the Settings page. Then, authenticate using the following command:

If you prefer offline mode, skip this step and install local dashboard:

pip install swanlab[dashboard]

Step 2: download Oxford-IIT Pet Dataset used in example code

You can find download link here

Step 3: run offical example script in accelerate projects

python examples/complete_cv_example.py  --data_dir <DOWNLOAD DATA PATH> --with_tracking

visualization demo here

Since my server is offline, I changed the pretrain parameter to false in the create_model code to avoid downloading the model online, which led to very poor accuracy after just 3 epochs 😂.

test suite passes

Since my AI training server couldn't connect to Hugging Face, some tests failed during the automated testing process.😭