Added a check in no_sync() to avoid errors when using deepspeed zero2/3 by xliu0105 · Pull Request #3656 · huggingface/accelerate

Added a check in no_sync() to avoid errors when using deepspeed zero2/3

What does this PR do?

When using accelerate and deepspeed with zero2 or zero3, and setting the gradient accumulation > 1, an error will be reported: AssertionError: no_sync context manager is incompatible with gradient partitioning logic of ZeRO stage 2

Therefore, modify the no_sync() function and add a judgment to it to ensure that no_sync does not conflict with the implementation of zero2 and zero3.

Thanks @alisafaya

Fixes #3481

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@SunMarc @zach-huggingface