Added a check in no_sync() to avoid errors when using deepspeed zero2/3 by xliu0105 · Pull Request #3656 · huggingface/accelerate
Added a check in no_sync() to avoid errors when using deepspeed zero2/3
What does this PR do?
When using accelerate and deepspeed with zero2 or zero3, and setting the gradient accumulation > 1, an error will be reported: AssertionError: no_sync context manager is incompatible with gradient partitioning logic of ZeRO stage 2
Therefore, modify the no_sync() function and add a judgment to it to ensure that no_sync does not conflict with the implementation of zero2 and zero3.
Thanks @alisafaya
Fixes #3481
Before submitting
- This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- Did you read the contributor guideline,
Pull Request section? - Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. - Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings. - Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@SunMarc @zach-huggingface