add support for port 0 auto-selection in multi-GPU environments by hellobiondi · Pull Request #3501 · huggingface/accelerate
What does this PR do?
This PR implements support for port 0 auto-selection in multi-GPU environments (prepare_multi_gpu_env()). The documentation already mentions that setting port to 0 will automatically select the next available port, but this functionality wasn't actually implemented in the code.
When main_process_port is set to 0, the code now:
- Automatically finds an available port through socket binding
- Updates relevant arguments (
master_port,rdzv_endpoint) with the selected port - Provides a more seamless experience for users working in environments where specific ports might be occupied
Before submitting
- This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- Did you read the [contributor guideline](https://github.com/huggingface/accelerate/blob/main/CONTRIBUTING.md#submitting-a-pull-request-pr), Pull Request section?
- Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case.
- Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/accelerate/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/accelerate/tree/main/docs#writing-documentation---specification).
- Did you write any new necessary tests?
tests/test_launch.py
Who can review?
@SunMarc @zach-huggingface - This relates to the Command Line Interface and distributed training functionality.