enable big_model_inference on xpu by yao-matrix · Pull Request #3595

enable big_model_inference on xpu by yao-matrix · Pull Request #3595 · huggingface/accelerate

added 2 commits

May 27, 2025 07:27

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

S1ro1 added a commit that referenced this pull request

Jun 10, 2025

commit 2f8fd72
Author: Simon <80467011+sorgfresser@users.noreply.github.com>
Date:   Tue Jun 10 13:50:34 2025 +0100

    Remove device_count (#3587)

commit d2e6b03
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 10 05:26:48 2025 -0700

    [FSDP2] Refactor + FP8 (#3585)

    * Fix double wrap

    * Clocking off, ~equal to torch baseline

    * works?

    * Working version

    * Partial rewrite

    * FSDP2 path works

    * Fix back prepare

    * Almost done, proper AC left

    * Feat: should work, cleanup + test more benchmarks left

    * Style+quality

    * Feat: fp8 example

    * Feat: better example

    * Feat: add readme

    * Docs + should be done

    * Fix: typos

    * Fix: protect imports

    * Feat: address comments

    * Feat: add flops image

commit b9fee48
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 10 13:24:43 2025 +0100

    better handle FP8 with and without deepspeed (#3611)

    * use the state mixed precision which has undergone all preprocessing

    * Update src/accelerate/accelerator.py

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * Update src/accelerate/accelerator.py

    * accelerator state sets the mixed precision for deepspeed and fp8_enabled

    * fix

    * fix

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 3a82b05
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Tue Jun 10 11:29:59 2025 +0200

    Fix bf16 training with TP  (#3610)

    * fix

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 6b61a37
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Fri Jun 6 13:48:43 2025 +0100

    fix deepspeed regional compilation (#3609)

commit 682691d
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 3 12:36:56 2025 +0200

    Update Gaudi Runners (#3593)

    * test

    * fix

    * push

    * in the morning

    * fix backend

    * run first

    * set habana modules

    * dynamo backend

    * trigger

    * remove on pr

    * remove on file change

commit 791055b
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 3 12:24:20 2025 +0200

    Fix: list object has no attribute keys (#3603)

commit 16bf1d8
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:36:34 2025 +0800

    enable torchao and pippy test cases on XPU (#3599)

    * enable torchao and pippy test cases on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ab3c604
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:23:26 2025 +0800

    enable big_model_inference on xpu (#3595)

    * enable big_model_inference on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix quality

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 273799c
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 20:08:59 2025 +0800

    enable fsdp2 benchmark on XPU (#3590)

    * enable fsdp2 benchmark on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * add deterministic

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 43526c5
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:44:50 2025 +0800

    add device-agnostic GradScaler (#3588)

    * add device-agnostic GradScaler

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix bug

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix review comments

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * format

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 07f2392
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:17:18 2025 +0800

    change to use torch.device (#3594)

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ee2f48c
Author: Fanli Lin <fanli.lin@intel.com>
Date:   Tue May 27 17:16:42 2025 +0800

    [docs] no hard-coded cuda in the ddp documentation (#3589)

    * make device-agnostic

    * refactor

commit 4f3abb7
Author: jiqing-feng <jiqing.feng@intel.com>
Date:   Mon May 26 21:55:10 2025 +0800

    Set ccl and KMP param in simple launch (#3575)

    * Even 1 CPU mechine can also run multi process

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl and kml param setting

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * set master addr only when processes > 1

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix num process check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl args check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    ---------

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

commit db536cb
Author: Yuanzhou Cai <80858000+yuanjua@users.noreply.github.com>
Date:   Mon May 26 21:08:13 2025 +0800

    Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Add test for bug #3550

    * Improve test for #3550

    * Remove redundant code

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * fix style

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 4e9d0de
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Mon May 26 21:05:42 2025 +0800

    enable regional_compilation benchmark on xpu (#3592)

    * enable regional_compilation benchmark on xpu

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 8cb3ace
Author: Luiz F. G. dos Santos <luiz.fernando0992@gmail.com>
Date:   Thu May 22 10:21:54 2025 -0500

    Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)

    * Added artifacts and figure tracking at MLFlow tracker

    * Added `log_artifact` to the MLFlowTracker

    * Remove changes

    * Added kwargs when loading state.

    * added doc string

    * Adjusted correct default types of kwargs

    * Changed the load kwargs to a single one

    * removed None value from kwargs

    * fix kwargs for loading the model

    * removed load_kwargs from optimizer state dict

    * make load_kwargs a dictionary

    * revert last changes

    * reverted load_kwargs

    * fix docstring

    * added dict initiation

    * Fix quality error during PR

commit b6d97cb
Author: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Date:   Thu May 22 17:26:31 2025 +0300

    Resolve logger warnings (#3582)

    Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

commit 33967d4
Author: Francesco Laiti <25352428+laitifranz@users.noreply.github.com>
Date:   Tue May 20 12:29:53 2025 +0200

    Add support for standalone mode when default port is occupied on single node (#3576)

    * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection

    * address review feedback: warn on port conflict only for single-node; raise error for multi-node

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 5b1fcda
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:04:24 2025 +0800

    enable test_cli & test_example cases on XPU (#3578)

    * enable test_cli & test_example cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * remove print

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix ci issue

    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>
    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

commit f55f053
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:02:14 2025 +0800

    goodbye torch_ccl (#3580)

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

commit 1ec99f0
Author: Yao Matrix <yaoweifeng0301@126.com>
Date:   Mon May 19 17:27:40 2025 +0800

    enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)

    * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * Update test_load_checkpoint_and_dispatch_with_broadcast.py

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

S1ro1 added a commit that referenced this pull request

Jun 10, 2025

commit 2f8fd72
Author: Simon <80467011+sorgfresser@users.noreply.github.com>
Date:   Tue Jun 10 13:50:34 2025 +0100

    Remove device_count (#3587)

commit d2e6b03
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 10 05:26:48 2025 -0700

    [FSDP2] Refactor + FP8 (#3585)

    * Fix double wrap

    * Clocking off, ~equal to torch baseline

    * works?

    * Working version

    * Partial rewrite

    * FSDP2 path works

    * Fix back prepare

    * Almost done, proper AC left

    * Feat: should work, cleanup + test more benchmarks left

    * Style+quality

    * Feat: fp8 example

    * Feat: better example

    * Feat: add readme

    * Docs + should be done

    * Fix: typos

    * Fix: protect imports

    * Feat: address comments

    * Feat: add flops image

commit b9fee48
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 10 13:24:43 2025 +0100

    better handle FP8 with and without deepspeed (#3611)

    * use the state mixed precision which has undergone all preprocessing

    * Update src/accelerate/accelerator.py

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * Update src/accelerate/accelerator.py

    * accelerator state sets the mixed precision for deepspeed and fp8_enabled

    * fix

    * fix

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 3a82b05
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Tue Jun 10 11:29:59 2025 +0200

    Fix bf16 training with TP  (#3610)

    * fix

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 6b61a37
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Fri Jun 6 13:48:43 2025 +0100

    fix deepspeed regional compilation (#3609)

commit 682691d
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 3 12:36:56 2025 +0200

    Update Gaudi Runners (#3593)

    * test

    * fix

    * push

    * in the morning

    * fix backend

    * run first

    * set habana modules

    * dynamo backend

    * trigger

    * remove on pr

    * remove on file change

commit 791055b
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 3 12:24:20 2025 +0200

    Fix: list object has no attribute keys (#3603)

commit 16bf1d8
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:36:34 2025 +0800

    enable torchao and pippy test cases on XPU (#3599)

    * enable torchao and pippy test cases on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ab3c604
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:23:26 2025 +0800

    enable big_model_inference on xpu (#3595)

    * enable big_model_inference on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix quality

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 273799c
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 20:08:59 2025 +0800

    enable fsdp2 benchmark on XPU (#3590)

    * enable fsdp2 benchmark on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * add deterministic

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 43526c5
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:44:50 2025 +0800

    add device-agnostic GradScaler (#3588)

    * add device-agnostic GradScaler

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix bug

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix review comments

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * format

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 07f2392
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:17:18 2025 +0800

    change to use torch.device (#3594)

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ee2f48c
Author: Fanli Lin <fanli.lin@intel.com>
Date:   Tue May 27 17:16:42 2025 +0800

    [docs] no hard-coded cuda in the ddp documentation (#3589)

    * make device-agnostic

    * refactor

commit 4f3abb7
Author: jiqing-feng <jiqing.feng@intel.com>
Date:   Mon May 26 21:55:10 2025 +0800

    Set ccl and KMP param in simple launch (#3575)

    * Even 1 CPU mechine can also run multi process

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl and kml param setting

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * set master addr only when processes > 1

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix num process check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl args check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    ---------

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

commit db536cb
Author: Yuanzhou Cai <80858000+yuanjua@users.noreply.github.com>
Date:   Mon May 26 21:08:13 2025 +0800

    Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Add test for bug #3550

    * Improve test for #3550

    * Remove redundant code

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * fix style

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 4e9d0de
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Mon May 26 21:05:42 2025 +0800

    enable regional_compilation benchmark on xpu (#3592)

    * enable regional_compilation benchmark on xpu

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 8cb3ace
Author: Luiz F. G. dos Santos <luiz.fernando0992@gmail.com>
Date:   Thu May 22 10:21:54 2025 -0500

    Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)

    * Added artifacts and figure tracking at MLFlow tracker

    * Added `log_artifact` to the MLFlowTracker

    * Remove changes

    * Added kwargs when loading state.

    * added doc string

    * Adjusted correct default types of kwargs

    * Changed the load kwargs to a single one

    * removed None value from kwargs

    * fix kwargs for loading the model

    * removed load_kwargs from optimizer state dict

    * make load_kwargs a dictionary

    * revert last changes

    * reverted load_kwargs

    * fix docstring

    * added dict initiation

    * Fix quality error during PR

commit b6d97cb
Author: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Date:   Thu May 22 17:26:31 2025 +0300

    Resolve logger warnings (#3582)

    Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

commit 33967d4
Author: Francesco Laiti <25352428+laitifranz@users.noreply.github.com>
Date:   Tue May 20 12:29:53 2025 +0200

    Add support for standalone mode when default port is occupied on single node (#3576)

    * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection

    * address review feedback: warn on port conflict only for single-node; raise error for multi-node

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 5b1fcda
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:04:24 2025 +0800

    enable test_cli & test_example cases on XPU (#3578)

    * enable test_cli & test_example cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * remove print

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix ci issue

    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>
    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

commit f55f053
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:02:14 2025 +0800

    goodbye torch_ccl (#3580)

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

commit 1ec99f0
Author: Yao Matrix <yaoweifeng0301@126.com>
Date:   Mon May 19 17:27:40 2025 +0800

    enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)

    * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * Update test_load_checkpoint_and_dispatch_with_broadcast.py

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

S1ro1 added a commit that referenced this pull request

Jul 9, 2025

commit 2f8fd72
Author: Simon <80467011+sorgfresser@users.noreply.github.com>
Date:   Tue Jun 10 13:50:34 2025 +0100

    Remove device_count (#3587)

commit d2e6b03
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 10 05:26:48 2025 -0700

    [FSDP2] Refactor + FP8 (#3585)

    * Fix double wrap

    * Clocking off, ~equal to torch baseline

    * works?

    * Working version

    * Partial rewrite

    * FSDP2 path works

    * Fix back prepare

    * Almost done, proper AC left

    * Feat: should work, cleanup + test more benchmarks left

    * Style+quality

    * Feat: fp8 example

    * Feat: better example

    * Feat: add readme

    * Docs + should be done

    * Fix: typos

    * Fix: protect imports

    * Feat: address comments

    * Feat: add flops image

commit b9fee48
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 10 13:24:43 2025 +0100

    better handle FP8 with and without deepspeed (#3611)

    * use the state mixed precision which has undergone all preprocessing

    * Update src/accelerate/accelerator.py

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * Update src/accelerate/accelerator.py

    * accelerator state sets the mixed precision for deepspeed and fp8_enabled

    * fix

    * fix

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 3a82b05
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Tue Jun 10 11:29:59 2025 +0200

    Fix bf16 training with TP  (#3610)

    * fix

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 6b61a37
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Fri Jun 6 13:48:43 2025 +0100

    fix deepspeed regional compilation (#3609)

commit 682691d
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 3 12:36:56 2025 +0200

    Update Gaudi Runners (#3593)

    * test

    * fix

    * push

    * in the morning

    * fix backend

    * run first

    * set habana modules

    * dynamo backend

    * trigger

    * remove on pr

    * remove on file change

commit 791055b
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 3 12:24:20 2025 +0200

    Fix: list object has no attribute keys (#3603)

commit 16bf1d8
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:36:34 2025 +0800

    enable torchao and pippy test cases on XPU (#3599)

    * enable torchao and pippy test cases on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ab3c604
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:23:26 2025 +0800

    enable big_model_inference on xpu (#3595)

    * enable big_model_inference on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix quality

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 273799c
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 20:08:59 2025 +0800

    enable fsdp2 benchmark on XPU (#3590)

    * enable fsdp2 benchmark on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * add deterministic

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 43526c5
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:44:50 2025 +0800

    add device-agnostic GradScaler (#3588)

    * add device-agnostic GradScaler

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix bug

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix review comments

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * format

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 07f2392
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:17:18 2025 +0800

    change to use torch.device (#3594)

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ee2f48c
Author: Fanli Lin <fanli.lin@intel.com>
Date:   Tue May 27 17:16:42 2025 +0800

    [docs] no hard-coded cuda in the ddp documentation (#3589)

    * make device-agnostic

    * refactor

commit 4f3abb7
Author: jiqing-feng <jiqing.feng@intel.com>
Date:   Mon May 26 21:55:10 2025 +0800

    Set ccl and KMP param in simple launch (#3575)

    * Even 1 CPU mechine can also run multi process

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl and kml param setting

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * set master addr only when processes > 1

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix num process check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl args check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    ---------

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

commit db536cb
Author: Yuanzhou Cai <80858000+yuanjua@users.noreply.github.com>
Date:   Mon May 26 21:08:13 2025 +0800

    Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Add test for bug #3550

    * Improve test for #3550

    * Remove redundant code

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * fix style

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 4e9d0de
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Mon May 26 21:05:42 2025 +0800

    enable regional_compilation benchmark on xpu (#3592)

    * enable regional_compilation benchmark on xpu

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 8cb3ace
Author: Luiz F. G. dos Santos <luiz.fernando0992@gmail.com>
Date:   Thu May 22 10:21:54 2025 -0500

    Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)

    * Added artifacts and figure tracking at MLFlow tracker

    * Added `log_artifact` to the MLFlowTracker

    * Remove changes

    * Added kwargs when loading state.

    * added doc string

    * Adjusted correct default types of kwargs

    * Changed the load kwargs to a single one

    * removed None value from kwargs

    * fix kwargs for loading the model

    * removed load_kwargs from optimizer state dict

    * make load_kwargs a dictionary

    * revert last changes

    * reverted load_kwargs

    * fix docstring

    * added dict initiation

    * Fix quality error during PR

commit b6d97cb
Author: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Date:   Thu May 22 17:26:31 2025 +0300

    Resolve logger warnings (#3582)

    Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

commit 33967d4
Author: Francesco Laiti <25352428+laitifranz@users.noreply.github.com>
Date:   Tue May 20 12:29:53 2025 +0200

    Add support for standalone mode when default port is occupied on single node (#3576)

    * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection

    * address review feedback: warn on port conflict only for single-node; raise error for multi-node

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 5b1fcda
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:04:24 2025 +0800

    enable test_cli & test_example cases on XPU (#3578)

    * enable test_cli & test_example cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * remove print

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix ci issue

    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>
    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

commit f55f053
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:02:14 2025 +0800

    goodbye torch_ccl (#3580)

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

commit 1ec99f0
Author: Yao Matrix <yaoweifeng0301@126.com>
Date:   Mon May 19 17:27:40 2025 +0800

    enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)

    * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * Update test_load_checkpoint_and_dispatch_with_broadcast.py

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

S1ro1 added a commit that referenced this pull request

Jul 9, 2025

commit 2f8fd72
Author: Simon <80467011+sorgfresser@users.noreply.github.com>
Date:   Tue Jun 10 13:50:34 2025 +0100

    Remove device_count (#3587)

commit d2e6b03
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 10 05:26:48 2025 -0700

    [FSDP2] Refactor + FP8 (#3585)

    * Fix double wrap

    * Clocking off, ~equal to torch baseline

    * works?

    * Working version

    * Partial rewrite

    * FSDP2 path works

    * Fix back prepare

    * Almost done, proper AC left

    * Feat: should work, cleanup + test more benchmarks left

    * Style+quality

    * Feat: fp8 example

    * Feat: better example

    * Feat: add readme

    * Docs + should be done

    * Fix: typos

    * Fix: protect imports

    * Feat: address comments

    * Feat: add flops image

commit b9fee48
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 10 13:24:43 2025 +0100

    better handle FP8 with and without deepspeed (#3611)

    * use the state mixed precision which has undergone all preprocessing

    * Update src/accelerate/accelerator.py

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * Update src/accelerate/accelerator.py

    * accelerator state sets the mixed precision for deepspeed and fp8_enabled

    * fix

    * fix

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 3a82b05
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Tue Jun 10 11:29:59 2025 +0200

    Fix bf16 training with TP  (#3610)

    * fix

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 6b61a37
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Fri Jun 6 13:48:43 2025 +0100

    fix deepspeed regional compilation (#3609)

commit 682691d
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 3 12:36:56 2025 +0200

    Update Gaudi Runners (#3593)

    * test

    * fix

    * push

    * in the morning

    * fix backend

    * run first

    * set habana modules

    * dynamo backend

    * trigger

    * remove on pr

    * remove on file change

commit 791055b
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 3 12:24:20 2025 +0200

    Fix: list object has no attribute keys (#3603)

commit 16bf1d8
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:36:34 2025 +0800

    enable torchao and pippy test cases on XPU (#3599)

    * enable torchao and pippy test cases on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ab3c604
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:23:26 2025 +0800

    enable big_model_inference on xpu (#3595)

    * enable big_model_inference on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix quality

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 273799c
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 20:08:59 2025 +0800

    enable fsdp2 benchmark on XPU (#3590)

    * enable fsdp2 benchmark on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * add deterministic

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 43526c5
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:44:50 2025 +0800

    add device-agnostic GradScaler (#3588)

    * add device-agnostic GradScaler

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix bug

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix review comments

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * format

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 07f2392
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:17:18 2025 +0800

    change to use torch.device (#3594)

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ee2f48c
Author: Fanli Lin <fanli.lin@intel.com>
Date:   Tue May 27 17:16:42 2025 +0800

    [docs] no hard-coded cuda in the ddp documentation (#3589)

    * make device-agnostic

    * refactor

commit 4f3abb7
Author: jiqing-feng <jiqing.feng@intel.com>
Date:   Mon May 26 21:55:10 2025 +0800

    Set ccl and KMP param in simple launch (#3575)

    * Even 1 CPU mechine can also run multi process

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl and kml param setting

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * set master addr only when processes > 1

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix num process check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl args check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    ---------

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

commit db536cb
Author: Yuanzhou Cai <80858000+yuanjua@users.noreply.github.com>
Date:   Mon May 26 21:08:13 2025 +0800

    Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Add test for bug #3550

    * Improve test for #3550

    * Remove redundant code

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * fix style

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 4e9d0de
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Mon May 26 21:05:42 2025 +0800

    enable regional_compilation benchmark on xpu (#3592)

    * enable regional_compilation benchmark on xpu

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 8cb3ace
Author: Luiz F. G. dos Santos <luiz.fernando0992@gmail.com>
Date:   Thu May 22 10:21:54 2025 -0500

    Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)

    * Added artifacts and figure tracking at MLFlow tracker

    * Added `log_artifact` to the MLFlowTracker

    * Remove changes

    * Added kwargs when loading state.

    * added doc string

    * Adjusted correct default types of kwargs

    * Changed the load kwargs to a single one

    * removed None value from kwargs

    * fix kwargs for loading the model

    * removed load_kwargs from optimizer state dict

    * make load_kwargs a dictionary

    * revert last changes

    * reverted load_kwargs

    * fix docstring

    * added dict initiation

    * Fix quality error during PR

commit b6d97cb
Author: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Date:   Thu May 22 17:26:31 2025 +0300

    Resolve logger warnings (#3582)

    Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

commit 33967d4
Author: Francesco Laiti <25352428+laitifranz@users.noreply.github.com>
Date:   Tue May 20 12:29:53 2025 +0200

    Add support for standalone mode when default port is occupied on single node (#3576)

    * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection

    * address review feedback: warn on port conflict only for single-node; raise error for multi-node

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 5b1fcda
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:04:24 2025 +0800

    enable test_cli & test_example cases on XPU (#3578)

    * enable test_cli & test_example cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * remove print

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix ci issue

    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>
    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

commit f55f053
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:02:14 2025 +0800

    goodbye torch_ccl (#3580)

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

commit 1ec99f0
Author: Yao Matrix <yaoweifeng0301@126.com>
Date:   Mon May 19 17:27:40 2025 +0800

    enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)

    * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * Update test_load_checkpoint_and_dispatch_with_broadcast.py

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>