[megatron] qwen3.5 use megatron-core by Jintao-Huang · Pull Request #8126 · modelscope/ms-swift
Summary of Changes
Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the megatron-core framework by integrating specific support for Qwen3.5 models. It introduces advanced configuration options for attention mechanisms, allowing for fine-grained control over the distribution of linear and scaled dot-product attention layers. These changes ensure that Qwen3.5 models can be effectively built and utilized within the Megatron ecosystem, leveraging its distributed training capabilities.
Highlights
- Qwen3.5 Integration: Enabled Qwen3.5 models to utilize
megatron-coreby configuring specific attention variants, layer normalization, and attention output gating. - Flexible Linear Attention Frequency: Introduced a new
linear_attn_freq_typefunction andlinear_attention_freqconfiguration option, allowing for highly customizable patterns of linear attention and scaled dot-product attention layers within the model architecture. - Model Configuration Refinements: Updated
MegatronModelConfigto include new parameters relevant to Qwen3.5's attention mechanisms and adjusted the processing of frequency parameters to support string-based pattern definitions.
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.
Changelog
- swift/megatron/model/gpts/qwen3_next.py
- Removed a TODO comment related to
layernorm_zero_centered_gamma. - Explicitly set
enormsubmodule toQwen3NextRMSNormwithinget_mtp_block_spec.
- Removed a TODO comment related to
- swift/megatron/model/model_config.py
- Added
linear_attn_freq_typefunction with detailed documentation for defining linear attention layer patterns. - Changed the default type of
moe_layer_freqfrom integer1to string'1'. - Reordered and added new configuration parameters for Qwen3_next/Qwen3_5 models in
MegatronModelConfig, includingexperimental_attention_variant,linear_attention_freq, andattention_output_gate. - Removed the
layer_typesparameter fromMegatronModelConfig. - Updated
_format_configto processlinear_attention_frequsing the newlinear_attn_freq_typefunction. - Modified
_HF_CONFIG_MAPto correctly maplinear_attention_freqtofull_attention_intervaland adjust other Qwen3_next related mappings. - Revised
convert_hf_configforqwen3_nextandqwen3_5models to directly setexperimental_attention_variant,layernorm_zero_centered_gamma,attention_output_gate, andlinear_attention_freq.
- Added
Activity
- No human activity has been recorded on this pull request yet.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩