feat: second attempt to support DDS and NonZero op by zewenli98 · Pull Request #3388 · pytorch/TensorRT
Description
Added a new path to support Data Dependent Shape (DDS) and NonZero op in this PR.
Static and dynamic shapes go the original path; DDS goes the new path with IOutputAllocator.
Fixes #2516
Type of change
- New feature (non-breaking change which adds functionality)
Checklist:
- My code follows the style guidelines of this project (You can use the linters)
- I have performed a self-review of my own code
- I have commented my code, particularly in hard-to-understand areas and hacks
- I have made corresponding changes to the documentation
- I have added tests to verify my fix or my feature
- New and existing unit tests pass locally with my changes
- I have added the relevant labels to my PR in so that relevant reviewers are notified
| if ( | ||
| node != output_node | ||
| and len(node.users) == 0 | ||
| and len(node.all_input_nodes) > 0 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably better to add an assert checking if if has only one input (print the number in the string if it fails)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I previously reused the code from other lowering pass. it looks like we can directly remove unused ops right?
| if ( | |
| node != output_node | |
| and len(node.users) == 0 | |
| and len(node.all_input_nodes) > 0 | |
| ): | |
| gm.graph.erase_node(node) | |
| gm = clean_up_graph_after_modifications(gm) | |
| logger.debug(f"Removed ops that [num_users=0] nodes:\n{gm.graph}") |
do you think if there's any potential issues?
| need_cudagraphs_reset, | ||
| ) = self.runtime_states.set_runtime_states( | ||
| cudagraphs_enabled, self.use_pre_allocated_outputs, shape_changed | ||
| self.cudagraphs_enabled, self.use_pre_allocated_outputs, shape_changed |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is use_pre_allocated_outputs valid now that you're adding OA feature ?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the OA feature will not affact use_pre_allocated_outputs because I didn't change the behavior of CG and use_pre_allocated_outputs has its own context manager as well.
| raise RuntimeError( | ||
| "Both CUDA Graphs and OutputAllocator are enabled. Please disable either one." | ||
| ) | ||
| if self.use_output_allocator_outputs: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is use_output_allocator_outputs set ? Is it by using the with context manager by the user ?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it will be set by the with context manager by the user. If users don't set it, it will choose standard exec or OA according to the converter decorator.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after minor change
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters