feat: support tiling optimization as of TRT 10.8 by zewenli98 · Pull Request #3444 · pytorch/TensorRT
Description
Support tiling optimization as of TRT 10.8. More details see TRT doc: https://docs.nvidia.com/deeplearning/tensorrt/10.9.0/inference-library/advanced.html#tiling-optimization
Fixes #3443
Type of change
- New feature (non-breaking change which adds functionality)
Checklist:
- My code follows the style guidelines of this project (You can use the linters)
- I have performed a self-review of my own code
- I have commented my code, particularly in hard-to-understand areas and hacks
- I have made corresponding changes to the documentation
- I have added tests to verify my fix or my feature
- New and existing unit tests pass locally with my changes
- I have added the relevant labels to my PR in so that relevant reviewers are notified
| if self.compilation_settings.enable_weight_streaming: | ||
| builder_config.set_flag(trt.BuilderFlag.WEIGHT_STREAMING) | ||
|
|
||
| if version.parse(trt.__version__) >= version.parse("10.8"): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just drop 10.7 instead having this piecemeal support
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we do it for some settings but not others, so we need to decide if we want versioned builder config or not
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my default stance is no but if its not too much work (outside of 2.7 scope) then we might want to in which case this can stay
| strip_engine_weights (bool): Strip engine weights from the serialized engine. This is useful when the engine is to be deployed in an environment where the weights are not required. | ||
| immutable_weights (bool): Build non-refittable engines. This is useful for some layers that are not refittable. If this argument is set to true, `strip_engine_weights` and `refit_identical_engine_weights` will be ignored. | ||
| enable_weight_streaming (bool): Enable weight streaming. | ||
| tiling_optimization_level (int): The optimization level of tiling strategies. A Higher level allows TensorRT to spend more time searching for better optimization strategy. (We currently support [0, 1, 2, 3], default is 0) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah using the names is a good idea
Comment on lines +354 to +356
| builder_config.l2_limit_for_tiling = ( | ||
| self.compilation_settings.l2_limit_for_tiling | ||
| ) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to be really safe (when we remove version guarding), you can check if self.compilation_settings.get("l2_limit_for_tiling", -1) != -1 or something.
@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg optimization_level. It's kind of weird that a level accepts integers but the other accept strings. 2) I think people are good at memorizing integers instead of strings. Some people may use moderate but others may use intermediate.
@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg
optimization_level. It's kind of weird that alevelaccepts integers but the other accept strings. 2) I think people are good at memorizing integers instead of strings. Some people may usemoderatebut others may useintermediate.
Dont think it matters that it should be consistent with the other optimization api, TensorRT made them different for some reason but I don't think we need to fix that for them. It should be consistent with TensorRT's API. I think the appropriate choices are strings or an enum
@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg
optimization_level. It's kind of weird that alevelaccepts integers but the other accept strings. 2) I think people are good at memorizing integers instead of strings. Some people may usemoderatebut others may useintermediate.Dont think it matters that it should be consistent with the other optimization api, TensorRT made them different for some reason but I don't think we need to fix that for them. It should be consistent with TensorRT's API. I think the appropriate choices are strings or an enum
Got it. trtexec actually uses integers as well. Anyways I'll change to strings
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters