feat: support tiling optimization as of TRT 10.8 by zewenli98 · Pull Request #3444 · pytorch/TensorRT

@zewenli98

Description

Support tiling optimization as of TRT 10.8. More details see TRT doc: https://docs.nvidia.com/deeplearning/tensorrt/10.9.0/inference-library/advanced.html#tiling-optimization

Fixes #3443

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@zewenli98

narendasan

narendasan

narendasan

if self.compilation_settings.enable_weight_streaming:
builder_config.set_flag(trt.BuilderFlag.WEIGHT_STREAMING)

if version.parse(trt.__version__) >= version.parse("10.8"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just drop 10.7 instead having this piecemeal support

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we do it for some settings but not others, so we need to decide if we want versioned builder config or not

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my default stance is no but if its not too much work (outside of 2.7 scope) then we might want to in which case this can stay

narendasan

narendasan

narendasan

@zewenli98

peri044

strip_engine_weights (bool): Strip engine weights from the serialized engine. This is useful when the engine is to be deployed in an environment where the weights are not required.
immutable_weights (bool): Build non-refittable engines. This is useful for some layers that are not refittable. If this argument is set to true, `strip_engine_weights` and `refit_identical_engine_weights` will be ignored.
enable_weight_streaming (bool): Enable weight streaming.
tiling_optimization_level (int): The optimization level of tiling strategies. A Higher level allows TensorRT to spend more time searching for better optimization strategy. (We currently support [0, 1, 2, 3], default is 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah using the names is a good idea

Comment on lines +354 to +356

builder_config.l2_limit_for_tiling = (
self.compilation_settings.l2_limit_for_tiling
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to be really safe (when we remove version guarding), you can check if self.compilation_settings.get("l2_limit_for_tiling", -1) != -1 or something.

@zewenli98

HolyWu

@zewenli98

@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg optimization_level. It's kind of weird that a level accepts integers but the other accept strings. 2) I think people are good at memorizing integers instead of strings. Some people may use moderate but others may use intermediate.

@narendasan

@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg optimization_level. It's kind of weird that a level accepts integers but the other accept strings. 2) I think people are good at memorizing integers instead of strings. Some people may use moderate but others may use intermediate.

Dont think it matters that it should be consistent with the other optimization api, TensorRT made them different for some reason but I don't think we need to fix that for them. It should be consistent with TensorRT's API. I think the appropriate choices are strings or an enum

@zewenli98

@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg optimization_level. It's kind of weird that a level accepts integers but the other accept strings. 2) I think people are good at memorizing integers instead of strings. Some people may use moderate but others may use intermediate.

Dont think it matters that it should be consistent with the other optimization api, TensorRT made them different for some reason but I don't think we need to fix that for them. It should be consistent with TensorRT's API. I think the appropriate choices are strings or an enum

Got it. trtexec actually uses integers as well. Anyways I'll change to strings

@zewenli98

peri044

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM