fix: Add lowering pass to remove output repacking in `convert_method_to_trt_engine` calls by gs-olive · Pull Request #1945 · pytorch/TensorRT
- Automatically remove output repacking for `convert_method_to_trt_engine` calls, to improve parity between models which can be converted directly to TRT engines, and models which can be fully compiled - Add new internal `CompileSpec` argument for lowering which indicates whether the lowering passes originate from a `convert_method_to_trt_engine` call or a regular `compile` call, which affects whether the lowering pass is applied - Regular TorchScript graphs cannot have this pass applied, as it can otherwise break the output graph. Newer versions of Torch disallow graph outputs with 0 or 2+ arguments which are not packed in a struct - Current lowering pass detects outputs which are flat Lists or Tuples of Tensors and returns the outputs as-is (direct from the TRT Engine), so the entire model can be converted to a single TRT engine