fix: Add lowering pass to remove output repacking in `convert_method_to_trt_engine` calls by gs-olive · Pull Request #1945

fix: Add lowering pass to remove output repacking in `convert_method_to_trt_engine` calls by gs-olive · Pull Request #1945 · pytorch/TensorRT

- Automatically remove output repacking for
`convert_method_to_trt_engine` calls, to improve parity between models
which can be converted directly to TRT engines, and models which can be
fully compiled
- Add new internal `CompileSpec` argument for lowering which indicates
whether the lowering passes originate from a
`convert_method_to_trt_engine` call or a regular `compile` call, which
affects whether the lowering pass is applied
- Regular TorchScript graphs cannot have this pass applied, as it can
otherwise break the output graph. Newer versions of Torch disallow graph
outputs with 0 or 2+ arguments which are not packed in a struct
- Current lowering pass detects outputs which are flat Lists or Tuples
of Tensors and returns the outputs as-is (direct from the TRT Engine),
so the entire model can be converted to a single TRT engine