Add ONNX export support for baseline TRT benchmarking#17771
Add ONNX export support for baseline TRT benchmarking#17771larryliu0820 wants to merge 27 commits intomainfrom
Conversation
Summary: Establish the foundation for integrating TensorRT as an ExecuTorch delegate backend. TensorRT enables GPU-accelerated inference on NVIDIA hardware. This initial commit sets up the project structure and build configuration to support both OSS (CMake) and internal (Buck) builds, without functional code. Differential Revision: D93275046
Summary: Add initial backend stub with empty preprocess implementation and TensorRT backend registration. Differential Revision: D93275053
Summary: Add partitioner stub that identifies and partitions TensorRT-compatible subgraphs. Differential Revision: D93275059
Summary: Add OperatorSupport class to define which operators the TensorRT backend can handle. Differential Revision: D93275052
Summary: Integrate the TensorRT partitioner with the export pipeline for graph partitioning. Differential Revision: D93275061
Summary: Add infrastructure for registering TensorRT converters for PyTorch operations. Differential Revision: D93275041
Summary: Add utility functions for TensorRT converter implementations and the add operation converter. Differential Revision: D93275044
Summary: Add TensorRTCompileSpec dataclass to configure TensorRT compilation options. Differential Revision: D93275056
Summary: Implement the basic preprocess method for the TensorRT backend. Differential Revision: D93275049
Summary: Add blob serialization format for TensorRT engines with I/O binding metadata. Differential Revision: D93275062
Summary: Complete preprocess integration with blob serialization for TensorRT engine compilation. Differential Revision: D93275051
Summary: Add complete C++ runtime infrastructure for TensorRT backend. Differential Revision: D93275039
Summary: Add complete C++ runner example for TensorRT-accelerated model inference. Also sets up GitHub Actions CI workflow for automated builds on NVIDIA GPUs. Differential Revision: D93275050
Differential Revision: D93275045
Summary: Add converters for addmm (fused add + matrix multiply) and permute_copy operations to enable support for linear layers in neural networks. Differential Revision: D93275060
Summary: Add converters and optimizations to enable MobileNetV3 model support with the TensorRT backend. Differential Revision: D93275043
Summary: Add converters for embedding, expand, and upsample operations. These enable transformer-based models and upsampling layers. Differential Revision: D93275040
Summary: Add converters for layer_norm and pixel_shuffle operations. These enable transformer-based models and super-resolution models like EDSR. Differential Revision: D93275054
Summary: Add Scaled Dot-Product Attention (SDPA) converter to enable transformer-based attention layers. Differential Revision: D93275047
Summary: Add batch matrix multiplication (bmm) converter to enable transformer attention layers and batch matrix operations. Differential Revision: D93275048
Summary: Add converters for comparison and slicing operations. Differential Revision: D93275055
Summary: Add comprehensive correctness tests for TensorRT backend including model export and inference validation. Differential Revision: D93275042
Summary: Add benchmark runners and infrastructure for measuring TensorRT inference performance including latency and throughput metrics. Differential Revision: D93275038
Summary: Add converters for power and unary operations to support additional model architectures. Differential Revision: D93275058
Summary: Share a single CUDA stream across all TensorRT delegate instances instead of creating a per-delegate stream. This improves performance for serialized execution (the common case) by eliminating synchronization overhead between subgraphs. Differential Revision: D93778115
Summary: Bring torch-tensorrt 2.10.0 into fbsource as a pure-Python buck target for benchmarking torch-tensorrt vs ExecuTorch TensorRT backend. The Python sources are extracted from the PyPI wheel with minimal patches: - __init__.py: skip libtorchtrt.so loading (dynamo IR works without C++ runtime) - _TorchTensorRTModule.py: hardcode runtime constants (C++ ops not available) - _features.py: handle missing tensorrt._package_name attribute The library depends on //caffe2:torch and //deeplearning/trt/python:py_tensorrt (which provides TensorRT 10.3 Python bindings via the existing PACKAGE config). Also updates export.py to support custom models via --custom-model module:callable, and adds create_mdm_for_benchmark() factory to MDM utils for benchmarking. Differential Revision: D94608890
Summary: Add --onnx flag to export.py that also exports models to ONNX format alongside the standard .pte export. This enables baseline TRT benchmarking via the benchmark binary which can compile and benchmark ONNX models directly using TensorRT's native ONNX parser. The ONNX export uses torch.onnx.export() which is built into PyTorch, so no additional dependencies are needed. Differential Revision: D94703568
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17771
Note: Links to docs will display an error until the docs builds have been completed. ❌ 7 New Failures, 2 Unrelated FailuresAs of commit c08953c with merge base 9f2f005 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@larryliu0820 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94703568. |
This PR needs a
|
Summary:
Add --onnx flag to export.py that also exports models to ONNX format
alongside the standard .pte export. This enables baseline TRT benchmarking
via the benchmark binary which can compile and benchmark ONNX models
directly using TensorRT's native ONNX parser.
The ONNX export uses torch.onnx.export() which is built into PyTorch,
so no additional dependencies are needed.
Differential Revision: D94703568