Skip to content

Add ONNX export support for baseline TRT benchmarking#17771

Open
larryliu0820 wants to merge 27 commits intomainfrom
export-D94703568
Open

Add ONNX export support for baseline TRT benchmarking#17771
larryliu0820 wants to merge 27 commits intomainfrom
export-D94703568

Conversation

@larryliu0820
Copy link
Contributor

Summary:
Add --onnx flag to export.py that also exports models to ONNX format
alongside the standard .pte export. This enables baseline TRT benchmarking
via the benchmark binary which can compile and benchmark ONNX models
directly using TensorRT's native ONNX parser.

The ONNX export uses torch.onnx.export() which is built into PyTorch,
so no additional dependencies are needed.

Differential Revision: D94703568

Summary:
Establish the foundation for integrating TensorRT as an ExecuTorch delegate backend.

TensorRT enables GPU-accelerated inference on NVIDIA hardware. This initial
commit sets up the project structure and build configuration to support both
OSS (CMake) and internal (Buck) builds, without functional code.

Differential Revision: D93275046
Summary: Add initial backend stub with empty preprocess implementation and TensorRT backend registration.

Differential Revision: D93275053
Summary: Add partitioner stub that identifies and partitions TensorRT-compatible subgraphs.

Differential Revision: D93275059
Summary: Add OperatorSupport class to define which operators the TensorRT backend can handle.

Differential Revision: D93275052
Summary: Integrate the TensorRT partitioner with the export pipeline for graph partitioning.

Differential Revision: D93275061
Summary: Add infrastructure for registering TensorRT converters for PyTorch operations.

Differential Revision: D93275041
Summary: Add utility functions for TensorRT converter implementations and the add operation converter.

Differential Revision: D93275044
Summary: Add TensorRTCompileSpec dataclass to configure TensorRT compilation options.

Differential Revision: D93275056
Summary: Implement the basic preprocess method for the TensorRT backend.

Differential Revision: D93275049
Summary: Add blob serialization format for TensorRT engines with I/O binding metadata.

Differential Revision: D93275062
Summary: Complete preprocess integration with blob serialization for TensorRT engine compilation.

Differential Revision: D93275051
Summary: Add complete C++ runtime infrastructure for TensorRT backend.

Differential Revision: D93275039
Summary: Add complete C++ runner example for TensorRT-accelerated model inference. Also sets up GitHub Actions CI workflow for automated builds on NVIDIA GPUs.

Differential Revision: D93275050
Summary: Add converters for addmm (fused add + matrix multiply) and permute_copy operations to enable support for linear layers in neural networks.

Differential Revision: D93275060
Summary: Add converters and optimizations to enable MobileNetV3 model support with the TensorRT backend.

Differential Revision: D93275043
Summary: Add converters for embedding, expand, and upsample operations. These enable transformer-based models and upsampling layers.

Differential Revision: D93275040
Summary: Add converters for layer_norm and pixel_shuffle operations. These enable transformer-based models and super-resolution models like EDSR.

Differential Revision: D93275054
Summary: Add Scaled Dot-Product Attention (SDPA) converter to enable transformer-based attention layers.

Differential Revision: D93275047
Summary: Add batch matrix multiplication (bmm) converter to enable transformer attention layers and batch matrix operations.

Differential Revision: D93275048
Summary: Add converters for comparison and slicing operations.

Differential Revision: D93275055
Summary: Add comprehensive correctness tests for TensorRT backend including model export and inference validation.

Differential Revision: D93275042
Summary: Add benchmark runners and infrastructure for measuring TensorRT inference performance including latency and throughput metrics.

Differential Revision: D93275038
Summary: Add converters for power and unary operations to support additional model architectures.

Differential Revision: D93275058
Summary: Share a single CUDA stream across all TensorRT delegate instances instead of creating a per-delegate stream. This improves performance for serialized execution (the common case) by eliminating synchronization overhead between subgraphs.

Differential Revision: D93778115
Summary:
Bring torch-tensorrt 2.10.0 into fbsource as a pure-Python buck target for
benchmarking torch-tensorrt vs ExecuTorch TensorRT backend.

The Python sources are extracted from the PyPI wheel with minimal patches:
- __init__.py: skip libtorchtrt.so loading (dynamo IR works without C++ runtime)
- _TorchTensorRTModule.py: hardcode runtime constants (C++ ops not available)
- _features.py: handle missing tensorrt._package_name attribute

The library depends on //caffe2:torch and //deeplearning/trt/python:py_tensorrt
(which provides TensorRT 10.3 Python bindings via the existing PACKAGE config).

Also updates export.py to support custom models via --custom-model module:callable,
and adds create_mdm_for_benchmark() factory to MDM utils for benchmarking.

Differential Revision: D94608890
Summary:
Add --onnx flag to export.py that also exports models to ONNX format
alongside the standard .pte export. This enables baseline TRT benchmarking
via the benchmark binary which can compile and benchmark ONNX models
directly using TensorRT's native ONNX parser.

The ONNX export uses torch.onnx.export() which is built into PyTorch,
so no additional dependencies are needed.

Differential Revision: D94703568
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 28, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17771

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 2 Unrelated Failures

As of commit c08953c with merge base 9f2f005 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 28, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Feb 28, 2026

@larryliu0820 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94703568.

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported fx meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants