Add ONNX export support for baseline TRT benchmarking by larryliu0820 · Pull Request #17771 · pytorch/executorch

larryliu0820 · 2026-02-28T19:39:45Z

Summary:
Add --onnx flag to export.py that also exports models to ONNX format
alongside the standard .pte export. This enables baseline TRT benchmarking
via the benchmark binary which can compile and benchmark ONNX models
directly using TensorRT's native ONNX parser.

The ONNX export uses torch.onnx.export() which is built into PyTorch,
so no additional dependencies are needed.

Differential Revision: D94703568

Summary: Establish the foundation for integrating TensorRT as an ExecuTorch delegate backend. TensorRT enables GPU-accelerated inference on NVIDIA hardware. This initial commit sets up the project structure and build configuration to support both OSS (CMake) and internal (Buck) builds, without functional code. Differential Revision: D93275046

Summary: Add initial backend stub with empty preprocess implementation and TensorRT backend registration. Differential Revision: D93275053

Summary: Add partitioner stub that identifies and partitions TensorRT-compatible subgraphs. Differential Revision: D93275059

Summary: Add OperatorSupport class to define which operators the TensorRT backend can handle. Differential Revision: D93275052

Summary: Integrate the TensorRT partitioner with the export pipeline for graph partitioning. Differential Revision: D93275061

Summary: Add infrastructure for registering TensorRT converters for PyTorch operations. Differential Revision: D93275041

Summary: Add utility functions for TensorRT converter implementations and the add operation converter. Differential Revision: D93275044

Summary: Add TensorRTCompileSpec dataclass to configure TensorRT compilation options. Differential Revision: D93275056

Summary: Implement the basic preprocess method for the TensorRT backend. Differential Revision: D93275049

Summary: Add blob serialization format for TensorRT engines with I/O binding metadata. Differential Revision: D93275062

Summary: Complete preprocess integration with blob serialization for TensorRT engine compilation. Differential Revision: D93275051

Summary: Add complete C++ runtime infrastructure for TensorRT backend. Differential Revision: D93275039

Summary: Add complete C++ runner example for TensorRT-accelerated model inference. Also sets up GitHub Actions CI workflow for automated builds on NVIDIA GPUs. Differential Revision: D93275050

Differential Revision: D93275045

Summary: Add converters for addmm (fused add + matrix multiply) and permute_copy operations to enable support for linear layers in neural networks. Differential Revision: D93275060

Summary: Add converters and optimizations to enable MobileNetV3 model support with the TensorRT backend. Differential Revision: D93275043

Summary: Add converters for embedding, expand, and upsample operations. These enable transformer-based models and upsampling layers. Differential Revision: D93275040

Summary: Add converters for layer_norm and pixel_shuffle operations. These enable transformer-based models and super-resolution models like EDSR. Differential Revision: D93275054

Summary: Add Scaled Dot-Product Attention (SDPA) converter to enable transformer-based attention layers. Differential Revision: D93275047

Summary: Add batch matrix multiplication (bmm) converter to enable transformer attention layers and batch matrix operations. Differential Revision: D93275048

Summary: Add converters for comparison and slicing operations. Differential Revision: D93275055

Summary: Add comprehensive correctness tests for TensorRT backend including model export and inference validation. Differential Revision: D93275042

Summary: Add benchmark runners and infrastructure for measuring TensorRT inference performance including latency and throughput metrics. Differential Revision: D93275038

Summary: Add converters for power and unary operations to support additional model architectures. Differential Revision: D93275058

Summary: Share a single CUDA stream across all TensorRT delegate instances instead of creating a per-delegate stream. This improves performance for serialized execution (the common case) by eliminating synchronization overhead between subgraphs. Differential Revision: D93778115

Summary: Bring torch-tensorrt 2.10.0 into fbsource as a pure-Python buck target for benchmarking torch-tensorrt vs ExecuTorch TensorRT backend. The Python sources are extracted from the PyPI wheel with minimal patches: - __init__.py: skip libtorchtrt.so loading (dynamo IR works without C++ runtime) - _TorchTensorRTModule.py: hardcode runtime constants (C++ ops not available) - _features.py: handle missing tensorrt._package_name attribute The library depends on //caffe2:torch and //deeplearning/trt/python:py_tensorrt (which provides TensorRT 10.3 Python bindings via the existing PACKAGE config). Also updates export.py to support custom models via --custom-model module:callable, and adds create_mdm_for_benchmark() factory to MDM utils for benchmarking. Differential Revision: D94608890

Summary: Add --onnx flag to export.py that also exports models to ONNX format alongside the standard .pte export. This enables baseline TRT benchmarking via the benchmark binary which can compile and benchmark ONNX models directly using TensorRT's native ONNX parser. The ONNX export uses torch.onnx.export() which is built into PyTorch, so no additional dependencies are needed. Differential Revision: D94703568

pytorch-bot · 2026-02-28T19:39:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17771

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 2 Unrelated Failures

As of commit c08953c with merge base 9f2f005 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_linear_model
Test TensorRT Backend / check-all-tensorrt-tests (gh)
ERROR: TensorRT build test failed!
Test TensorRT Backend / test-export / linux-job (gh)
RuntimeError: Command docker exec -t 45bf72b3ebae95cc65b18baf89dd6ee5e2520d964f4465091d50b74a0b34aa12 /exec failed with exit code 4
Test TensorRT Backend / test-models-tensorrt-cpp (add) / linux-job (gh)
RuntimeError: Command docker exec -t 691d6752adf34fa34ed7c5245b879efac758c435a74852790650aeda71825593 /exec failed with exit code 1
Test TensorRT Backend / test-models-tensorrt-cpp (linear) / linux-job (gh)
RuntimeError: Command docker exec -t 45b3a208d2a0ed6765c6d8c79e170b37aef7ff4eac7e35678f141503541e39b2 /exec failed with exit code 1
Test TensorRT Backend / test-models-tensorrt-cpp (mul) / linux-job (gh)
RuntimeError: Command docker exec -t 07dd4f7900e0bd82f79e1f9e1afd9dd2246b043aaaa2ccba9f2e107619315961 /exec failed with exit code 1
Test TensorRT Backend / test-tensorrt-build / linux-job (gh)
RuntimeError: Command docker exec -t f26cfc11d39a442c61a530ef9d040f76c96c5eeb27f8ad7801e6b56e0f952deb /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Test CUDA Windows Export and E2E / test-model-cuda-windows-e2e (mistralai, Voxtral-Mini-3B-2507, non-quantized) / windows-job (gh) (trunk failure)
Process completed with exit code 1.
Test CUDA Windows Export and E2E / test-model-cuda-windows-e2e (mistralai, Voxtral-Mini-3B-2507, quantized-int4-weight-only) / windows-job (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-02-28T19:40:12Z

@larryliu0820 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94703568.

github-actions · 2026-02-28T19:41:24Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

shoumikhin added 27 commits February 26, 2026 23:52

Add TensorRT backend stub

57f0661

Summary: Add initial backend stub with empty preprocess implementation and TensorRT backend registration. Differential Revision: D93275053

Add TensorRT partitioner stub

4944ad8

Summary: Add partitioner stub that identifies and partitions TensorRT-compatible subgraphs. Differential Revision: D93275059

Add operator support base class

7e8049a

Summary: Add OperatorSupport class to define which operators the TensorRT backend can handle. Differential Revision: D93275052

Integrate partitioner

7a94d91

Summary: Integrate the TensorRT partitioner with the export pipeline for graph partitioning. Differential Revision: D93275061

Add converter registry infrastructure

084bdb4

Summary: Add infrastructure for registering TensorRT converters for PyTorch operations. Differential Revision: D93275041

Add converter utils and add converter

0209511

Summary: Add utility functions for TensorRT converter implementations and the add operation converter. Differential Revision: D93275044

Add TensorRTCompileSpec dataclass

a94ce67

Summary: Add TensorRTCompileSpec dataclass to configure TensorRT compilation options. Differential Revision: D93275056

Implement preprocess basic implementation

f43fc99

Summary: Implement the basic preprocess method for the TensorRT backend. Differential Revision: D93275049

Add blob serialization format with I/O metadata

45a7a6a

Summary: Add blob serialization format for TensorRT engines with I/O binding metadata. Differential Revision: D93275062

Complete preprocess integration with serialization

6d7d20a

Summary: Complete preprocess integration with blob serialization for TensorRT engine compilation. Differential Revision: D93275051

Add C++ runtime infrastructure

c25dc14

Summary: Add complete C++ runtime infrastructure for TensorRT backend. Differential Revision: D93275039

Add examples, C++ runner and CI workflow

61165bd

Summary: Add complete C++ runner example for TensorRT-accelerated model inference. Also sets up GitHub Actions CI workflow for automated builds on NVIDIA GPUs. Differential Revision: D93275050

Add additional converters (mul, sub, div, mm, relu)

0b3f0a5

Differential Revision: D93275045

Add linear model support with addmm and permute_copy converters

ebb84e7

Summary: Add converters for addmm (fused add + matrix multiply) and permute_copy operations to enable support for linear layers in neural networks. Differential Revision: D93275060

Enable MobileNetV3 model support

834a647

Summary: Add converters and optimizations to enable MobileNetV3 model support with the TensorRT backend. Differential Revision: D93275043

Add embedding, expand, upsample converters

b0cc5aa

Summary: Add converters for embedding, expand, and upsample operations. These enable transformer-based models and upsampling layers. Differential Revision: D93275040

Add layer_norm and pixel_shuffle converters

6c7dccd

Summary: Add converters for layer_norm and pixel_shuffle operations. These enable transformer-based models and super-resolution models like EDSR. Differential Revision: D93275054

Add SDPA converter

6df55f0

Summary: Add Scaled Dot-Product Attention (SDPA) converter to enable transformer-based attention layers. Differential Revision: D93275047

Add bmm converter

1b0d347

Summary: Add batch matrix multiplication (bmm) converter to enable transformer attention layers and batch matrix operations. Differential Revision: D93275048

Add comparison and slice converters

e9a84c6

Summary: Add converters for comparison and slicing operations. Differential Revision: D93275055

Add correctness tests

79ba645

Summary: Add comprehensive correctness tests for TensorRT backend including model export and inference validation. Differential Revision: D93275042

Add benchmark infrastructure

6b079d5

Summary: Add benchmark runners and infrastructure for measuring TensorRT inference performance including latency and throughput metrics. Differential Revision: D93275038

Add power and unary converters with model improvements

a092025

Summary: Add converters for power and unary operations to support additional model architectures. Differential Revision: D93275058

larryliu0820 requested a review from kirklandsign as a code owner February 28, 2026 19:39

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 28, 2026

meta-codesync bot added fb-exported meta-exported labels Feb 28, 2026

facebook-github-bot added the fx label Feb 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ONNX export support for baseline TRT benchmarking#17771

Add ONNX export support for baseline TRT benchmarking#17771
larryliu0820 wants to merge 27 commits intomainfrom
export-D94703568

larryliu0820 commented Feb 28, 2026

Uh oh!

pytorch-bot bot commented Feb 28, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

larryliu0820 commented Feb 28, 2026

Uh oh!

pytorch-bot bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17771

❌ 7 New Failures, 2 Unrelated Failures

Uh oh!

meta-codesync bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Feb 28, 2026 •

edited

Loading

This PR needs a `release notes:` label