Preview branch that addresses a number of reported bugs by narendasan · Pull Request #4138 · pytorch/TensorRT

narendasan · 2026-03-19T00:28:05Z

Description

Previews complex tensor support, rms norm support, and latest main + porting the profiler from the legacy profiler.

Fixes #4137 #4135

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

…mplex numerics, including complex tensor I/O Introduce a new infrastructure in the replace complex pass to handle a number of cases where simply just unpacking complex tensors is not sufficent for supporting the numerics correctly. This pass also now captures meta data about the original call signature so that during graph construction, the original calling convention is preserved and the runtimes do not need any specialization on supporting complex types.

…plex numerics

…ment that marks nodes that are complex

… pytorch rather than fail to build

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_index_bool_split_aten.py	2026-03-19 00:28:21.633730+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_index_bool_split_aten.py	2026-03-19 00:28:42.141701+00:00
@@ -4,10 +4,11 @@
1. `index_has_bool_indices` validator correctly distinguishes bool vs int indices.
2. Integer-indexed `aten.index.Tensor` routes to the converter WITHOUT output allocator.
3. Boolean-indexed `aten.index.Tensor` routes to the converter WITH output allocator.
4. Both paths produce correct results.
"""
+
import unittest
from unittest.mock import MagicMock

import torch
import torch.nn as nn
@@ -58,13 +59,11 @@
        node = _make_index_node([None, torch.tensor([True, False])])
        self.assertTrue(index_has_bool_indices(node))

    def test_mixed_int_and_bool_returns_true(self):
        """If any index is bool, the function should return True."""
-        node = _make_index_node(
-            [torch.tensor([0, 1]), torch.tensor([True, False])]
-        )
+        node = _make_index_node([torch.tensor([0, 1]), torch.tensor([True, False])])
        self.assertTrue(index_has_bool_indices(node))

    def test_all_none_returns_false(self):
        node = _make_index_node([None, None])
        self.assertFalse(index_has_bool_indices(node))

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/execute_engine.cpp b/tmp/changes.txt
index 94264f0..fdc06a1 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/execute_engine.cpp
+++ b/tmp/changes.txt
@@ -412,24 +412,24 @@ std::vector<at::Tensor> execute_engine(std::vector<at::Tensor> inputs, c10::intr
    { // Output Collection
      RECORD_USER_SCOPE("torch_tensorrt_execute_engine::OutputCollection");
      for (size_t i = 0; i < compiled_engine->out_binding_names.size(); i++) {
-      auto name = compiled_engine->out_binding_names[i];
-      auto dims = compiled_engine->output_allocator->getShapes().at(name);
-      auto dtype =
-          util::TRTDataTypeToScalarType(compiled_engine->exec_ctx->getEngine().getTensorDataType(name.c_str()));
-      at::Tensor output = compiled_engine->output_allocator->getBuffers().at(name).clone().detach();
-      int64_t prod = 1;
-      for (int i = 0; i < dims.nbDims; ++i) {
-        prod *= dims.d[i];
-      }
-      std::vector<int64_t> shape(dims.nbDims);
-      for (int i = 0; i < dims.nbDims; ++i) {
-        shape[i] = dims.d[i];
+        auto name = compiled_engine->out_binding_names[i];
+        auto dims = compiled_engine->output_allocator->getShapes().at(name);
+        auto dtype =
+            util::TRTDataTypeToScalarType(compiled_engine->exec_ctx->getEngine().getTensorDataType(name.c_str()));
+        at::Tensor output = compiled_engine->output_allocator->getBuffers().at(name).clone().detach();
+        int64_t prod = 1;
+        for (int i = 0; i < dims.nbDims; ++i) {
+          prod *= dims.d[i];
+        }
+        std::vector<int64_t> shape(dims.nbDims);
+        for (int i = 0; i < dims.nbDims; ++i) {
+          shape[i] = dims.d[i];
+        }
+        // When using the OutputAllocator, the allocated buffer might be larger than the size of the output,
+        // so we need to reshape the buffer to the output shape
+        output = output.reshape(-1).view(dtype).slice(0, 0, prod).reshape(shape);
+        outputs.push_back(output);
      }
-      // When using the OutputAllocator, the allocated buffer might be larger than the size of the output,
-      // so we need to reshape the buffer to the output shape
-      output = output.reshape(-1).view(dtype).slice(0, 0, prod).reshape(shape);
-      outputs.push_back(output);
-    }
    } // End Output Collection

    if (compiled_engine->profile_execution) {
ERROR: Some files do not conform to style guidelines

cehongwang and others added 11 commits February 23, 2026 22:33

squash the commit

3873e31

support attn_bias for efficient sdpa

a88200f

feat(//py/torch_tensorrt/dynamo): Allow the refit system to cache com…

0b449a7

…plex numerics

docs: Add documentation on how complex numerics works

ed435c1

Instead of keying on shapes we add metadata prior to subgraph replace…

f1b04cd

…ment that marks nodes that are complex

docs: update for new metadata approach

3fcd86b

feat: Complex operations which are not supported will now fallback to…

e16b612

… pytorch rather than fail to build

chore: addressing PR AIs

00d0389

remove the old complex subgraph detection

fb4fc99

feat: Updating the profiler to the new kineto based one in PyTorch

4e9c0a6

meta-cla bot added the cla signed label Mar 19, 2026

github-actions bot requested a review from zewenli98 March 19, 2026 00:28

github-actions bot requested changes Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preview branch that addresses a number of reported bugs#4138

Preview branch that addresses a number of reported bugs#4138
narendasan wants to merge 11 commits intomainfrom
push-xrxwyywvtqkz

narendasan commented Mar 19, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

narendasan commented Mar 19, 2026

Description

Type of change

Checklist:

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants