Arm backend: Fix rejection criteria of TRANSPOSE from VIEW#19044
Arm backend: Fix rejection criteria of TRANSPOSE from VIEW#19044gggekov wants to merge 1 commit intopytorch:mainfrom
Conversation
When delegating a VIEW for Ethos-U55, we were overly pessimistic whether we can delegate the TRANSPOSE that is needed for the NHWC -> NCHW or NCHW -> NHWC permutation. As a result, some RESHAPEs were left-over to the CPU when actually they could have been run on NPU. Signed-off-by: George Gekov <george.gekov@arm.com> Change-Id: I34cc3b38cf0dbb0ceee32ac5d0044805c4e1f085
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19044
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (3 Unrelated Failures)As of commit 415811f with merge base 63217d2 ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR updates the Arm Ethos-U55 VIEW support checks to be less pessimistic about whether required layout transposes can be delegated, aiming to reduce CPU fallbacks (e.g., leftover reshapes) when NHWC↔NCHW permutations are needed.
Changes:
- Add new VIEW test cases intended to exercise delegation scenarios that require NHWC↔NCHW transposes.
- Update Ethos-U55 VIEW transpose feasibility checks from “full axis product” to a more targeted axis-product constraint.
- Tighten the U55 VIEW INT test to assert that
aten_view_copyis not left in EXIR after lowering/partitioning.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| backends/arm/test/ops/test_view.py | Adds/adjusts VIEW test vectors and strengthens the U55 INT test expectations. |
| backends/arm/operator_support/ethos_u55_support.py | Refines U55 VIEW transpose eligibility checks using a new axis-product helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| rank_product_too_large = { | ||
| "rand_4d_large": lambda: (torch.rand(1, 49, 16, 128), (1, 16, 49, 128)), | ||
| "rand_5d_large": lambda: (torch.rand(2, 25, 16, 8, 64), (2, 16, 25, 8, 64)), | ||
| "rand_5d_large": lambda: (torch.rand(2, 256, 512, 8, 64), (2, 512, 256, 8, 64)), |
There was a problem hiding this comment.
rand_5d_large allocates a very large tensor (2×256×512×8×64 ≈ 134M elements ≈ 512MB in fp32), which is likely to cause OOMs or very slow CI runs. Consider using a much smaller shape that still violates the U55 transpose axis-product constraint (e.g. choose a rank-5 shape where at least one (rank-2) axis-product exceeds 65536 but total elements stay small).
| "rand_5d_large": lambda: (torch.rand(2, 256, 512, 8, 64), (2, 512, 256, 8, 64)), | |
| "rand_5d_large": lambda: (torch.rand(1, 257, 256, 1, 2), (1, 256, 257, 1, 2)), |
| "rand_5d_3d": lambda: (torch.rand(1, 1, 4, 5, 6), (2, 3, -1)), | ||
| "rand_3d_5d": lambda: (torch.rand(4, 5, 6), (1, 1, 2, -1, 3)), | ||
| "rank4_rank3_large": lambda: (torch.rand(1, 256, 6, 48), (6, 48, 256)), | ||
| "rank5_rank_4_large": lambda: (torch.rand(1, 256, 2, 3, 48), (1, 256, 6, 48)), |
There was a problem hiding this comment.
Test case key rank5_rank_4_large is inconsistent with the surrounding naming (rank4_rank3_large) and looks like a typo (rank_4). Consider renaming to something like rank5_rank4_large/rank5_to_rank4_large to keep the test matrix easy to read and search.
| "rank5_rank_4_large": lambda: (torch.rand(1, 256, 2, 3, 48), (1, 256, 6, 48)), | |
| "rank5_rank4_large": lambda: (torch.rand(1, 256, 2, 3, 48), (1, 256, 6, 48)), |
| def _max_product_axis(self, shape: shape_t): | ||
| """Return shape padded to rank4. | ||
| Args: | ||
| shape (shape_t): Shape. | ||
|
|
There was a problem hiding this comment.
_max_product_axis’s docstring is misleading/inaccurate: it says it “Return[s] shape padded to rank4” but the function actually returns a boolean, and it doesn’t pad the shape. Consider renaming the helper to reflect that it validates axis-product constraints (and add a -> bool return type), and update the docstring to describe the boolean semantics and the rule being enforced.
| # For TRANSPOSE originating from a VIEW, we know we will only do | ||
| # NHWC -> NCHW or NCHW -> NHWC permutations, hence we only need to validate | ||
| # these two TRANSPOSEs. For the general case of any permutation on TRANSPOSE, | ||
| # we reason via the checks in EthosU55TransposeCheck | ||
| if needs_input_transpose and not (self._max_product_axis(input_shape)): | ||
| self.reporter.report_reject( |
There was a problem hiding this comment.
The new _max_product_axis constraint for view-inserted transposes (checking all (rank-2)-axis products) is stricter than the existing U55 permute constraints encoded in EthosU55TransposeCheck for NHWC<->NCHW-like permutations (which only bounds N*H, W, and C). This divergence can lead to inconsistent accept/reject decisions between a VIEW-induced transpose and an explicit permute_copy. Consider reusing/factoring the transpose constraint logic so both paths enforce the same rule set for the relevant permutation(s).
When delegating a VIEW for Ethos-U55, we were overly pessimistic whether we can delegate the TRANSPOSE that is needed for the NHWC -> NCHW or NCHW -> NHWC permutation. As a result, some RESHAPEs were left-over to the CPU when actually they could have been run on NPU.
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell