* [PATCH v3 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround
@ 2026-06-01 10:48 Ashish Mhetre
2026-06-01 10:48 ` [PATCH v3 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions Ashish Mhetre
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Ashish Mhetre @ 2026-06-01 10:48 UTC (permalink / raw)
To: will, robin.murphy, joro, jgg, nicolinc
Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
Nvidia Tegra264 SMMUs are affected by an erratum where a TLB entry can
survive an invalidation that races with concurrent traffic targeting
the same entry. The hardware-recommended software workaround is to
issue every CFGI/TLBI command (each followed by CMD_SYNC) twice. The
second issue must execute only after the first issue's CMD_SYNC has
completed, giving the sequence:
TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC
ATC_INV is not affected and must not be doubled.
This series implements the workaround by hooking the duplication into
arm_smmu_cmdq_issue_cmdlist(), the single chokepoint that every
synchronous CMDQ submission flows through.
Patch 1 is a preparatory refactor that factors the existing batch
force-sync conditions out of arm_smmu_cmdq_batch_add_cmd_p() into a
new arm_smmu_cmdq_batch_force_sync() helper. No functional change.
Patch 2 detects affected instances using the existing
"nvidia,tegra264-smmu" compatible string, exposes the condition via a
new ARM_SMMU_OPT_TLBI_TWICE option bit, and adds a static-inline
arm_smmu_cmd_needs_tlbi_twice() classifier in arm-smmu-v3.h so that
both the in-tree CMDQ path and the iommufd VSMMU path can share a
single predicate.
Patch 3 wires the workaround in. arm_smmu_cmdq_issue_cmdlist() becomes
a thin wrapper that re-issues a synced cmdlist a second time when the
first command needs doubling. The Tegra264 condition is added to
arm_smmu_cmdq_batch_force_sync() so a full batch carrying CFGI/TLBI
commands flushes with sync=true and is then doubled. The iommufd
VSMMU path (arm_vsmmu_cache_invalidate()) is also taught to split the
user-supplied batch at every "needs doubling" / "doesn't need
doubling" transition via a new arm_vsmmu_can_batch_cmd() predicate,
since that path can otherwise mix CFGI/TLBI with ATC_INV in a single
submission.
The series is based on Jason Gunthorpe's "Remove SMMUv3
struct arm_smmu_cmdq_ent" series [1], specifically commit 13428b0bf794
("iommu/arm-smmu-v3: Directly encode TLBI commands") which is the
final patch of that series in linux-next.
[1] https://lore.kernel.org/all/177919957385.1012282.14787407041669291032.b4-ty@kernel.org/
Changes since v2:
- Add new prep patch 1/3 that factors the existing force-sync
conditions into arm_smmu_cmdq_batch_force_sync() (from Nicolin).
- Move arm_smmu_cmd_needs_tlbi_twice() to arm-smmu-v3.h as static
inline taking (smmu, cmd*) and folding in the option check.
- Plug the Tegra264 condition into arm_smmu_cmdq_batch_force_sync()
instead of carrying a separate need_sync in batch_add_cmd_p().
- Fix iommufd batching: arm_vsmmu_cache_invalidate() can mix
CFGI/TLBI with ATC_INV in one batch. Split at the boundary via a
new arm_vsmmu_can_batch_cmd() predicate.
- Patch 2 wording: "next patch wires" -> "a subsequent change will
wire".
v2: https://lore.kernel.org/all/20260529140830.629738-1-amhetre@nvidia.com/
Nicolin Chen (1):
iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions
Ashish Mhetre (2):
iommu/arm-smmu-v3: Detect Tegra264 erratum
iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 23 ++++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 66 +++++++++++++++----
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 40 +++++++++++
3 files changed, 117 insertions(+), 12 deletions(-)
base-commit: 13428b0bf7947098daf9a1db14a74d33eb1b5079
--
2.50.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions
2026-06-01 10:48 [PATCH v3 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
@ 2026-06-01 10:48 ` Ashish Mhetre
2026-06-02 19:55 ` Will Deacon
2026-06-01 10:48 ` [PATCH v3 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre
` (2 subsequent siblings)
3 siblings, 1 reply; 16+ messages in thread
From: Ashish Mhetre @ 2026-06-01 10:48 UTC (permalink / raw)
To: will, robin.murphy, joro, jgg, nicolinc
Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
From: Nicolin Chen <nicolinc@nvidia.com>
arm_smmu_cmdq_batch_add_cmd_p() carries two distinct reasons for
flushing the current batch with a CMD_SYNC before appending the
new command:
- The batch's pre-assigned cmdq does not support the new command.
- The Arm erratum 2812531 workaround (ARM_SMMU_OPT_CMDQ_FORCE_SYNC)
forces a SYNC at one entry before the batch is full.
Lift those checks into a new arm_smmu_cmdq_batch_force_sync() helper
so that adding another force-sync condition becomes a one-line
addition. No functional change.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++++++++++-----
1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9be589d14a3b..4d29bd343460 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -847,16 +847,30 @@ static void arm_smmu_cmdq_batch_init_cmd(struct arm_smmu_device *smmu,
cmds->cmdq = arm_smmu_get_cmdq(smmu, cmd);
}
+static bool arm_smmu_cmdq_batch_force_sync(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq_batch *cmds,
+ struct arm_smmu_cmd *cmd)
+{
+ if (!cmds->num)
+ return false;
+
+ /* The batch's pre-assigned cmdq doesn't support the new command */
+ if (!arm_smmu_cmdq_supports_cmd(cmds->cmdq, cmd))
+ return true;
+
+ /* Arm erratum 2812531 */
+ if (cmds->num == CMDQ_BATCH_ENTRIES - 1 &&
+ (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC))
+ return true;
+
+ return false;
+}
+
static void arm_smmu_cmdq_batch_add_cmd_p(struct arm_smmu_device *smmu,
struct arm_smmu_cmdq_batch *cmds,
struct arm_smmu_cmd *cmd)
{
- bool force_sync = (cmds->num == CMDQ_BATCH_ENTRIES - 1) &&
- (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC);
- bool unsupported_cmd;
-
- unsupported_cmd = !arm_smmu_cmdq_supports_cmd(cmds->cmdq, cmd);
- if (force_sync || unsupported_cmd) {
+ if (arm_smmu_cmdq_batch_force_sync(smmu, cmds, cmd)) {
arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmdq, cmds->cmds,
cmds->num, true);
arm_smmu_cmdq_batch_init_cmd(smmu, cmds, cmd);
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum
2026-06-01 10:48 [PATCH v3 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
2026-06-01 10:48 ` [PATCH v3 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions Ashish Mhetre
@ 2026-06-01 10:48 ` Ashish Mhetre
2026-06-02 20:13 ` Will Deacon
2026-06-01 10:48 ` [PATCH v3 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre
2026-06-02 16:31 ` [PATCH v3 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Mostafa Saleh
3 siblings, 1 reply; 16+ messages in thread
From: Ashish Mhetre @ 2026-06-01 10:48 UTC (permalink / raw)
To: will, robin.murphy, joro, jgg, nicolinc
Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
Tegra264 SMMU is affected by erratum where a TLB entry can survive an
invalidation that races with concurrent traffic targeting the same
entry. The hardware-recommended software workaround is to issue every
CFGI/TLBI command (each followed by CMD_SYNC) twice. The second issue
is guaranteed to evict the entry. ATC_INV is not affected and must not
be doubled.
The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it
cannot be detected from hardware ID. Tegra264 boots from device tree
only and has no ACPI/IORT support, so detection is through device
tree only.
Add the ARM_SMMU_OPT_TLBI_TWICE option and set it on instances matching
the existing "nvidia,tegra264-smmu" compatible. Also add a
static-inline arm_smmu_cmd_needs_tlbi_twice() classifier in
arm-smmu-v3.h so that subsequent changes wiring the workaround into the
CMDQ submission and iommufd batching paths can share a single
predicate.
No callers consume the option yet; a subsequent change will wire the
workaround into the CMDQ issue paths.
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 40 +++++++++++++++++++++
2 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4d29bd343460..08684bd40a6d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -5243,8 +5243,10 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
if (of_dma_is_coherent(dev->of_node))
smmu->features |= ARM_SMMU_FEAT_COHERENCY;
- if (of_device_is_compatible(dev->of_node, "nvidia,tegra264-smmu"))
+ if (of_device_is_compatible(dev->of_node, "nvidia,tegra264-smmu")) {
tegra_cmdqv_dt_probe(dev->of_node, smmu);
+ smmu->options |= ARM_SMMU_OPT_TLBI_TWICE;
+ }
return ret;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 16353596e08a..106034c348a1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -928,6 +928,14 @@ struct arm_smmu_device {
#define ARM_SMMU_OPT_MSIPOLL (1 << 2)
#define ARM_SMMU_OPT_CMDQ_FORCE_SYNC (1 << 3)
#define ARM_SMMU_OPT_TEGRA241_CMDQV (1 << 4)
+/*
+ * Tegra264 erratum: a TLB entry can survive an invalidation that races
+ * with concurrent traffic targeting the same entry. The software
+ * workaround is to issue every CFGI/TLBI command twice, each followed
+ * by CMD_SYNC. The second issue is guaranteed to evict the entry.
+ * ATC_INV commands are not affected and must not be doubled.
+ */
+#define ARM_SMMU_OPT_TLBI_TWICE (1 << 5)
u32 options;
struct arm_smmu_cmdq cmdq;
@@ -1211,6 +1219,38 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
struct arm_smmu_cmd *cmds, int n,
bool sync);
+/*
+ * Returns true if @cmd is one of the CFGI_* or TLBI_* commands covered
+ * by the Tegra264 erratum (see ARM_SMMU_OPT_TLBI_TWICE) on an affected
+ * SMMU instance.
+ */
+static inline bool arm_smmu_cmd_needs_tlbi_twice(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmd *cmd)
+{
+ if (!(smmu->options & ARM_SMMU_OPT_TLBI_TWICE))
+ return false;
+
+ switch (FIELD_GET(CMDQ_0_OP, cmd->data[0])) {
+ case CMDQ_OP_CFGI_STE:
+ case CMDQ_OP_CFGI_ALL:
+ case CMDQ_OP_CFGI_CD:
+ case CMDQ_OP_CFGI_CD_ALL:
+ case CMDQ_OP_TLBI_NH_ALL:
+ case CMDQ_OP_TLBI_NH_ASID:
+ case CMDQ_OP_TLBI_NH_VA:
+ case CMDQ_OP_TLBI_NH_VAA:
+ case CMDQ_OP_TLBI_EL2_ALL:
+ case CMDQ_OP_TLBI_EL2_ASID:
+ case CMDQ_OP_TLBI_EL2_VA:
+ case CMDQ_OP_TLBI_S12_VMALL:
+ case CMDQ_OP_TLBI_S2_IPA:
+ case CMDQ_OP_TLBI_NSNH_ALL:
+ return true;
+ default:
+ return false;
+ }
+}
+
#ifdef CONFIG_ARM_SMMU_V3_SVA
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
void arm_smmu_sva_notifier_synchronize(void);
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
2026-06-01 10:48 [PATCH v3 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
2026-06-01 10:48 ` [PATCH v3 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions Ashish Mhetre
2026-06-01 10:48 ` [PATCH v3 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre
@ 2026-06-01 10:48 ` Ashish Mhetre
2026-06-01 18:37 ` Nicolin Chen
2026-06-02 20:22 ` Will Deacon
2026-06-02 16:31 ` [PATCH v3 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Mostafa Saleh
3 siblings, 2 replies; 16+ messages in thread
From: Ashish Mhetre @ 2026-06-01 10:48 UTC (permalink / raw)
To: will, robin.murphy, joro, jgg, nicolinc
Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
Apply the workaround for Tegra264 erratum ARM_SMMU_OPT_TLBI_TWICE by
issuing every CFGI/TLBI cmdlist twice on affected SMMU instances, with
CMD_SYNC after each. The erratum requires this exact sequencing:
TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC
Rename the existing arm_smmu_cmdq_issue_cmdlist() to
__arm_smmu_cmdq_issue_cmdlist() and add a thin wrapper that, on
affected SMMUs and when @sync is true, re-issues the same cmdlist a
second time when arm_smmu_cmd_needs_tlbi_twice() is true.
For the in-tree batching path, register the Tegra264 condition with
arm_smmu_cmdq_batch_force_sync() so that a full batch carrying
CFGI/TLBI commands flushes with sync=true.
For iommufd VSMMU path add an arm_vsmmu_can_batch_cmd() predicate that
splits the iommufd batch at cmd which doesn't need doubling.
Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 23 +++++++++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 36 ++++++++++++++++---
2 files changed, 54 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 1e9f7d2de344..78c96a2b652b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -350,6 +350,26 @@ static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu,
return 0;
}
+/*
+ * On Tegra264, arm_smmu_cmdq_issue_cmdlist() doubles every CFGI/TLBI
+ * submission (see ARM_SMMU_OPT_TLBI_TWICE). The doubling decision is
+ * taken once per cmdlist based on the first command, so a single
+ * batch must not mix commands that need doubling with commands that
+ * do not. Split the iommufd batch whenever the next user command
+ * crosses that boundary.
+ */
+static bool arm_vsmmu_can_batch_cmd(struct arm_smmu_device *smmu,
+ struct arm_vsmmu_invalidation_cmd *last,
+ struct arm_vsmmu_invalidation_cmd *next)
+{
+ struct arm_smmu_cmd next_cmd = {
+ .data[0] = le64_to_cpu(next->ucmd.cmd[0]),
+ };
+
+ return arm_smmu_cmd_needs_tlbi_twice(smmu, &last->cmd) ==
+ arm_smmu_cmd_needs_tlbi_twice(smmu, &next_cmd);
+}
+
int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
struct iommu_user_data_array *array)
{
@@ -382,7 +402,8 @@ int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
/* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */
cur++;
- if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1)
+ if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1 &&
+ arm_vsmmu_can_batch_cmd(smmu, last, cur))
continue;
/* FIXME always uses the main cmdq rather than trying to group by type */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 08684bd40a6d..f38c21b56f28 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -698,10 +698,10 @@ static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq,
* insert their own list of commands then all of the commands from one
* CPU will appear before any of the commands from the other CPU.
*/
-int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
- struct arm_smmu_cmdq *cmdq,
- struct arm_smmu_cmd *cmds, int n,
- bool sync)
+static int __arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq *cmdq,
+ struct arm_smmu_cmd *cmds, int n,
+ bool sync)
{
struct arm_smmu_cmd cmd_sync;
u32 prod;
@@ -820,6 +820,26 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
return ret;
}
+int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq *cmdq,
+ struct arm_smmu_cmd *cmds, int n,
+ bool sync)
+{
+ int ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync);
+
+ /*
+ * On Tegra264 (see ARM_SMMU_OPT_TLBI_TWICE) re-issue the same
+ * cmdlist with another CMD_SYNC to satisfy the erratum.
+ * Callers must ensure the batch carries a uniform opcode class
+ * so that checking the first command is enough; the iommufd
+ * VSMMU path enforces this with arm_vsmmu_can_batch_cmd().
+ */
+ if (!ret && sync && arm_smmu_cmd_needs_tlbi_twice(smmu, &cmds[0]))
+ ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync);
+
+ return ret;
+}
+
static int arm_smmu_cmdq_issue_cmd_p(struct arm_smmu_device *smmu,
struct arm_smmu_cmd *cmd, bool sync)
{
@@ -863,6 +883,14 @@ static bool arm_smmu_cmdq_batch_force_sync(struct arm_smmu_device *smmu,
(smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC))
return true;
+ /*
+ * Tegra264 erratum (see ARM_SMMU_OPT_TLBI_TWICE). The batch holds
+ * a uniform opcode class, so checking the first command is enough.
+ */
+ if (cmds->num == CMDQ_BATCH_ENTRIES &&
+ arm_smmu_cmd_needs_tlbi_twice(smmu, &cmds->cmds[0]))
+ return true;
+
return false;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
2026-06-01 10:48 ` [PATCH v3 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre
@ 2026-06-01 18:37 ` Nicolin Chen
2026-06-02 20:22 ` Will Deacon
1 sibling, 0 replies; 16+ messages in thread
From: Nicolin Chen @ 2026-06-01 18:37 UTC (permalink / raw)
To: Ashish Mhetre
Cc: will, robin.murphy, joro, jgg, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Mon, Jun 01, 2026 at 10:48:45AM +0000, Ashish Mhetre wrote:
> Apply the workaround for Tegra264 erratum ARM_SMMU_OPT_TLBI_TWICE by
> issuing every CFGI/TLBI cmdlist twice on affected SMMU instances, with
> CMD_SYNC after each. The erratum requires this exact sequencing:
>
> TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC
>
> Rename the existing arm_smmu_cmdq_issue_cmdlist() to
> __arm_smmu_cmdq_issue_cmdlist() and add a thin wrapper that, on
> affected SMMUs and when @sync is true, re-issues the same cmdlist a
> second time when arm_smmu_cmd_needs_tlbi_twice() is true.
>
> For the in-tree batching path, register the Tegra264 condition with
> arm_smmu_cmdq_batch_force_sync() so that a full batch carrying
> CFGI/TLBI commands flushes with sync=true.
>
> For iommufd VSMMU path add an arm_vsmmu_can_batch_cmd() predicate that
> splits the iommufd batch at cmd which doesn't need doubling.
>
> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround
2026-06-01 10:48 [PATCH v3 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
` (2 preceding siblings ...)
2026-06-01 10:48 ` [PATCH v3 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre
@ 2026-06-02 16:31 ` Mostafa Saleh
2026-06-02 18:23 ` Jason Gunthorpe
3 siblings, 1 reply; 16+ messages in thread
From: Mostafa Saleh @ 2026-06-02 16:31 UTC (permalink / raw)
To: Ashish Mhetre
Cc: will, robin.murphy, joro, jgg, nicolinc, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Mon, Jun 01, 2026 at 10:48:42AM +0000, Ashish Mhetre wrote:
> Nvidia Tegra264 SMMUs are affected by an erratum where a TLB entry can
> survive an invalidation that races with concurrent traffic targeting
> the same entry. The hardware-recommended software workaround is to
> issue every CFGI/TLBI command (each followed by CMD_SYNC) twice. The
> second issue must execute only after the first issue's CMD_SYNC has
> completed, giving the sequence:
This seems quite intrusive, will the TLB entry survive if you push a
full invalidation instead?
Thanks,
Mostafa
>
> TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC
>
> ATC_INV is not affected and must not be doubled.
>
> This series implements the workaround by hooking the duplication into
> arm_smmu_cmdq_issue_cmdlist(), the single chokepoint that every
> synchronous CMDQ submission flows through.
>
> Patch 1 is a preparatory refactor that factors the existing batch
> force-sync conditions out of arm_smmu_cmdq_batch_add_cmd_p() into a
> new arm_smmu_cmdq_batch_force_sync() helper. No functional change.
>
> Patch 2 detects affected instances using the existing
> "nvidia,tegra264-smmu" compatible string, exposes the condition via a
> new ARM_SMMU_OPT_TLBI_TWICE option bit, and adds a static-inline
> arm_smmu_cmd_needs_tlbi_twice() classifier in arm-smmu-v3.h so that
> both the in-tree CMDQ path and the iommufd VSMMU path can share a
> single predicate.
>
> Patch 3 wires the workaround in. arm_smmu_cmdq_issue_cmdlist() becomes
> a thin wrapper that re-issues a synced cmdlist a second time when the
> first command needs doubling. The Tegra264 condition is added to
> arm_smmu_cmdq_batch_force_sync() so a full batch carrying CFGI/TLBI
> commands flushes with sync=true and is then doubled. The iommufd
> VSMMU path (arm_vsmmu_cache_invalidate()) is also taught to split the
> user-supplied batch at every "needs doubling" / "doesn't need
> doubling" transition via a new arm_vsmmu_can_batch_cmd() predicate,
> since that path can otherwise mix CFGI/TLBI with ATC_INV in a single
> submission.
>
> The series is based on Jason Gunthorpe's "Remove SMMUv3
> struct arm_smmu_cmdq_ent" series [1], specifically commit 13428b0bf794
> ("iommu/arm-smmu-v3: Directly encode TLBI commands") which is the
> final patch of that series in linux-next.
>
> [1] https://lore.kernel.org/all/177919957385.1012282.14787407041669291032.b4-ty@kernel.org/
>
> Changes since v2:
> - Add new prep patch 1/3 that factors the existing force-sync
> conditions into arm_smmu_cmdq_batch_force_sync() (from Nicolin).
> - Move arm_smmu_cmd_needs_tlbi_twice() to arm-smmu-v3.h as static
> inline taking (smmu, cmd*) and folding in the option check.
> - Plug the Tegra264 condition into arm_smmu_cmdq_batch_force_sync()
> instead of carrying a separate need_sync in batch_add_cmd_p().
> - Fix iommufd batching: arm_vsmmu_cache_invalidate() can mix
> CFGI/TLBI with ATC_INV in one batch. Split at the boundary via a
> new arm_vsmmu_can_batch_cmd() predicate.
> - Patch 2 wording: "next patch wires" -> "a subsequent change will
> wire".
>
> v2: https://lore.kernel.org/all/20260529140830.629738-1-amhetre@nvidia.com/
>
>
> Nicolin Chen (1):
> iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions
>
> Ashish Mhetre (2):
> iommu/arm-smmu-v3: Detect Tegra264 erratum
> iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
>
> .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 23 ++++++-
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 66 +++++++++++++++----
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 40 +++++++++++
> 3 files changed, 117 insertions(+), 12 deletions(-)
>
>
> base-commit: 13428b0bf7947098daf9a1db14a74d33eb1b5079
> --
> 2.50.1
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround
2026-06-02 16:31 ` [PATCH v3 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Mostafa Saleh
@ 2026-06-02 18:23 ` Jason Gunthorpe
0 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2026-06-02 18:23 UTC (permalink / raw)
To: Mostafa Saleh
Cc: Ashish Mhetre, will, robin.murphy, joro, nicolinc,
linux-arm-kernel, iommu, linux-kernel, linux-tegra
On Tue, Jun 02, 2026 at 04:31:29PM +0000, Mostafa Saleh wrote:
> On Mon, Jun 01, 2026 at 10:48:42AM +0000, Ashish Mhetre wrote:
> > Nvidia Tegra264 SMMUs are affected by an erratum where a TLB entry can
> > survive an invalidation that races with concurrent traffic targeting
> > the same entry. The hardware-recommended software workaround is to
> > issue every CFGI/TLBI command (each followed by CMD_SYNC) twice. The
> > second issue must execute only after the first issue's CMD_SYNC has
> > completed, giving the sequence:
>
> This seems quite intrusive, will the TLB entry survive if you push a
> full invalidation instead?
It's 36 lines and completely contained to the insides of the command
sumbissions code??
Stuff like this is why I was guiding you to use the more code exactly
as is for pkvm. Historically there have been many invalidation related
errata, and invalidate twice seems to be a theme in fixing many of
them.
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions
2026-06-01 10:48 ` [PATCH v3 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions Ashish Mhetre
@ 2026-06-02 19:55 ` Will Deacon
2026-06-02 20:08 ` Nicolin Chen
0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2026-06-02 19:55 UTC (permalink / raw)
To: Ashish Mhetre
Cc: robin.murphy, joro, jgg, nicolinc, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Mon, Jun 01, 2026 at 10:48:43AM +0000, Ashish Mhetre wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> arm_smmu_cmdq_batch_add_cmd_p() carries two distinct reasons for
> flushing the current batch with a CMD_SYNC before appending the
> new command:
>
> - The batch's pre-assigned cmdq does not support the new command.
> - The Arm erratum 2812531 workaround (ARM_SMMU_OPT_CMDQ_FORCE_SYNC)
> forces a SYNC at one entry before the batch is full.
>
> Lift those checks into a new arm_smmu_cmdq_batch_force_sync() helper
> so that adding another force-sync condition becomes a one-line
> addition. No functional change.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++++++++++-----
> 1 file changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 9be589d14a3b..4d29bd343460 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -847,16 +847,30 @@ static void arm_smmu_cmdq_batch_init_cmd(struct arm_smmu_device *smmu,
> cmds->cmdq = arm_smmu_get_cmdq(smmu, cmd);
> }
>
> +static bool arm_smmu_cmdq_batch_force_sync(struct arm_smmu_device *smmu,
> + struct arm_smmu_cmdq_batch *cmds,
> + struct arm_smmu_cmd *cmd)
> +{
> + if (!cmds->num)
> + return false;
This check seems new?
Will
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions
2026-06-02 19:55 ` Will Deacon
@ 2026-06-02 20:08 ` Nicolin Chen
2026-06-02 20:23 ` Will Deacon
0 siblings, 1 reply; 16+ messages in thread
From: Nicolin Chen @ 2026-06-02 20:08 UTC (permalink / raw)
To: Will Deacon
Cc: Ashish Mhetre, robin.murphy, joro, jgg, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Tue, Jun 02, 2026 at 08:55:10PM +0100, Will Deacon wrote:
> On Mon, Jun 01, 2026 at 10:48:43AM +0000, Ashish Mhetre wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> >
> > arm_smmu_cmdq_batch_add_cmd_p() carries two distinct reasons for
> > flushing the current batch with a CMD_SYNC before appending the
> > new command:
> >
> > - The batch's pre-assigned cmdq does not support the new command.
> > - The Arm erratum 2812531 workaround (ARM_SMMU_OPT_CMDQ_FORCE_SYNC)
> > forces a SYNC at one entry before the batch is full.
> >
> > Lift those checks into a new arm_smmu_cmdq_batch_force_sync() helper
> > so that adding another force-sync condition becomes a one-line
> > addition. No functional change.
> >
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
> > ---
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++++++++++-----
> > 1 file changed, 20 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 9be589d14a3b..4d29bd343460 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -847,16 +847,30 @@ static void arm_smmu_cmdq_batch_init_cmd(struct arm_smmu_device *smmu,
> > cmds->cmdq = arm_smmu_get_cmdq(smmu, cmd);
> > }
> >
> > +static bool arm_smmu_cmdq_batch_force_sync(struct arm_smmu_device *smmu,
> > + struct arm_smmu_cmdq_batch *cmds,
> > + struct arm_smmu_cmd *cmd)
> > +{
> > + if (!cmds->num)
> > + return false;
>
> This check seems new?
You are right. Maybe the commit message should have mentioned that
there is a slight behavior change to the unsupported_cmd path:
- Before: if cmds->num = 0 && unsupported_cmd, it would issue an
empty batch (one CMD_SYNC)
- After : if cmds->num = 0, no issue on the empty batch
With that, I think it's good to have?
Thanks
Nicolin
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum
2026-06-01 10:48 ` [PATCH v3 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre
@ 2026-06-02 20:13 ` Will Deacon
2026-06-02 20:31 ` Nicolin Chen
0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2026-06-02 20:13 UTC (permalink / raw)
To: Ashish Mhetre
Cc: robin.murphy, joro, jgg, nicolinc, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Mon, Jun 01, 2026 at 10:48:44AM +0000, Ashish Mhetre wrote:
> Tegra264 SMMU is affected by erratum where a TLB entry can survive an
> invalidation that races with concurrent traffic targeting the same
> entry. The hardware-recommended software workaround is to issue every
> CFGI/TLBI command (each followed by CMD_SYNC) twice. The second issue
> is guaranteed to evict the entry. ATC_INV is not affected and must not
> be doubled.
>
> The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it
> cannot be detected from hardware ID. Tegra264 boots from device tree
> only and has no ACPI/IORT support, so detection is through device
> tree only.
That seems odd to me -- whether the hardware has the erratum is
completely unrelated to whether it probes using DT or ACPI, so I find it
really weird to have the workaround enabled when booting with DT and not
when booting with ACPI. We should have consistent behaviour between the
two.
> Add the ARM_SMMU_OPT_TLBI_TWICE option and set it on instances matching
> the existing "nvidia,tegra264-smmu" compatible. Also add a
> static-inline arm_smmu_cmd_needs_tlbi_twice() classifier in
> arm-smmu-v3.h so that subsequent changes wiring the workaround into the
> CMDQ submission and iommufd batching paths can share a single
> predicate.
>
> No callers consume the option yet; a subsequent change will wire the
> workaround into the CMDQ issue paths.
>
> Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++-
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 40 +++++++++++++++++++++
> 2 files changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 4d29bd343460..08684bd40a6d 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -5243,8 +5243,10 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
> if (of_dma_is_coherent(dev->of_node))
> smmu->features |= ARM_SMMU_FEAT_COHERENCY;
>
> - if (of_device_is_compatible(dev->of_node, "nvidia,tegra264-smmu"))
> + if (of_device_is_compatible(dev->of_node, "nvidia,tegra264-smmu")) {
> tegra_cmdqv_dt_probe(dev->of_node, smmu);
> + smmu->options |= ARM_SMMU_OPT_TLBI_TWICE;
> + }
>
> return ret;
> }
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 16353596e08a..106034c348a1 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -928,6 +928,14 @@ struct arm_smmu_device {
> #define ARM_SMMU_OPT_MSIPOLL (1 << 2)
> #define ARM_SMMU_OPT_CMDQ_FORCE_SYNC (1 << 3)
> #define ARM_SMMU_OPT_TEGRA241_CMDQV (1 << 4)
> +/*
> + * Tegra264 erratum: a TLB entry can survive an invalidation that races
> + * with concurrent traffic targeting the same entry. The software
> + * workaround is to issue every CFGI/TLBI command twice, each followed
> + * by CMD_SYNC. The second issue is guaranteed to evict the entry.
> + * ATC_INV commands are not affected and must not be doubled.
> + */
> +#define ARM_SMMU_OPT_TLBI_TWICE (1 << 5)
nit: I think this should be named slightly differently as it covers CFGI
as well. Maybe ARM_SMMU_OPT_REPEAT_TLBI_CFGI ?
The comment can be simpler too and avoid being specific to Tegra264. The
main things to say are that it repeats {CFGI,TLBI}; SYNC sequences and
does not apply to ATC_INV.
> +/*
> + * Returns true if @cmd is one of the CFGI_* or TLBI_* commands covered
> + * by the Tegra264 erratum (see ARM_SMMU_OPT_TLBI_TWICE) on an affected
> + * SMMU instance.
> + */
(remove the comment)
> +static inline bool arm_smmu_cmd_needs_tlbi_twice(struct arm_smmu_device *smmu,
> + struct arm_smmu_cmd *cmd)
Rename the function to something like arm_smmu_erratum_cmd_needs_repeating()?
> +{
> + if (!(smmu->options & ARM_SMMU_OPT_TLBI_TWICE))
> + return false;
Maybe we should make this a static key?
> + switch (FIELD_GET(CMDQ_0_OP, cmd->data[0])) {
> + case CMDQ_OP_CFGI_STE:
> + case CMDQ_OP_CFGI_ALL:
> + case CMDQ_OP_CFGI_CD:
> + case CMDQ_OP_CFGI_CD_ALL:
> + case CMDQ_OP_TLBI_NH_ALL:
> + case CMDQ_OP_TLBI_NH_ASID:
> + case CMDQ_OP_TLBI_NH_VA:
> + case CMDQ_OP_TLBI_NH_VAA:
> + case CMDQ_OP_TLBI_EL2_ALL:
> + case CMDQ_OP_TLBI_EL2_ASID:
> + case CMDQ_OP_TLBI_EL2_VA:
> + case CMDQ_OP_TLBI_S12_VMALL:
> + case CMDQ_OP_TLBI_S2_IPA:
> + case CMDQ_OP_TLBI_NSNH_ALL:
> + return true;
Isn't this just everything < ATC_INV || >= CFGI_STE? Seems better than
enumerating everything.
Will
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
2026-06-01 10:48 ` [PATCH v3 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre
2026-06-01 18:37 ` Nicolin Chen
@ 2026-06-02 20:22 ` Will Deacon
2026-06-02 20:35 ` Nicolin Chen
1 sibling, 1 reply; 16+ messages in thread
From: Will Deacon @ 2026-06-02 20:22 UTC (permalink / raw)
To: Ashish Mhetre
Cc: robin.murphy, joro, jgg, nicolinc, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Mon, Jun 01, 2026 at 10:48:45AM +0000, Ashish Mhetre wrote:
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
> index 1e9f7d2de344..78c96a2b652b 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
> @@ -350,6 +350,26 @@ static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu,
> return 0;
> }
>
> +/*
> + * On Tegra264, arm_smmu_cmdq_issue_cmdlist() doubles every CFGI/TLBI
> + * submission (see ARM_SMMU_OPT_TLBI_TWICE). The doubling decision is
> + * taken once per cmdlist based on the first command, so a single
> + * batch must not mix commands that need doubling with commands that
> + * do not. Split the iommufd batch whenever the next user command
> + * crosses that boundary.
> + */
Again, I wouldn't bother with this comment. You probably _should_ update
Documentation/arch/arm64/silicon-errata.rst, however.
> +static bool arm_vsmmu_can_batch_cmd(struct arm_smmu_device *smmu,
> + struct arm_vsmmu_invalidation_cmd *last,
> + struct arm_vsmmu_invalidation_cmd *next)
> +{
> + struct arm_smmu_cmd next_cmd = {
> + .data[0] = le64_to_cpu(next->ucmd.cmd[0]),
> + };
> +
> + return arm_smmu_cmd_needs_tlbi_twice(smmu, &last->cmd) ==
> + arm_smmu_cmd_needs_tlbi_twice(smmu, &next_cmd);
> +}
> +
> int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
> struct iommu_user_data_array *array)
> {
> @@ -382,7 +402,8 @@ int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
>
> /* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */
> cur++;
> - if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1)
> + if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1 &&
> + arm_vsmmu_can_batch_cmd(smmu, last, cur))
> continue;
FYI: Sashiko is unhappy with the existing code here, so somebody should
check that out:
https://sashiko.dev/#/patchset/20260601104845.995005-2-amhetre@nvidia.com
> /* FIXME always uses the main cmdq rather than trying to group by type */
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 08684bd40a6d..f38c21b56f28 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -698,10 +698,10 @@ static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq,
> * insert their own list of commands then all of the commands from one
> * CPU will appear before any of the commands from the other CPU.
> */
> -int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> - struct arm_smmu_cmdq *cmdq,
> - struct arm_smmu_cmd *cmds, int n,
> - bool sync)
> +static int __arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> + struct arm_smmu_cmdq *cmdq,
> + struct arm_smmu_cmd *cmds, int n,
> + bool sync)
> {
> struct arm_smmu_cmd cmd_sync;
> u32 prod;
> @@ -820,6 +820,26 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> return ret;
> }
>
> +int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> + struct arm_smmu_cmdq *cmdq,
> + struct arm_smmu_cmd *cmds, int n,
> + bool sync)
> +{
> + int ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync);
> +
> + /*
> + * On Tegra264 (see ARM_SMMU_OPT_TLBI_TWICE) re-issue the same
> + * cmdlist with another CMD_SYNC to satisfy the erratum.
> + * Callers must ensure the batch carries a uniform opcode class
> + * so that checking the first command is enough; the iommufd
> + * VSMMU path enforces this with arm_vsmmu_can_batch_cmd().
> + */
> + if (!ret && sync && arm_smmu_cmd_needs_tlbi_twice(smmu, &cmds[0]))
Can you move the arm_smmu_cmd_... part to the start of the conjunction,
please? If you make it a static key as I mentioned previously, then
hopefully that should mean everything else is moved out of line.
> + ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync);
Sashiko is also unhappy here if n == 0 because we probably shouldn't
be inspecting the command array in that case. Generally, it's a pity
that we can't handle this all a bit further up in the stack when we know
exactly what operationg we're trying to perform, but I suppose with all
the different users of the invalidation commands that's hard to catch in
one place?
> +
> + return ret;
> +}
> +
> static int arm_smmu_cmdq_issue_cmd_p(struct arm_smmu_device *smmu,
> struct arm_smmu_cmd *cmd, bool sync)
> {
> @@ -863,6 +883,14 @@ static bool arm_smmu_cmdq_batch_force_sync(struct arm_smmu_device *smmu,
> (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC))
> return true;
>
> + /*
> + * Tegra264 erratum (see ARM_SMMU_OPT_TLBI_TWICE). The batch holds
> + * a uniform opcode class, so checking the first command is enough.
> + */
Again, please drop the Tegra264 mention and just refer to the option.
Will
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions
2026-06-02 20:08 ` Nicolin Chen
@ 2026-06-02 20:23 ` Will Deacon
0 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2026-06-02 20:23 UTC (permalink / raw)
To: Nicolin Chen
Cc: Ashish Mhetre, robin.murphy, joro, jgg, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Tue, Jun 02, 2026 at 01:08:26PM -0700, Nicolin Chen wrote:
> On Tue, Jun 02, 2026 at 08:55:10PM +0100, Will Deacon wrote:
> > On Mon, Jun 01, 2026 at 10:48:43AM +0000, Ashish Mhetre wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > >
> > > arm_smmu_cmdq_batch_add_cmd_p() carries two distinct reasons for
> > > flushing the current batch with a CMD_SYNC before appending the
> > > new command:
> > >
> > > - The batch's pre-assigned cmdq does not support the new command.
> > > - The Arm erratum 2812531 workaround (ARM_SMMU_OPT_CMDQ_FORCE_SYNC)
> > > forces a SYNC at one entry before the batch is full.
> > >
> > > Lift those checks into a new arm_smmu_cmdq_batch_force_sync() helper
> > > so that adding another force-sync condition becomes a one-line
> > > addition. No functional change.
> > >
> > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
> > > ---
> > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++++++++++-----
> > > 1 file changed, 20 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > index 9be589d14a3b..4d29bd343460 100644
> > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > @@ -847,16 +847,30 @@ static void arm_smmu_cmdq_batch_init_cmd(struct arm_smmu_device *smmu,
> > > cmds->cmdq = arm_smmu_get_cmdq(smmu, cmd);
> > > }
> > >
> > > +static bool arm_smmu_cmdq_batch_force_sync(struct arm_smmu_device *smmu,
> > > + struct arm_smmu_cmdq_batch *cmds,
> > > + struct arm_smmu_cmd *cmd)
> > > +{
> > > + if (!cmds->num)
> > > + return false;
> >
> > This check seems new?
>
> You are right. Maybe the commit message should have mentioned that
> there is a slight behavior change to the unsupported_cmd path:
>
> - Before: if cmds->num = 0 && unsupported_cmd, it would issue an
> empty batch (one CMD_SYNC)
> - After : if cmds->num = 0, no issue on the empty batch
>
> With that, I think it's good to have?
It just shouldn't be part of the refactoring patch with "no functional
change". Do it in the patch that needs it and explain why it's needed.
Will
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum
2026-06-02 20:13 ` Will Deacon
@ 2026-06-02 20:31 ` Nicolin Chen
2026-06-02 20:59 ` Will Deacon
0 siblings, 1 reply; 16+ messages in thread
From: Nicolin Chen @ 2026-06-02 20:31 UTC (permalink / raw)
To: Will Deacon
Cc: Ashish Mhetre, robin.murphy, joro, jgg, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Tue, Jun 02, 2026 at 09:13:39PM +0100, Will Deacon wrote:
> On Mon, Jun 01, 2026 at 10:48:44AM +0000, Ashish Mhetre wrote:
> > Tegra264 SMMU is affected by erratum where a TLB entry can survive an
> > invalidation that races with concurrent traffic targeting the same
> > entry. The hardware-recommended software workaround is to issue every
> > CFGI/TLBI command (each followed by CMD_SYNC) twice. The second issue
> > is guaranteed to evict the entry. ATC_INV is not affected and must not
> > be doubled.
> >
> > The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it
> > cannot be detected from hardware ID. Tegra264 boots from device tree
> > only and has no ACPI/IORT support, so detection is through device
> > tree only.
>
> That seems odd to me -- whether the hardware has the erratum is
> completely unrelated to whether it probes using DT or ACPI, so I find it
> really weird to have the workaround enabled when booting with DT and not
> when booting with ACPI. We should have consistent behaviour between the
> two.
That's a good point. Yet, for ACPI to detect the erratum, we would
need a new IORT model or flag, right? That would need to go through
the entire ACPI protocol to update SMMU's IORT spec and the header
accordingly, which we don't have a use case to do so or to test it.
What would you like us to do here for the consistency?
Thanks
Nicolin
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
2026-06-02 20:22 ` Will Deacon
@ 2026-06-02 20:35 ` Nicolin Chen
0 siblings, 0 replies; 16+ messages in thread
From: Nicolin Chen @ 2026-06-02 20:35 UTC (permalink / raw)
To: Will Deacon
Cc: Ashish Mhetre, robin.murphy, joro, jgg, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Tue, Jun 02, 2026 at 09:22:15PM +0100, Will Deacon wrote:
> On Mon, Jun 01, 2026 at 10:48:45AM +0000, Ashish Mhetre wrote:
> > @@ -382,7 +402,8 @@ int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
> >
> > /* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */
> > cur++;
> > - if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1)
> > + if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1 &&
> > + arm_vsmmu_can_batch_cmd(smmu, last, cur))
> > continue;
>
> FYI: Sashiko is unhappy with the existing code here, so somebody should
> check that out:
>
> https://sashiko.dev/#/patchset/20260601104845.995005-2-amhetre@nvidia.com
Ack. I will submit a separate patch to address this.
Nicolin
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum
2026-06-02 20:31 ` Nicolin Chen
@ 2026-06-02 20:59 ` Will Deacon
2026-06-02 21:06 ` Nicolin Chen
0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2026-06-02 20:59 UTC (permalink / raw)
To: Nicolin Chen
Cc: Ashish Mhetre, robin.murphy, joro, jgg, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Tue, Jun 02, 2026 at 01:31:58PM -0700, Nicolin Chen wrote:
> On Tue, Jun 02, 2026 at 09:13:39PM +0100, Will Deacon wrote:
> > On Mon, Jun 01, 2026 at 10:48:44AM +0000, Ashish Mhetre wrote:
> > > Tegra264 SMMU is affected by erratum where a TLB entry can survive an
> > > invalidation that races with concurrent traffic targeting the same
> > > entry. The hardware-recommended software workaround is to issue every
> > > CFGI/TLBI command (each followed by CMD_SYNC) twice. The second issue
> > > is guaranteed to evict the entry. ATC_INV is not affected and must not
> > > be doubled.
> > >
> > > The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it
> > > cannot be detected from hardware ID. Tegra264 boots from device tree
> > > only and has no ACPI/IORT support, so detection is through device
> > > tree only.
> >
> > That seems odd to me -- whether the hardware has the erratum is
> > completely unrelated to whether it probes using DT or ACPI, so I find it
> > really weird to have the workaround enabled when booting with DT and not
> > when booting with ACPI. We should have consistent behaviour between the
> > two.
>
> That's a good point. Yet, for ACPI to detect the erratum, we would
> need a new IORT model or flag, right? That would need to go through
> the entire ACPI protocol to update SMMU's IORT spec and the header
> accordingly, which we don't have a use case to do so or to test it.
>
> What would you like us to do here for the consistency?
Gah, I now realise I've mixed up Tegra264 and Tegra241 because
tegra_cmdqv_dt_probe() is only called if the compatible string matches
"nvidia,tegra264-smmu" yet all the cmdqv stuff talks only about
Tegra241. So I was surprised that the ACPI probing code for Tegra241
wasn't enabling the workaround.
But if you're saying that:
1. Tegra264 never uses ACPI
2. Tegra241 doesn't have the invalidation erratum
Then I'm less worried (even though it's a shame that we can't detect
the erratum from the hardware itself).
Will
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum
2026-06-02 20:59 ` Will Deacon
@ 2026-06-02 21:06 ` Nicolin Chen
0 siblings, 0 replies; 16+ messages in thread
From: Nicolin Chen @ 2026-06-02 21:06 UTC (permalink / raw)
To: Will Deacon
Cc: Ashish Mhetre, robin.murphy, joro, jgg, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Tue, Jun 02, 2026 at 09:59:39PM +0100, Will Deacon wrote:
> On Tue, Jun 02, 2026 at 01:31:58PM -0700, Nicolin Chen wrote:
> > On Tue, Jun 02, 2026 at 09:13:39PM +0100, Will Deacon wrote:
> > > On Mon, Jun 01, 2026 at 10:48:44AM +0000, Ashish Mhetre wrote:
> > > > Tegra264 SMMU is affected by erratum where a TLB entry can survive an
> > > > invalidation that races with concurrent traffic targeting the same
> > > > entry. The hardware-recommended software workaround is to issue every
> > > > CFGI/TLBI command (each followed by CMD_SYNC) twice. The second issue
> > > > is guaranteed to evict the entry. ATC_INV is not affected and must not
> > > > be doubled.
> > > >
> > > > The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it
> > > > cannot be detected from hardware ID. Tegra264 boots from device tree
> > > > only and has no ACPI/IORT support, so detection is through device
> > > > tree only.
> > >
> > > That seems odd to me -- whether the hardware has the erratum is
> > > completely unrelated to whether it probes using DT or ACPI, so I find it
> > > really weird to have the workaround enabled when booting with DT and not
> > > when booting with ACPI. We should have consistent behaviour between the
> > > two.
> >
> > That's a good point. Yet, for ACPI to detect the erratum, we would
> > need a new IORT model or flag, right? That would need to go through
> > the entire ACPI protocol to update SMMU's IORT spec and the header
> > accordingly, which we don't have a use case to do so or to test it.
> >
> > What would you like us to do here for the consistency?
>
> Gah, I now realise I've mixed up Tegra264 and Tegra241 because
> tegra_cmdqv_dt_probe() is only called if the compatible string matches
> "nvidia,tegra264-smmu" yet all the cmdqv stuff talks only about
> Tegra241. So I was surprised that the ACPI probing code for Tegra241
> wasn't enabling the workaround.
I see. Numbers could be indeed confusing :)
> But if you're saying that:
>
> 1. Tegra264 never uses ACPI
> 2. Tegra241 doesn't have the invalidation erratum
Yes.
> Then I'm less worried (even though it's a shame that we can't detect
> the erratum from the hardware itself).
Agreed, though I think the compatible string is already a life
saver as we wouldn't need to go through the whole ACPI route..
Nicolin
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-06-02 21:07 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01 10:48 [PATCH v3 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
2026-06-01 10:48 ` [PATCH v3 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions Ashish Mhetre
2026-06-02 19:55 ` Will Deacon
2026-06-02 20:08 ` Nicolin Chen
2026-06-02 20:23 ` Will Deacon
2026-06-01 10:48 ` [PATCH v3 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre
2026-06-02 20:13 ` Will Deacon
2026-06-02 20:31 ` Nicolin Chen
2026-06-02 20:59 ` Will Deacon
2026-06-02 21:06 ` Nicolin Chen
2026-06-01 10:48 ` [PATCH v3 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre
2026-06-01 18:37 ` Nicolin Chen
2026-06-02 20:22 ` Will Deacon
2026-06-02 20:35 ` Nicolin Chen
2026-06-02 16:31 ` [PATCH v3 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Mostafa Saleh
2026-06-02 18:23 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox