* [PATCH 1/2] iommu/arm-smmu-v3: Detect Tegra264 erratum
2026-05-28 10:16 [PATCH 0/2] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
@ 2026-05-28 10:16 ` Ashish Mhetre
2026-05-28 10:34 ` Robin Murphy
2026-05-28 10:16 ` [PATCH 2/2] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre
2026-05-28 18:41 ` [PATCH 0/2] iommu/arm-smmu-v3: Tegra264 invalidation workaround Nicolin Chen
2 siblings, 1 reply; 7+ messages in thread
From: Ashish Mhetre @ 2026-05-28 10:16 UTC (permalink / raw)
To: will, robin.murphy, joro, jgg, nicolinc
Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
Tegra264 SMMU is affected by erratum where a TLB entry can survive an
invalidation that races with concurrent traffic targeting the same
entry. The hardware-recommended software workaround is to issue every
CFGI/TLBI command (each followed by CMD_SYNC) twice. The second issue is
guaranteed to evict the entry. ATC_INV is not affected and must not be
doubled.
Add the ARM_SMMU_OPT_TLBI_TWICE option and set it on instances matching
the existing "nvidia,tegra264-smmu" compatible. No callers consume the
option yet, next patch wires the workaround into the CMDQ issue paths.
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 +++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 ++++++++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9be589d14a3b..88296c0a5337 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -5229,8 +5229,10 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
if (of_dma_is_coherent(dev->of_node))
smmu->features |= ARM_SMMU_FEAT_COHERENCY;
- if (of_device_is_compatible(dev->of_node, "nvidia,tegra264-smmu"))
+ if (of_device_is_compatible(dev->of_node, "nvidia,tegra264-smmu")) {
tegra_cmdqv_dt_probe(dev->of_node, smmu);
+ smmu->options |= ARM_SMMU_OPT_TLBI_TWICE;
+ }
return ret;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 16353596e08a..08d1abaf31ae 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -928,6 +928,14 @@ struct arm_smmu_device {
#define ARM_SMMU_OPT_MSIPOLL (1 << 2)
#define ARM_SMMU_OPT_CMDQ_FORCE_SYNC (1 << 3)
#define ARM_SMMU_OPT_TEGRA241_CMDQV (1 << 4)
+/*
+ * Tegra264 erratum: a TLB entry can survive an invalidation that races
+ * with concurrent traffic targeting the same entry. The software
+ * workaround is to issue every CFGI/TLBI command twice, each followed
+ * by CMD_SYNC. The second issue is guaranteed to evict the entry.
+ * ATC_INV commands are not affected and must not be doubled.
+ */
+#define ARM_SMMU_OPT_TLBI_TWICE (1 << 5)
u32 options;
struct arm_smmu_cmdq cmdq;
--
2.50.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH 2/2] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
2026-05-28 10:16 [PATCH 0/2] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
2026-05-28 10:16 ` [PATCH 1/2] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre
@ 2026-05-28 10:16 ` Ashish Mhetre
2026-05-28 18:41 ` [PATCH 0/2] iommu/arm-smmu-v3: Tegra264 invalidation workaround Nicolin Chen
2 siblings, 0 replies; 7+ messages in thread
From: Ashish Mhetre @ 2026-05-28 10:16 UTC (permalink / raw)
To: will, robin.murphy, joro, jgg, nicolinc
Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
Apply the workaround for Tegra264 erratum by issuing every CFGI/TLBI
command twice on affected SMMU instances, with CMD_SYNC after each.
The erratum requires this exact sequencing:
TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC
To get this sequence with minimal surgery, hook the workaround into
arm_smmu_cmdq_issue_cmdlist(). Rename the original function to
__arm_smmu_cmdq_issue_cmdlist() and add a thin wrapper that, on
affected SMMUs and when @sync is true, re-issues the same cmdlist a
second time.
A new arm_smmu_cmd_needs_tlbi_twice() helper classifies which opcodes
need the doubling: CFGI_* and TLBI_*.
For batches that exceed CMDQ_BATCH_ENTRIES commands,
arm_smmu_cmdq_batch_add_cmd_p() normally flushes the full buffer with
sync=false, deferring the SYNC to the eventual batch_submit(). On
affected SMMUs this would leave the first chunk's commands issued
only once, since the WAR hook in arm_smmu_cmdq_issue_cmdlist() only
fires on synced submissions. Force a SYNC on the capacity rollover
when the buffer carries CFGI/TLBI commands so every flushed chunk is
correctly doubled.
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 66 +++++++++++++++++++--
1 file changed, 61 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 88296c0a5337..38d45f175a2c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -698,10 +698,10 @@ static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq,
* insert their own list of commands then all of the commands from one
* CPU will appear before any of the commands from the other CPU.
*/
-int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
- struct arm_smmu_cmdq *cmdq,
- struct arm_smmu_cmd *cmds, int n,
- bool sync)
+static int __arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq *cmdq,
+ struct arm_smmu_cmd *cmds, int n,
+ bool sync)
{
struct arm_smmu_cmd cmd_sync;
u32 prod;
@@ -820,6 +820,52 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
return ret;
}
+/*
+ * Returns true if @opcode is a CFGI_* or TLBI_* command, i.e. one of the
+ * invalidations covered by Tegra264 erratum (see ARM_SMMU_OPT_TLBI_TWICE).
+ */
+static bool arm_smmu_cmd_needs_tlbi_twice(u8 opcode)
+{
+ switch (opcode) {
+ case CMDQ_OP_CFGI_STE:
+ case CMDQ_OP_CFGI_ALL:
+ case CMDQ_OP_CFGI_CD:
+ case CMDQ_OP_CFGI_CD_ALL:
+ case CMDQ_OP_TLBI_NH_ALL:
+ case CMDQ_OP_TLBI_NH_ASID:
+ case CMDQ_OP_TLBI_NH_VA:
+ case CMDQ_OP_TLBI_NH_VAA:
+ case CMDQ_OP_TLBI_EL2_ALL:
+ case CMDQ_OP_TLBI_EL2_ASID:
+ case CMDQ_OP_TLBI_EL2_VA:
+ case CMDQ_OP_TLBI_S12_VMALL:
+ case CMDQ_OP_TLBI_S2_IPA:
+ case CMDQ_OP_TLBI_NSNH_ALL:
+ return true;
+ default:
+ return false;
+ }
+}
+
+int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq *cmdq,
+ struct arm_smmu_cmd *cmds, int n,
+ bool sync)
+{
+ int ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync);
+
+ /*
+ * The driver's batch invariants keep a single submission's
+ * opcode class uniform, so checking the first command is enough.
+ */
+ if (!ret && sync && (smmu->options & ARM_SMMU_OPT_TLBI_TWICE) &&
+ arm_smmu_cmd_needs_tlbi_twice(FIELD_GET(CMDQ_0_OP,
+ cmds[0].data[0])))
+ ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync);
+
+ return ret;
+}
+
static int arm_smmu_cmdq_issue_cmd_p(struct arm_smmu_device *smmu,
struct arm_smmu_cmd *cmd, bool sync)
{
@@ -863,8 +909,18 @@ static void arm_smmu_cmdq_batch_add_cmd_p(struct arm_smmu_device *smmu,
}
if (cmds->num == CMDQ_BATCH_ENTRIES) {
+ /*
+ * Force a SYNC only when the batch carries commands that
+ * have to be doubled (see ARM_SMMU_OPT_TLBI_TWICE).
+ * The batch holds a uniform opcode class, so checking
+ * the first command is sufficient.
+ */
+ bool need_sync = (smmu->options & ARM_SMMU_OPT_TLBI_TWICE) &&
+ arm_smmu_cmd_needs_tlbi_twice(FIELD_GET(CMDQ_0_OP,
+ cmds->cmds[0].data[0]));
+
arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmdq, cmds->cmds,
- cmds->num, false);
+ cmds->num, need_sync);
arm_smmu_cmdq_batch_init_cmd(smmu, cmds, cmd);
}
--
2.50.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH 0/2] iommu/arm-smmu-v3: Tegra264 invalidation workaround
2026-05-28 10:16 [PATCH 0/2] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
2026-05-28 10:16 ` [PATCH 1/2] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre
2026-05-28 10:16 ` [PATCH 2/2] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre
@ 2026-05-28 18:41 ` Nicolin Chen
2 siblings, 0 replies; 7+ messages in thread
From: Nicolin Chen @ 2026-05-28 18:41 UTC (permalink / raw)
To: Ashish Mhetre
Cc: will, robin.murphy, joro, jgg, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Thu, May 28, 2026 at 10:16:15AM +0000, Ashish Mhetre wrote:
> Nvidia Tegra264 SMMUs are affected by an erratum where a TLB entry can
> survive an invalidation that races with concurrent traffic targeting
> the same entry. The hardware-recommended software workaround is to
> issue every CFGI/TLBI command (each followed by CMD_SYNC) twice. The
> second issue must execute only after the first issue's CMD_SYNC has
> completed, giving the sequence:
>
> TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC
>
> This series implements the workaround by hooking the duplication into
> the single chokepoint that every synchronous submission flows through
> arm_smmu_cmdq_issue_cmdlist().
>
> Patch 1 detects affected instances using the existing
> "nvidia,tegra264-smmu" compatible string and exposes the condition
> via a new ARM_SMMU_OPT_TLBI_TWICE option bit.
>
> Patch 2 wires the option into the CMDQ submission path which is used to
> re-issue the cmdlist when @sync is true and the first command is a
> CFGI/TLBI.
What base-commit do you format the patches from?
Sashiko failed to apply for running a review:
https://sashiko.dev/#/patchset/20260528101617.4068249-1-amhetre%40nvidia.com
Nicolin
^ permalink raw reply [flat|nested] 7+ messages in thread