* [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround
@ 2026-06-09 7:32 Ashish Mhetre
2026-06-09 7:32 ` [PATCH v4 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions Ashish Mhetre
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Ashish Mhetre @ 2026-06-09 7:32 UTC (permalink / raw)
To: will, robin.murphy, joro, jgg, nicolinc
Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
Nvidia Tegra264 SMMUs are affected by an erratum where a TLB entry can
survive an invalidation that races with concurrent traffic targeting
the same entry. The hardware-recommended software workaround is to
issue every CFGI/TLBI command (each followed by CMD_SYNC) twice.
The second issue must execute only after the first issue's CMD_SYNC
has completed, giving the sequence:
TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC
ATC_INV is not affected and must not be doubled.
The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it
cannot be detected from hardware ID. Tegra264 is device-tree-only
(no ACPI/IORT support), so detection is purely by compatible string.
This series is structured as a small refactor + detect + apply
sequence so that each step is reviewable in isolation:
1/3 Pure refactor (no functional change): lift the existing
force-sync conditions out of arm_smmu_cmdq_batch_add_cmd_p()
into a new arm_smmu_cmdq_batch_force_sync() helper, so that
adding another condition (in patch 3) is a one-liner.
Authored by Nicolin Chen.
2/3 Detect the erratum and provide the classifier. Adds the
ARM_SMMU_OPT_REPEAT_TLBI_CFGI per-instance option, a global
arm_smmu_erratum_repeat_tlbi_cfgi_key static key, and the
arm_smmu_erratum_cmd_needs_repeating() predicate. The static
key means the wrapper compiles to a single tested branch on
unaffected kernels.
3/3 Apply the workaround: factor arm_smmu_cmdq_issue_cmdlist()
into a thin wrapper around __arm_smmu_cmdq_issue_cmdlist()
that re-issues the cmdlist a second time when the predicate
fires; register the same condition with the batch helper so
full batches of CFGI/TLBI flush with sync=true; and add
arm_vsmmu_can_batch_cmd() so iommufd does not mix command
classes inside a single batch. Also documents the erratum
in silicon-errata.rst.
The series applies cleanly on linux-next/master (base-commit below).
Changes since v3:
- Drop the cmds->num == 0 early-return so the refactor is
truly "no functional change".
- Rename ARM_SMMU_OPT_TLBI_TWICE -> ARM_SMMU_OPT_REPEAT_TLBI_CFGI
and rephrase its kdoc to be hardware-agnostic.
- Rename arm_smmu_cmd_needs_tlbi_twice() ->
arm_smmu_erratum_cmd_needs_repeating() and drop the kdoc
above it.
- Replace the explicit opcode switch with a single range check
opcode >= CMDQ_OP_CFGI_STE && opcode < CMDQ_OP_ATC_INV.
- Introduce arm_smmu_erratum_repeat_tlbi_cfgi_key static key:
the predicate gates on it first so unaffected kernels pay
only a single static_branch_unlikely() check.
- Drop the verbose Tegra264-specific comments above
arm_vsmmu_can_batch_cmd() and inside the batch helper.
- Document the erratum in
Documentation/arch/arm64/silicon-errata.rst.
- Guard the repeat path in arm_smmu_cmdq_issue_cmdlist() with
an n > 0 check so we never inspect cmds[0] on the bare-SYNC
flush emitted by arm_smmu_cmdq_batch_add_cmd_p() when the
next command is unsupported by the batch's pre-selected
cmdq.
- Drop the carried Reviewed-by tags now that the patch
shape has changed; re-review appreciated.
Changes since v2:
- Split into a 3-patch series (refactor / detect / apply) to keep
each step small and bisectable.
- Move the classifier to arm-smmu-v3.h as static inline so the
iommufd file can share it.
- Add arm_vsmmu_can_batch_cmd() to split iommufd batches at
"needs repeating" transitions so the per-batch decision based
on the first command stays correct under mixed user input.
- Spell out in the commit message why detection is via DT and
not via IIDR/ACPI.
Changes since v1:
- Detect the erratum from the existing "nvidia,tegra264-smmu"
compatible instead of adding a new property.
- Centralise the doubling at the CMDQ submission layer and only
apply it to CFGI/TLBI (not ATC_INV).
- Drop the binding/dtsi patches accordingly.
Ashish Mhetre (2):
iommu/arm-smmu-v3: Detect Tegra264 erratum
iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
Nicolin Chen (1):
iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions
Documentation/arch/arm64/silicon-errata.rst | 2 +
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 15 ++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 +++++++++++++++----
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 24 +++++++
4 files changed, 94 insertions(+), 12 deletions(-)
base-commit: 7da7f07112610a520567421dd2ffcb51beaefbcc
--
2.50.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v4 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions
2026-06-09 7:32 [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
@ 2026-06-09 7:32 ` Ashish Mhetre
2026-06-09 7:32 ` [PATCH v4 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Ashish Mhetre @ 2026-06-09 7:32 UTC (permalink / raw)
To: will, robin.murphy, joro, jgg, nicolinc
Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
From: Nicolin Chen <nicolinc@nvidia.com>
arm_smmu_cmdq_batch_add_cmd_p() carries two distinct reasons for
flushing the current batch with a CMD_SYNC before appending the
new command:
- The batch's pre-assigned cmdq does not support the new command.
- The Arm erratum 2812531 workaround (ARM_SMMU_OPT_CMDQ_FORCE_SYNC)
forces a SYNC at one entry before the batch is full.
Lift those checks into a new arm_smmu_cmdq_batch_force_sync() helper
so that adding another force-sync condition becomes a one-line
addition. No functional change.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 23 +++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a10affb483a4..76efe479e80f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -847,16 +847,27 @@ static void arm_smmu_cmdq_batch_init_cmd(struct arm_smmu_device *smmu,
cmds->cmdq = arm_smmu_get_cmdq(smmu, cmd);
}
+static bool arm_smmu_cmdq_batch_force_sync(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq_batch *cmds,
+ struct arm_smmu_cmd *cmd)
+{
+ /* The batch's pre-assigned cmdq doesn't support the new command */
+ if (!arm_smmu_cmdq_supports_cmd(cmds->cmdq, cmd))
+ return true;
+
+ /* Arm erratum 2812531 */
+ if (cmds->num == CMDQ_BATCH_ENTRIES - 1 &&
+ (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC))
+ return true;
+
+ return false;
+}
+
static void arm_smmu_cmdq_batch_add_cmd_p(struct arm_smmu_device *smmu,
struct arm_smmu_cmdq_batch *cmds,
struct arm_smmu_cmd *cmd)
{
- bool force_sync = (cmds->num == CMDQ_BATCH_ENTRIES - 1) &&
- (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC);
- bool unsupported_cmd;
-
- unsupported_cmd = !arm_smmu_cmdq_supports_cmd(cmds->cmdq, cmd);
- if (force_sync || unsupported_cmd) {
+ if (arm_smmu_cmdq_batch_force_sync(smmu, cmds, cmd)) {
arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmdq, cmds->cmds,
cmds->num, true);
arm_smmu_cmdq_batch_init_cmd(smmu, cmds, cmd);
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v4 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum
2026-06-09 7:32 [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
2026-06-09 7:32 ` [PATCH v4 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions Ashish Mhetre
@ 2026-06-09 7:32 ` Ashish Mhetre
2026-06-09 7:32 ` [PATCH v4 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre
2026-06-17 3:58 ` [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
3 siblings, 0 replies; 6+ messages in thread
From: Ashish Mhetre @ 2026-06-09 7:32 UTC (permalink / raw)
To: will, robin.murphy, joro, jgg, nicolinc
Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
Tegra264 SMMU is affected by an erratum where a TLB entry can survive
an invalidation that races with concurrent traffic targeting the same
entry. The hardware-recommended software workaround is to issue every
CFGI/TLBI command (each followed by CMD_SYNC) twice. The second issue
is guaranteed to evict the entry. ATC_INV is not affected and must
not be doubled.
The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it
cannot be detected from hardware registers. Tegra264 boots from device
tree only and has no ACPI/IORT support, so detection is through device
tree only.
Add the ARM_SMMU_OPT_REPEAT_TLBI_CFGI option and set it on instances
matching the existing "nvidia,tegra264-smmu" compatible. Also add a
matching arm_smmu_erratum_repeat_tlbi_cfgi_key static key that DT
probe enables, so the inline classifier compiles down to a single
test+branch on unaffected kernels. Add an
arm_smmu_erratum_cmd_needs_repeating() helper in arm-smmu-v3.h that
gates on the static key first and then range-checks the opcode
(CFGI_STE .. ATC_INV), so subsequent changes wiring the workaround
into the CMDQ submission and iommufd batching paths can share a
single predicate.
No callers consume the option yet. A subsequent change wires the
workaround into the CMDQ issue paths.
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 +++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 24 +++++++++++++++++++++
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 76efe479e80f..599c835c50d8 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -42,6 +42,8 @@ MODULE_PARM_DESC(disable_msipolling,
static const struct iommu_ops arm_smmu_ops;
static struct iommu_dirty_ops arm_smmu_dirty_ops;
+DEFINE_STATIC_KEY_FALSE(arm_smmu_erratum_repeat_tlbi_cfgi_key);
+
enum arm_smmu_msi_index {
EVTQ_MSI_INDEX,
GERROR_MSI_INDEX,
@@ -5303,8 +5305,11 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
if (of_dma_is_coherent(dev->of_node))
smmu->features |= ARM_SMMU_FEAT_COHERENCY;
- if (of_device_is_compatible(dev->of_node, "nvidia,tegra264-smmu"))
+ if (of_device_is_compatible(dev->of_node, "nvidia,tegra264-smmu")) {
tegra_cmdqv_dt_probe(dev->of_node, smmu);
+ smmu->options |= ARM_SMMU_OPT_REPEAT_TLBI_CFGI;
+ static_branch_enable(&arm_smmu_erratum_repeat_tlbi_cfgi_key);
+ }
return ret;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index c909c9a88538..c6ea3b8dc761 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -11,6 +11,7 @@
#include <linux/bitfield.h>
#include <linux/iommu.h>
#include <linux/iommufd.h>
+#include <linux/jump_label.h>
#include <linux/kernel.h>
#include <linux/mmzone.h>
#include <linux/sizes.h>
@@ -928,6 +929,12 @@ struct arm_smmu_device {
#define ARM_SMMU_OPT_MSIPOLL (1 << 2)
#define ARM_SMMU_OPT_CMDQ_FORCE_SYNC (1 << 3)
#define ARM_SMMU_OPT_TEGRA241_CMDQV (1 << 4)
+/*
+ * Repeat every {CFGI,TLBI};CMD_SYNC command sequence so that the second
+ * issue executes only after the first issue's CMD_SYNC has completed.
+ * Does not apply to ATC_INV.
+ */
+#define ARM_SMMU_OPT_REPEAT_TLBI_CFGI (1 << 5)
u32 options;
struct arm_smmu_cmdq cmdq;
@@ -1212,6 +1219,23 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
struct arm_smmu_cmd *cmds, int n,
bool sync);
+DECLARE_STATIC_KEY_FALSE(arm_smmu_erratum_repeat_tlbi_cfgi_key);
+
+static inline bool
+arm_smmu_erratum_cmd_needs_repeating(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmd *cmd)
+{
+ u8 opcode;
+
+ if (!static_branch_unlikely(&arm_smmu_erratum_repeat_tlbi_cfgi_key))
+ return false;
+ if (!(smmu->options & ARM_SMMU_OPT_REPEAT_TLBI_CFGI))
+ return false;
+
+ opcode = FIELD_GET(CMDQ_0_OP, cmd->data[0]);
+ return opcode >= CMDQ_OP_CFGI_STE && opcode < CMDQ_OP_ATC_INV;
+}
+
#ifdef CONFIG_ARM_SMMU_V3_SVA
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
void arm_smmu_sva_notifier_synchronize(void);
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v4 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
2026-06-09 7:32 [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
2026-06-09 7:32 ` [PATCH v4 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions Ashish Mhetre
2026-06-09 7:32 ` [PATCH v4 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre
@ 2026-06-09 7:32 ` Ashish Mhetre
2026-06-17 3:58 ` [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
3 siblings, 0 replies; 6+ messages in thread
From: Ashish Mhetre @ 2026-06-09 7:32 UTC (permalink / raw)
To: will, robin.murphy, joro, jgg, nicolinc
Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
Apply the workaround for Tegra264 erratum ARM_SMMU_OPT_REPEAT_TLBI_CFGI
by issuing every CFGI/TLBI cmdlist twice on affected SMMU instances,
with CMD_SYNC after each. The erratum requires this exact sequencing:
TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC
Rename the existing arm_smmu_cmdq_issue_cmdlist() to
__arm_smmu_cmdq_issue_cmdlist() and add a thin wrapper that, on
affected SMMUs and when @sync is true with @n > 0, re-issues the
same cmdlist a second time when arm_smmu_erratum_cmd_needs_repeating()
is true. The @n > 0 gate is needed because arm_smmu_cmdq_batch_add_cmd_p()
can call arm_smmu_cmdq_issue_cmdlist() with @n == 0 and @sync == true
to flush a bare CMD_SYNC when the next command is not supported by
the batch's pre-selected cmdq; the repeat path must not inspect
cmds[0] in that case. The static-key gate inside the predicate means
the wrapper compiles to a single tested branch on unaffected kernels.
For the in-tree batching path, register the new condition with
arm_smmu_cmdq_batch_force_sync() so that a full batch carrying
CFGI/TLBI commands flushes with sync=true.
For the iommufd VSMMU path add an arm_vsmmu_can_batch_cmd() predicate
that splits the iommufd batch at every "needs repeating" transition,
so the wrapper's per-batch decision based on the first command stays
correct even when userspace mixes opcode classes.
Also document the erratum in Documentation/arch/arm64/silicon-errata.rst.
Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
Documentation/arch/arm64/silicon-errata.rst | 2 ++
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 15 +++++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 35 ++++++++++++++++---
3 files changed, 47 insertions(+), 5 deletions(-)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index 046a7fa47063..96050886a7d6 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -268,6 +268,8 @@ stable kernels.
| | | T241-MPAM-4, | |
| | | T241-MPAM-6 | |
+----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA | T264 SMMU | T264-SMMU-3 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 1e9f7d2de344..11d22acae613 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -350,6 +350,18 @@ static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu,
return 0;
}
+static bool arm_vsmmu_can_batch_cmd(struct arm_smmu_device *smmu,
+ struct arm_vsmmu_invalidation_cmd *last,
+ struct arm_vsmmu_invalidation_cmd *next)
+{
+ struct arm_smmu_cmd next_cmd = {
+ .data[0] = le64_to_cpu(next->ucmd.cmd[0]),
+ };
+
+ return arm_smmu_erratum_cmd_needs_repeating(smmu, &last->cmd) ==
+ arm_smmu_erratum_cmd_needs_repeating(smmu, &next_cmd);
+}
+
int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
struct iommu_user_data_array *array)
{
@@ -382,7 +394,8 @@ int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
/* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */
cur++;
- if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1)
+ if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1 &&
+ arm_vsmmu_can_batch_cmd(smmu, last, cur))
continue;
/* FIXME always uses the main cmdq rather than trying to group by type */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 599c835c50d8..041e188b3b30 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -700,10 +700,10 @@ static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq,
* insert their own list of commands then all of the commands from one
* CPU will appear before any of the commands from the other CPU.
*/
-int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
- struct arm_smmu_cmdq *cmdq,
- struct arm_smmu_cmd *cmds, int n,
- bool sync)
+static int __arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq *cmdq,
+ struct arm_smmu_cmd *cmds, int n,
+ bool sync)
{
struct arm_smmu_cmd cmd_sync;
u32 prod;
@@ -822,6 +822,28 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
return ret;
}
+int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq *cmdq,
+ struct arm_smmu_cmd *cmds, int n,
+ bool sync)
+{
+ int ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync);
+
+ /*
+ * arm_smmu_cmdq_batch_add_cmd_p() can flush its current batch with
+ * sync=true and n=0 (bare SYNC) when the next command is not
+ * supported by the batch's pre-selected cmdq, so the repeat path
+ * must not inspect cmds[0].
+ */
+ if (!n || ret || !sync)
+ return ret;
+
+ if (arm_smmu_erratum_cmd_needs_repeating(smmu, &cmds[0]))
+ ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync);
+
+ return ret;
+}
+
static int arm_smmu_cmdq_issue_cmd_p(struct arm_smmu_device *smmu,
struct arm_smmu_cmd *cmd, bool sync)
{
@@ -862,6 +884,11 @@ static bool arm_smmu_cmdq_batch_force_sync(struct arm_smmu_device *smmu,
(smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC))
return true;
+ /* See ARM_SMMU_OPT_REPEAT_TLBI_CFGI */
+ if (cmds->num == CMDQ_BATCH_ENTRIES &&
+ arm_smmu_erratum_cmd_needs_repeating(smmu, &cmds->cmds[0]))
+ return true;
+
return false;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround
2026-06-09 7:32 [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
` (2 preceding siblings ...)
2026-06-09 7:32 ` [PATCH v4 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre
@ 2026-06-17 3:58 ` Ashish Mhetre
2026-06-17 4:54 ` Nicolin Chen
3 siblings, 1 reply; 6+ messages in thread
From: Ashish Mhetre @ 2026-06-17 3:58 UTC (permalink / raw)
To: will, robin.murphy, joro, jgg, nicolinc
Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra
On 6/9/2026 1:02 PM, Ashish Mhetre wrote:
> Nvidia Tegra264 SMMUs are affected by an erratum where a TLB entry can
> survive an invalidation that races with concurrent traffic targeting
> the same entry. The hardware-recommended software workaround is to
> issue every CFGI/TLBI command (each followed by CMD_SYNC) twice.
> The second issue must execute only after the first issue's CMD_SYNC
> has completed, giving the sequence:
>
> TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC
>
> ATC_INV is not affected and must not be doubled.
>
> The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it
> cannot be detected from hardware ID. Tegra264 is device-tree-only
> (no ACPI/IORT support), so detection is purely by compatible string.
>
> This series is structured as a small refactor + detect + apply
> sequence so that each step is reviewable in isolation:
>
> 1/3 Pure refactor (no functional change): lift the existing
> force-sync conditions out of arm_smmu_cmdq_batch_add_cmd_p()
> into a new arm_smmu_cmdq_batch_force_sync() helper, so that
> adding another condition (in patch 3) is a one-liner.
> Authored by Nicolin Chen.
>
> 2/3 Detect the erratum and provide the classifier. Adds the
> ARM_SMMU_OPT_REPEAT_TLBI_CFGI per-instance option, a global
> arm_smmu_erratum_repeat_tlbi_cfgi_key static key, and the
> arm_smmu_erratum_cmd_needs_repeating() predicate. The static
> key means the wrapper compiles to a single tested branch on
> unaffected kernels.
>
> 3/3 Apply the workaround: factor arm_smmu_cmdq_issue_cmdlist()
> into a thin wrapper around __arm_smmu_cmdq_issue_cmdlist()
> that re-issues the cmdlist a second time when the predicate
> fires; register the same condition with the batch helper so
> full batches of CFGI/TLBI flush with sync=true; and add
> arm_vsmmu_can_batch_cmd() so iommufd does not mix command
> classes inside a single batch. Also documents the erratum
> in silicon-errata.rst.
>
> The series applies cleanly on linux-next/master (base-commit below).
>
> Changes since v3:
> - Drop the cmds->num == 0 early-return so the refactor is
> truly "no functional change".
> - Rename ARM_SMMU_OPT_TLBI_TWICE -> ARM_SMMU_OPT_REPEAT_TLBI_CFGI
> and rephrase its kdoc to be hardware-agnostic.
> - Rename arm_smmu_cmd_needs_tlbi_twice() ->
> arm_smmu_erratum_cmd_needs_repeating() and drop the kdoc
> above it.
> - Replace the explicit opcode switch with a single range check
> opcode >= CMDQ_OP_CFGI_STE && opcode < CMDQ_OP_ATC_INV.
> - Introduce arm_smmu_erratum_repeat_tlbi_cfgi_key static key:
> the predicate gates on it first so unaffected kernels pay
> only a single static_branch_unlikely() check.
> - Drop the verbose Tegra264-specific comments above
> arm_vsmmu_can_batch_cmd() and inside the batch helper.
> - Document the erratum in
> Documentation/arch/arm64/silicon-errata.rst.
> - Guard the repeat path in arm_smmu_cmdq_issue_cmdlist() with
> an n > 0 check so we never inspect cmds[0] on the bare-SYNC
> flush emitted by arm_smmu_cmdq_batch_add_cmd_p() when the
> next command is unsupported by the batch's pre-selected
> cmdq.
> - Drop the carried Reviewed-by tags now that the patch
> shape has changed; re-review appreciated.
>
> Changes since v2:
> - Split into a 3-patch series (refactor / detect / apply) to keep
> each step small and bisectable.
> - Move the classifier to arm-smmu-v3.h as static inline so the
> iommufd file can share it.
> - Add arm_vsmmu_can_batch_cmd() to split iommufd batches at
> "needs repeating" transitions so the per-batch decision based
> on the first command stays correct under mixed user input.
> - Spell out in the commit message why detection is via DT and
> not via IIDR/ACPI.
>
> Changes since v1:
> - Detect the erratum from the existing "nvidia,tegra264-smmu"
> compatible instead of adding a new property.
> - Centralise the doubling at the CMDQ submission layer and only
> apply it to CFGI/TLBI (not ATC_INV).
> - Drop the binding/dtsi patches accordingly.
>
> Ashish Mhetre (2):
> iommu/arm-smmu-v3: Detect Tegra264 erratum
> iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
>
> Nicolin Chen (1):
> iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions
>
> Documentation/arch/arm64/silicon-errata.rst | 2 +
> .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 15 ++++-
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 +++++++++++++++----
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 24 +++++++
> 4 files changed, 94 insertions(+), 12 deletions(-)
>
>
> base-commit: 7da7f07112610a520567421dd2ffcb51beaefbcc
Hi all,
A gentle reminder to review the patches and share your comments.
Thanks,
Ashish Mhetre
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround
2026-06-17 3:58 ` [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
@ 2026-06-17 4:54 ` Nicolin Chen
0 siblings, 0 replies; 6+ messages in thread
From: Nicolin Chen @ 2026-06-17 4:54 UTC (permalink / raw)
To: Ashish Mhetre
Cc: will, robin.murphy, joro, jgg, linux-arm-kernel, iommu,
linux-kernel, linux-tegra
On Wed, Jun 17, 2026 at 09:28:10AM +0530, Ashish Mhetre wrote:
> On 6/9/2026 1:02 PM, Ashish Mhetre wrote:
> Hi all,
>
> A gentle reminder to review the patches and share your comments.
https://docs.kernel.org/process/maintainer-tip.html
"
4.2.9 Merge window
Please do not expect patches to be reviewed or merged by tip
maintainers around or during the merge window. The trees are closed
to all but urgent fixes during this time. They reopen once the merge
window closes and a new -rc1 kernel has been released.
"
I would wait for rc1 to rebase and respin.
Nicolin
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-06-17 4:55 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 7:32 [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
2026-06-09 7:32 ` [PATCH v4 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions Ashish Mhetre
2026-06-09 7:32 ` [PATCH v4 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre
2026-06-09 7:32 ` [PATCH v4 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre
2026-06-17 3:58 ` [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre
2026-06-17 4:54 ` Nicolin Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox