linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rc v3 0/4] : iommu/arm-smmu-v3: Fix hitless STE update in nesting cases
@ 2025-12-10  2:45 Nicolin Chen
  2025-12-10  2:45 ` [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence Nicolin Chen
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Nicolin Chen @ 2025-12-10  2:45 UTC (permalink / raw)
  To: jgg, will, robin.murphy
  Cc: joro, linux-arm-kernel, iommu, linux-kernel, skolothumtho, praan,
	xueshuai

Occasional C_BAD_STE errors were observed in nesting setups where a device
attached to a nested bypass/identity domain enables PASID.

This occurred when the physical STE was updated from S2-only mode to S1+S2
nesting mode, but the update failed to use the hitless routine that it was
supposed to use. Instead, it cleared STE.V bit to load the CD table, while
the default substream was still actively performing DMA.

It was later found that the diff algorithm in arm_smmu_entry_qword_diff()
enforced an additional critical word due to misaligned MEV and EATS fields
between S2-only and S1+S2 modes.

Both fields are either well-managed or non-critical, so move them to the
"ignored" list to relax the qword diff algorithm.

Additionally, add KUnit test coverage for these nesting STE cases.

This is on Github:
https://github.com/nicolinc/iommufd/commits/smmuv3_ste_fixes/

A host kernel must apply this to fix the bug.

Changelog
v3:
 * Add Reviewed-by from Shuai
 * Add an inline comments in nested test cases
 * Reuse arm_smmu_test_make_cdtable_ste() for nested test cases
v2:
 https://lore.kernel.org/all/cover.1765140287.git.nicolinc@nvidia.com/
 * Fix kunit tests
 * Update commit message and inline comments
 * Keep MEV/EATS in used list by masking them away using ignored_bits
v1:
 https://lore.kernel.org/all/cover.1764982046.git.nicolinc@nvidia.com/

Jason Gunthorpe (3):
  iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence
  iommu/arm-smmu-v3: Ignore STE MEV when computing the update sequence
  iommu/arm-smmu-v3: Ignore STE EATS when computing the update sequence

Nicolin Chen (1):
  iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  2 +
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  | 65 ++++++++++++++++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 43 ++++++++++--
 3 files changed, 102 insertions(+), 8 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence
  2025-12-10  2:45 [PATCH rc v3 0/4] : iommu/arm-smmu-v3: Fix hitless STE update in nesting cases Nicolin Chen
@ 2025-12-10  2:45 ` Nicolin Chen
  2025-12-14 22:32   ` Mostafa Saleh
  2025-12-10  2:45 ` [PATCH rc v3 2/4] iommu/arm-smmu-v3: Ignore STE MEV when computing the " Nicolin Chen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Nicolin Chen @ 2025-12-10  2:45 UTC (permalink / raw)
  To: jgg, will, robin.murphy
  Cc: joro, linux-arm-kernel, iommu, linux-kernel, skolothumtho, praan,
	xueshuai

From: Jason Gunthorpe <jgg@nvidia.com>

C_BAD_STE was observed when updating nested STE from an S1-bypass mode to
an S1DSS-bypass mode. As both modes enabled S2, the used bit is slightly
different than the normal S1-bypass and S1DSS-bypass modes. As a result,
fields like MEV and EATS in S2's used list marked the word1 as a critical
word that requested a STE.V=0. This breaks a hitless update.

However, both MEV and EATS aren't critical in terms of STE update. One
controls the merge of the events and the other controls the ATS that is
managed by the driver at the same time via pci_enable_ats().

Add an arm_smmu_get_ste_ignored() to allow STE update algorithm to ignore
those fields, avoiding the STE update breakages.

Note that this change is required by both MEV and EATS fields, which were
introduced in different kernel versions. So add this get_ignored() first.
The MEV and EATS will be added in arm_smmu_get_ste_ignored() separately.

Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  2 ++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  | 19 ++++++++++++---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 24 +++++++++++++++----
 3 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index ae23aacc3840..d5f0e5407b9f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -900,6 +900,7 @@ struct arm_smmu_entry_writer {
 
 struct arm_smmu_entry_writer_ops {
 	void (*get_used)(const __le64 *entry, __le64 *used);
+	void (*get_ignored)(__le64 *ignored_bits);
 	void (*sync)(struct arm_smmu_entry_writer *writer);
 };
 
@@ -911,6 +912,7 @@ void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 
 #if IS_ENABLED(CONFIG_KUNIT)
 void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits);
+void arm_smmu_get_ste_ignored(__le64 *ignored_bits);
 void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *cur,
 			  const __le64 *target);
 void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
index d2671bfd3798..3556e65cf9ac 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
@@ -38,13 +38,16 @@ enum arm_smmu_test_master_feat {
 static bool arm_smmu_entry_differs_in_used_bits(const __le64 *entry,
 						const __le64 *used_bits,
 						const __le64 *target,
+						const __le64 *ignored,
 						unsigned int length)
 {
 	bool differs = false;
 	unsigned int i;
 
 	for (i = 0; i < length; i++) {
-		if ((entry[i] & used_bits[i]) != target[i])
+		__le64 used = used_bits[i] & ~ignored[i];
+
+		if ((entry[i] & used) != (target[i] & used))
 			differs = true;
 	}
 	return differs;
@@ -56,12 +59,18 @@ arm_smmu_test_writer_record_syncs(struct arm_smmu_entry_writer *writer)
 	struct arm_smmu_test_writer *test_writer =
 		container_of(writer, struct arm_smmu_test_writer, writer);
 	__le64 *entry_used_bits;
+	__le64 *ignored;
 
 	entry_used_bits = kunit_kzalloc(
 		test_writer->test, sizeof(*entry_used_bits) * NUM_ENTRY_QWORDS,
 		GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test_writer->test, entry_used_bits);
 
+	ignored = kunit_kzalloc(test_writer->test,
+				sizeof(*ignored) * NUM_ENTRY_QWORDS,
+				GFP_KERNEL);
+	KUNIT_ASSERT_NOT_NULL(test_writer->test, ignored);
+
 	pr_debug("STE value is now set to: ");
 	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8,
 			     test_writer->entry,
@@ -79,14 +88,17 @@ arm_smmu_test_writer_record_syncs(struct arm_smmu_entry_writer *writer)
 		 * configuration.
 		 */
 		writer->ops->get_used(test_writer->entry, entry_used_bits);
+		if (writer->ops->get_ignored)
+			writer->ops->get_ignored(ignored);
 		KUNIT_EXPECT_FALSE(
 			test_writer->test,
 			arm_smmu_entry_differs_in_used_bits(
 				test_writer->entry, entry_used_bits,
-				test_writer->init_entry, NUM_ENTRY_QWORDS) &&
+				test_writer->init_entry, ignored,
+				NUM_ENTRY_QWORDS) &&
 				arm_smmu_entry_differs_in_used_bits(
 					test_writer->entry, entry_used_bits,
-					test_writer->target_entry,
+					test_writer->target_entry, ignored,
 					NUM_ENTRY_QWORDS));
 	}
 }
@@ -106,6 +118,7 @@ arm_smmu_v3_test_debug_print_used_bits(struct arm_smmu_entry_writer *writer,
 static const struct arm_smmu_entry_writer_ops test_ste_ops = {
 	.sync = arm_smmu_test_writer_record_syncs,
 	.get_used = arm_smmu_get_ste_used,
+	.get_ignored = arm_smmu_get_ste_ignored,
 };
 
 static const struct arm_smmu_entry_writer_ops test_cd_ops = {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index d16d35c78c06..e22c0890041b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1082,6 +1082,12 @@ void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
 }
 EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_used);
 
+VISIBLE_IF_KUNIT
+void arm_smmu_get_ste_ignored(__le64 *ignored_bits)
+{
+}
+EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_ignored);
+
 /*
  * Figure out if we can do a hitless update of entry to become target. Returns a
  * bit mask where 1 indicates that qword needs to be set disruptively.
@@ -1094,13 +1100,22 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
 {
 	__le64 target_used[NUM_ENTRY_QWORDS] = {};
 	__le64 cur_used[NUM_ENTRY_QWORDS] = {};
+	__le64 ignored[NUM_ENTRY_QWORDS] = {};
 	u8 used_qword_diff = 0;
 	unsigned int i;
 
 	writer->ops->get_used(entry, cur_used);
 	writer->ops->get_used(target, target_used);
+	if (writer->ops->get_ignored)
+		writer->ops->get_ignored(ignored);
 
 	for (i = 0; i != NUM_ENTRY_QWORDS; i++) {
+		/*
+		 * Ignored is only used for bits that are used by both entries,
+		 * otherwise it is sequenced according to the unused entry.
+		 */
+		ignored[i] &= target_used[i] & cur_used[i];
+
 		/*
 		 * Check that masks are up to date, the make functions are not
 		 * allowed to set a bit to 1 if the used function doesn't say it
@@ -1109,6 +1124,7 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
 		WARN_ON_ONCE(target[i] & ~target_used[i]);
 
 		/* Bits can change because they are not currently being used */
+		cur_used[i] &= ~ignored[i];
 		unused_update[i] = (entry[i] & cur_used[i]) |
 				   (target[i] & ~cur_used[i]);
 		/*
@@ -1207,12 +1223,9 @@ void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *entry,
 		entry_set(writer, entry, target, 0, 1);
 	} else {
 		/*
-		 * No inuse bit changed. Sanity check that all unused bits are 0
-		 * in the entry. The target was already sanity checked by
-		 * compute_qword_diff().
+		 * No inuse bit changed, though ignored bits may have changed.
 		 */
-		WARN_ON_ONCE(
-			entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS));
+		entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS);
 	}
 }
 EXPORT_SYMBOL_IF_KUNIT(arm_smmu_write_entry);
@@ -1543,6 +1556,7 @@ static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer)
 static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = {
 	.sync = arm_smmu_ste_writer_sync_entry,
 	.get_used = arm_smmu_get_ste_used,
+	.get_ignored = arm_smmu_get_ste_ignored,
 };
 
 static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rc v3 2/4] iommu/arm-smmu-v3: Ignore STE MEV when computing the update sequence
  2025-12-10  2:45 [PATCH rc v3 0/4] : iommu/arm-smmu-v3: Fix hitless STE update in nesting cases Nicolin Chen
  2025-12-10  2:45 ` [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence Nicolin Chen
@ 2025-12-10  2:45 ` Nicolin Chen
  2025-12-10  2:45 ` [PATCH rc v3 3/4] iommu/arm-smmu-v3: Ignore STE EATS " Nicolin Chen
  2025-12-10  2:45 ` [PATCH rc v3 4/4] iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage Nicolin Chen
  3 siblings, 0 replies; 12+ messages in thread
From: Nicolin Chen @ 2025-12-10  2:45 UTC (permalink / raw)
  To: jgg, will, robin.murphy
  Cc: joro, linux-arm-kernel, iommu, linux-kernel, skolothumtho, praan,
	xueshuai

From: Jason Gunthorpe <jgg@nvidia.com>

Nested CD tables set the MEV bit to try to reduce multi-fault spamming on
the hypervisor. Since MEV is in STE word 1 this causes a breaking update
sequence that is not required and impacts real workloads.

For the purposes of STE updates the value of MEV doesn't matter, if it is
set/cleared early or late it just results in a change to the fault reports
that must be supported by the kernel anyhow. The spec says:

 Note: Software must expect, and be able to deal with, coalesced fault
 records even when MEV == 0.

So ignore MEV when computing the update sequence to avoid creating a
breaking update.

Fixes: da0c56520e88 ("iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e22c0890041b..3e161d8298d9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1085,6 +1085,16 @@ EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_used);
 VISIBLE_IF_KUNIT
 void arm_smmu_get_ste_ignored(__le64 *ignored_bits)
 {
+	/*
+	 * MEV does not meaningfully impact the operation of the HW, it only
+	 * changes how many fault events are generated, thus we can ignore it
+	 * when computing the ordering. The spec notes the device can act like
+	 * MEV=1 anyhow:
+	 *
+	 *  Note: Software must expect, and be able to deal with, coalesced
+	 *  fault records even when MEV == 0.
+	 */
+	ignored_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV);
 }
 EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_ignored);
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rc v3 3/4] iommu/arm-smmu-v3: Ignore STE EATS when computing the update sequence
  2025-12-10  2:45 [PATCH rc v3 0/4] : iommu/arm-smmu-v3: Fix hitless STE update in nesting cases Nicolin Chen
  2025-12-10  2:45 ` [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence Nicolin Chen
  2025-12-10  2:45 ` [PATCH rc v3 2/4] iommu/arm-smmu-v3: Ignore STE MEV when computing the " Nicolin Chen
@ 2025-12-10  2:45 ` Nicolin Chen
  2025-12-10  2:45 ` [PATCH rc v3 4/4] iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage Nicolin Chen
  3 siblings, 0 replies; 12+ messages in thread
From: Nicolin Chen @ 2025-12-10  2:45 UTC (permalink / raw)
  To: jgg, will, robin.murphy
  Cc: joro, linux-arm-kernel, iommu, linux-kernel, skolothumtho, praan,
	xueshuai

From: Jason Gunthorpe <jgg@nvidia.com>

If a VM wants to toggle EATS off at the same time as changing the CFG, the
hypervisor will see EATS change to 0 and insert a V=0 breaking update into
the STE even though the VM did not ask for that.

In bare metal, EATS is ignored by CFG=ABORT/BYPASS, which is why this does
not cause a problem until we have nested where CFG is always a variation of
S2 trans that does use EATS.

Relax the rules for EATS sequencing, we don't need it to be exact because
the enclosing code will always disable ATS at the PCI device if we are
changing EATS. This ensures there are no ATS transactions that can race
with an EATS change so we don't need to carefully sequence these bits.

Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3e161d8298d9..72ba41591fdb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1095,6 +1095,15 @@ void arm_smmu_get_ste_ignored(__le64 *ignored_bits)
 	 *  fault records even when MEV == 0.
 	 */
 	ignored_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV);
+
+	/*
+	 * EATS is used to reject and control the ATS behavior of the device. If
+	 * we are changing it away from 0 then we already trust the device to
+	 * use ATS properly and we have sequenced the device's ATS enable in PCI
+	 * config space to prevent it from issuing ATS while we are changing
+	 * EATS.
+	 */
+	ignored_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
 }
 EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_ignored);
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rc v3 4/4] iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage
  2025-12-10  2:45 [PATCH rc v3 0/4] : iommu/arm-smmu-v3: Fix hitless STE update in nesting cases Nicolin Chen
                   ` (2 preceding siblings ...)
  2025-12-10  2:45 ` [PATCH rc v3 3/4] iommu/arm-smmu-v3: Ignore STE EATS " Nicolin Chen
@ 2025-12-10  2:45 ` Nicolin Chen
  3 siblings, 0 replies; 12+ messages in thread
From: Nicolin Chen @ 2025-12-10  2:45 UTC (permalink / raw)
  To: jgg, will, robin.murphy
  Cc: joro, linux-arm-kernel, iommu, linux-kernel, skolothumtho, praan,
	xueshuai

STE in a nested case requires both S1 and S2 fields. And this makes the use
case different from the existing one.

Add coverage for previously failed cases shifting between S2-only and S1+S2
STEs.

Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  | 46 +++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
index 3556e65cf9ac..ffa43e103692 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
@@ -33,8 +33,12 @@ static struct mm_struct sva_mm = {
 enum arm_smmu_test_master_feat {
 	ARM_SMMU_MASTER_TEST_ATS = BIT(0),
 	ARM_SMMU_MASTER_TEST_STALL = BIT(1),
+	ARM_SMMU_MASTER_TEST_NESTED = BIT(2),
 };
 
+static void arm_smmu_test_make_s2_ste(struct arm_smmu_ste *ste,
+				      enum arm_smmu_test_master_feat feat);
+
 static bool arm_smmu_entry_differs_in_used_bits(const __le64 *entry,
 						const __le64 *used_bits,
 						const __le64 *target,
@@ -198,6 +202,17 @@ static void arm_smmu_test_make_cdtable_ste(struct arm_smmu_ste *ste,
 	};
 
 	arm_smmu_make_cdtable_ste(ste, &master, ats_enabled, s1dss);
+	if (feat & ARM_SMMU_MASTER_TEST_NESTED) {
+		struct arm_smmu_ste s2ste;
+		int i;
+
+		arm_smmu_test_make_s2_ste(&s2ste, ARM_SMMU_MASTER_TEST_ATS);
+		ste->data[0] |= cpu_to_le64(
+			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_NESTED));
+		ste->data[1] |= cpu_to_le64(STRTAB_STE_1_MEV);
+		for (i = 2; i < NUM_ENTRY_QWORDS; i++)
+			ste->data[i] = s2ste.data[i];
+	}
 }
 
 static void arm_smmu_v3_write_ste_test_bypass_to_abort(struct kunit *test)
@@ -555,6 +570,35 @@ static void arm_smmu_v3_write_ste_test_s2_to_s1_stall(struct kunit *test)
 						       NUM_EXPECTED_SYNCS(3));
 }
 
+static void
+arm_smmu_v3_write_ste_test_nested_s1dssbypass_to_s1bypass(struct kunit *test)
+{
+	struct arm_smmu_ste s1_ste;
+	struct arm_smmu_ste s2_ste;
+
+	arm_smmu_test_make_cdtable_ste(
+		&s1_ste, STRTAB_STE_1_S1DSS_BYPASS, fake_cdtab_dma_addr,
+		ARM_SMMU_MASTER_TEST_ATS | ARM_SMMU_MASTER_TEST_NESTED);
+	arm_smmu_test_make_s2_ste(&s2_ste, 0);
+	/* Expect an additional sync to unset ignored bits: EATS and MEV */
+	arm_smmu_v3_test_ste_expect_hitless_transition(test, &s1_ste, &s2_ste,
+						       NUM_EXPECTED_SYNCS(3));
+}
+
+static void
+arm_smmu_v3_write_ste_test_nested_s1bypass_to_s1dssbypass(struct kunit *test)
+{
+	struct arm_smmu_ste s1_ste;
+	struct arm_smmu_ste s2_ste;
+
+	arm_smmu_test_make_cdtable_ste(
+		&s1_ste, STRTAB_STE_1_S1DSS_BYPASS, fake_cdtab_dma_addr,
+		ARM_SMMU_MASTER_TEST_ATS | ARM_SMMU_MASTER_TEST_NESTED);
+	arm_smmu_test_make_s2_ste(&s2_ste, 0);
+	arm_smmu_v3_test_ste_expect_hitless_transition(test, &s2_ste, &s1_ste,
+						       NUM_EXPECTED_SYNCS(2));
+}
+
 static void arm_smmu_v3_write_cd_test_sva_clear(struct kunit *test)
 {
 	struct arm_smmu_cd cd = {};
@@ -601,6 +645,8 @@ static struct kunit_case arm_smmu_v3_test_cases[] = {
 	KUNIT_CASE(arm_smmu_v3_write_cd_test_s1_change_asid),
 	KUNIT_CASE(arm_smmu_v3_write_ste_test_s1_to_s2_stall),
 	KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_s1_stall),
+	KUNIT_CASE(arm_smmu_v3_write_ste_test_nested_s1dssbypass_to_s1bypass),
+	KUNIT_CASE(arm_smmu_v3_write_ste_test_nested_s1bypass_to_s1dssbypass),
 	KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_clear),
 	KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_release),
 	{},
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence
  2025-12-10  2:45 ` [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence Nicolin Chen
@ 2025-12-14 22:32   ` Mostafa Saleh
  2025-12-15 20:51     ` Nicolin Chen
  2025-12-16  0:09     ` Jason Gunthorpe
  0 siblings, 2 replies; 12+ messages in thread
From: Mostafa Saleh @ 2025-12-14 22:32 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, will, robin.murphy, joro, linux-arm-kernel, iommu,
	linux-kernel, skolothumtho, praan, xueshuai

Hi Nicolin,

On Tue, Dec 09, 2025 at 06:45:16PM -0800, Nicolin Chen wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> C_BAD_STE was observed when updating nested STE from an S1-bypass mode to
> an S1DSS-bypass mode. As both modes enabled S2, the used bit is slightly
> different than the normal S1-bypass and S1DSS-bypass modes. As a result,
> fields like MEV and EATS in S2's used list marked the word1 as a critical
> word that requested a STE.V=0. This breaks a hitless update.
> 
> However, both MEV and EATS aren't critical in terms of STE update. One
> controls the merge of the events and the other controls the ATS that is
> managed by the driver at the same time via pci_enable_ats().
> 
> Add an arm_smmu_get_ste_ignored() to allow STE update algorithm to ignore
> those fields, avoiding the STE update breakages.
> 
> Note that this change is required by both MEV and EATS fields, which were
> introduced in different kernel versions. So add this get_ignored() first.
> The MEV and EATS will be added in arm_smmu_get_ste_ignored() separately.
> 
> Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
> Cc: stable@vger.kernel.org
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  2 ++
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  | 19 ++++++++++++---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 24 +++++++++++++++----
>  3 files changed, 37 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index ae23aacc3840..d5f0e5407b9f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -900,6 +900,7 @@ struct arm_smmu_entry_writer {
>  
>  struct arm_smmu_entry_writer_ops {
>  	void (*get_used)(const __le64 *entry, __le64 *used);
> +	void (*get_ignored)(__le64 *ignored_bits);
>  	void (*sync)(struct arm_smmu_entry_writer *writer);
>  };
>  
> @@ -911,6 +912,7 @@ void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
>  
>  #if IS_ENABLED(CONFIG_KUNIT)
>  void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits);
> +void arm_smmu_get_ste_ignored(__le64 *ignored_bits);
>  void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *cur,
>  			  const __le64 *target);
>  void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
> index d2671bfd3798..3556e65cf9ac 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
> @@ -38,13 +38,16 @@ enum arm_smmu_test_master_feat {
>  static bool arm_smmu_entry_differs_in_used_bits(const __le64 *entry,
>  						const __le64 *used_bits,
>  						const __le64 *target,
> +						const __le64 *ignored,
>  						unsigned int length)
>  {
>  	bool differs = false;
>  	unsigned int i;
>  
>  	for (i = 0; i < length; i++) {
> -		if ((entry[i] & used_bits[i]) != target[i])
> +		__le64 used = used_bits[i] & ~ignored[i];
> +
> +		if ((entry[i] & used) != (target[i] & used))
>  			differs = true;
>  	}
>  	return differs;
> @@ -56,12 +59,18 @@ arm_smmu_test_writer_record_syncs(struct arm_smmu_entry_writer *writer)
>  	struct arm_smmu_test_writer *test_writer =
>  		container_of(writer, struct arm_smmu_test_writer, writer);
>  	__le64 *entry_used_bits;
> +	__le64 *ignored;
>  
>  	entry_used_bits = kunit_kzalloc(
>  		test_writer->test, sizeof(*entry_used_bits) * NUM_ENTRY_QWORDS,
>  		GFP_KERNEL);
>  	KUNIT_ASSERT_NOT_NULL(test_writer->test, entry_used_bits);
>  
> +	ignored = kunit_kzalloc(test_writer->test,
> +				sizeof(*ignored) * NUM_ENTRY_QWORDS,
> +				GFP_KERNEL);
> +	KUNIT_ASSERT_NOT_NULL(test_writer->test, ignored);
> +
>  	pr_debug("STE value is now set to: ");
>  	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8,
>  			     test_writer->entry,
> @@ -79,14 +88,17 @@ arm_smmu_test_writer_record_syncs(struct arm_smmu_entry_writer *writer)
>  		 * configuration.
>  		 */
>  		writer->ops->get_used(test_writer->entry, entry_used_bits);
> +		if (writer->ops->get_ignored)
> +			writer->ops->get_ignored(ignored);
>  		KUNIT_EXPECT_FALSE(
>  			test_writer->test,
>  			arm_smmu_entry_differs_in_used_bits(
>  				test_writer->entry, entry_used_bits,
> -				test_writer->init_entry, NUM_ENTRY_QWORDS) &&
> +				test_writer->init_entry, ignored,
> +				NUM_ENTRY_QWORDS) &&
>  				arm_smmu_entry_differs_in_used_bits(
>  					test_writer->entry, entry_used_bits,
> -					test_writer->target_entry,
> +					test_writer->target_entry, ignored,
>  					NUM_ENTRY_QWORDS));
>  	}
>  }
> @@ -106,6 +118,7 @@ arm_smmu_v3_test_debug_print_used_bits(struct arm_smmu_entry_writer *writer,
>  static const struct arm_smmu_entry_writer_ops test_ste_ops = {
>  	.sync = arm_smmu_test_writer_record_syncs,
>  	.get_used = arm_smmu_get_ste_used,
> +	.get_ignored = arm_smmu_get_ste_ignored,
>  };
>  
>  static const struct arm_smmu_entry_writer_ops test_cd_ops = {
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index d16d35c78c06..e22c0890041b 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1082,6 +1082,12 @@ void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
>  }
>  EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_used);
>  
> +VISIBLE_IF_KUNIT
> +void arm_smmu_get_ste_ignored(__le64 *ignored_bits)
> +{
> +}
> +EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_ignored);
> +
>  /*
>   * Figure out if we can do a hitless update of entry to become target. Returns a
>   * bit mask where 1 indicates that qword needs to be set disruptively.
> @@ -1094,13 +1100,22 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
>  {
>  	__le64 target_used[NUM_ENTRY_QWORDS] = {};
>  	__le64 cur_used[NUM_ENTRY_QWORDS] = {};
> +	__le64 ignored[NUM_ENTRY_QWORDS] = {};

I think we can avoid extra stack allocation for another STE, if we make
the function update cur_used directly, but no strong opinion.

>  	u8 used_qword_diff = 0;
>  	unsigned int i;
>  
>  	writer->ops->get_used(entry, cur_used);
>  	writer->ops->get_used(target, target_used);
> +	if (writer->ops->get_ignored)
> +		writer->ops->get_ignored(ignored);
>  
>  	for (i = 0; i != NUM_ENTRY_QWORDS; i++) {
> +		/*
> +		 * Ignored is only used for bits that are used by both entries,
> +		 * otherwise it is sequenced according to the unused entry.
> +		 */
> +		ignored[i] &= target_used[i] & cur_used[i];
> +
>  		/*
>  		 * Check that masks are up to date, the make functions are not
>  		 * allowed to set a bit to 1 if the used function doesn't say it
> @@ -1109,6 +1124,7 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
>  		WARN_ON_ONCE(target[i] & ~target_used[i]);
>  
>  		/* Bits can change because they are not currently being used */
> +		cur_used[i] &= ~ignored[i];
>  		unused_update[i] = (entry[i] & cur_used[i]) |
>  				   (target[i] & ~cur_used[i]);
>  		/*
> @@ -1207,12 +1223,9 @@ void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *entry,
>  		entry_set(writer, entry, target, 0, 1);
>  	} else {
>  		/*
> -		 * No inuse bit changed. Sanity check that all unused bits are 0
> -		 * in the entry. The target was already sanity checked by
> -		 * compute_qword_diff().
> +		 * No inuse bit changed, though ignored bits may have changed.
>  		 */
> -		WARN_ON_ONCE(
> -			entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS));
> +		entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS);

After this change, no other caller uses the entry_set() return value, so it
can be changed to return void.

>  	}
>  }
>  EXPORT_SYMBOL_IF_KUNIT(arm_smmu_write_entry);
> @@ -1543,6 +1556,7 @@ static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer)
>  static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = {
>  	.sync = arm_smmu_ste_writer_sync_entry,
>  	.get_used = arm_smmu_get_ste_used,
> +	.get_ignored = arm_smmu_get_ste_ignored,
>  };
>  

I have some mixed feelings about this, having get_used(), then get_ignored()
with the same bits set seems confusing to me, specially the get_ignored()
loops back to update cur_used, which is set from get_used()

My initial though was just to remove this bit from get_used() + some changes
to checks setting bits that are not used would be enough, and the semantics
of get_used() can be something as:
“Return bits used by the updated translation regime that MUST be observed
atomically” and in that case we can ignore things as MEV as it doesn’t
impact the translation.

However, this approach makes it a bit explicit which bits are ignored, if we
keep this logic, I think changing the name of get_ignored() might help, to
something as "get_allowed_break()" or "get_update_safe()"?

Thanks,
Mostafa

>  static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence
  2025-12-14 22:32   ` Mostafa Saleh
@ 2025-12-15 20:51     ` Nicolin Chen
  2025-12-16 22:49       ` Mostafa Saleh
  2025-12-16  0:09     ` Jason Gunthorpe
  1 sibling, 1 reply; 12+ messages in thread
From: Nicolin Chen @ 2025-12-15 20:51 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: jgg, will, robin.murphy, joro, linux-arm-kernel, iommu,
	linux-kernel, skolothumtho, praan, xueshuai

Hi Mostafa,

On Sun, Dec 14, 2025 at 10:32:35PM +0000, Mostafa Saleh wrote:
> On Tue, Dec 09, 2025 at 06:45:16PM -0800, Nicolin Chen wrote:
> > @@ -1207,12 +1223,9 @@ void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *entry,
> >  		entry_set(writer, entry, target, 0, 1);
> >  	} else {
> >  		/*
> > -		 * No inuse bit changed. Sanity check that all unused bits are 0
> > -		 * in the entry. The target was already sanity checked by
> > -		 * compute_qword_diff().
> > +		 * No inuse bit changed, though ignored bits may have changed.
> >  		 */
> > -		WARN_ON_ONCE(
> > -			entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS));
> > +		entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS);
> 
> After this change, no other caller uses the entry_set() return value, so it
> can be changed to return void.

OK.

> >  	}
> >  }
> >  EXPORT_SYMBOL_IF_KUNIT(arm_smmu_write_entry);
> > @@ -1543,6 +1556,7 @@ static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer)
> >  static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = {
> >  	.sync = arm_smmu_ste_writer_sync_entry,
> >  	.get_used = arm_smmu_get_ste_used,
> > +	.get_ignored = arm_smmu_get_ste_ignored,
> >  };
> >  
> 
> I have some mixed feelings about this, having get_used(), then get_ignored()
> with the same bits set seems confusing to me, specially the get_ignored()
> loops back to update cur_used, which is set from get_used()
> 
> My initial though was just to remove this bit from get_used() + some changes
> to checks setting bits that are not used would be enough, and the semantics
> of get_used() can be something as:
> “Return bits used by the updated translation regime that MUST be observed
> atomically” and in that case we can ignore things as MEV as it doesn’t
> impact the translation.
> 
> However, this approach makes it a bit explicit which bits are ignored, if we
> keep this logic, I think changing the name of get_ignored() might help, to
> something as "get_allowed_break()" or "get_update_safe()"?

I think "ignored" itself is brief and understandable.. Instead,
perhaps we can add a kdocs to make it clearer:

/**
 * struct arm_smmu_entry_writer_ops - STE/CD entry writer operations
 * @get_used: Output to @used the bits used by the hardware corresponding to the
 *            configurations bits set in a given @entry
 * @get_ignored: Output to @ignored the bits that are listed in the "used" list
 *               but allowed to be ignored by arm_smmu_entry_qword_diff(). Each
 *               field (bits) must provide a reason to justify that the entries
 *               can be updated safely without breaking STE/CD configurations.
 * @sync: Operation to synchronize the updated STE/CD entries in the memory
 */
struct arm_smmu_entry_writer_ops {
	void (*get_used)(const __le64 *entry, __le64 *used);
	void (*get_ignored)(__le64 *ignored);
	void (*sync)(struct arm_smmu_entry_writer *writer);
};

?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence
  2025-12-14 22:32   ` Mostafa Saleh
  2025-12-15 20:51     ` Nicolin Chen
@ 2025-12-16  0:09     ` Jason Gunthorpe
  2025-12-16 20:46       ` Nicolin Chen
  2025-12-16 22:58       ` Mostafa Saleh
  1 sibling, 2 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2025-12-16  0:09 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: Nicolin Chen, will, robin.murphy, joro, linux-arm-kernel, iommu,
	linux-kernel, skolothumtho, praan, xueshuai

On Sun, Dec 14, 2025 at 10:32:35PM +0000, Mostafa Saleh wrote:
> >   * Figure out if we can do a hitless update of entry to become target. Returns a
> >   * bit mask where 1 indicates that qword needs to be set disruptively.
> > @@ -1094,13 +1100,22 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
> >  {
> >  	__le64 target_used[NUM_ENTRY_QWORDS] = {};
> >  	__le64 cur_used[NUM_ENTRY_QWORDS] = {};
> > +	__le64 ignored[NUM_ENTRY_QWORDS] = {};
> 
> I think we can avoid extra stack allocation for another STE, if we make
> the function update cur_used directly, but no strong opinion.

It does more than just mask cur_used, it also adjusts ignored:

> > +		/*
> > +		 * Ignored is only used for bits that are used by both entries,
> > +		 * otherwise it is sequenced according to the unused entry.
> > +		 */
> > +		ignored[i] &= target_used[i] & cur_used[i];

Which also explains this:

> I have some mixed feelings about this, having get_used(), then get_ignored()
> with the same bits set seems confusing to me, specially the get_ignored()
> loops back to update cur_used, which is set from get_used()

The same bits are set because of the above - we need to know what the
actual used bits are to decide if we need to rely on the ignored rule
to do the update.
 
> My initial though was just to remove this bit from get_used() + some changes
> to checks setting bits that are not used would be enough, and the semantics
> of get_used() can be something as:
> “Return bits used by the updated translation regime that MUST be observed
> atomically” and in that case we can ignore things as MEV as it doesn’t
> impact the translation.

Aside from the above this would cause problems with the validation
assertions, so it is not a great idea.

> However, this approach makes it a bit explicit which bits are ignored, if we
> keep this logic, I think changing the name of get_ignored() might help, to
> something as "get_allowed_break()" or "get_update_safe()"?

update_safe sounds good to me

Jason


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence
  2025-12-16  0:09     ` Jason Gunthorpe
@ 2025-12-16 20:46       ` Nicolin Chen
  2025-12-16 22:58       ` Mostafa Saleh
  1 sibling, 0 replies; 12+ messages in thread
From: Nicolin Chen @ 2025-12-16 20:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mostafa Saleh, will, robin.murphy, joro, linux-arm-kernel, iommu,
	linux-kernel, skolothumtho, praan, xueshuai

On Mon, Dec 15, 2025 at 08:09:52PM -0400, Jason Gunthorpe wrote:
> On Sun, Dec 14, 2025 at 10:32:35PM +0000, Mostafa Saleh wrote:
> > However, this approach makes it a bit explicit which bits are ignored, if we
> > keep this logic, I think changing the name of get_ignored() might help, to
> > something as "get_allowed_break()" or "get_update_safe()"?
> 
> update_safe sounds good to me

I have renamed the op and changed entry_set to void. I'll send v4
later today.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence
  2025-12-15 20:51     ` Nicolin Chen
@ 2025-12-16 22:49       ` Mostafa Saleh
  0 siblings, 0 replies; 12+ messages in thread
From: Mostafa Saleh @ 2025-12-16 22:49 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, will, robin.murphy, joro, linux-arm-kernel, iommu,
	linux-kernel, skolothumtho, praan, xueshuai

On Mon, Dec 15, 2025 at 8:51 PM Nicolin Chen <nicolinc@nvidia.com> wrote:
>
> Hi Mostafa,
>
> On Sun, Dec 14, 2025 at 10:32:35PM +0000, Mostafa Saleh wrote:
> > On Tue, Dec 09, 2025 at 06:45:16PM -0800, Nicolin Chen wrote:
> > > @@ -1207,12 +1223,9 @@ void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *entry,
> > >             entry_set(writer, entry, target, 0, 1);
> > >     } else {
> > >             /*
> > > -            * No inuse bit changed. Sanity check that all unused bits are 0
> > > -            * in the entry. The target was already sanity checked by
> > > -            * compute_qword_diff().
> > > +            * No inuse bit changed, though ignored bits may have changed.
> > >              */
> > > -           WARN_ON_ONCE(
> > > -                   entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS));
> > > +           entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS);
> >
> > After this change, no other caller uses the entry_set() return value, so it
> > can be changed to return void.
>
> OK.
>
> > >     }
> > >  }
> > >  EXPORT_SYMBOL_IF_KUNIT(arm_smmu_write_entry);
> > > @@ -1543,6 +1556,7 @@ static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer)
> > >  static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = {
> > >     .sync = arm_smmu_ste_writer_sync_entry,
> > >     .get_used = arm_smmu_get_ste_used,
> > > +   .get_ignored = arm_smmu_get_ste_ignored,
> > >  };
> > >
> >
> > I have some mixed feelings about this, having get_used(), then get_ignored()
> > with the same bits set seems confusing to me, specially the get_ignored()
> > loops back to update cur_used, which is set from get_used()
> >
> > My initial though was just to remove this bit from get_used() + some changes
> > to checks setting bits that are not used would be enough, and the semantics
> > of get_used() can be something as:
> > “Return bits used by the updated translation regime that MUST be observed
> > atomically” and in that case we can ignore things as MEV as it doesn’t
> > impact the translation.
> >
> > However, this approach makes it a bit explicit which bits are ignored, if we
> > keep this logic, I think changing the name of get_ignored() might help, to
> > something as "get_allowed_break()" or "get_update_safe()"?
>
> I think "ignored" itself is brief and understandable.. Instead,
> perhaps we can add a kdocs to make it clearer:
>
> /**
>  * struct arm_smmu_entry_writer_ops - STE/CD entry writer operations
>  * @get_used: Output to @used the bits used by the hardware corresponding to the
>  *            configurations bits set in a given @entry
>  * @get_ignored: Output to @ignored the bits that are listed in the "used" list
>  *               but allowed to be ignored by arm_smmu_entry_qword_diff(). Each
>  *               field (bits) must provide a reason to justify that the entries
>  *               can be updated safely without breaking STE/CD configurations.
>  * @sync: Operation to synchronize the updated STE/CD entries in the memory
>  */
> struct arm_smmu_entry_writer_ops {
>         void (*get_used)(const __le64 *entry, __le64 *used);
>         void (*get_ignored)(__le64 *ignored);
>         void (*sync)(struct arm_smmu_entry_writer *writer);
> };
>
> ?
>

A comment is indeed helpful, but my point was that "used" and
"ignored" make it seem that they are mutually exclusive.

Thanks,
Mostafa

> Thanks
> Nicolin


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence
  2025-12-16  0:09     ` Jason Gunthorpe
  2025-12-16 20:46       ` Nicolin Chen
@ 2025-12-16 22:58       ` Mostafa Saleh
  2025-12-17  0:23         ` Jason Gunthorpe
  1 sibling, 1 reply; 12+ messages in thread
From: Mostafa Saleh @ 2025-12-16 22:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, will, robin.murphy, joro, linux-arm-kernel, iommu,
	linux-kernel, skolothumtho, praan, xueshuai

On Mon, Dec 15, 2025 at 08:09:52PM -0400, Jason Gunthorpe wrote:
> On Sun, Dec 14, 2025 at 10:32:35PM +0000, Mostafa Saleh wrote:
> > >   * Figure out if we can do a hitless update of entry to become target. Returns a
> > >   * bit mask where 1 indicates that qword needs to be set disruptively.
> > > @@ -1094,13 +1100,22 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
> > >  {
> > >  	__le64 target_used[NUM_ENTRY_QWORDS] = {};
> > >  	__le64 cur_used[NUM_ENTRY_QWORDS] = {};
> > > +	__le64 ignored[NUM_ENTRY_QWORDS] = {};
> > 
> > I think we can avoid extra stack allocation for another STE, if we make
> > the function update cur_used directly, but no strong opinion.
> 
> It does more than just mask cur_used, it also adjusts ignored:
> 
> > > +		/*
> > > +		 * Ignored is only used for bits that are used by both entries,
> > > +		 * otherwise it is sequenced according to the unused entry.
> > > +		 */
> > > +		ignored[i] &= target_used[i] & cur_used[i];
> 
> Which also explains this:

I haven't tested that, but I was thinking about (applies on top of the patches)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 72ba41591fdb..9981eefcf0da 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1083,7 +1083,7 @@ void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
 EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_used);

 VISIBLE_IF_KUNIT
-void arm_smmu_get_ste_ignored(__le64 *ignored_bits)
+void arm_smmu_get_ste_ignored(__le64 *used)
 {
 	/*
 	 * MEV does not meaningfully impact the operation of the HW, it only
@@ -1093,17 +1093,14 @@ void arm_smmu_get_ste_ignored(__le64 *ignored_bits)
 	 *
 	 *  Note: Software must expect, and be able to deal with, coalesced
 	 *  fault records even when MEV == 0.
-	 */
-	ignored_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV);
-
-	/*
+	 *
 	 * EATS is used to reject and control the ATS behavior of the device. If
 	 * we are changing it away from 0 then we already trust the device to
 	 * use ATS properly and we have sequenced the device's ATS enable in PCI
 	 * config space to prevent it from issuing ATS while we are changing
 	 * EATS.
 	 */
-	ignored_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
+	used[1] &= ~cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_MEV);
 }
 EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_ignored);

@@ -1119,22 +1116,15 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
 {
 	__le64 target_used[NUM_ENTRY_QWORDS] = {};
 	__le64 cur_used[NUM_ENTRY_QWORDS] = {};
-	__le64 ignored[NUM_ENTRY_QWORDS] = {};
 	u8 used_qword_diff = 0;
 	unsigned int i;

 	writer->ops->get_used(entry, cur_used);
 	writer->ops->get_used(target, target_used);
-	if (writer->ops->get_ignored)
-		writer->ops->get_ignored(ignored);
+	if (writer->ops->filter_ignored)
+		writer->ops->filter_ignored(cur_used);

 	for (i = 0; i != NUM_ENTRY_QWORDS; i++) {
-		/*
-		 * Ignored is only used for bits that are used by both entries,
-		 * otherwise it is sequenced according to the unused entry.
-		 */
-		ignored[i] &= target_used[i] & cur_used[i];
-
 		/*
 		 * Check that masks are up to date, the make functions are not
 		 * allowed to set a bit to 1 if the used function doesn't say it
@@ -1142,8 +1132,6 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
 		 */
 		WARN_ON_ONCE(target[i] & ~target_used[i]);

-		/* Bits can change because they are not currently being used */
-		cur_used[i] &= ~ignored[i];
 		unused_update[i] = (entry[i] & cur_used[i]) |
 				   (target[i] & ~cur_used[i]);
 		/*
@@ -1575,7 +1563,7 @@ static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer)
 static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = {
 	.sync = arm_smmu_ste_writer_sync_entry,
 	.get_used = arm_smmu_get_ste_used,
-	.get_ignored = arm_smmu_get_ste_ignored,
+	.filter_ignored = arm_smmu_get_ste_ignored,
 };

 static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index d5f0e5407b9f..97b995974049 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -900,7 +900,7 @@ struct arm_smmu_entry_writer {

 struct arm_smmu_entry_writer_ops {
 	void (*get_used)(const __le64 *entry, __le64 *used);
-	void (*get_ignored)(__le64 *ignored_bits);
+	void (*filter_ignored)(__le64 *used);
 	void (*sync)(struct arm_smmu_entry_writer *writer);
 };


And we only clear the bits from cur_used, there is no need to for the
other mask ignored (ignored[i] &= target_used[i] & cur_used[i])
- If an ignored bit is not in cur_used it will not impact
  "cur_used[i] &= ~ignored[i]" as it must be already zero
- If an ignored bit is not in target_used, it doesn't really matter,
  we can ignore it anyway, as it is safe to do so.

Anyway, I am not fixated on that though, extra 64B on the stack is not that bad.

> 
> > I have some mixed feelings about this, having get_used(), then get_ignored()
> > with the same bits set seems confusing to me, specially the get_ignored()
> > loops back to update cur_used, which is set from get_used()
> 
> The same bits are set because of the above - we need to know what the
> actual used bits are to decide if we need to rely on the ignored rule
> to do the update.
>  
> > My initial though was just to remove this bit from get_used() + some changes
> > to checks setting bits that are not used would be enough, and the semantics
> > of get_used() can be something as:
> > “Return bits used by the updated translation regime that MUST be observed
> > atomically” and in that case we can ignore things as MEV as it doesn’t
> > impact the translation.
> 
> Aside from the above this would cause problems with the validation
> assertions, so it is not a great idea.

Yes, that's why I didn't like this, it had to hack the validation logic.

Thanks,
Mostafa

> 
> > However, this approach makes it a bit explicit which bits are ignored, if we
> > keep this logic, I think changing the name of get_ignored() might help, to
> > something as "get_allowed_break()" or "get_update_safe()"?
> 
> update_safe sounds good to me
> 
> Jason


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence
  2025-12-16 22:58       ` Mostafa Saleh
@ 2025-12-17  0:23         ` Jason Gunthorpe
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2025-12-17  0:23 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: Nicolin Chen, will, robin.murphy, joro, linux-arm-kernel, iommu,
	linux-kernel, skolothumtho, praan, xueshuai

On Tue, Dec 16, 2025 at 10:58:33PM +0000, Mostafa Saleh wrote:
>  	for (i = 0; i != NUM_ENTRY_QWORDS; i++) {
> -		/*
> -		 * Ignored is only used for bits that are used by both entries,
> -		 * otherwise it is sequenced according to the unused entry.
> -		 */
> -		ignored[i] &= target_used[i] & cur_used[i];
> -

It is not functionally the same thing without this..

> And we only clear the bits from cur_used, there is no need to for the
> other mask ignored (ignored[i] &= target_used[i] & cur_used[i])
> - If an ignored bit is not in cur_used it will not impact
>   "cur_used[i] &= ~ignored[i]" as it must be already zero
> - If an ignored bit is not in target_used, it doesn't really matter,
>   we can ignore it anyway, as it is safe to do so.

That was an earlier version, it was switched away to this so as to be
less of a change though the reasoning is sound.

Jason


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-12-17  0:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-10  2:45 [PATCH rc v3 0/4] : iommu/arm-smmu-v3: Fix hitless STE update in nesting cases Nicolin Chen
2025-12-10  2:45 ` [PATCH rc v3 1/4] iommu/arm-smmu-v3: Add ignored bits to fix STE update sequence Nicolin Chen
2025-12-14 22:32   ` Mostafa Saleh
2025-12-15 20:51     ` Nicolin Chen
2025-12-16 22:49       ` Mostafa Saleh
2025-12-16  0:09     ` Jason Gunthorpe
2025-12-16 20:46       ` Nicolin Chen
2025-12-16 22:58       ` Mostafa Saleh
2025-12-17  0:23         ` Jason Gunthorpe
2025-12-10  2:45 ` [PATCH rc v3 2/4] iommu/arm-smmu-v3: Ignore STE MEV when computing the " Nicolin Chen
2025-12-10  2:45 ` [PATCH rc v3 3/4] iommu/arm-smmu-v3: Ignore STE EATS " Nicolin Chen
2025-12-10  2:45 ` [PATCH rc v3 4/4] iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage Nicolin Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).