linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH v6 0/3] Initial BBML2 support for contpte_convert()
@ 2025-04-28 15:35 Mikołaj Lenczewski
  2025-04-28 15:35 ` [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Mikołaj Lenczewski @ 2025-04-28 15:35 UTC (permalink / raw)
  To: ryan.roberts, suzuki.poulose, yang, corbet, catalin.marinas, will,
	jean-philippe, robin.murphy, joro, akpm, paulmck, mark.rutland,
	joey.gouly, maz, james.morse, broonie, oliver.upton, baohua,
	david, ioworker0, jgg, nicolinc, mshavit, jsnitsel, smostafa,
	kevin.tian, linux-doc, linux-kernel, linux-arm-kernel, iommu
  Cc: Mikołaj Lenczewski

Hi All,

This patch series adds initial support for eliding Break-Before-Make
requirements on systems that support BBML2 and additionally guarantee
to never raise a conflict abort.

This support reorders and elides both a TLB invalidation and a DSB in
contpte_convert(), when BBML2 is supported. This leads to a 12%
improvement when executing a microbenchmark designed to force the
pathological path where contpte_convert() gets called. This represents
an 80% reduction in the cost of calling contpte_convert().

We clarify both the correctness and performance benefits of this elision
with respect to the relevant Arm ARM passages, via substantial comments
in the contpte_convert() source.

This series is based on v6.15-rc3 (9c32cda43eb7).

Notes
======

Patch 1 implements an allow-list of cpus that support BBML2, but with
the additional constraint of never causing TLB conflict aborts. We
settled on this constraint because we will use the feature for kernel
mappings in the future, for which we cannot handle conflict aborts
safely.

Yang Shi has a series at [1] that aims to use BBML2 to enable splitting
the linear map at runtime. This series partially overlaps with it to add
the cpu feature. We believe this series is fully compatible with Yang's
requirements and could go first.

Due to constraints with the current design of the cpufeature framework
and the fact that our has_bbml2_noabort() check relies on both a MIDR
allowlist and the exposed MMFR2 register value, if an implementation
supports our desired BBML2+NOABORT semantics but fails to declare
support for BBML2 via the id_aa64mmfr2.bbm field, the check will fail.

Not declaring base support for BBML2 when supporting BBML2+NOABORT
should be considered an erratum [2], and a workaround can be applied in
__cpuinfo_store_cpu() to patch in support for BBML2 for the sanitised
register view used by SCOPE_SYSTEM. However, SCOPE_LOCAL_CPU bypasses
this sanitised view and reads the MSRs directly by design, and so an
additional workaround can be applied in __read_sysreg_by_encoding()
for the MMFR2 case.

For situations where support for BBML2+NOABORT is claimed by an
implementor and subsequently built into the kernel, but problems later
arise that require user damage control [3], we introduce a kernel
commandline parameter override for disabling all BBML2 support.

[1]:
  https://lore.kernel.org/linux-arm-kernel/20250304222018.615808-1-yang@os.amperecomputing.com/

[2]:
  https://lore.kernel.org/linux-arm-kernel/3bba7adb-392b-4024-984f-b6f0f0f88629@arm.com/

[3]:
  https://lore.kernel.org/all/0ac0f1f5-e4a0-46ae-8ea0-2eba7e21a7e1@arm.com/

Changelog
=========

v6:
  - clarify correctness and performance of elision of __tlb_flush_range()
  - rebase onto v6.15-rc3

v5:
  - https://lore.kernel.org/all/20250325093625.55184-1-miko.lenczewski@arm.com/
  - fixup coding style nits
  - document motivation for kernel commandline parameter

v4:
  - https://lore.kernel.org/all/20250319150533.37440-2-miko.lenczewski@arm.com/
  - rebase onto v6.14-rc5
  - switch from arm64 sw feature override to hw feature override
  - reintroduce has_cpuid_feature() check in addition to MIDR check

v3:
  - https://lore.kernel.org/all/20250313104111.24196-2-miko.lenczewski@arm.com/
  - rebase onto v6.14-rc4
  - add arm64.nobbml2 commandline override
  - squash "delay tlbi" and "elide tlbi" patches

v2:
  - https://lore.kernel.org/all/20250228182403.6269-2-miko.lenczewski@arm.com/
  - fix buggy MIDR check to properly account for all boot+late cpus
  - add smmu bbml2 feature check

v1:
  - https://lore.kernel.org/all/20250219143837.44277-3-miko.lenczewski@arm.com/
  - rebase onto v6.14-rc3
  - remove kvm bugfix patches from series
  - strip out conflict abort handler code
  - switch from blocklist to allowlist of bmml2+noabort implementations
  - remove has_cpuid_feature() in favour of MIDR check

rfc-v1:
  - https://lore.kernel.org/all/20241211154611.40395-1-miko.lenczewski@arm.com/
  - https://lore.kernel.org/all/20241211160218.41404-1-miko.lenczewski@arm.com/

Mikołaj Lenczewski (3):
  arm64: Add BBM Level 2 cpu feature
  iommu/arm: Add BBM Level 2 smmu feature
  arm64/mm: Elide tlbi in contpte_convert() under BBML2

 .../admin-guide/kernel-parameters.txt         |   3 +
 arch/arm64/Kconfig                            |  19 +++
 arch/arm64/include/asm/cpucaps.h              |   2 +
 arch/arm64/include/asm/cpufeature.h           |   5 +
 arch/arm64/kernel/cpufeature.c                |  71 +++++++++
 arch/arm64/kernel/pi/idreg-override.c         |   2 +
 arch/arm64/mm/contpte.c                       | 139 +++++++++++++++++-
 arch/arm64/tools/cpucaps                      |   1 +
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   3 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |   3 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   4 +
 11 files changed, 251 insertions(+), 1 deletion(-)

-- 
2.49.0



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-04-28 15:35 [RESEND PATCH v6 0/3] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
@ 2025-04-28 15:35 ` Mikołaj Lenczewski
  2025-04-28 17:55   ` ALOK TIWARI
  2025-05-06 14:25   ` Will Deacon
  2025-04-28 15:35 ` [RESEND PATCH v6 2/3] iommu/arm: Add BBM Level 2 smmu feature Mikołaj Lenczewski
  2025-04-28 15:35 ` [RESEND PATCH v6 3/3] arm64/mm: Elide tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
  2 siblings, 2 replies; 28+ messages in thread
From: Mikołaj Lenczewski @ 2025-04-28 15:35 UTC (permalink / raw)
  To: ryan.roberts, suzuki.poulose, yang, corbet, catalin.marinas, will,
	jean-philippe, robin.murphy, joro, akpm, paulmck, mark.rutland,
	joey.gouly, maz, james.morse, broonie, oliver.upton, baohua,
	david, ioworker0, jgg, nicolinc, mshavit, jsnitsel, smostafa,
	kevin.tian, linux-doc, linux-kernel, linux-arm-kernel, iommu
  Cc: Mikołaj Lenczewski

The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
and this commit adds a dedicated BBML2 cpufeature to test against
support for, as well as a kernel commandline parameter to optionally
disable BBML2 altogether.

This is a system feature as we might have a big.LITTLE architecture
where some cores support BBML2 and some don't, but we want all cores to
be available and BBM to default to level 0 (as opposed to having cores
without BBML2 not coming online).

To support BBML2 in as wide a range of contexts as we can, we want not
only the architectural guarantees that BBML2 makes, but additionally
want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
us having to prove that no recursive faults can be induced in any path
that uses BBML2, allowing its use for arbitrary kernel mappings.
Support detection of such CPUs.

Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
---
 .../admin-guide/kernel-parameters.txt         |  3 +
 arch/arm64/Kconfig                            | 19 +++++
 arch/arm64/include/asm/cpucaps.h              |  2 +
 arch/arm64/include/asm/cpufeature.h           |  5 ++
 arch/arm64/kernel/cpufeature.c                | 71 +++++++++++++++++++
 arch/arm64/kernel/pi/idreg-override.c         |  2 +
 arch/arm64/tools/cpucaps                      |  1 +
 7 files changed, 103 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d9fd26b95b34..2749c67a4f07 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -449,6 +449,9 @@
 	arm64.no32bit_el0 [ARM64] Unconditionally disable the execution of
 			32 bit applications.
 
+	arm64.nobbml2	[ARM64] Unconditionally disable Break-Before-Make Level
+			2 support
+
 	arm64.nobti	[ARM64] Unconditionally disable Branch Target
 			Identification support
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a182295e6f08..613b4925ca06 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2070,6 +2070,25 @@ config ARM64_TLB_RANGE
 	  The feature introduces new assembly instructions, and they were
 	  support when binutils >= 2.30.
 
+config ARM64_BBML2_NOABORT
+	bool "Enable support for Break-Before-Make Level 2 detection and usage"
+	default y
+	help
+	  FEAT_BBM provides detection of support levels for break-before-make
+	  sequences. If BBM level 2 is supported, some TLB maintenance requirements
+	  can be relaxed to improve performance. We additonally require the
+	  property that the implementation cannot ever raise TLB Conflict Aborts.
+	  Selecting N causes the kernel to fallback to BBM level 0 behaviour
+	  even if the system supports BBM level 2.
+
+	  To enable detection of BBML2 support, and to make use of it, say Y.
+
+	  Detection of and support for BBM level 2 can optionally be overridden
+	  at runtime via the use of the arm64.nobbml2 kernel commandline
+	  parameter. If your system claims support for BBML2, but is unstable
+	  with this option enabled, either say N or make use of the commandline
+	  parameter override to force BBML0.
+
 endmenu # "ARMv8.4 architectural features"
 
 menu "ARMv8.5 architectural features"
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 9d769291a306..413eec6e2438 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -23,6 +23,8 @@ cpucap_is_possible(const unsigned int cap)
 		return IS_ENABLED(CONFIG_ARM64_PAN);
 	case ARM64_HAS_EPAN:
 		return IS_ENABLED(CONFIG_ARM64_EPAN);
+	case ARM64_HAS_BBML2_NOABORT:
+		return IS_ENABLED(CONFIG_ARM64_BBML2_NOABORT);
 	case ARM64_SVE:
 		return IS_ENABLED(CONFIG_ARM64_SVE);
 	case ARM64_SME:
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index c4326f1cb917..8f36ffa16b73 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -848,6 +848,11 @@ static inline bool system_supports_pmuv3(void)
 	return cpus_have_final_cap(ARM64_HAS_PMUV3);
 }
 
+static inline bool system_supports_bbml2_noabort(void)
+{
+	return alternative_has_cap_unlikely(ARM64_HAS_BBML2_NOABORT);
+}
+
 int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
 bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9c4d6d552b25..7a85a1bdc6e9 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2200,6 +2200,70 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
 	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
 }
 
+static bool cpu_has_bbml2_noabort(unsigned int cpu_midr)
+{
+	/*
+	 * We want to allow usage of bbml2 in as wide a range of kernel contexts
+	 * as possible. This list is therefore an allow-list of known-good
+	 * implementations that both support bbml2 and additionally, fulfill the
+	 * extra constraint of never generating TLB conflict aborts when using
+	 * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
+	 * kernel contexts difficult to prove safe against recursive aborts).
+	 *
+	 * Note that implementations can only be considered "known-good" if their
+	 * implementors attest to the fact that the implementation never raises
+	 * TLBI conflict aborts for bbml2 mapping granularity changes.
+	 */
+	static const struct midr_range supports_bbml2_noabort_list[] = {
+		MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
+		MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
+		{}
+	};
+
+	return is_midr_in_range_list(cpu_midr, supports_bbml2_noabort_list);
+}
+
+static inline unsigned int cpu_read_midr(int cpu)
+{
+	WARN_ON_ONCE(!cpu_online(cpu));
+
+	return per_cpu(cpu_data, cpu).reg_midr;
+}
+
+static bool has_bbml2_noabort(const struct arm64_cpu_capabilities *caps, int scope)
+{
+	if (!IS_ENABLED(CONFIG_ARM64_BBML2_NOABORT))
+		return false;
+
+	if (scope & SCOPE_SYSTEM) {
+		int cpu;
+
+		/*
+		 * We are a boot CPU, and must verify that all enumerated boot
+		 * CPUs have MIDR values within our allowlist. Otherwise, we do
+		 * not allow the BBML2 feature to avoid potential faults when
+		 * the insufficient CPUs access memory regions using BBML2
+		 * semantics.
+		 */
+		for_each_online_cpu(cpu) {
+			if (!cpu_has_bbml2_noabort(cpu_read_midr(cpu)))
+				return false;
+		}
+	} else if (scope & SCOPE_LOCAL_CPU) {
+		/*
+		 * We are a hot-plugged CPU, so must only check our MIDR.
+		 * If we have the correct MIDR, but the kernel booted on an
+		 * insufficient CPU, we will not use BBML2 (this is safe). If
+		 * we have an incorrect MIDR, but the kernel booted on a
+		 * sufficient CPU, we will not bring up this CPU.
+		 */
+		if (!cpu_has_bbml2_noabort(read_cpuid_id()))
+			return false;
+	}
+
+	return has_cpuid_feature(caps, scope);
+}
+
 #ifdef CONFIG_ARM64_PAN
 static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
 {
@@ -2960,6 +3024,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_cpuid_feature,
 		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
 	},
+	{
+		.desc = "BBM Level 2 without conflict abort",
+		.capability = ARM64_HAS_BBML2_NOABORT,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_bbml2_noabort,
+		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, BBM, 2)
+	},
 	{
 		.desc = "52-bit Virtual Addressing for KVM (LPA2)",
 		.capability = ARM64_HAS_LPA2,
diff --git a/arch/arm64/kernel/pi/idreg-override.c b/arch/arm64/kernel/pi/idreg-override.c
index c6b185b885f7..803a0c99f7b4 100644
--- a/arch/arm64/kernel/pi/idreg-override.c
+++ b/arch/arm64/kernel/pi/idreg-override.c
@@ -102,6 +102,7 @@ static const struct ftr_set_desc mmfr2 __prel64_initconst = {
 	.override	= &id_aa64mmfr2_override,
 	.fields		= {
 		FIELD("varange", ID_AA64MMFR2_EL1_VARange_SHIFT, mmfr2_varange_filter),
+		FIELD("bbm", ID_AA64MMFR2_EL1_BBM_SHIFT, NULL),
 		{}
 	},
 };
@@ -246,6 +247,7 @@ static const struct {
 	{ "rodata=off",			"arm64_sw.rodataoff=1" },
 	{ "arm64.nolva",		"id_aa64mmfr2.varange=0" },
 	{ "arm64.no32bit_el0",		"id_aa64pfr0.el0=1" },
+	{ "arm64.nobbml2",		"id_aa64mmfr2.bbm=0" },
 };
 
 static int __init parse_hexdigit(const char *p, u64 *v)
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 772c1b008e43..80ec2d9859a2 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -14,6 +14,7 @@ HAS_ADDRESS_AUTH_ARCH_QARMA5
 HAS_ADDRESS_AUTH_IMP_DEF
 HAS_AMU_EXTN
 HAS_ARMv8_4_TTL
+HAS_BBML2_NOABORT
 HAS_CACHE_DIC
 HAS_CACHE_IDC
 HAS_CNP
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RESEND PATCH v6 2/3] iommu/arm: Add BBM Level 2 smmu feature
  2025-04-28 15:35 [RESEND PATCH v6 0/3] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
  2025-04-28 15:35 ` [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
@ 2025-04-28 15:35 ` Mikołaj Lenczewski
  2025-05-06 14:19   ` Will Deacon
  2025-04-28 15:35 ` [RESEND PATCH v6 3/3] arm64/mm: Elide tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
  2 siblings, 1 reply; 28+ messages in thread
From: Mikołaj Lenczewski @ 2025-04-28 15:35 UTC (permalink / raw)
  To: ryan.roberts, suzuki.poulose, yang, corbet, catalin.marinas, will,
	jean-philippe, robin.murphy, joro, akpm, paulmck, mark.rutland,
	joey.gouly, maz, james.morse, broonie, oliver.upton, baohua,
	david, ioworker0, jgg, nicolinc, mshavit, jsnitsel, smostafa,
	kevin.tian, linux-doc, linux-kernel, linux-arm-kernel, iommu
  Cc: Mikołaj Lenczewski

For supporting BBM Level 2 for userspace mappings, we want to ensure
that the smmu also supports its own version of BBM Level 2. Luckily, the
smmu spec (IHI 0070G 3.21.1.3) is stricter than the aarch64 spec (DDI
0487K.a D8.16.2), so already guarantees that no aborts are raised when
BBM level 2 is claimed.

Add the feature and testing for it under arm_smmu_sva_supported().

Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 3 +++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c     | 3 +++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h     | 4 ++++
 3 files changed, 10 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 9ba596430e7c..6ba182572788 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -222,6 +222,9 @@ bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
 		feat_mask |= ARM_SMMU_FEAT_VAX;
 	}
 
+	if (system_supports_bbml2_noabort())
+		feat_mask |= ARM_SMMU_FEAT_BBML2;
+
 	if ((smmu->features & feat_mask) != feat_mask)
 		return false;
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b4c21aaed126..a8b171372f17 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4430,6 +4430,9 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (FIELD_GET(IDR3_RIL, reg))
 		smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
 
+	if (FIELD_GET(IDR3_BBML, reg) == IDR3_BBML2)
+		smmu->features |= ARM_SMMU_FEAT_BBML2;
+
 	/* IDR5 */
 	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5);
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index dd1ad56ce863..e67653d342e1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -60,6 +60,9 @@ struct arm_smmu_device;
 #define ARM_SMMU_IDR3			0xc
 #define IDR3_FWB			(1 << 8)
 #define IDR3_RIL			(1 << 10)
+#define IDR3_BBML			GENMASK(12, 11)
+#define IDR3_BBML1			(1 << 11)
+#define IDR3_BBML2			(2 << 11)
 
 #define ARM_SMMU_IDR5			0x14
 #define IDR5_STALL_MAX			GENMASK(31, 16)
@@ -755,6 +758,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_HA		(1 << 21)
 #define ARM_SMMU_FEAT_HD		(1 << 22)
 #define ARM_SMMU_FEAT_S2FWB		(1 << 23)
+#define ARM_SMMU_FEAT_BBML2		(1 << 24)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RESEND PATCH v6 3/3] arm64/mm: Elide tlbi in contpte_convert() under BBML2
  2025-04-28 15:35 [RESEND PATCH v6 0/3] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
  2025-04-28 15:35 ` [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
  2025-04-28 15:35 ` [RESEND PATCH v6 2/3] iommu/arm: Add BBM Level 2 smmu feature Mikołaj Lenczewski
@ 2025-04-28 15:35 ` Mikołaj Lenczewski
  2025-04-28 16:17   ` Ryan Roberts
  2 siblings, 1 reply; 28+ messages in thread
From: Mikołaj Lenczewski @ 2025-04-28 15:35 UTC (permalink / raw)
  To: ryan.roberts, suzuki.poulose, yang, corbet, catalin.marinas, will,
	jean-philippe, robin.murphy, joro, akpm, paulmck, mark.rutland,
	joey.gouly, maz, james.morse, broonie, oliver.upton, baohua,
	david, ioworker0, jgg, nicolinc, mshavit, jsnitsel, smostafa,
	kevin.tian, linux-doc, linux-kernel, linux-arm-kernel, iommu
  Cc: Mikołaj Lenczewski

When converting a region via contpte_convert() to use mTHP, we have two
different goals. We have to mark each entry as contiguous, and we would
like to smear the dirty and young (access) bits across all entries in
the contiguous block. Currently, we do this by first accumulating the
dirty and young bits in the block, using an atomic
__ptep_get_and_clear() and the relevant pte_{dirty,young}() calls,
performing a tlbi, and finally smearing the correct bits across the
block using __set_ptes().

This approach works fine for BBM level 0, but with support for BBM level
2 we are allowed to reorder the tlbi to after setting the pagetable
entries. We expect the time cost of a tlbi to be much greater than the
cost of clearing and resetting the PTEs. As such, this reordering of the
tlbi outside the window where our PTEs are invalid greatly reduces the
duration the PTE are visibly invalid for other threads. This reduces the
likelyhood of a concurrent page walk finding an invalid PTE, reducing
the likelyhood of a fault in other threads, and improving performance
(more so when there are more threads).

Because we support via allowlist only bbml2 implementations that never
raise conflict aborts and instead invalidate the tlb entries
automatically in hardware, we can avoid the final flush altogether.

However, avoiding the intermediate tlbi+dsb must be carefully considered
to ensure that we remain both correct and performant. We document our
reasoning and the expected interactions further in the contpte_convert()
source. To do so we rely on the aarch64 spec (DDI 0487L.a D8.7.1.1)
requirements RNGLXZ and RJQQTC to provide guarantees that the elision is
correct.

Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
---
 arch/arm64/mm/contpte.c | 139 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 138 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
index bcac4f55f9c1..203357061d0a 100644
--- a/arch/arm64/mm/contpte.c
+++ b/arch/arm64/mm/contpte.c
@@ -68,7 +68,144 @@ static void contpte_convert(struct mm_struct *mm, unsigned long addr,
 			pte = pte_mkyoung(pte);
 	}
 
-	__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
+	/*
+	 * On eliding the __tlb_flush_range() under BBML2+noabort:
+	 *
+	 * NOTE: Instead of using N=16 as the contiguous block length, we use
+	 *       N=4 for clarity.
+	 *
+	 * NOTE: 'n' and 'c' are used to denote the "contiguous bit" being
+	 *       unset and set, respectively.
+	 *
+	 * We worry about two cases where contiguous bit is used:
+	 *  - When folding N smaller non-contiguous ptes as 1 contiguous block.
+	 *  - When unfolding a contiguous block into N smaller non-contiguous ptes.
+	 *
+	 * Currently, the BBML0 folding case looks as follows:
+	 *
+	 *  0) Initial page-table layout:
+	 *
+	 *   +----+----+----+----+
+	 *   |RO,n|RO,n|RO,n|RW,n| <--- last page being set as RO
+	 *   +----+----+----+----+
+	 *
+	 *  1) Aggregate AF + dirty flags using __ptep_get_and_clear():
+	 *
+	 *   +----+----+----+----+
+	 *   |  0 |  0 |  0 |  0 |
+	 *   +----+----+----+----+
+	 *
+	 *  2) __flush_tlb_range():
+	 *
+	 *   |____ tlbi + dsb ____|
+	 *
+	 *  3) __set_ptes() to repaint contiguous block:
+	 *
+	 *   +----+----+----+----+
+	 *   |RO,c|RO,c|RO,c|RO,c|
+	 *   +----+----+----+----+
+	 *
+	 *  4) The kernel will eventually __flush_tlb() for changed page:
+	 *
+	 *                  |____| <--- tlbi + dsb
+	 *
+	 * As expected, the intermediate tlbi+dsb ensures that other PEs
+	 * only ever see an invalid (0) entry, or the new contiguous TLB entry.
+	 * The final tlbi+dsb will always throw away the newly installed
+	 * contiguous TLB entry, which is a micro-optimisation opportunity,
+	 * but does not affect correctness.
+	 *
+	 * In the BBML2 case, the change is avoiding the intermediate tlbi+dsb.
+	 * This means a few things, but notably other PEs will still "see" any
+	 * stale cached TLB entries. This could lead to a "contiguous bit
+	 * misprogramming" issue until the final tlbi+dsb of the changed page,
+	 * which would clear out both the stale (RW,n) entry and the new (RO,c)
+	 * contiguous entry installed in its place.
+	 *
+	 * What this is saying, is the following:
+	 *
+	 *  +----+----+----+----+
+	 *  |RO,n|RO,n|RO,n|RW,n| <--- old page tables, all non-contiguous
+	 *  +----+----+----+----+
+	 *
+	 *  +----+----+----+----+
+	 *  |RO,c|RO,c|RO,c|RO,c| <--- new page tables, all contiguous
+	 *  +----+----+----+----+
+	 *   /\
+	 *   ||
+	 *
+	 *  If both the old single (RW,n) and new contiguous (RO,c) TLB entries
+	 *  are present, and a write is made to this address, do we fault or
+	 *  is the write permitted (via amalgamation)?
+	 *
+	 * The relevant Arm ARM DDI 0487L.a requirements are RNGLXZ and RJQQTC,
+	 * and together state that when BBML1 or BBML2 are implemented, either
+	 * a TLB conflict abort is raised (which we expressly forbid), or will
+	 * "produce an OA, access permissions, and memory attributes that are
+	 * consistent with any of the programmed translation table values".
+	 *
+	 * That is to say, will either raise a TLB conflict, or produce one of
+	 * the cached TLB entries, but never amalgamate.
+	 *
+	 * Thus, as the page tables are only considered "consistent" after
+	 * the final tlbi+dsb (which evicts both the single stale (RW,n) TLB
+	 * entry as well as the new contiguous (RO,c) TLB entry), omitting the
+	 * initial tlbi+dsb is correct.
+	 *
+	 * It is also important to note that at the end of the BBML2 folding
+	 * case, we are still left with potentially all N TLB entries still
+	 * cached (the N-1 non-contiguous ptes, and the single contiguous
+	 * block). However, over time, natural TLB pressure will cause the
+	 * non-contiguous pte TLB entries to be flushed, leaving only the
+	 * contiguous block TLB entry. This means that omitting the tlbi+dsb is
+	 * not only correct, but also keeps our eventual performance benefits.
+	 *
+	 * For the unfolding case, BBML0 looks as follows:
+	 *
+	 *  0) Initial page-table layout:
+	 *
+	 *   +----+----+----+----+
+	 *   |RW,c|RW,c|RW,c|RW,c| <--- last page being set as RO
+	 *   +----+----+----+----+
+	 *
+	 *  1) Aggregate AF + dirty flags using __ptep_get_and_clear():
+	 *
+	 *   +----+----+----+----+
+	 *   |  0 |  0 |  0 |  0 |
+	 *   +----+----+----+----+
+	 *
+	 *  2) __flush_tlb_range():
+	 *
+	 *   |____ tlbi + dsb ____|
+	 *
+	 *  3) __set_ptes() to repaint as non-contiguous:
+	 *
+	 *   +----+----+----+----+
+	 *   |RW,n|RW,n|RW,n|RW,n|
+	 *   +----+----+----+----+
+	 *
+	 *  4) Update changed page permissions:
+	 *
+	 *   +----+----+----+----+
+	 *   |RW,n|RW,n|RW,n|RO,n| <--- last page permissions set
+	 *   +----+----+----+----+
+	 *
+	 *  5) The kernel will eventually __flush_tlb() for changed page:
+	 *
+	 *                  |____| <--- tlbi + dsb
+	 *
+	 * For BBML2, we again remove the intermediate tlbi+dsb. Here, there
+	 * are no issues, as the final tlbi+dsb covering the changed page is
+	 * guaranteed to remove the original large contiguous (RW,c) TLB entry,
+	 * as well as the intermediate (RW,n) TLB entry; the next access will
+	 * install the new (RO,n) TLB entry and the page tables are only
+	 * considered "consistent" after the final tlbi+dsb, so software must
+	 * be prepared for this inconsistency prior to finishing the mm dance
+	 * regardless.
+	 */
+
+	if (!system_supports_bbml2_noabort())
+		__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
 
 	__set_ptes(mm, start_addr, start_ptep, pte, CONT_PTES);
 }
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 3/3] arm64/mm: Elide tlbi in contpte_convert() under BBML2
  2025-04-28 15:35 ` [RESEND PATCH v6 3/3] arm64/mm: Elide tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
@ 2025-04-28 16:17   ` Ryan Roberts
  0 siblings, 0 replies; 28+ messages in thread
From: Ryan Roberts @ 2025-04-28 16:17 UTC (permalink / raw)
  To: Mikołaj Lenczewski, suzuki.poulose, yang, corbet,
	catalin.marinas, will, jean-philippe, robin.murphy, joro, akpm,
	paulmck, mark.rutland, joey.gouly, maz, james.morse, broonie,
	oliver.upton, baohua, david, ioworker0, jgg, nicolinc, mshavit,
	jsnitsel, smostafa, kevin.tian, linux-doc, linux-kernel,
	linux-arm-kernel, iommu

On 28/04/2025 16:35, Mikołaj Lenczewski wrote:
> When converting a region via contpte_convert() to use mTHP, we have two
> different goals. We have to mark each entry as contiguous, and we would
> like to smear the dirty and young (access) bits across all entries in
> the contiguous block. Currently, we do this by first accumulating the
> dirty and young bits in the block, using an atomic
> __ptep_get_and_clear() and the relevant pte_{dirty,young}() calls,
> performing a tlbi, and finally smearing the correct bits across the
> block using __set_ptes().
> 
> This approach works fine for BBM level 0, but with support for BBM level
> 2 we are allowed to reorder the tlbi to after setting the pagetable
> entries. We expect the time cost of a tlbi to be much greater than the
> cost of clearing and resetting the PTEs. As such, this reordering of the
> tlbi outside the window where our PTEs are invalid greatly reduces the
> duration the PTE are visibly invalid for other threads. This reduces the
> likelyhood of a concurrent page walk finding an invalid PTE, reducing
> the likelyhood of a fault in other threads, and improving performance
> (more so when there are more threads).
> 
> Because we support via allowlist only bbml2 implementations that never
> raise conflict aborts and instead invalidate the tlb entries
> automatically in hardware, we can avoid the final flush altogether.
> 
> However, avoiding the intermediate tlbi+dsb must be carefully considered
> to ensure that we remain both correct and performant. We document our
> reasoning and the expected interactions further in the contpte_convert()
> source. To do so we rely on the aarch64 spec (DDI 0487L.a D8.7.1.1)
> requirements RNGLXZ and RJQQTC to provide guarantees that the elision is
> correct.
> 
> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

It would be great to see this series merged this cycle!

Thanks,
Ryan

> ---
>  arch/arm64/mm/contpte.c | 139 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 138 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> index bcac4f55f9c1..203357061d0a 100644
> --- a/arch/arm64/mm/contpte.c
> +++ b/arch/arm64/mm/contpte.c
> @@ -68,7 +68,144 @@ static void contpte_convert(struct mm_struct *mm, unsigned long addr,
>  			pte = pte_mkyoung(pte);
>  	}
>  
> -	__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
> +	/*
> +	 * On eliding the __tlb_flush_range() under BBML2+noabort:
> +	 *
> +	 * NOTE: Instead of using N=16 as the contiguous block length, we use
> +	 *       N=4 for clarity.
> +	 *
> +	 * NOTE: 'n' and 'c' are used to denote the "contiguous bit" being
> +	 *       unset and set, respectively.
> +	 *
> +	 * We worry about two cases where contiguous bit is used:
> +	 *  - When folding N smaller non-contiguous ptes as 1 contiguous block.
> +	 *  - When unfolding a contiguous block into N smaller non-contiguous ptes.
> +	 *
> +	 * Currently, the BBML0 folding case looks as follows:
> +	 *
> +	 *  0) Initial page-table layout:
> +	 *
> +	 *   +----+----+----+----+
> +	 *   |RO,n|RO,n|RO,n|RW,n| <--- last page being set as RO
> +	 *   +----+----+----+----+
> +	 *
> +	 *  1) Aggregate AF + dirty flags using __ptep_get_and_clear():
> +	 *
> +	 *   +----+----+----+----+
> +	 *   |  0 |  0 |  0 |  0 |
> +	 *   +----+----+----+----+
> +	 *
> +	 *  2) __flush_tlb_range():
> +	 *
> +	 *   |____ tlbi + dsb ____|
> +	 *
> +	 *  3) __set_ptes() to repaint contiguous block:
> +	 *
> +	 *   +----+----+----+----+
> +	 *   |RO,c|RO,c|RO,c|RO,c|
> +	 *   +----+----+----+----+
> +	 *
> +	 *  4) The kernel will eventually __flush_tlb() for changed page:
> +	 *
> +	 *                  |____| <--- tlbi + dsb
> +	 *
> +	 * As expected, the intermediate tlbi+dsb ensures that other PEs
> +	 * only ever see an invalid (0) entry, or the new contiguous TLB entry.
> +	 * The final tlbi+dsb will always throw away the newly installed
> +	 * contiguous TLB entry, which is a micro-optimisation opportunity,
> +	 * but does not affect correctness.
> +	 *
> +	 * In the BBML2 case, the change is avoiding the intermediate tlbi+dsb.
> +	 * This means a few things, but notably other PEs will still "see" any
> +	 * stale cached TLB entries. This could lead to a "contiguous bit
> +	 * misprogramming" issue until the final tlbi+dsb of the changed page,
> +	 * which would clear out both the stale (RW,n) entry and the new (RO,c)
> +	 * contiguous entry installed in its place.
> +	 *
> +	 * What this is saying, is the following:
> +	 *
> +	 *  +----+----+----+----+
> +	 *  |RO,n|RO,n|RO,n|RW,n| <--- old page tables, all non-contiguous
> +	 *  +----+----+----+----+
> +	 *
> +	 *  +----+----+----+----+
> +	 *  |RO,c|RO,c|RO,c|RO,c| <--- new page tables, all contiguous
> +	 *  +----+----+----+----+
> +	 *   /\
> +	 *   ||
> +	 *
> +	 *  If both the old single (RW,n) and new contiguous (RO,c) TLB entries
> +	 *  are present, and a write is made to this address, do we fault or
> +	 *  is the write permitted (via amalgamation)?
> +	 *
> +	 * The relevant Arm ARM DDI 0487L.a requirements are RNGLXZ and RJQQTC,
> +	 * and together state that when BBML1 or BBML2 are implemented, either
> +	 * a TLB conflict abort is raised (which we expressly forbid), or will
> +	 * "produce an OA, access permissions, and memory attributes that are
> +	 * consistent with any of the programmed translation table values".
> +	 *
> +	 * That is to say, will either raise a TLB conflict, or produce one of
> +	 * the cached TLB entries, but never amalgamate.
> +	 *
> +	 * Thus, as the page tables are only considered "consistent" after
> +	 * the final tlbi+dsb (which evicts both the single stale (RW,n) TLB
> +	 * entry as well as the new contiguous (RO,c) TLB entry), omitting the
> +	 * initial tlbi+dsb is correct.
> +	 *
> +	 * It is also important to note that at the end of the BBML2 folding
> +	 * case, we are still left with potentially all N TLB entries still
> +	 * cached (the N-1 non-contiguous ptes, and the single contiguous
> +	 * block). However, over time, natural TLB pressure will cause the
> +	 * non-contiguous pte TLB entries to be flushed, leaving only the
> +	 * contiguous block TLB entry. This means that omitting the tlbi+dsb is
> +	 * not only correct, but also keeps our eventual performance benefits.
> +	 *
> +	 * For the unfolding case, BBML0 looks as follows:
> +	 *
> +	 *  0) Initial page-table layout:
> +	 *
> +	 *   +----+----+----+----+
> +	 *   |RW,c|RW,c|RW,c|RW,c| <--- last page being set as RO
> +	 *   +----+----+----+----+
> +	 *
> +	 *  1) Aggregate AF + dirty flags using __ptep_get_and_clear():
> +	 *
> +	 *   +----+----+----+----+
> +	 *   |  0 |  0 |  0 |  0 |
> +	 *   +----+----+----+----+
> +	 *
> +	 *  2) __flush_tlb_range():
> +	 *
> +	 *   |____ tlbi + dsb ____|
> +	 *
> +	 *  3) __set_ptes() to repaint as non-contiguous:
> +	 *
> +	 *   +----+----+----+----+
> +	 *   |RW,n|RW,n|RW,n|RW,n|
> +	 *   +----+----+----+----+
> +	 *
> +	 *  4) Update changed page permissions:
> +	 *
> +	 *   +----+----+----+----+
> +	 *   |RW,n|RW,n|RW,n|RO,n| <--- last page permissions set
> +	 *   +----+----+----+----+
> +	 *
> +	 *  5) The kernel will eventually __flush_tlb() for changed page:
> +	 *
> +	 *                  |____| <--- tlbi + dsb
> +	 *
> +	 * For BBML2, we again remove the intermediate tlbi+dsb. Here, there
> +	 * are no issues, as the final tlbi+dsb covering the changed page is
> +	 * guaranteed to remove the original large contiguous (RW,c) TLB entry,
> +	 * as well as the intermediate (RW,n) TLB entry; the next access will
> +	 * install the new (RO,n) TLB entry and the page tables are only
> +	 * considered "consistent" after the final tlbi+dsb, so software must
> +	 * be prepared for this inconsistency prior to finishing the mm dance
> +	 * regardless.
> +	 */
> +
> +	if (!system_supports_bbml2_noabort())
> +		__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
>  
>  	__set_ptes(mm, start_addr, start_ptep, pte, CONT_PTES);
>  }



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-04-28 15:35 ` [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
@ 2025-04-28 17:55   ` ALOK TIWARI
  2025-05-06  8:36     ` Mikołaj Lenczewski
  2025-05-06 14:25   ` Will Deacon
  1 sibling, 1 reply; 28+ messages in thread
From: ALOK TIWARI @ 2025-04-28 17:55 UTC (permalink / raw)
  To: Mikołaj Lenczewski, ryan.roberts, suzuki.poulose, yang,
	corbet, catalin.marinas, will, jean-philippe, robin.murphy, joro,
	akpm, paulmck, mark.rutland, joey.gouly, maz, james.morse,
	broonie, oliver.upton, baohua, david, ioworker0, jgg, nicolinc,
	mshavit, jsnitsel, smostafa, kevin.tian, linux-doc, linux-kernel,
	linux-arm-kernel, iommu



On 28-04-2025 21:05, Mikołaj Lenczewski wrote:
> +config ARM64_BBML2_NOABORT
> +	bool "Enable support for Break-Before-Make Level 2 detection and usage"
> +	default y
> +	help
> +	  FEAT_BBM provides detection of support levels for break-before-make
> +	  sequences. If BBM level 2 is supported, some TLB maintenance requirements
> +	  can be relaxed to improve performance. We additonally require the

typo additonally  -> additionally

> +	  property that the implementation cannot ever raise TLB Conflict Aborts.
> +	  Selecting N causes the kernel to fallback to BBM level 0 behaviour
> +	  even if the system supports BBM level 2.
> +
> +	  To enable detection of BBML2 support, and to make use of it, say Y.
> +
> +	  Detection of and support for BBM level 2 can optionally be overridden
> +	  at runtime via the use of the arm64.nobbml2 kernel commandline
> +	  parameter. If your system claims support for BBML2, but is unstable
> +	  with this option enabled, either say N or make use of the commandline
> +	  parameter override to force BBML0.
> +
>   endmenu # "ARMv8.4 architectural features"
>   
[clip]
>   
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 9c4d6d552b25..7a85a1bdc6e9 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2200,6 +2200,70 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
>   	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
>   }
>   
> +static bool cpu_has_bbml2_noabort(unsigned int cpu_midr)
> +{
> +	/*
> +	 * We want to allow usage of bbml2 in as wide a range of kernel contexts
> +	 * as possible. This list is therefore an allow-list of known-good
> +	 * implementations that both support bbml2 and additionally, fulfill the
> +	 * extra constraint of never generating TLB conflict aborts when using
> +	 * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
> +	 * kernel contexts difficult to prove safe against recursive aborts).
> +	 *
> +	 * Note that implementations can only be considered "known-good" if their
> +	 * implementors attest to the fact that the implementation never raises
> +	 * TLBI conflict aborts for bbml2 mapping granularity changes.
> +	 */

use bbml2 -> BBML2 to maintain consistency

> +	static const struct midr_range supports_bbml2_noabort_list[] = {
> +		MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
> +		MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
> +		{}
> +	};
> +
> +	return is_midr_in_range_list(cpu_midr, supports_bbml2_noabort_list);
> +}
> +
> +static inline unsigned int cpu_read_midr(int cpu)
> +{
> +	WARN_ON_ONCE(!cpu_online(cpu));
> +
> +	return per_cpu(cpu_data, cpu).reg_midr;
> +}
> +




Thanks
Alok



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-04-28 17:55   ` ALOK TIWARI
@ 2025-05-06  8:36     ` Mikołaj Lenczewski
  0 siblings, 0 replies; 28+ messages in thread
From: Mikołaj Lenczewski @ 2025-05-06  8:36 UTC (permalink / raw)
  To: ALOK TIWARI
  Cc: ryan.roberts, suzuki.poulose, yang, corbet, catalin.marinas, will,
	jean-philippe, robin.murphy, joro, akpm, paulmck, mark.rutland,
	joey.gouly, maz, james.morse, broonie, oliver.upton, baohua,
	david, ioworker0, jgg, nicolinc, mshavit, jsnitsel, smostafa,
	kevin.tian, linux-doc, linux-kernel, linux-arm-kernel, iommu

Hi Alok,

Apologies for the delay.

On Mon, Apr 28, 2025 at 11:25:07PM +0530, ALOK TIWARI wrote:
> 
> 
> On 28-04-2025 21:05, Mikołaj Lenczewski wrote:
> > +config ARM64_BBML2_NOABORT
> > +	bool "Enable support for Break-Before-Make Level 2 detection and usage"
> > +	default y
> > +	help
> > +	  FEAT_BBM provides detection of support levels for break-before-make
> > +	  sequences. If BBM level 2 is supported, some TLB maintenance requirements
> > +	  can be relaxed to improve performance. We additonally require the
> 
> typo additonally  -> additionally
Yes, had missed this. Will fix.

> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index 9c4d6d552b25..7a85a1bdc6e9 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -2200,6 +2200,70 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
> >   	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
> >   }
> > +static bool cpu_has_bbml2_noabort(unsigned int cpu_midr)
> > +{
> > +	/*
> > +	 * We want to allow usage of bbml2 in as wide a range of kernel contexts
> > +	 * as possible. This list is therefore an allow-list of known-good
> > +	 * implementations that both support bbml2 and additionally, fulfill the
> > +	 * extra constraint of never generating TLB conflict aborts when using
> > +	 * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
> > +	 * kernel contexts difficult to prove safe against recursive aborts).
> > +	 *
> > +	 * Note that implementations can only be considered "known-good" if their
> > +	 * implementors attest to the fact that the implementation never raises
> > +	 * TLBI conflict aborts for bbml2 mapping granularity changes.
> > +	 */
> 
> use bbml2 -> BBML2 to maintain consistency
> 
OK, will go through and use BBML2 throughout for consistency.

-- 
Kind regards,
Mikołaj Lenczewski


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 2/3] iommu/arm: Add BBM Level 2 smmu feature
  2025-04-28 15:35 ` [RESEND PATCH v6 2/3] iommu/arm: Add BBM Level 2 smmu feature Mikołaj Lenczewski
@ 2025-05-06 14:19   ` Will Deacon
  0 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2025-05-06 14:19 UTC (permalink / raw)
  To: Mikołaj Lenczewski
  Cc: ryan.roberts, suzuki.poulose, yang, corbet, catalin.marinas,
	jean-philippe, robin.murphy, joro, akpm, paulmck, mark.rutland,
	joey.gouly, maz, james.morse, broonie, oliver.upton, baohua,
	david, ioworker0, jgg, nicolinc, mshavit, jsnitsel, smostafa,
	kevin.tian, linux-doc, linux-kernel, linux-arm-kernel, iommu

On Mon, Apr 28, 2025 at 03:35:16PM +0000, Mikołaj Lenczewski wrote:
> For supporting BBM Level 2 for userspace mappings, we want to ensure
> that the smmu also supports its own version of BBM Level 2. Luckily, the
> smmu spec (IHI 0070G 3.21.1.3) is stricter than the aarch64 spec (DDI
> 0487K.a D8.16.2), so already guarantees that no aborts are raised when
> BBM level 2 is claimed.
> 
> Add the feature and testing for it under arm_smmu_sva_supported().
> 
> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 3 +++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c     | 3 +++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h     | 4 ++++
>  3 files changed, 10 insertions(+)

This looks fine to me but please note that it doesn't apply against
mainline (v6.15-rc5).

Will


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-04-28 15:35 ` [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
  2025-04-28 17:55   ` ALOK TIWARI
@ 2025-05-06 14:25   ` Will Deacon
  2025-05-06 14:51     ` Marc Zyngier
  2025-05-06 14:52     ` Ryan Roberts
  1 sibling, 2 replies; 28+ messages in thread
From: Will Deacon @ 2025-05-06 14:25 UTC (permalink / raw)
  To: Mikołaj Lenczewski
  Cc: ryan.roberts, suzuki.poulose, yang, corbet, catalin.marinas,
	jean-philippe, robin.murphy, joro, akpm, paulmck, mark.rutland,
	joey.gouly, maz, james.morse, broonie, oliver.upton, baohua,
	david, ioworker0, jgg, nicolinc, mshavit, jsnitsel, smostafa,
	kevin.tian, linux-doc, linux-kernel, linux-arm-kernel, iommu

On Mon, Apr 28, 2025 at 03:35:14PM +0000, Mikołaj Lenczewski wrote:
> The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
> and this commit adds a dedicated BBML2 cpufeature to test against
> support for, as well as a kernel commandline parameter to optionally
> disable BBML2 altogether.
> 
> This is a system feature as we might have a big.LITTLE architecture
> where some cores support BBML2 and some don't, but we want all cores to
> be available and BBM to default to level 0 (as opposed to having cores
> without BBML2 not coming online).
> 
> To support BBML2 in as wide a range of contexts as we can, we want not
> only the architectural guarantees that BBML2 makes, but additionally
> want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
> us having to prove that no recursive faults can be induced in any path
> that uses BBML2, allowing its use for arbitrary kernel mappings.
> Support detection of such CPUs.
> 
> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  3 +
>  arch/arm64/Kconfig                            | 19 +++++
>  arch/arm64/include/asm/cpucaps.h              |  2 +
>  arch/arm64/include/asm/cpufeature.h           |  5 ++
>  arch/arm64/kernel/cpufeature.c                | 71 +++++++++++++++++++
>  arch/arm64/kernel/pi/idreg-override.c         |  2 +
>  arch/arm64/tools/cpucaps                      |  1 +
>  7 files changed, 103 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index d9fd26b95b34..2749c67a4f07 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -449,6 +449,9 @@
>  	arm64.no32bit_el0 [ARM64] Unconditionally disable the execution of
>  			32 bit applications.
>  
> +	arm64.nobbml2	[ARM64] Unconditionally disable Break-Before-Make Level
> +			2 support

Hmm, I'm not sure we really want this. It opens up the door for folks to
pass 'id_aa64mmfr2.bbm=2' without updating the allow-list which feels
like it's going to make crashes harder to reason about.

Is there a compelling reason to add this right now?

>  	arm64.nobti	[ARM64] Unconditionally disable Branch Target
>  			Identification support
>  
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a182295e6f08..613b4925ca06 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2070,6 +2070,25 @@ config ARM64_TLB_RANGE
>  	  The feature introduces new assembly instructions, and they were
>  	  support when binutils >= 2.30.
>  
> +config ARM64_BBML2_NOABORT
> +	bool "Enable support for Break-Before-Make Level 2 detection and usage"
> +	default y

I don't think we need a new Kconfig option for this. It's a
kernel-internal detail and I'd prefer not to fragment the testing base.

> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 9c4d6d552b25..7a85a1bdc6e9 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2200,6 +2200,70 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
>  	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
>  }
>  
> +static bool cpu_has_bbml2_noabort(unsigned int cpu_midr)
> +{
> +	/*
> +	 * We want to allow usage of bbml2 in as wide a range of kernel contexts
> +	 * as possible. This list is therefore an allow-list of known-good
> +	 * implementations that both support bbml2 and additionally, fulfill the
> +	 * extra constraint of never generating TLB conflict aborts when using
> +	 * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
> +	 * kernel contexts difficult to prove safe against recursive aborts).
> +	 *
> +	 * Note that implementations can only be considered "known-good" if their
> +	 * implementors attest to the fact that the implementation never raises
> +	 * TLBI conflict aborts for bbml2 mapping granularity changes.
> +	 */
> +	static const struct midr_range supports_bbml2_noabort_list[] = {
> +		MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
> +		MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
> +		{}
> +	};
> +
> +	return is_midr_in_range_list(cpu_midr, supports_bbml2_noabort_list);

This doesn't compile against latest mainline as is_midr_in_range_list()
no longer takes the midr.

> +static bool has_bbml2_noabort(const struct arm64_cpu_capabilities *caps, int scope)
> +{
> +	if (!IS_ENABLED(CONFIG_ARM64_BBML2_NOABORT))
> +		return false;
> +
> +	if (scope & SCOPE_SYSTEM) {
> +		int cpu;
> +
> +		/*
> +		 * We are a boot CPU, and must verify that all enumerated boot
> +		 * CPUs have MIDR values within our allowlist. Otherwise, we do
> +		 * not allow the BBML2 feature to avoid potential faults when
> +		 * the insufficient CPUs access memory regions using BBML2
> +		 * semantics.
> +		 */
> +		for_each_online_cpu(cpu) {
> +			if (!cpu_has_bbml2_noabort(cpu_read_midr(cpu)))
> +				return false;
> +		}

This penalises large homogeneous systems and it feels unnecessary given
that we have the ability to check this per-CPU. Can you use
ARM64_CPUCAP_BOOT_CPU_FEATURE instead of ARM64_CPUCAP_SYSTEM_FEATURE
to solve this?

Will


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-06 14:25   ` Will Deacon
@ 2025-05-06 14:51     ` Marc Zyngier
  2025-05-06 14:57       ` Will Deacon
  2025-05-06 14:52     ` Ryan Roberts
  1 sibling, 1 reply; 28+ messages in thread
From: Marc Zyngier @ 2025-05-06 14:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mikołaj Lenczewski, ryan.roberts, suzuki.poulose, yang,
	corbet, catalin.marinas, jean-philippe, robin.murphy, joro, akpm,
	paulmck, mark.rutland, joey.gouly, james.morse, broonie,
	oliver.upton, baohua, david, ioworker0, jgg, nicolinc, mshavit,
	jsnitsel, smostafa, kevin.tian, linux-doc, linux-kernel,
	linux-arm-kernel, iommu

On Tue, 06 May 2025 15:25:09 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Mon, Apr 28, 2025 at 03:35:14PM +0000, Mikołaj Lenczewski wrote:
> > The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
> > and this commit adds a dedicated BBML2 cpufeature to test against
> > support for, as well as a kernel commandline parameter to optionally
> > disable BBML2 altogether.
> > 
> > This is a system feature as we might have a big.LITTLE architecture
> > where some cores support BBML2 and some don't, but we want all cores to
> > be available and BBM to default to level 0 (as opposed to having cores
> > without BBML2 not coming online).
> > 
> > To support BBML2 in as wide a range of contexts as we can, we want not
> > only the architectural guarantees that BBML2 makes, but additionally
> > want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
> > us having to prove that no recursive faults can be induced in any path
> > that uses BBML2, allowing its use for arbitrary kernel mappings.
> > Support detection of such CPUs.
> > 
> > Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
> > Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> > ---
> >  .../admin-guide/kernel-parameters.txt         |  3 +
> >  arch/arm64/Kconfig                            | 19 +++++
> >  arch/arm64/include/asm/cpucaps.h              |  2 +
> >  arch/arm64/include/asm/cpufeature.h           |  5 ++
> >  arch/arm64/kernel/cpufeature.c                | 71 +++++++++++++++++++
> >  arch/arm64/kernel/pi/idreg-override.c         |  2 +
> >  arch/arm64/tools/cpucaps                      |  1 +
> >  7 files changed, 103 insertions(+)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index d9fd26b95b34..2749c67a4f07 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -449,6 +449,9 @@
> >  	arm64.no32bit_el0 [ARM64] Unconditionally disable the execution of
> >  			32 bit applications.
> >  
> > +	arm64.nobbml2	[ARM64] Unconditionally disable Break-Before-Make Level
> > +			2 support
> 
> Hmm, I'm not sure we really want this. It opens up the door for folks to
> pass 'id_aa64mmfr2.bbm=2' without updating the allow-list which feels
> like it's going to make crashes harder to reason about.

Passing id_aa64mmfr2.bbm=2 shouldn't have any effect if the HW doesn't
advertise it already, as you can only downgrade features. Trying to
upgrade features should leave a nastygram in the kernel log.

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-06 14:25   ` Will Deacon
  2025-05-06 14:51     ` Marc Zyngier
@ 2025-05-06 14:52     ` Ryan Roberts
  2025-05-09 13:49       ` Will Deacon
  1 sibling, 1 reply; 28+ messages in thread
From: Ryan Roberts @ 2025-05-06 14:52 UTC (permalink / raw)
  To: Will Deacon, Mikołaj Lenczewski
  Cc: suzuki.poulose, yang, corbet, catalin.marinas, jean-philippe,
	robin.murphy, joro, akpm, paulmck, mark.rutland, joey.gouly, maz,
	james.morse, broonie, oliver.upton, baohua, david, ioworker0, jgg,
	nicolinc, mshavit, jsnitsel, smostafa, kevin.tian, linux-doc,
	linux-kernel, linux-arm-kernel, iommu

On 06/05/2025 15:25, Will Deacon wrote:
> On Mon, Apr 28, 2025 at 03:35:14PM +0000, Mikołaj Lenczewski wrote:
>> The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
>> and this commit adds a dedicated BBML2 cpufeature to test against
>> support for, as well as a kernel commandline parameter to optionally
>> disable BBML2 altogether.
>>
>> This is a system feature as we might have a big.LITTLE architecture
>> where some cores support BBML2 and some don't, but we want all cores to
>> be available and BBM to default to level 0 (as opposed to having cores
>> without BBML2 not coming online).
>>
>> To support BBML2 in as wide a range of contexts as we can, we want not
>> only the architectural guarantees that BBML2 makes, but additionally
>> want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
>> us having to prove that no recursive faults can be induced in any path
>> that uses BBML2, allowing its use for arbitrary kernel mappings.
>> Support detection of such CPUs.
>>
>> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
>> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  .../admin-guide/kernel-parameters.txt         |  3 +
>>  arch/arm64/Kconfig                            | 19 +++++
>>  arch/arm64/include/asm/cpucaps.h              |  2 +
>>  arch/arm64/include/asm/cpufeature.h           |  5 ++
>>  arch/arm64/kernel/cpufeature.c                | 71 +++++++++++++++++++
>>  arch/arm64/kernel/pi/idreg-override.c         |  2 +
>>  arch/arm64/tools/cpucaps                      |  1 +
>>  7 files changed, 103 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index d9fd26b95b34..2749c67a4f07 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -449,6 +449,9 @@
>>  	arm64.no32bit_el0 [ARM64] Unconditionally disable the execution of
>>  			32 bit applications.
>>  
>> +	arm64.nobbml2	[ARM64] Unconditionally disable Break-Before-Make Level
>> +			2 support
> 
> Hmm, I'm not sure we really want this. It opens up the door for folks to
> pass 'id_aa64mmfr2.bbm=2' without updating the allow-list which feels
> like it's going to make crashes harder to reason about.
> 
> Is there a compelling reason to add this right now?

I don't think there is a *compelling* reason. This came about from Suzuki's
feedback at [1]. He was keen to have a mechanism to disable BBML2 in case issues
were found.

But simpler is usually better; I'd be ok with removing.

[1] https://lore.kernel.org/all/0ac0f1f5-e4a0-46ae-8ea0-2eba7e21a7e1@arm.com/

> 
>>  	arm64.nobti	[ARM64] Unconditionally disable Branch Target
>>  			Identification support
>>  
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index a182295e6f08..613b4925ca06 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -2070,6 +2070,25 @@ config ARM64_TLB_RANGE
>>  	  The feature introduces new assembly instructions, and they were
>>  	  support when binutils >= 2.30.
>>  
>> +config ARM64_BBML2_NOABORT
>> +	bool "Enable support for Break-Before-Make Level 2 detection and usage"
>> +	default y
> 
> I don't think we need a new Kconfig option for this. It's a
> kernel-internal detail and I'd prefer not to fragment the testing base.

Fair enough. This originated based on a similar argument to above. Let's just
remove then.

> 
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 9c4d6d552b25..7a85a1bdc6e9 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -2200,6 +2200,70 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
>>  	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
>>  }
>>  
>> +static bool cpu_has_bbml2_noabort(unsigned int cpu_midr)
>> +{
>> +	/*
>> +	 * We want to allow usage of bbml2 in as wide a range of kernel contexts
>> +	 * as possible. This list is therefore an allow-list of known-good
>> +	 * implementations that both support bbml2 and additionally, fulfill the
>> +	 * extra constraint of never generating TLB conflict aborts when using
>> +	 * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
>> +	 * kernel contexts difficult to prove safe against recursive aborts).
>> +	 *
>> +	 * Note that implementations can only be considered "known-good" if their
>> +	 * implementors attest to the fact that the implementation never raises
>> +	 * TLBI conflict aborts for bbml2 mapping granularity changes.
>> +	 */
>> +	static const struct midr_range supports_bbml2_noabort_list[] = {
>> +		MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
>> +		MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
>> +		{}
>> +	};
>> +
>> +	return is_midr_in_range_list(cpu_midr, supports_bbml2_noabort_list);
> 
> This doesn't compile against latest mainline as is_midr_in_range_list()
> no longer takes the midr.

Will ask Miko to fix.

> 
>> +static bool has_bbml2_noabort(const struct arm64_cpu_capabilities *caps, int scope)
>> +{
>> +	if (!IS_ENABLED(CONFIG_ARM64_BBML2_NOABORT))
>> +		return false;
>> +
>> +	if (scope & SCOPE_SYSTEM) {
>> +		int cpu;
>> +
>> +		/*
>> +		 * We are a boot CPU, and must verify that all enumerated boot
>> +		 * CPUs have MIDR values within our allowlist. Otherwise, we do
>> +		 * not allow the BBML2 feature to avoid potential faults when
>> +		 * the insufficient CPUs access memory regions using BBML2
>> +		 * semantics.
>> +		 */
>> +		for_each_online_cpu(cpu) {
>> +			if (!cpu_has_bbml2_noabort(cpu_read_midr(cpu)))
>> +				return false;
>> +		}
> 
> This penalises large homogeneous systems and it feels unnecessary given
> that we have the ability to check this per-CPU. Can you use
> ARM64_CPUCAP_BOOT_CPU_FEATURE instead of ARM64_CPUCAP_SYSTEM_FEATURE
> to solve this?

We are trying to solve for the case where the boot CPU has BBML2 but a secondary
CPU doesn't. (e.g. hetrogeneous system where boot CPU is big and secondary is
little and does not advertise the feature. I can't remember if we proved there
are real systems with this config - I have vague recollection that we did but my
memory is poor...).

My understanding is that for ARM64_CPUCAP_BOOT_CPU_FEATURE, "If the boot CPU
has enabled this feature already, then every late CPU must have it". So that
would exclude any secondary CPUs without BBML2 from coming online?

Suzuki guided us towards this solution.

How do you see this working with ARM64_CPUCAP_BOOT_CPU_FEATURE? Or do you just
think that all systems will always be homogeneous with respect to FEAT_BBM?

Thanks,
Ryan

> 
> Will



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-06 14:51     ` Marc Zyngier
@ 2025-05-06 14:57       ` Will Deacon
  0 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2025-05-06 14:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Mikołaj Lenczewski, ryan.roberts, suzuki.poulose, yang,
	corbet, catalin.marinas, jean-philippe, robin.murphy, joro, akpm,
	paulmck, mark.rutland, joey.gouly, james.morse, broonie,
	oliver.upton, baohua, david, ioworker0, jgg, nicolinc, mshavit,
	jsnitsel, smostafa, kevin.tian, linux-doc, linux-kernel,
	linux-arm-kernel, iommu

On Tue, May 06, 2025 at 03:51:59PM +0100, Marc Zyngier wrote:
> On Tue, 06 May 2025 15:25:09 +0100,
> Will Deacon <will@kernel.org> wrote:
> > 
> > On Mon, Apr 28, 2025 at 03:35:14PM +0000, Mikołaj Lenczewski wrote:
> > > The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
> > > and this commit adds a dedicated BBML2 cpufeature to test against
> > > support for, as well as a kernel commandline parameter to optionally
> > > disable BBML2 altogether.
> > > 
> > > This is a system feature as we might have a big.LITTLE architecture
> > > where some cores support BBML2 and some don't, but we want all cores to
> > > be available and BBM to default to level 0 (as opposed to having cores
> > > without BBML2 not coming online).
> > > 
> > > To support BBML2 in as wide a range of contexts as we can, we want not
> > > only the architectural guarantees that BBML2 makes, but additionally
> > > want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
> > > us having to prove that no recursive faults can be induced in any path
> > > that uses BBML2, allowing its use for arbitrary kernel mappings.
> > > Support detection of such CPUs.
> > > 
> > > Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
> > > Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> > > ---
> > >  .../admin-guide/kernel-parameters.txt         |  3 +
> > >  arch/arm64/Kconfig                            | 19 +++++
> > >  arch/arm64/include/asm/cpucaps.h              |  2 +
> > >  arch/arm64/include/asm/cpufeature.h           |  5 ++
> > >  arch/arm64/kernel/cpufeature.c                | 71 +++++++++++++++++++
> > >  arch/arm64/kernel/pi/idreg-override.c         |  2 +
> > >  arch/arm64/tools/cpucaps                      |  1 +
> > >  7 files changed, 103 insertions(+)
> > > 
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index d9fd26b95b34..2749c67a4f07 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -449,6 +449,9 @@
> > >  	arm64.no32bit_el0 [ARM64] Unconditionally disable the execution of
> > >  			32 bit applications.
> > >  
> > > +	arm64.nobbml2	[ARM64] Unconditionally disable Break-Before-Make Level
> > > +			2 support
> > 
> > Hmm, I'm not sure we really want this. It opens up the door for folks to
> > pass 'id_aa64mmfr2.bbm=2' without updating the allow-list which feels
> > like it's going to make crashes harder to reason about.
> 
> Passing id_aa64mmfr2.bbm=2 shouldn't have any effect if the HW doesn't
> advertise it already, as you can only downgrade features. Trying to
> upgrade features should leave a nastygram in the kernel log.

Ah, thanks, I was playing around in QEMU and my CPU already had BBML2
so I didn't spot that. In any case, I'd prefer to avoid adding the
option unless we need it -- this thing is driven from an MIDR-based
list and that should be maintained.

Will


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-06 14:52     ` Ryan Roberts
@ 2025-05-09 13:49       ` Will Deacon
  2025-05-09 14:16         ` Ryan Roberts
  2025-05-09 16:04         ` Catalin Marinas
  0 siblings, 2 replies; 28+ messages in thread
From: Will Deacon @ 2025-05-09 13:49 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Mikołaj Lenczewski, suzuki.poulose, yang, corbet,
	catalin.marinas, jean-philippe, robin.murphy, joro, akpm, paulmck,
	mark.rutland, joey.gouly, maz, james.morse, broonie, oliver.upton,
	baohua, david, ioworker0, jgg, nicolinc, mshavit, jsnitsel,
	smostafa, kevin.tian, linux-doc, linux-kernel, linux-arm-kernel,
	iommu

On Tue, May 06, 2025 at 03:52:59PM +0100, Ryan Roberts wrote:
> On 06/05/2025 15:25, Will Deacon wrote:
> > On Mon, Apr 28, 2025 at 03:35:14PM +0000, Mikołaj Lenczewski wrote:
> >> The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
> >> and this commit adds a dedicated BBML2 cpufeature to test against
> >> support for, as well as a kernel commandline parameter to optionally
> >> disable BBML2 altogether.
> >>
> >> This is a system feature as we might have a big.LITTLE architecture
> >> where some cores support BBML2 and some don't, but we want all cores to
> >> be available and BBM to default to level 0 (as opposed to having cores
> >> without BBML2 not coming online).
> >>
> >> To support BBML2 in as wide a range of contexts as we can, we want not
> >> only the architectural guarantees that BBML2 makes, but additionally
> >> want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
> >> us having to prove that no recursive faults can be induced in any path
> >> that uses BBML2, allowing its use for arbitrary kernel mappings.
> >> Support detection of such CPUs.
> >>
> >> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
> >> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> >> ---
> >>  .../admin-guide/kernel-parameters.txt         |  3 +
> >>  arch/arm64/Kconfig                            | 19 +++++
> >>  arch/arm64/include/asm/cpucaps.h              |  2 +
> >>  arch/arm64/include/asm/cpufeature.h           |  5 ++
> >>  arch/arm64/kernel/cpufeature.c                | 71 +++++++++++++++++++
> >>  arch/arm64/kernel/pi/idreg-override.c         |  2 +
> >>  arch/arm64/tools/cpucaps                      |  1 +
> >>  7 files changed, 103 insertions(+)
> >>
> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> >> index d9fd26b95b34..2749c67a4f07 100644
> >> --- a/Documentation/admin-guide/kernel-parameters.txt
> >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >> @@ -449,6 +449,9 @@
> >>  	arm64.no32bit_el0 [ARM64] Unconditionally disable the execution of
> >>  			32 bit applications.
> >>  
> >> +	arm64.nobbml2	[ARM64] Unconditionally disable Break-Before-Make Level
> >> +			2 support
> > 
> > Hmm, I'm not sure we really want this. It opens up the door for folks to
> > pass 'id_aa64mmfr2.bbm=2' without updating the allow-list which feels
> > like it's going to make crashes harder to reason about.
> > 
> > Is there a compelling reason to add this right now?
> 
> I don't think there is a *compelling* reason. This came about from Suzuki's
> feedback at [1]. He was keen to have a mechanism to disable BBML2 in case issues
> were found.
> 
> But simpler is usually better; I'd be ok with removing.

We can always add it back if we really need it, but adding an allowlist
*and* a mechanism to override the allowlist at the same time seems overly
pessimistic to me :)

> >> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> >> index 9c4d6d552b25..7a85a1bdc6e9 100644
> >> --- a/arch/arm64/kernel/cpufeature.c
> >> +++ b/arch/arm64/kernel/cpufeature.c
> >> @@ -2200,6 +2200,70 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
> >>  	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
> >>  }
> >>  
> >> +static bool cpu_has_bbml2_noabort(unsigned int cpu_midr)
> >> +{
> >> +	/*
> >> +	 * We want to allow usage of bbml2 in as wide a range of kernel contexts
> >> +	 * as possible. This list is therefore an allow-list of known-good
> >> +	 * implementations that both support bbml2 and additionally, fulfill the
> >> +	 * extra constraint of never generating TLB conflict aborts when using
> >> +	 * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
> >> +	 * kernel contexts difficult to prove safe against recursive aborts).
> >> +	 *
> >> +	 * Note that implementations can only be considered "known-good" if their
> >> +	 * implementors attest to the fact that the implementation never raises
> >> +	 * TLBI conflict aborts for bbml2 mapping granularity changes.
> >> +	 */
> >> +	static const struct midr_range supports_bbml2_noabort_list[] = {
> >> +		MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
> >> +		MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
> >> +		{}
> >> +	};
> >> +
> >> +	return is_midr_in_range_list(cpu_midr, supports_bbml2_noabort_list);
> > 
> > This doesn't compile against latest mainline as is_midr_in_range_list()
> > no longer takes the midr.
> 
> Will ask Miko to fix.

Cheers. v6.15-rc1 is probably the right base to use as that's what I've
based for-next/mm on.

> >> +static bool has_bbml2_noabort(const struct arm64_cpu_capabilities *caps, int scope)
> >> +{
> >> +	if (!IS_ENABLED(CONFIG_ARM64_BBML2_NOABORT))
> >> +		return false;
> >> +
> >> +	if (scope & SCOPE_SYSTEM) {
> >> +		int cpu;
> >> +
> >> +		/*
> >> +		 * We are a boot CPU, and must verify that all enumerated boot
> >> +		 * CPUs have MIDR values within our allowlist. Otherwise, we do
> >> +		 * not allow the BBML2 feature to avoid potential faults when
> >> +		 * the insufficient CPUs access memory regions using BBML2
> >> +		 * semantics.
> >> +		 */
> >> +		for_each_online_cpu(cpu) {
> >> +			if (!cpu_has_bbml2_noabort(cpu_read_midr(cpu)))
> >> +				return false;
> >> +		}
> > 
> > This penalises large homogeneous systems and it feels unnecessary given
> > that we have the ability to check this per-CPU. Can you use
> > ARM64_CPUCAP_BOOT_CPU_FEATURE instead of ARM64_CPUCAP_SYSTEM_FEATURE
> > to solve this?
> 
> We are trying to solve for the case where the boot CPU has BBML2 but a secondary
> CPU doesn't. (e.g. hetrogeneous system where boot CPU is big and secondary is
> little and does not advertise the feature. I can't remember if we proved there
> are real systems with this config - I have vague recollection that we did but my
> memory is poor...).
> 
> My understanding is that for ARM64_CPUCAP_BOOT_CPU_FEATURE, "If the boot CPU
> has enabled this feature already, then every late CPU must have it". So that
> would exclude any secondary CPUs without BBML2 from coming online?

Damn, yes, you're right. However, it still feels horribly hacky to iterate
over the online CPUs in has_bbml2_noabort() -- the cpufeature framework
has the ability to query features locally and we should be able to use
that. We're going to want that should the architecture eventually decide
on something like BBML3 for this.

What we have with BBML2_NOABORT seems similar to an hwcap in that we only
support the capability if all CPUs have it (rejecting late CPUs without it
in that case) but we can live without it if not all of the early CPUs
have it. Unlikely hwcaps, though, we shouldn't be advertising this to
userspace and we can't derive the capability solely from the sanitised
system registers.

I wonder if we could treat it like an erratum in some way instead? That
is, invert things so that CPUs which _don't_ have BBML2_NOABORT are
considered to have a "BBM_CONFLICT_ABORT" erratum (which we obviously
wouldn't shout about). Then we should be able to say:

  - If any of the early CPUs don't have BBML2_NOABORT, then the erratum
    would be enabled and we wouln't elide BBM.

  - If a late CPU doesn't have BBML2_NOABORT then it can't come online
    if the erratum isn't already enabled.

Does that work? If not, then perhaps the cpufeature/cpuerrata code needs
some surgery for this.

> How do you see this working with ARM64_CPUCAP_BOOT_CPU_FEATURE? Or do you just
> think that all systems will always be homogeneous with respect to FEAT_BBM?

That's probably wishful thinking, sadly :(

Will


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-09 13:49       ` Will Deacon
@ 2025-05-09 14:16         ` Ryan Roberts
  2025-05-09 14:28           ` Will Deacon
  2025-05-09 16:04         ` Catalin Marinas
  1 sibling, 1 reply; 28+ messages in thread
From: Ryan Roberts @ 2025-05-09 14:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mikołaj Lenczewski, suzuki.poulose, yang, corbet,
	catalin.marinas, jean-philippe, robin.murphy, joro, akpm, paulmck,
	mark.rutland, joey.gouly, maz, james.morse, broonie, oliver.upton,
	baohua, david, ioworker0, jgg, nicolinc, mshavit, jsnitsel,
	smostafa, kevin.tian, linux-doc, linux-kernel, linux-arm-kernel,
	iommu

On 09/05/2025 14:49, Will Deacon wrote:
> On Tue, May 06, 2025 at 03:52:59PM +0100, Ryan Roberts wrote:
>> On 06/05/2025 15:25, Will Deacon wrote:
>>> On Mon, Apr 28, 2025 at 03:35:14PM +0000, Mikołaj Lenczewski wrote:
>>>> The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
>>>> and this commit adds a dedicated BBML2 cpufeature to test against
>>>> support for, as well as a kernel commandline parameter to optionally
>>>> disable BBML2 altogether.
>>>>
>>>> This is a system feature as we might have a big.LITTLE architecture
>>>> where some cores support BBML2 and some don't, but we want all cores to
>>>> be available and BBM to default to level 0 (as opposed to having cores
>>>> without BBML2 not coming online).
>>>>
>>>> To support BBML2 in as wide a range of contexts as we can, we want not
>>>> only the architectural guarantees that BBML2 makes, but additionally
>>>> want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
>>>> us having to prove that no recursive faults can be induced in any path
>>>> that uses BBML2, allowing its use for arbitrary kernel mappings.
>>>> Support detection of such CPUs.
>>>>
>>>> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
>>>> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>  .../admin-guide/kernel-parameters.txt         |  3 +
>>>>  arch/arm64/Kconfig                            | 19 +++++
>>>>  arch/arm64/include/asm/cpucaps.h              |  2 +
>>>>  arch/arm64/include/asm/cpufeature.h           |  5 ++
>>>>  arch/arm64/kernel/cpufeature.c                | 71 +++++++++++++++++++
>>>>  arch/arm64/kernel/pi/idreg-override.c         |  2 +
>>>>  arch/arm64/tools/cpucaps                      |  1 +
>>>>  7 files changed, 103 insertions(+)
>>>>
>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>>> index d9fd26b95b34..2749c67a4f07 100644
>>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>>> @@ -449,6 +449,9 @@
>>>>  	arm64.no32bit_el0 [ARM64] Unconditionally disable the execution of
>>>>  			32 bit applications.
>>>>  
>>>> +	arm64.nobbml2	[ARM64] Unconditionally disable Break-Before-Make Level
>>>> +			2 support
>>>
>>> Hmm, I'm not sure we really want this. It opens up the door for folks to
>>> pass 'id_aa64mmfr2.bbm=2' without updating the allow-list which feels
>>> like it's going to make crashes harder to reason about.
>>>
>>> Is there a compelling reason to add this right now?
>>
>> I don't think there is a *compelling* reason. This came about from Suzuki's
>> feedback at [1]. He was keen to have a mechanism to disable BBML2 in case issues
>> were found.
>>
>> But simpler is usually better; I'd be ok with removing.
> 
> We can always add it back if we really need it, but adding an allowlist
> *and* a mechanism to override the allowlist at the same time seems overly
> pessimistic to me :)

ACK

> 
>>>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>>>> index 9c4d6d552b25..7a85a1bdc6e9 100644
>>>> --- a/arch/arm64/kernel/cpufeature.c
>>>> +++ b/arch/arm64/kernel/cpufeature.c
>>>> @@ -2200,6 +2200,70 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
>>>>  	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
>>>>  }
>>>>  
>>>> +static bool cpu_has_bbml2_noabort(unsigned int cpu_midr)
>>>> +{
>>>> +	/*
>>>> +	 * We want to allow usage of bbml2 in as wide a range of kernel contexts
>>>> +	 * as possible. This list is therefore an allow-list of known-good
>>>> +	 * implementations that both support bbml2 and additionally, fulfill the
>>>> +	 * extra constraint of never generating TLB conflict aborts when using
>>>> +	 * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
>>>> +	 * kernel contexts difficult to prove safe against recursive aborts).
>>>> +	 *
>>>> +	 * Note that implementations can only be considered "known-good" if their
>>>> +	 * implementors attest to the fact that the implementation never raises
>>>> +	 * TLBI conflict aborts for bbml2 mapping granularity changes.
>>>> +	 */
>>>> +	static const struct midr_range supports_bbml2_noabort_list[] = {
>>>> +		MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
>>>> +		MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
>>>> +		{}
>>>> +	};
>>>> +
>>>> +	return is_midr_in_range_list(cpu_midr, supports_bbml2_noabort_list);
>>>
>>> This doesn't compile against latest mainline as is_midr_in_range_list()
>>> no longer takes the midr.
>>
>> Will ask Miko to fix.
> 
> Cheers. v6.15-rc1 is probably the right base to use as that's what I've
> based for-next/mm on.

ACK

> 
>>>> +static bool has_bbml2_noabort(const struct arm64_cpu_capabilities *caps, int scope)
>>>> +{
>>>> +	if (!IS_ENABLED(CONFIG_ARM64_BBML2_NOABORT))
>>>> +		return false;
>>>> +
>>>> +	if (scope & SCOPE_SYSTEM) {
>>>> +		int cpu;
>>>> +
>>>> +		/*
>>>> +		 * We are a boot CPU, and must verify that all enumerated boot
>>>> +		 * CPUs have MIDR values within our allowlist. Otherwise, we do
>>>> +		 * not allow the BBML2 feature to avoid potential faults when
>>>> +		 * the insufficient CPUs access memory regions using BBML2
>>>> +		 * semantics.
>>>> +		 */
>>>> +		for_each_online_cpu(cpu) {
>>>> +			if (!cpu_has_bbml2_noabort(cpu_read_midr(cpu)))
>>>> +				return false;
>>>> +		}
>>>
>>> This penalises large homogeneous systems and it feels unnecessary given
>>> that we have the ability to check this per-CPU. 

In case you didn't spot it, cpu_read_midr() is not actually reading a remote
cpu's midr. It's reading the cached midr in a per-cpu data structure. So I don't
think this will be very expensive in reality. And it's only run once during boot...

static inline unsigned int cpu_read_midr(int cpu)
{
	WARN_ON_ONCE(!cpu_online(cpu));

	return per_cpu(cpu_data, cpu).reg_midr;
}


> Can you use
>>> ARM64_CPUCAP_BOOT_CPU_FEATURE instead of ARM64_CPUCAP_SYSTEM_FEATURE
>>> to solve this?
>>
>> We are trying to solve for the case where the boot CPU has BBML2 but a secondary
>> CPU doesn't. (e.g. hetrogeneous system where boot CPU is big and secondary is
>> little and does not advertise the feature. I can't remember if we proved there
>> are real systems with this config - I have vague recollection that we did but my
>> memory is poor...).
>>
>> My understanding is that for ARM64_CPUCAP_BOOT_CPU_FEATURE, "If the boot CPU
>> has enabled this feature already, then every late CPU must have it". So that
>> would exclude any secondary CPUs without BBML2 from coming online?
> 
> Damn, yes, you're right. However, it still feels horribly hacky to iterate
> over the online CPUs in has_bbml2_noabort() -- the cpufeature framework
> has the ability to query features locally and we should be able to use
> that. We're going to want that should the architecture eventually decide
> on something like BBML3 for this.

For BBML3, we're looking for a minimum value in the BBM field of the FFMR, and
in that case the framework can handle it nicely with
ARM64_CPUCAP_SYSTEM_FEATURE. But the framework doesn't have any support for the
case where we need to look at all the midrs. So Suzuki came up with this method.

If/when we have BBML3 I was thinking we could retrospectively treat the CPUs in
the midr list as having an erratum that they don't advertise BBML3 when they
should (since the semantics are basically the same I expect/hope/have been
trying to ensure), so we can just implement workarounds to make it look like
they do have BBML3, then the framework does it's thing. Perhaps we can live with
this hack until we get to that point?

> 
> What we have with BBML2_NOABORT seems similar to an hwcap in that we only
> support the capability if all CPUs have it (rejecting late CPUs without it
> in that case) but we can live without it if not all of the early CPUs
> have it. Unlikely hwcaps, though, we shouldn't be advertising this to
> userspace and we can't derive the capability solely from the sanitised
> system registers.

Agreed.

> 
> I wonder if we could treat it like an erratum in some way instead? That
> is, invert things so that CPUs which _don't_ have BBML2_NOABORT are
> considered to have a "BBM_CONFLICT_ABORT" erratum (which we obviously
> wouldn't shout about). Then we should be able to say:
> 
>   - If any of the early CPUs don't have BBML2_NOABORT, then the erratum
>     would be enabled and we wouln't elide BBM.
> 
>   - If a late CPU doesn't have BBML2_NOABORT then it can't come online
>     if the erratum isn't already enabled.

That's exactly the policy that this cludge provides. But it's using the midr to
check if the CPU has BBML2_NOABORT. I'm not sure I follow your point about a
"BBM_CONFLICT_ABORT" erratum?

I'm also at a massive disadvantage because I find the whole cpufeatures
framework impenetrable :)

I'll ping Suzuki to see if he can chime in here.

Thanks,
Ryan

> 
> Does that work? If not, then perhaps the cpufeature/cpuerrata code needs
> some surgery for this.
> 
>> How do you see this working with ARM64_CPUCAP_BOOT_CPU_FEATURE? Or do you just
>> think that all systems will always be homogeneous with respect to FEAT_BBM?
> 
> That's probably wishful thinking, sadly :(
> 
> Will



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-09 14:16         ` Ryan Roberts
@ 2025-05-09 14:28           ` Will Deacon
  2025-05-09 14:58             ` Ryan Roberts
  2025-05-09 15:59             ` Catalin Marinas
  0 siblings, 2 replies; 28+ messages in thread
From: Will Deacon @ 2025-05-09 14:28 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Mikołaj Lenczewski, suzuki.poulose, yang, corbet,
	catalin.marinas, jean-philippe, robin.murphy, joro, akpm, paulmck,
	mark.rutland, joey.gouly, maz, james.morse, broonie, oliver.upton,
	baohua, david, ioworker0, jgg, nicolinc, mshavit, jsnitsel,
	smostafa, kevin.tian, linux-doc, linux-kernel, linux-arm-kernel,
	iommu

On Fri, May 09, 2025 at 03:16:01PM +0100, Ryan Roberts wrote:
> On 09/05/2025 14:49, Will Deacon wrote:
> > On Tue, May 06, 2025 at 03:52:59PM +0100, Ryan Roberts wrote:
> >> On 06/05/2025 15:25, Will Deacon wrote:
> >>> On Mon, Apr 28, 2025 at 03:35:14PM +0000, Mikołaj Lenczewski wrote:
> >>>> +static bool has_bbml2_noabort(const struct arm64_cpu_capabilities *caps, int scope)
> >>>> +{
> >>>> +	if (!IS_ENABLED(CONFIG_ARM64_BBML2_NOABORT))
> >>>> +		return false;
> >>>> +
> >>>> +	if (scope & SCOPE_SYSTEM) {
> >>>> +		int cpu;
> >>>> +
> >>>> +		/*
> >>>> +		 * We are a boot CPU, and must verify that all enumerated boot
> >>>> +		 * CPUs have MIDR values within our allowlist. Otherwise, we do
> >>>> +		 * not allow the BBML2 feature to avoid potential faults when
> >>>> +		 * the insufficient CPUs access memory regions using BBML2
> >>>> +		 * semantics.
> >>>> +		 */
> >>>> +		for_each_online_cpu(cpu) {
> >>>> +			if (!cpu_has_bbml2_noabort(cpu_read_midr(cpu)))
> >>>> +				return false;
> >>>> +		}
> >>>
> >>> This penalises large homogeneous systems and it feels unnecessary given
> >>> that we have the ability to check this per-CPU. 
> 
> In case you didn't spot it, cpu_read_midr() is not actually reading a remote
> cpu's midr. It's reading the cached midr in a per-cpu data structure. So I don't
> think this will be very expensive in reality. And it's only run once during boot...
> 
> static inline unsigned int cpu_read_midr(int cpu)
> {
> 	WARN_ON_ONCE(!cpu_online(cpu));
> 
> 	return per_cpu(cpu_data, cpu).reg_midr;
> }

I know it's not reading a remote MIDR, that would be crazy :)

But iterating over per-cpu variables sucks and we should be able to avoid
it because this code already knows how to detect features locally.

> 
> > Can you use
> >>> ARM64_CPUCAP_BOOT_CPU_FEATURE instead of ARM64_CPUCAP_SYSTEM_FEATURE
> >>> to solve this?
> >>
> >> We are trying to solve for the case where the boot CPU has BBML2 but a secondary
> >> CPU doesn't. (e.g. hetrogeneous system where boot CPU is big and secondary is
> >> little and does not advertise the feature. I can't remember if we proved there
> >> are real systems with this config - I have vague recollection that we did but my
> >> memory is poor...).
> >>
> >> My understanding is that for ARM64_CPUCAP_BOOT_CPU_FEATURE, "If the boot CPU
> >> has enabled this feature already, then every late CPU must have it". So that
> >> would exclude any secondary CPUs without BBML2 from coming online?
> > 
> > Damn, yes, you're right. However, it still feels horribly hacky to iterate
> > over the online CPUs in has_bbml2_noabort() -- the cpufeature framework
> > has the ability to query features locally and we should be able to use
> > that. We're going to want that should the architecture eventually decide
> > on something like BBML3 for this.
> 
> For BBML3, we're looking for a minimum value in the BBM field of the FFMR, and
> in that case the framework can handle it nicely with
> ARM64_CPUCAP_SYSTEM_FEATURE. But the framework doesn't have any support for the
> case where we need to look at all the midrs. So Suzuki came up with this method.
> 
> If/when we have BBML3 I was thinking we could retrospectively treat the CPUs in
> the midr list as having an erratum that they don't advertise BBML3 when they
> should (since the semantics are basically the same I expect/hope/have been
> trying to ensure), so we can just implement workarounds to make it look like
> they do have BBML3, then the framework does it's thing. Perhaps we can live with
> this hack until we get to that point?

I think if you want to go down that route, then this needs to be detected
locally on each CPU.

> > What we have with BBML2_NOABORT seems similar to an hwcap in that we only
> > support the capability if all CPUs have it (rejecting late CPUs without it
> > in that case) but we can live without it if not all of the early CPUs
> > have it. Unlikely hwcaps, though, we shouldn't be advertising this to
> > userspace and we can't derive the capability solely from the sanitised
> > system registers.
> 
> Agreed.
> 
> > 
> > I wonder if we could treat it like an erratum in some way instead? That
> > is, invert things so that CPUs which _don't_ have BBML2_NOABORT are
> > considered to have a "BBM_CONFLICT_ABORT" erratum (which we obviously
> > wouldn't shout about). Then we should be able to say:
> > 
> >   - If any of the early CPUs don't have BBML2_NOABORT, then the erratum
> >     would be enabled and we wouln't elide BBM.
> > 
> >   - If a late CPU doesn't have BBML2_NOABORT then it can't come online
> >     if the erratum isn't already enabled.
> 
> That's exactly the policy that this cludge provides. But it's using the midr to
> check if the CPU has BBML2_NOABORT. I'm not sure I follow your point about a
> "BBM_CONFLICT_ABORT" erratum?

I was hoping that it would mean that each CPU can independently determine
whether or not they have the erratum and then enable it as soon as they
detect it. That way, there's no need to iterate over all the early cores.

> I'm also at a massive disadvantage because I find the whole cpufeatures
> framework impenetrable :)
> 
> I'll ping Suzuki to see if he can chime in here.

Thanks,

Will


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-09 14:28           ` Will Deacon
@ 2025-05-09 14:58             ` Ryan Roberts
  2025-05-09 15:59             ` Catalin Marinas
  1 sibling, 0 replies; 28+ messages in thread
From: Ryan Roberts @ 2025-05-09 14:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mikołaj Lenczewski, suzuki.poulose, yang, corbet,
	catalin.marinas, jean-philippe, robin.murphy, joro, akpm, paulmck,
	mark.rutland, joey.gouly, maz, james.morse, broonie, oliver.upton,
	baohua, david, ioworker0, jgg, nicolinc, mshavit, jsnitsel,
	smostafa, kevin.tian, linux-doc, linux-kernel, linux-arm-kernel,
	iommu

On 09/05/2025 15:28, Will Deacon wrote:
> On Fri, May 09, 2025 at 03:16:01PM +0100, Ryan Roberts wrote:
>> On 09/05/2025 14:49, Will Deacon wrote:
>>> On Tue, May 06, 2025 at 03:52:59PM +0100, Ryan Roberts wrote:
>>>> On 06/05/2025 15:25, Will Deacon wrote:
>>>>> On Mon, Apr 28, 2025 at 03:35:14PM +0000, Mikołaj Lenczewski wrote:
>>>>>> +static bool has_bbml2_noabort(const struct arm64_cpu_capabilities *caps, int scope)
>>>>>> +{
>>>>>> +	if (!IS_ENABLED(CONFIG_ARM64_BBML2_NOABORT))
>>>>>> +		return false;
>>>>>> +
>>>>>> +	if (scope & SCOPE_SYSTEM) {
>>>>>> +		int cpu;
>>>>>> +
>>>>>> +		/*
>>>>>> +		 * We are a boot CPU, and must verify that all enumerated boot
>>>>>> +		 * CPUs have MIDR values within our allowlist. Otherwise, we do
>>>>>> +		 * not allow the BBML2 feature to avoid potential faults when
>>>>>> +		 * the insufficient CPUs access memory regions using BBML2
>>>>>> +		 * semantics.
>>>>>> +		 */
>>>>>> +		for_each_online_cpu(cpu) {
>>>>>> +			if (!cpu_has_bbml2_noabort(cpu_read_midr(cpu)))
>>>>>> +				return false;
>>>>>> +		}
>>>>>
>>>>> This penalises large homogeneous systems and it feels unnecessary given
>>>>> that we have the ability to check this per-CPU. 
>>
>> In case you didn't spot it, cpu_read_midr() is not actually reading a remote
>> cpu's midr. It's reading the cached midr in a per-cpu data structure. So I don't
>> think this will be very expensive in reality. And it's only run once during boot...
>>
>> static inline unsigned int cpu_read_midr(int cpu)
>> {
>> 	WARN_ON_ONCE(!cpu_online(cpu));
>>
>> 	return per_cpu(cpu_data, cpu).reg_midr;
>> }
> 
> I know it's not reading a remote MIDR, that would be crazy :)
> 
> But iterating over per-cpu variables sucks and we should be able to avoid
> it because this code already knows how to detect features locally.
> 
>>
>>> Can you use
>>>>> ARM64_CPUCAP_BOOT_CPU_FEATURE instead of ARM64_CPUCAP_SYSTEM_FEATURE
>>>>> to solve this?
>>>>
>>>> We are trying to solve for the case where the boot CPU has BBML2 but a secondary
>>>> CPU doesn't. (e.g. hetrogeneous system where boot CPU is big and secondary is
>>>> little and does not advertise the feature. I can't remember if we proved there
>>>> are real systems with this config - I have vague recollection that we did but my
>>>> memory is poor...).
>>>>
>>>> My understanding is that for ARM64_CPUCAP_BOOT_CPU_FEATURE, "If the boot CPU
>>>> has enabled this feature already, then every late CPU must have it". So that
>>>> would exclude any secondary CPUs without BBML2 from coming online?
>>>
>>> Damn, yes, you're right. However, it still feels horribly hacky to iterate
>>> over the online CPUs in has_bbml2_noabort() -- the cpufeature framework
>>> has the ability to query features locally and we should be able to use
>>> that. We're going to want that should the architecture eventually decide
>>> on something like BBML3 for this.
>>
>> For BBML3, we're looking for a minimum value in the BBM field of the FFMR, and
>> in that case the framework can handle it nicely with
>> ARM64_CPUCAP_SYSTEM_FEATURE. But the framework doesn't have any support for the
>> case where we need to look at all the midrs. So Suzuki came up with this method.
>>
>> If/when we have BBML3 I was thinking we could retrospectively treat the CPUs in
>> the midr list as having an erratum that they don't advertise BBML3 when they
>> should (since the semantics are basically the same I expect/hope/have been
>> trying to ensure), so we can just implement workarounds to make it look like
>> they do have BBML3, then the framework does it's thing. Perhaps we can live with
>> this hack until we get to that point?
> 
> I think if you want to go down that route, then this needs to be detected
> locally on each CPU.

Yes that's my point; once we have BBML3 it will be detected locally for each CPU
because the framework can handle that for MMFR fields. But until we get there,
we are stuck with a midr list.

> 
>>> What we have with BBML2_NOABORT seems similar to an hwcap in that we only
>>> support the capability if all CPUs have it (rejecting late CPUs without it
>>> in that case) but we can live without it if not all of the early CPUs
>>> have it. Unlikely hwcaps, though, we shouldn't be advertising this to
>>> userspace and we can't derive the capability solely from the sanitised
>>> system registers.
>>
>> Agreed.
>>
>>>
>>> I wonder if we could treat it like an erratum in some way instead? That
>>> is, invert things so that CPUs which _don't_ have BBML2_NOABORT are
>>> considered to have a "BBM_CONFLICT_ABORT" erratum (which we obviously
>>> wouldn't shout about). Then we should be able to say:
>>>
>>>   - If any of the early CPUs don't have BBML2_NOABORT, then the erratum
>>>     would be enabled and we wouln't elide BBM.
>>>
>>>   - If a late CPU doesn't have BBML2_NOABORT then it can't come online
>>>     if the erratum isn't already enabled.
>>
>> That's exactly the policy that this cludge provides. But it's using the midr to
>> check if the CPU has BBML2_NOABORT. I'm not sure I follow your point about a
>> "BBM_CONFLICT_ABORT" erratum?
> 
> I was hoping that it would mean that each CPU can independently determine
> whether or not they have the erratum and then enable it as soon as they
> detect it. That way, there's no need to iterate over all the early cores.

OK, I don't understand the framework well enough to fully understand your
suggestion; I'll talk to Suzuki and have a dig through the code.

Thanks for the review!
Ryan

> 
>> I'm also at a massive disadvantage because I find the whole cpufeatures
>> framework impenetrable :)
>>
>> I'll ping Suzuki to see if he can chime in here.
> 
> Thanks,
> 
> Will



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-09 14:28           ` Will Deacon
  2025-05-09 14:58             ` Ryan Roberts
@ 2025-05-09 15:59             ` Catalin Marinas
  1 sibling, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2025-05-09 15:59 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ryan Roberts, Mikołaj Lenczewski, suzuki.poulose, yang,
	corbet, jean-philippe, robin.murphy, joro, akpm, paulmck,
	mark.rutland, joey.gouly, maz, james.morse, broonie, oliver.upton,
	baohua, david, ioworker0, jgg, nicolinc, mshavit, jsnitsel,
	smostafa, kevin.tian, linux-doc, linux-kernel, linux-arm-kernel,
	iommu

On Fri, May 09, 2025 at 03:28:53PM +0100, Will Deacon wrote:
> On Fri, May 09, 2025 at 03:16:01PM +0100, Ryan Roberts wrote:
> > On 09/05/2025 14:49, Will Deacon wrote:
> > > I wonder if we could treat it like an erratum in some way instead? That
> > > is, invert things so that CPUs which _don't_ have BBML2_NOABORT are
> > > considered to have a "BBM_CONFLICT_ABORT" erratum (which we obviously
> > > wouldn't shout about). Then we should be able to say:
> > > 
> > >   - If any of the early CPUs don't have BBML2_NOABORT, then the erratum
> > >     would be enabled and we wouln't elide BBM.
> > > 
> > >   - If a late CPU doesn't have BBML2_NOABORT then it can't come online
> > >     if the erratum isn't already enabled.
> > 
> > That's exactly the policy that this cludge provides. But it's using the midr to
> > check if the CPU has BBML2_NOABORT. I'm not sure I follow your point about a
> > "BBM_CONFLICT_ABORT" erratum?
> 
> I was hoping that it would mean that each CPU can independently determine
> whether or not they have the erratum and then enable it as soon as they
> detect it. That way, there's no need to iterate over all the early cores.

But then we'll still have to disable the feature if one of the early
CPUs doesn't have it. As a local CPU feature as per the errata handling,
the feature (bug) is advertised if at least one CPU supports it. Here we
kind of need a combination of system (available an all early CPUs) and
local MIDR check.

We might be able to work with two features - one SCOPE_SYSTEM for the
sanitised ID reg and a _negative_ (deny-list) SCOPE_LOCAL_CPU for the
MIDR. Or probably a single CPU-local feature, but negative, that checks
both the ID reg and MIDR, let's call it ARM64_HAS_NO_BBML2_NOABORT. If a
single CPU does not have the ID reg or is on the MIDR deny-list, this
"feature" will be enabled. For late CPUs, if NO_BBML2_NOABORT has been
enabled, they are allowed to miss it (OPTIONAL_FOR_LATE_CPU). However,
if NO_BBML2_NOABORT is cleared, a late CPU is not allowed to have it.

Hmm, if I got the logic right, that's what
ARM64_CPUCAP_LOCAL_CPU_ERRATUM means but we need to negate the feature
to disable the optimisation if at least one CPU is on the deny-list or
missing BBML2.

(or maybe I haven't had enough coffee today)

-- 
Catalin


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-09 13:49       ` Will Deacon
  2025-05-09 14:16         ` Ryan Roberts
@ 2025-05-09 16:04         ` Catalin Marinas
  2025-05-12 13:07           ` Ryan Roberts
  1 sibling, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2025-05-09 16:04 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ryan Roberts, Mikołaj Lenczewski, suzuki.poulose, yang,
	corbet, jean-philippe, robin.murphy, joro, akpm, paulmck,
	mark.rutland, joey.gouly, maz, james.morse, broonie, oliver.upton,
	baohua, david, ioworker0, jgg, nicolinc, mshavit, jsnitsel,
	smostafa, kevin.tian, linux-doc, linux-kernel, linux-arm-kernel,
	iommu

On Fri, May 09, 2025 at 02:49:05PM +0100, Will Deacon wrote:
> On Tue, May 06, 2025 at 03:52:59PM +0100, Ryan Roberts wrote:
> > On 06/05/2025 15:25, Will Deacon wrote:
> > > This penalises large homogeneous systems and it feels unnecessary given
> > > that we have the ability to check this per-CPU. Can you use
> > > ARM64_CPUCAP_BOOT_CPU_FEATURE instead of ARM64_CPUCAP_SYSTEM_FEATURE
> > > to solve this?
> > 
> > We are trying to solve for the case where the boot CPU has BBML2 but a secondary
> > CPU doesn't. (e.g. hetrogeneous system where boot CPU is big and secondary is
> > little and does not advertise the feature. I can't remember if we proved there
> > are real systems with this config - I have vague recollection that we did but my
> > memory is poor...).
> > 
> > My understanding is that for ARM64_CPUCAP_BOOT_CPU_FEATURE, "If the boot CPU
> > has enabled this feature already, then every late CPU must have it". So that
> > would exclude any secondary CPUs without BBML2 from coming online?
> 
> Damn, yes, you're right. However, it still feels horribly hacky to iterate
> over the online CPUs in has_bbml2_noabort() -- the cpufeature framework
> has the ability to query features locally and we should be able to use
> that. We're going to want that should the architecture eventually decide
> on something like BBML3 for this.
> 
> What we have with BBML2_NOABORT seems similar to an hwcap in that we only
> support the capability if all CPUs have it (rejecting late CPUs without it
> in that case) but we can live without it if not all of the early CPUs
> have it. Unlikely hwcaps, though, we shouldn't be advertising this to
> userspace and we can't derive the capability solely from the sanitised
> system registers.
> 
> I wonder if we could treat it like an erratum in some way instead? That
> is, invert things so that CPUs which _don't_ have BBML2_NOABORT are
> considered to have a "BBM_CONFLICT_ABORT" erratum (which we obviously
> wouldn't shout about). Then we should be able to say:
> 
>   - If any of the early CPUs don't have BBML2_NOABORT, then the erratum
>     would be enabled and we wouln't elide BBM.
> 
>   - If a late CPU doesn't have BBML2_NOABORT then it can't come online
>     if the erratum isn't already enabled.
> 
> Does that work? If not, then perhaps the cpufeature/cpuerrata code needs
> some surgery for this.

Ah, I should have read this thread in order. I think we can treat this
as BBML2_NOABORT available as default based on ID regs and use the
allow/deny-list as an erratum.

-- 
Catalin


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-09 16:04         ` Catalin Marinas
@ 2025-05-12 13:07           ` Ryan Roberts
  2025-05-12 13:24             ` Suzuki K Poulose
  0 siblings, 1 reply; 28+ messages in thread
From: Ryan Roberts @ 2025-05-12 13:07 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Mikołaj Lenczewski, suzuki.poulose, yang, corbet,
	jean-philippe, robin.murphy, joro, akpm, paulmck, mark.rutland,
	joey.gouly, maz, james.morse, broonie, oliver.upton, baohua,
	david, ioworker0, jgg, nicolinc, mshavit, jsnitsel, smostafa,
	kevin.tian, linux-doc, linux-kernel, linux-arm-kernel, iommu

On 09/05/2025 17:04, Catalin Marinas wrote:
> On Fri, May 09, 2025 at 02:49:05PM +0100, Will Deacon wrote:
>> On Tue, May 06, 2025 at 03:52:59PM +0100, Ryan Roberts wrote:
>>> On 06/05/2025 15:25, Will Deacon wrote:
>>>> This penalises large homogeneous systems and it feels unnecessary given
>>>> that we have the ability to check this per-CPU. Can you use
>>>> ARM64_CPUCAP_BOOT_CPU_FEATURE instead of ARM64_CPUCAP_SYSTEM_FEATURE
>>>> to solve this?
>>>
>>> We are trying to solve for the case where the boot CPU has BBML2 but a secondary
>>> CPU doesn't. (e.g. hetrogeneous system where boot CPU is big and secondary is
>>> little and does not advertise the feature. I can't remember if we proved there
>>> are real systems with this config - I have vague recollection that we did but my
>>> memory is poor...).
>>>
>>> My understanding is that for ARM64_CPUCAP_BOOT_CPU_FEATURE, "If the boot CPU
>>> has enabled this feature already, then every late CPU must have it". So that
>>> would exclude any secondary CPUs without BBML2 from coming online?
>>
>> Damn, yes, you're right. However, it still feels horribly hacky to iterate
>> over the online CPUs in has_bbml2_noabort() -- the cpufeature framework
>> has the ability to query features locally and we should be able to use
>> that. We're going to want that should the architecture eventually decide
>> on something like BBML3 for this.
>>
>> What we have with BBML2_NOABORT seems similar to an hwcap in that we only
>> support the capability if all CPUs have it (rejecting late CPUs without it
>> in that case) but we can live without it if not all of the early CPUs
>> have it. Unlikely hwcaps, though, we shouldn't be advertising this to
>> userspace and we can't derive the capability solely from the sanitised
>> system registers.
>>
>> I wonder if we could treat it like an erratum in some way instead? That
>> is, invert things so that CPUs which _don't_ have BBML2_NOABORT are
>> considered to have a "BBM_CONFLICT_ABORT" erratum (which we obviously
>> wouldn't shout about). Then we should be able to say:
>>
>>   - If any of the early CPUs don't have BBML2_NOABORT, then the erratum
>>     would be enabled and we wouln't elide BBM.
>>
>>   - If a late CPU doesn't have BBML2_NOABORT then it can't come online
>>     if the erratum isn't already enabled.
>>
>> Does that work? If not, then perhaps the cpufeature/cpuerrata code needs
>> some surgery for this.
> 
> Ah, I should have read this thread in order. I think we can treat this
> as BBML2_NOABORT available as default based on ID regs and use the
> allow/deny-list as an erratum.
> 

Just to make sure I've understood all this, I think what you are both saying is
we can create a single capability called ARM64_HAS_NO_BBML2_NOABORT of type
ARM64_CPUCAP_LOCAL_CPU_ERRATUM. Each CPU will then check it has BBML2 and is in
the MIDR allow list; If any of those conditions are not met, the CPU is
considered to have ARM64_HAS_NO_BBML2_NOABORT.

Then we have this helper:

static inline bool system_supports_bbml2_noabort(void)
{
	return system_capabilities_finalized() &&
	       !alternative_has_cap_unlikely(ARM64_HAS_NO_BBML2_NOABORT);
}

system_capabilities_finalized() is there to ensure an early call to this helper
returns false (i.e. the safe value before we have evaluated on all CPUs).
Because ARM64_HAS_NO_BBML2_NOABORT is inverted it would otherwise return true
prior to finalization.

I don't believe we need any second (SYSTEM or BOOT) feature. This is sufficient
on its own?

Thanks,
Ryan



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-12 13:07           ` Ryan Roberts
@ 2025-05-12 13:24             ` Suzuki K Poulose
  2025-05-12 13:35               ` Ryan Roberts
  0 siblings, 1 reply; 28+ messages in thread
From: Suzuki K Poulose @ 2025-05-12 13:24 UTC (permalink / raw)
  To: Ryan Roberts, Catalin Marinas, Will Deacon
  Cc: Mikołaj Lenczewski, yang, corbet, jean-philippe,
	robin.murphy, joro, akpm, paulmck, mark.rutland, joey.gouly, maz,
	james.morse, broonie, oliver.upton, baohua, david, ioworker0, jgg,
	nicolinc, mshavit, jsnitsel, smostafa, kevin.tian, linux-doc,
	linux-kernel, linux-arm-kernel, iommu

On 12/05/2025 14:07, Ryan Roberts wrote:
> On 09/05/2025 17:04, Catalin Marinas wrote:
>> On Fri, May 09, 2025 at 02:49:05PM +0100, Will Deacon wrote:
>>> On Tue, May 06, 2025 at 03:52:59PM +0100, Ryan Roberts wrote:
>>>> On 06/05/2025 15:25, Will Deacon wrote:
>>>>> This penalises large homogeneous systems and it feels unnecessary given
>>>>> that we have the ability to check this per-CPU. Can you use
>>>>> ARM64_CPUCAP_BOOT_CPU_FEATURE instead of ARM64_CPUCAP_SYSTEM_FEATURE
>>>>> to solve this?
>>>>
>>>> We are trying to solve for the case where the boot CPU has BBML2 but a secondary
>>>> CPU doesn't. (e.g. hetrogeneous system where boot CPU is big and secondary is
>>>> little and does not advertise the feature. I can't remember if we proved there
>>>> are real systems with this config - I have vague recollection that we did but my
>>>> memory is poor...).
>>>>
>>>> My understanding is that for ARM64_CPUCAP_BOOT_CPU_FEATURE, "If the boot CPU
>>>> has enabled this feature already, then every late CPU must have it". So that
>>>> would exclude any secondary CPUs without BBML2 from coming online?
>>>
>>> Damn, yes, you're right. However, it still feels horribly hacky to iterate
>>> over the online CPUs in has_bbml2_noabort() -- the cpufeature framework
>>> has the ability to query features locally and we should be able to use
>>> that. We're going to want that should the architecture eventually decide
>>> on something like BBML3 for this.
>>>
>>> What we have with BBML2_NOABORT seems similar to an hwcap in that we only
>>> support the capability if all CPUs have it (rejecting late CPUs without it
>>> in that case) but we can live without it if not all of the early CPUs
>>> have it. Unlikely hwcaps, though, we shouldn't be advertising this to
>>> userspace and we can't derive the capability solely from the sanitised
>>> system registers.
>>>
>>> I wonder if we could treat it like an erratum in some way instead? That
>>> is, invert things so that CPUs which _don't_ have BBML2_NOABORT are
>>> considered to have a "BBM_CONFLICT_ABORT" erratum (which we obviously
>>> wouldn't shout about). Then we should be able to say:
>>>
>>>    - If any of the early CPUs don't have BBML2_NOABORT, then the erratum
>>>      would be enabled and we wouln't elide BBM.
>>>
>>>    - If a late CPU doesn't have BBML2_NOABORT then it can't come online
>>>      if the erratum isn't already enabled.
>>>
>>> Does that work? If not, then perhaps the cpufeature/cpuerrata code needs
>>> some surgery for this.
>>
>> Ah, I should have read this thread in order. I think we can treat this
>> as BBML2_NOABORT available as default based on ID regs and use the
>> allow/deny-list as an erratum.
>>
> 
> Just to make sure I've understood all this, I think what you are both saying is
> we can create a single capability called ARM64_HAS_NO_BBML2_NOABORT of type
> ARM64_CPUCAP_LOCAL_CPU_ERRATUM. Each CPU will then check it has BBML2 and is in
> the MIDR allow list; If any of those conditions are not met, the CPU is
> considered to have ARM64_HAS_NO_BBML2_NOABORT.

I guess we need two caps.

1. SYSTEM cap -> ARM64_HAS_BBML2. Based on the ID registers
2. An erratum -> ARM64_BBML2_ABORTS. Based on BBLM2==1 && !in_midr_list()


And then:


> 
> Then we have this helper:
> 
> static inline bool system_supports_bbml2_noabort(void)
> {
> 	return system_capabilities_finalized() &&
		alternative_has_cap_unlikely(ARM64_HAS_BBML2) &&
		!alternative_has_cap_unlikely(!ARM64_HAS_BBML2_ABORTS)

Without (1), we may enable BBML2 on a (system with) CPU that doesn't
have BBML2 feature.

And (1) can prevent any non-BBML2 capable CPUs from booting or (2) can 
prevent anything that aborts with BBML2.


Suzuki


> 	       !alternative_has_cap_unlikely(ARM64_HAS_NO_BBML2_NOABORT);


> }
> 
> system_capabilities_finalized() is there to ensure an early call to this helper
> returns false (i.e. the safe value before we have evaluated on all CPUs).
> Because ARM64_HAS_NO_BBML2_NOABORT is inverted it would otherwise return true
> prior to finalization.
> 
> I don't believe we need any second (SYSTEM or BOOT) feature. This is sufficient
> on its own?
> 
> Thanks,
> Ryan
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-12 13:24             ` Suzuki K Poulose
@ 2025-05-12 13:35               ` Ryan Roberts
  2025-05-12 16:33                 ` Catalin Marinas
  2025-05-12 17:17                 ` Suzuki K Poulose
  0 siblings, 2 replies; 28+ messages in thread
From: Ryan Roberts @ 2025-05-12 13:35 UTC (permalink / raw)
  To: Suzuki K Poulose, Catalin Marinas, Will Deacon
  Cc: Mikołaj Lenczewski, yang, corbet, jean-philippe,
	robin.murphy, joro, akpm, paulmck, mark.rutland, joey.gouly, maz,
	james.morse, broonie, oliver.upton, baohua, david, ioworker0, jgg,
	nicolinc, mshavit, jsnitsel, smostafa, kevin.tian, linux-doc,
	linux-kernel, linux-arm-kernel, iommu

On 12/05/2025 14:24, Suzuki K Poulose wrote:
> On 12/05/2025 14:07, Ryan Roberts wrote:
>> On 09/05/2025 17:04, Catalin Marinas wrote:
>>> On Fri, May 09, 2025 at 02:49:05PM +0100, Will Deacon wrote:
>>>> On Tue, May 06, 2025 at 03:52:59PM +0100, Ryan Roberts wrote:
>>>>> On 06/05/2025 15:25, Will Deacon wrote:
>>>>>> This penalises large homogeneous systems and it feels unnecessary given
>>>>>> that we have the ability to check this per-CPU. Can you use
>>>>>> ARM64_CPUCAP_BOOT_CPU_FEATURE instead of ARM64_CPUCAP_SYSTEM_FEATURE
>>>>>> to solve this?
>>>>>
>>>>> We are trying to solve for the case where the boot CPU has BBML2 but a
>>>>> secondary
>>>>> CPU doesn't. (e.g. hetrogeneous system where boot CPU is big and secondary is
>>>>> little and does not advertise the feature. I can't remember if we proved there
>>>>> are real systems with this config - I have vague recollection that we did
>>>>> but my
>>>>> memory is poor...).
>>>>>
>>>>> My understanding is that for ARM64_CPUCAP_BOOT_CPU_FEATURE, "If the boot CPU
>>>>> has enabled this feature already, then every late CPU must have it". So that
>>>>> would exclude any secondary CPUs without BBML2 from coming online?
>>>>
>>>> Damn, yes, you're right. However, it still feels horribly hacky to iterate
>>>> over the online CPUs in has_bbml2_noabort() -- the cpufeature framework
>>>> has the ability to query features locally and we should be able to use
>>>> that. We're going to want that should the architecture eventually decide
>>>> on something like BBML3 for this.
>>>>
>>>> What we have with BBML2_NOABORT seems similar to an hwcap in that we only
>>>> support the capability if all CPUs have it (rejecting late CPUs without it
>>>> in that case) but we can live without it if not all of the early CPUs
>>>> have it. Unlikely hwcaps, though, we shouldn't be advertising this to
>>>> userspace and we can't derive the capability solely from the sanitised
>>>> system registers.
>>>>
>>>> I wonder if we could treat it like an erratum in some way instead? That
>>>> is, invert things so that CPUs which _don't_ have BBML2_NOABORT are
>>>> considered to have a "BBM_CONFLICT_ABORT" erratum (which we obviously
>>>> wouldn't shout about). Then we should be able to say:
>>>>
>>>>    - If any of the early CPUs don't have BBML2_NOABORT, then the erratum
>>>>      would be enabled and we wouln't elide BBM.
>>>>
>>>>    - If a late CPU doesn't have BBML2_NOABORT then it can't come online
>>>>      if the erratum isn't already enabled.
>>>>
>>>> Does that work? If not, then perhaps the cpufeature/cpuerrata code needs
>>>> some surgery for this.
>>>
>>> Ah, I should have read this thread in order. I think we can treat this
>>> as BBML2_NOABORT available as default based on ID regs and use the
>>> allow/deny-list as an erratum.
>>>
>>
>> Just to make sure I've understood all this, I think what you are both saying is
>> we can create a single capability called ARM64_HAS_NO_BBML2_NOABORT of type
>> ARM64_CPUCAP_LOCAL_CPU_ERRATUM. Each CPU will then check it has BBML2 and is in
>> the MIDR allow list; If any of those conditions are not met, the CPU is
>> considered to have ARM64_HAS_NO_BBML2_NOABORT.
> 
> I guess we need two caps.
> 
> 1. SYSTEM cap -> ARM64_HAS_BBML2. Based on the ID registers
> 2. An erratum -> ARM64_BBML2_ABORTS. Based on BBLM2==1 && !in_midr_list()

I don't think we *need* two caps; I was suggesting to consider both of these
conditions for the single cap. You are suggesting to separate them. But I think
both approaches give the same result?

I'm easy either way, but keen to understand why 2 caps are preferred?

Perhaps for my version it would be better to refer to it as
ARM64_CPUCAP_BOOT_RESTRICTED_CPU_LOCAL_FEATURE instead of
ARM64_CPUCAP_LOCAL_CPU_ERRATUM (they both have the exact same semantics under
the hood AFAICT).

Thanks,
Ryan

> 
> 
> And then:
> 
> 
>>
>> Then we have this helper:
>>
>> static inline bool system_supports_bbml2_noabort(void)
>> {
>>     return system_capabilities_finalized() &&
>         alternative_has_cap_unlikely(ARM64_HAS_BBML2) &&
>         !alternative_has_cap_unlikely(!ARM64_HAS_BBML2_ABORTS)
> 
> Without (1), we may enable BBML2 on a (system with) CPU that doesn't
> have BBML2 feature.
> 
> And (1) can prevent any non-BBML2 capable CPUs from booting or (2) can prevent
> anything that aborts with BBML2.
> 
> 
> Suzuki
> 
> 
>>            !alternative_has_cap_unlikely(ARM64_HAS_NO_BBML2_NOABORT);
> 
> 
>> }
>>
>> system_capabilities_finalized() is there to ensure an early call to this helper
>> returns false (i.e. the safe value before we have evaluated on all CPUs).
>> Because ARM64_HAS_NO_BBML2_NOABORT is inverted it would otherwise return true
>> prior to finalization.
>>
>> I don't believe we need any second (SYSTEM or BOOT) feature. This is sufficient
>> on its own?
>>
>> Thanks,
>> Ryan
>>
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-12 13:35               ` Ryan Roberts
@ 2025-05-12 16:33                 ` Catalin Marinas
  2025-05-13  9:15                   ` Suzuki K Poulose
  2025-05-12 17:17                 ` Suzuki K Poulose
  1 sibling, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2025-05-12 16:33 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Suzuki K Poulose, Will Deacon, Mikołaj Lenczewski, yang,
	corbet, jean-philippe, robin.murphy, joro, akpm, paulmck,
	mark.rutland, joey.gouly, maz, james.morse, broonie, oliver.upton,
	baohua, david, ioworker0, jgg, nicolinc, mshavit, jsnitsel,
	smostafa, kevin.tian, linux-doc, linux-kernel, linux-arm-kernel,
	iommu

On Mon, May 12, 2025 at 02:35:01PM +0100, Ryan Roberts wrote:
> On 12/05/2025 14:24, Suzuki K Poulose wrote:
> > On 12/05/2025 14:07, Ryan Roberts wrote:
> >> On 09/05/2025 17:04, Catalin Marinas wrote:
> >>> On Fri, May 09, 2025 at 02:49:05PM +0100, Will Deacon wrote:
> >>>> I wonder if we could treat it like an erratum in some way instead? That
> >>>> is, invert things so that CPUs which _don't_ have BBML2_NOABORT are
> >>>> considered to have a "BBM_CONFLICT_ABORT" erratum (which we obviously
> >>>> wouldn't shout about). Then we should be able to say:
> >>>>
> >>>>    - If any of the early CPUs don't have BBML2_NOABORT, then the erratum
> >>>>      would be enabled and we wouln't elide BBM.
> >>>>
> >>>>    - If a late CPU doesn't have BBML2_NOABORT then it can't come online
> >>>>      if the erratum isn't already enabled.
> >>>>
> >>>> Does that work? If not, then perhaps the cpufeature/cpuerrata code needs
> >>>> some surgery for this.
> >>>
> >>> Ah, I should have read this thread in order. I think we can treat this
> >>> as BBML2_NOABORT available as default based on ID regs and use the
> >>> allow/deny-list as an erratum.
> >>
> >> Just to make sure I've understood all this, I think what you are both saying is
> >> we can create a single capability called ARM64_HAS_NO_BBML2_NOABORT of type
> >> ARM64_CPUCAP_LOCAL_CPU_ERRATUM. Each CPU will then check it has BBML2 and is in
> >> the MIDR allow list; If any of those conditions are not met, the CPU is
> >> considered to have ARM64_HAS_NO_BBML2_NOABORT.
> > 
> > I guess we need two caps.
> > 
> > 1. SYSTEM cap -> ARM64_HAS_BBML2. Based on the ID registers
> > 2. An erratum -> ARM64_BBML2_ABORTS. Based on BBLM2==1 && !in_midr_list()
> 
> I don't think we *need* two caps; I was suggesting to consider both of these
> conditions for the single cap. You are suggesting to separate them. But I think
> both approaches give the same result?
> 
> I'm easy either way, but keen to understand why 2 caps are preferred?

I guess it's easier to reason about than a single, negated property but
the result should be identical. With two properties we can easily
implement the idreg override like nobbml2 since this works on the
sanitised ID regs. But we could also implement this differently, no need
to rely on the ID regs.

Stepping back a bit, we know that the MIDR allow-list implies
BBML2_NOABORT (and at least BBML2 as in the ID regs). In theory, we need
something like a SYSTEM_FEATURE which is the conjunction of all the
early CPUs. However, such system-level cap is only checked after all the
early CPUs booted _and_ only on the sanitised ID regs rather than MIDR.

We need a LOCAL_CPU feature behaviour to be called on each CPU but still
have the conjunction of early CPUs, more like the system one. It should
be permitted for late CPUs to have but not optional if already enabled.

So how about we introduce a WEAK_BOOT_CPU_FEATURE which gets enabled by
the boot CPU if it has it _but_ cleared by any secondary early CPU if it
doesn't (and never enabled by secondary CPUs). When the features are
finalised, we know if all early CPUs had it. In combination with
PERMITTED_FOR_LATE_CPU, we'd reject late CPUs that don't have it.

I think if we can get the above, it would be the cleaner option than
trying to bend our minds around double negations like !NO_BBLM2_NOABORT.

-- 
Catalin


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-12 13:35               ` Ryan Roberts
  2025-05-12 16:33                 ` Catalin Marinas
@ 2025-05-12 17:17                 ` Suzuki K Poulose
  1 sibling, 0 replies; 28+ messages in thread
From: Suzuki K Poulose @ 2025-05-12 17:17 UTC (permalink / raw)
  To: Ryan Roberts, Catalin Marinas, Will Deacon
  Cc: Mikołaj Lenczewski, yang, corbet, jean-philippe,
	robin.murphy, joro, akpm, paulmck, mark.rutland, joey.gouly, maz,
	james.morse, broonie, oliver.upton, baohua, david, ioworker0, jgg,
	nicolinc, mshavit, jsnitsel, smostafa, kevin.tian, linux-doc,
	linux-kernel, linux-arm-kernel, iommu

On 12/05/2025 14:35, Ryan Roberts wrote:
> On 12/05/2025 14:24, Suzuki K Poulose wrote:
>> On 12/05/2025 14:07, Ryan Roberts wrote:
>>> On 09/05/2025 17:04, Catalin Marinas wrote:
>>>> On Fri, May 09, 2025 at 02:49:05PM +0100, Will Deacon wrote:
>>>>> On Tue, May 06, 2025 at 03:52:59PM +0100, Ryan Roberts wrote:
>>>>>> On 06/05/2025 15:25, Will Deacon wrote:
>>>>>>> This penalises large homogeneous systems and it feels unnecessary given
>>>>>>> that we have the ability to check this per-CPU. Can you use
>>>>>>> ARM64_CPUCAP_BOOT_CPU_FEATURE instead of ARM64_CPUCAP_SYSTEM_FEATURE
>>>>>>> to solve this?
>>>>>>
>>>>>> We are trying to solve for the case where the boot CPU has BBML2 but a
>>>>>> secondary
>>>>>> CPU doesn't. (e.g. hetrogeneous system where boot CPU is big and secondary is
>>>>>> little and does not advertise the feature. I can't remember if we proved there
>>>>>> are real systems with this config - I have vague recollection that we did
>>>>>> but my
>>>>>> memory is poor...).
>>>>>>
>>>>>> My understanding is that for ARM64_CPUCAP_BOOT_CPU_FEATURE, "If the boot CPU
>>>>>> has enabled this feature already, then every late CPU must have it". So that
>>>>>> would exclude any secondary CPUs without BBML2 from coming online?
>>>>>
>>>>> Damn, yes, you're right. However, it still feels horribly hacky to iterate
>>>>> over the online CPUs in has_bbml2_noabort() -- the cpufeature framework
>>>>> has the ability to query features locally and we should be able to use
>>>>> that. We're going to want that should the architecture eventually decide
>>>>> on something like BBML3 for this.
>>>>>
>>>>> What we have with BBML2_NOABORT seems similar to an hwcap in that we only
>>>>> support the capability if all CPUs have it (rejecting late CPUs without it
>>>>> in that case) but we can live without it if not all of the early CPUs
>>>>> have it. Unlikely hwcaps, though, we shouldn't be advertising this to
>>>>> userspace and we can't derive the capability solely from the sanitised
>>>>> system registers.
>>>>>
>>>>> I wonder if we could treat it like an erratum in some way instead? That
>>>>> is, invert things so that CPUs which _don't_ have BBML2_NOABORT are
>>>>> considered to have a "BBM_CONFLICT_ABORT" erratum (which we obviously
>>>>> wouldn't shout about). Then we should be able to say:
>>>>>
>>>>>     - If any of the early CPUs don't have BBML2_NOABORT, then the erratum
>>>>>       would be enabled and we wouln't elide BBM.
>>>>>
>>>>>     - If a late CPU doesn't have BBML2_NOABORT then it can't come online
>>>>>       if the erratum isn't already enabled.
>>>>>
>>>>> Does that work? If not, then perhaps the cpufeature/cpuerrata code needs
>>>>> some surgery for this.
>>>>
>>>> Ah, I should have read this thread in order. I think we can treat this
>>>> as BBML2_NOABORT available as default based on ID regs and use the
>>>> allow/deny-list as an erratum.
>>>>
>>>
>>> Just to make sure I've understood all this, I think what you are both saying is
>>> we can create a single capability called ARM64_HAS_NO_BBML2_NOABORT of type
>>> ARM64_CPUCAP_LOCAL_CPU_ERRATUM. Each CPU will then check it has BBML2 and is in
>>> the MIDR allow list; If any of those conditions are not met, the CPU is
>>> considered to have ARM64_HAS_NO_BBML2_NOABORT.
>>
>> I guess we need two caps.
>>
>> 1. SYSTEM cap -> ARM64_HAS_BBML2. Based on the ID registers
>> 2. An erratum -> ARM64_BBML2_ABORTS. Based on BBLM2==1 && !in_midr_list()
> 
> I don't think we *need* two caps; I was suggesting to consider both of these
> conditions for the single cap. You are suggesting to separate them. But I think
> both approaches give the same result?

Ah, my bad. I think a single cap should work as long as it makes sure
we "no BBML2" doesn't est the erratum.

Just to confirm, you are proposing:

ARM64_HAS_NO_BBML2_NOABORT (or in other words, ARM64_NO_SAFE_BBML2)
as CPU_LOCAL_ERRATUM (remove the "description" field from the cap, so
that we don't report it, when we detect it).

matches() =>  !in_midr_list() ||  !ID_AA64MMFR*.BBML2

I think that should work with the inverted check.
I am wondering how we can plug in the AmpereOne missing the ID register
case. May be special case it in the matches() above for the same cap.

I think that should work.

Suzuki


> 
> I'm easy either way, but keen to understand why 2 caps are preferred?
> 
> Perhaps for my version it would be better to refer to it as
> ARM64_CPUCAP_BOOT_RESTRICTED_CPU_LOCAL_FEATURE instead of
> ARM64_CPUCAP_LOCAL_CPU_ERRATUM (they both have the exact same semantics under
> the hood AFAICT).
> 
> Thanks,
> Ryan
> 
>>
>>
>> And then:
>>
>>
>>>
>>> Then we have this helper:
>>>
>>> static inline bool system_supports_bbml2_noabort(void)
>>> {
>>>      return system_capabilities_finalized() &&
>>          alternative_has_cap_unlikely(ARM64_HAS_BBML2) &&
>>          !alternative_has_cap_unlikely(!ARM64_HAS_BBML2_ABORTS)
>>
>> Without (1), we may enable BBML2 on a (system with) CPU that doesn't
>> have BBML2 feature.
>>
>> And (1) can prevent any non-BBML2 capable CPUs from booting or (2) can prevent
>> anything that aborts with BBML2.
>>
>>
>> Suzuki
>>
>>
>>>             !alternative_has_cap_unlikely(ARM64_HAS_NO_BBML2_NOABORT);
>>
>>
>>> }
>>>
>>> system_capabilities_finalized() is there to ensure an early call to this helper
>>> returns false (i.e. the safe value before we have evaluated on all CPUs).
>>> Because ARM64_HAS_NO_BBML2_NOABORT is inverted it would otherwise return true
>>> prior to finalization.
>>>
>>> I don't believe we need any second (SYSTEM or BOOT) feature. This is sufficient
>>> on its own?
>>>
>>> Thanks,
>>> Ryan
>>>
>>
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-12 16:33                 ` Catalin Marinas
@ 2025-05-13  9:15                   ` Suzuki K Poulose
  2025-05-14 12:05                     ` Catalin Marinas
  0 siblings, 1 reply; 28+ messages in thread
From: Suzuki K Poulose @ 2025-05-13  9:15 UTC (permalink / raw)
  To: Catalin Marinas, Ryan Roberts
  Cc: Will Deacon, Mikołaj Lenczewski, yang, corbet, jean-philippe,
	robin.murphy, joro, akpm, paulmck, mark.rutland, joey.gouly, maz,
	james.morse, broonie, oliver.upton, baohua, david, ioworker0, jgg,
	nicolinc, mshavit, jsnitsel, smostafa, kevin.tian, linux-doc,
	linux-kernel, linux-arm-kernel, iommu

On 12/05/2025 17:33, Catalin Marinas wrote:
> On Mon, May 12, 2025 at 02:35:01PM +0100, Ryan Roberts wrote:
>> On 12/05/2025 14:24, Suzuki K Poulose wrote:
>>> On 12/05/2025 14:07, Ryan Roberts wrote:
>>>> On 09/05/2025 17:04, Catalin Marinas wrote:
>>>>> On Fri, May 09, 2025 at 02:49:05PM +0100, Will Deacon wrote:
>>>>>> I wonder if we could treat it like an erratum in some way instead? That
>>>>>> is, invert things so that CPUs which _don't_ have BBML2_NOABORT are
>>>>>> considered to have a "BBM_CONFLICT_ABORT" erratum (which we obviously
>>>>>> wouldn't shout about). Then we should be able to say:
>>>>>>
>>>>>>     - If any of the early CPUs don't have BBML2_NOABORT, then the erratum
>>>>>>       would be enabled and we wouln't elide BBM.
>>>>>>
>>>>>>     - If a late CPU doesn't have BBML2_NOABORT then it can't come online
>>>>>>       if the erratum isn't already enabled.
>>>>>>
>>>>>> Does that work? If not, then perhaps the cpufeature/cpuerrata code needs
>>>>>> some surgery for this.
>>>>>
>>>>> Ah, I should have read this thread in order. I think we can treat this
>>>>> as BBML2_NOABORT available as default based on ID regs and use the
>>>>> allow/deny-list as an erratum.
>>>>
>>>> Just to make sure I've understood all this, I think what you are both saying is
>>>> we can create a single capability called ARM64_HAS_NO_BBML2_NOABORT of type
>>>> ARM64_CPUCAP_LOCAL_CPU_ERRATUM. Each CPU will then check it has BBML2 and is in
>>>> the MIDR allow list; If any of those conditions are not met, the CPU is
>>>> considered to have ARM64_HAS_NO_BBML2_NOABORT.
>>>
>>> I guess we need two caps.
>>>
>>> 1. SYSTEM cap -> ARM64_HAS_BBML2. Based on the ID registers
>>> 2. An erratum -> ARM64_BBML2_ABORTS. Based on BBLM2==1 && !in_midr_list()
>>
>> I don't think we *need* two caps; I was suggesting to consider both of these
>> conditions for the single cap. You are suggesting to separate them. But I think
>> both approaches give the same result?
>>
>> I'm easy either way, but keen to understand why 2 caps are preferred?
> 
> I guess it's easier to reason about than a single, negated property but
> the result should be identical. With two properties we can easily
> implement the idreg override like nobbml2 since this works on the
> sanitised ID regs. But we could also implement this differently, no need
> to rely on the ID regs.
> 
> Stepping back a bit, we know that the MIDR allow-list implies
> BBML2_NOABORT (and at least BBML2 as in the ID regs). In theory, we need

Please be aware that BBML2_NOABORT midr list may not always imply BBLM2 
in ID registers (e.g., AmpereOne. But the plan is to fixup the per cpu
ID register - struct cpuinfo_arm64 - for such cores at early boot,
individually, before it is used for sanitisation of the system wide
copy).


> something like a SYSTEM_FEATURE which is the conjunction of all the
> early CPUs. However, such system-level cap is only checked after all the
> early CPUs booted _and_ only on the sanitised ID regs rather than MIDR.
> 
> We need a LOCAL_CPU feature behaviour to be called on each CPU but still
> have the conjunction of early CPUs, more like the system one. It should
> be permitted for late CPUs to have but not optional if already enabled.
> 
> So how about we introduce a WEAK_BOOT_CPU_FEATURE which gets enabled by
> the boot CPU if it has it _but_ cleared by any secondary early CPU if it
> doesn't (and never enabled by secondary CPUs). When the features are
> finalised, we know if all early CPUs had it. In combination with
> PERMITTED_FOR_LATE_CPU, we'd reject late CPUs that don't have it.

That could work, but it introduces this "clearing" a capability, which
we don't do at the moment.

We had an offline discussion about this some time ago, with Mark
Rutland. The best way to deal with this is to change the way we compute
capabilities. i.e.,


1. Each boot CPU run through all the capabilities and maintain a per-cpu
    copy of the state.
2. System wide capabilities can then be constructed from the all early
    boot CPU capability state (e.g., ANDing all the state from all CPUs
    for SCOPE_SYSTEM or ORing for LOCAL_CPU).

But this requires a drastic change to the infrastructure.

> 
> I think if we can get the above, it would be the cleaner option than
> trying to bend our minds around double negations like !NO_BBLM2_NOABORT.

Agree, every time I come back to the thread, I have to write down the
check and stare at it for a minute to agree with what it does. That said
it may be ideal solution for the short term. Or stick to what we do in
the patch currently, until we implement per-cpu capability proposal.

Cheers
Suzuki

> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-13  9:15                   ` Suzuki K Poulose
@ 2025-05-14 12:05                     ` Catalin Marinas
  2025-05-19  9:45                       ` Suzuki K Poulose
  0 siblings, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2025-05-14 12:05 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Ryan Roberts, Will Deacon, Mikołaj Lenczewski, yang, corbet,
	jean-philippe, robin.murphy, joro, akpm, paulmck, mark.rutland,
	joey.gouly, maz, james.morse, broonie, oliver.upton, baohua,
	david, ioworker0, jgg, nicolinc, mshavit, jsnitsel, smostafa,
	kevin.tian, linux-doc, linux-kernel, linux-arm-kernel, iommu

On Tue, May 13, 2025 at 10:15:49AM +0100, Suzuki K Poulose wrote:
> On 12/05/2025 17:33, Catalin Marinas wrote:
> > Stepping back a bit, we know that the MIDR allow-list implies
> > BBML2_NOABORT (and at least BBML2 as in the ID regs). In theory, we need
> 
> Please be aware that BBML2_NOABORT midr list may not always imply BBLM2 in
> ID registers (e.g., AmpereOne. But the plan is to fixup the per cpu
> ID register - struct cpuinfo_arm64 - for such cores at early boot,
> individually, before it is used for sanitisation of the system wide
> copy).

Ah, good point. We can then ignore BBML2 ID regs and only rely on MIDR
(and some future BBML3).

> > So how about we introduce a WEAK_BOOT_CPU_FEATURE which gets enabled by
> > the boot CPU if it has it _but_ cleared by any secondary early CPU if it
> > doesn't (and never enabled by secondary CPUs). When the features are
> > finalised, we know if all early CPUs had it. In combination with
> > PERMITTED_FOR_LATE_CPU, we'd reject late CPUs that don't have it.
> 
> That could work, but it introduces this "clearing" a capability, which
> we don't do at the moment.
> 
> We had an offline discussion about this some time ago, with Mark
> Rutland. The best way to deal with this is to change the way we compute
> capabilities. i.e.,
> 
> 
> 1. Each boot CPU run through all the capabilities and maintain a per-cpu
>    copy of the state.
> 2. System wide capabilities can then be constructed from the all early
>    boot CPU capability state (e.g., ANDing all the state from all CPUs
>    for SCOPE_SYSTEM or ORing for LOCAL_CPU).
> 
> But this requires a drastic change to the infrastructure.

I think it's a lot simpler to achieve the ANDing - set the (system)
capability if detected on the boot CPU, only clear it if missing on
subsequent CPUs. See below on an attempt to introduce this. For lack of
inspiration, I called it ARM64_CPUCAP_GLOBAL_CPU_FEATURE which has both
SCOPE_LOCAL and SCOPE_SYSTEM. It's permitted for late CPUs but not
optional if already enabled. The advantage of having both local&system
is that the match function will be called for both scopes. I added a
mask in to cpucap_default_scope() when calling matches() since so far
no cap had both.

---------------------8<-----------------------------------------
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index c4326f1cb917..0b0b26a6f27b 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -331,6 +331,15 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;
 #define ARM64_CPUCAP_BOOT_CPU_FEATURE                  \
 	(ARM64_CPUCAP_SCOPE_BOOT_CPU | ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU)
 
+/*
+ * CPU feature detected at boot time based on all CPUs. It is safe for a late
+ * CPU to have this feature even though the system hasn't enabled it, although
+ * the feature will not be used by Linux in this case. If the system has
+ * enabled this feature already, then every late CPU must have it.
+ */
+#define ARM64_CPUCAP_GLOBAL_CPU_FEATURE			\
+	 (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_SYSTEM_FEATURE)
+
 struct arm64_cpu_capabilities {
 	const char *desc;
 	u16 capability;
@@ -391,6 +400,11 @@ static inline int cpucap_default_scope(const struct arm64_cpu_capabilities *cap)
 	return cap->type & ARM64_CPUCAP_SCOPE_MASK;
 }
 
+static inline bool cpucap_global_scope(const struct arm64_cpu_capabilities *cap)
+{
+	return (cap->type & SCOPE_LOCAL_CPU) && (cap->type & SCOPE_SYSTEM);
+}
+
 /*
  * Generic helper for handling capabilities with multiple (match,enable) pairs
  * of call backs, sharing the same capability bit.
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 5ba149c0c2ac..1a5a51090c0e 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -3359,13 +3381,47 @@ static void update_cpu_capabilities(u16 scope_mask)
 
 	scope_mask &= ARM64_CPUCAP_SCOPE_MASK;
 	for (i = 0; i < ARM64_NCAPS; i++) {
+		bool global_cap = false;
+
 		caps = cpucap_ptrs[i];
-		if (!caps || !(caps->type & scope_mask) ||
-		    cpus_have_cap(caps->capability) ||
-		    !caps->matches(caps, cpucap_default_scope(caps)))
+		if (!caps || !(caps->type & scope_mask))
 			continue;
 
-		if (caps->desc && !caps->cpus)
+		global_cap = cpucap_global_scope(caps);
+
+		/*
+		 * If it's not a global CPU capability, avoid probing if
+		 * already detected.
+		 */
+		if (!global_cap && cpus_have_cap(caps->capability))
+			continue;
+
+		/*
+		 * Pass the actual scope we are probing to the match function.
+		 * This is important for the global CPU capabilities that are
+		 * checked both as a local CPU feature and as a system one.
+		 */
+		if (!caps->matches(caps,
+				   cpucap_default_scope(caps) & scope_mask)) {
+			/* All CPUs must have the global capability */
+			if (global_cap)
+				__clear_bit(caps->capability, system_cpucaps);
+			continue;
+		}
+
+		/*
+		 * A global capability is only set when probing the boot CPU.
+		 * It may be cleared subsequently if not detected on secondary
+		 * ones.
+		 */
+		if (global_cap && !(scope_mask & SCOPE_BOOT_CPU))
+			continue;
+
+		/*
+		 * Global CPU capabilities are logged later when the system
+		 * capabilities are finalised.
+		 */
+		if (!global_cap && caps->desc && !caps->cpus)
 			pr_info("detected: %s\n", caps->desc);
 
 		__set_bit(caps->capability, system_cpucaps);
@@ -3771,17 +3827,24 @@ static void __init setup_system_capabilities(void)
 	enable_cpu_capabilities(SCOPE_ALL & ~SCOPE_BOOT_CPU);
 	apply_alternatives_all();
 
-	/*
-	 * Log any cpucaps with a cpumask as these aren't logged by
-	 * update_cpu_capabilities().
-	 */
 	for (int i = 0; i < ARM64_NCAPS; i++) {
 		const struct arm64_cpu_capabilities *caps = cpucap_ptrs[i];
 
-		if (caps && caps->cpus && caps->desc &&
-			cpumask_any(caps->cpus) < nr_cpu_ids)
+		if (!caps || !caps->desc)
+			continue;
+
+		/*
+		 * Log any cpucaps with a cpumask as these aren't logged by
+		 * update_cpu_capabilities().
+		 */
+		if (caps->cpus && cpumask_any(caps->cpus) < nr_cpu_ids)
 			pr_info("detected: %s on CPU%*pbl\n",
 				caps->desc, cpumask_pr_args(caps->cpus));
+
+		/* Log global CPU capabilities */
+		if (cpucap_global_scope(caps) &&
+		    cpus_have_cap(caps->capability))
+			pr_info("detected: %s\n", caps->desc);
 	}
 
 	/*
---------------------8<-----------------------------------------

And an dummy test:

---------------------8<-----------------------------------------
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 5ba149c0c2ac..1a5a51090c0e 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2480,6 +2480,21 @@ test_has_mpam_hcr(const struct arm64_cpu_capabilities *entry, int scope)
 	return idr & MPAMIDR_EL1_HAS_HCR;
 }
 
+static void
+cpu_enable_dummy_global(const struct arm64_cpu_capabilities *entry)
+{
+	pr_info("%s: %s: smp_processor_id() = %d", __func__, entry->desc, smp_processor_id());
+}
+
+static bool
+has_dummy_global(const struct arm64_cpu_capabilities *entry, int scope)
+{
+	pr_info("%s: %s: scope = %x smp_processor_id() = %d", __func__, entry->desc, scope, smp_processor_id());
+	if (smp_processor_id() < 4)
+		return true;
+	return false;
+}
+
 static const struct arm64_cpu_capabilities arm64_features[] = {
 	{
 		.capability = ARM64_ALWAYS_BOOT,
@@ -3050,6 +3065,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_pmuv3,
 	},
 #endif
+	{
+		.desc = "Dummy test for global CPU feature",
+		.capability = ARM64_HAS_GLOBAL_CPU_TEST,
+		.type = ARM64_CPUCAP_GLOBAL_CPU_FEATURE,
+		.cpu_enable = cpu_enable_dummy_global,
+		.matches = has_dummy_global,
+	},
 	{},
 };
 
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 772c1b008e43..dbc5a3eb5b3d 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -37,6 +37,7 @@ HAS_GENERIC_AUTH_IMP_DEF
 HAS_GIC_CPUIF_SYSREGS
 HAS_GIC_PRIO_MASKING
 HAS_GIC_PRIO_RELAXED_SYNC
+HAS_GLOBAL_CPU_TEST
 HAS_HCR_NV1
 HAS_HCX
 HAS_LDAPR
---------------------8<-----------------------------------------

-- 
Catalin


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-14 12:05                     ` Catalin Marinas
@ 2025-05-19  9:45                       ` Suzuki K Poulose
  2025-05-22 15:23                         ` Catalin Marinas
  0 siblings, 1 reply; 28+ messages in thread
From: Suzuki K Poulose @ 2025-05-19  9:45 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Ryan Roberts, Will Deacon, Mikołaj Lenczewski, yang, corbet,
	jean-philippe, robin.murphy, joro, akpm, paulmck, mark.rutland,
	joey.gouly, maz, james.morse, broonie, oliver.upton, baohua,
	david, ioworker0, jgg, nicolinc, mshavit, jsnitsel, smostafa,
	kevin.tian, linux-doc, linux-kernel, linux-arm-kernel, iommu

Hi Catalin,

On 14/05/2025 13:05, Catalin Marinas wrote:
> On Tue, May 13, 2025 at 10:15:49AM +0100, Suzuki K Poulose wrote:
>> On 12/05/2025 17:33, Catalin Marinas wrote:
>>> Stepping back a bit, we know that the MIDR allow-list implies
>>> BBML2_NOABORT (and at least BBML2 as in the ID regs). In theory, we need
>>
>> Please be aware that BBML2_NOABORT midr list may not always imply BBLM2 in
>> ID registers (e.g., AmpereOne. But the plan is to fixup the per cpu
>> ID register - struct cpuinfo_arm64 - for such cores at early boot,
>> individually, before it is used for sanitisation of the system wide
>> copy).
> 
> Ah, good point. We can then ignore BBML2 ID regs and only rely on MIDR
> (and some future BBML3).
> 
>>> So how about we introduce a WEAK_BOOT_CPU_FEATURE which gets enabled by
>>> the boot CPU if it has it _but_ cleared by any secondary early CPU if it
>>> doesn't (and never enabled by secondary CPUs). When the features are
>>> finalised, we know if all early CPUs had it. In combination with
>>> PERMITTED_FOR_LATE_CPU, we'd reject late CPUs that don't have it.
>>
>> That could work, but it introduces this "clearing" a capability, which
>> we don't do at the moment.
>>
>> We had an offline discussion about this some time ago, with Mark
>> Rutland. The best way to deal with this is to change the way we compute
>> capabilities. i.e.,
>>
>>
>> 1. Each boot CPU run through all the capabilities and maintain a per-cpu
>>     copy of the state.
>> 2. System wide capabilities can then be constructed from the all early
>>     boot CPU capability state (e.g., ANDing all the state from all CPUs
>>     for SCOPE_SYSTEM or ORing for LOCAL_CPU).
>>
>> But this requires a drastic change to the infrastructure.
> 
> I think it's a lot simpler to achieve the ANDing - set the (system)
> capability if detected on the boot CPU, only clear it if missing on
> subsequent CPUs. See below on an attempt to introduce this. For lack of
> inspiration, I called it ARM64_CPUCAP_GLOBAL_CPU_FEATURE which has both
> SCOPE_LOCAL and SCOPE_SYSTEM. It's permitted for late CPUs but not
> optional if already enabled. The advantage of having both local&system
> is that the match function will be called for both scopes. I added a
> mask in to cpucap_default_scope() when calling matches() since so far
> no cap had both.

Thanks, the change below does the trick. I am reasoning with the way
the scope has been defined (hacked ;-)).

SCOPE_LOCAL_CPU && SCOPE_SYSTEM

1. SCOPE_LOCAL_CPU : Because you need to run this on all the (early) CPUs.

2. SCOPE_SYSTEM: To check if the capability holds at the end of the
smp boot.

While, we really "detect" it on SCOPE_BOOT_CPU and only run the
cap checks, if that is available. But put another way, BOOT_CPU
is only used as an easy way to detect if this CPUs is the first
one to run the check vs at least one CPU has run and cleared the
cap.


i.e, as below.

 > +
 > +		/*
 > +		 * A global capability is only set when probing the boot CPU.
 > +		 * It may be cleared subsequently if not detected on secondary
 > +		 * ones.
 > +		 */
 > +		if (global_cap && !(scope_mask & SCOPE_BOOT_CPU))
 > +			continue;
 > +

I wonder if we could use some other flag to indicate the fact that,
a non-boot CPU is allowed to clear the capability explicitly, rather 
than implying it with SCOPE_SYSTEM && SCOPE_LOCAL_CPU. Or may be make
it explicit that the capability must be matched on ALL cpus and
finalized at the end ?


/*
  * When paired with SCOPE_LOCAL_CPU, all CPUs must satisfy the
  * condition. This is different from SCOPE_SYSTEM, where the check
  * is performed only once at the end of SMP boot. But SCOPE_SYSTEM
  * may not be sufficient in cases where the capability depends on
  * properties that are not "sanitised" (e.g., MIDR_EL1) and must be
  * satisfied by all the early SMP boot CPUs.
  */
#define ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS	((u16)BIT(7))

statici inline bool cpucap_match_all_cpus(struct arm64_capability *cap)
{
	return !!(cap->type & ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS);
}

Also, we already go through the capablity list to report the ones
with "cpumask" separately, and we could use that to also report
the ones with MATCH_ALL_CPUs. Something like:


diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9c4d6d552b25..14cbae51d802 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -3769,10 +3769,15 @@ static void __init setup_system_capabilities(void)
         for (int i = 0; i < ARM64_NCAPS; i++) {
                 const struct arm64_cpu_capabilities *caps = cpucap_ptrs[i];

-		if (caps && caps->cpus && caps->desc &&
-			cpumask_any(caps->cpus) < nr_cpu_ids)
+		if (!caps || !caps->desc)
+			continue;
+
+		if (caps->cpus && cpumask_any(caps->cpus) < nr_cpu_ids)
			pr_info("detected: %s on CPU%*pbl\n",
				caps->desc, cpumask_pr_args(caps->cpus));
+
+		/* Report capabilities that had to be matched on all CPUs */
+		if (capcpucap_match_all_cpus(caps) && cpus_have_cap(caps))
+			pr_info("detected: %s\n", caps->desc);
         }

         /*


> 
> ---------------------8<-----------------------------------------
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index c4326f1cb917..0b0b26a6f27b 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -331,6 +331,15 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;
>   #define ARM64_CPUCAP_BOOT_CPU_FEATURE                  \
>   	(ARM64_CPUCAP_SCOPE_BOOT_CPU | ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU)
>   
> +/*
> + * CPU feature detected at boot time based on all CPUs. It is safe for a late
> + * CPU to have this feature even though the system hasn't enabled it, although
> + * the feature will not be used by Linux in this case. If the system has
> + * enabled this feature already, then every late CPU must have it.
> + */
> +#define ARM64_CPUCAP_GLOBAL_CPU_FEATURE			

#define ARM64_CPUCAP_MATCH_ALL_CPU_FEATURE ?

\
> +	 (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_SYSTEM_FEATURE)

   (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS)


> +
>   struct arm64_cpu_capabilities {
>   	const char *desc;
>   	u16 capability;
> @@ -391,6 +400,11 @@ static inline int cpucap_default_scope(const struct arm64_cpu_capabilities *cap)
>   	return cap->type & ARM64_CPUCAP_SCOPE_MASK;
>   }
>   
> +static inline bool cpucap_global_scope(const struct arm64_cpu_capabilities *cap)

May be call it cpucap_match_all_cpus() ?

Thoughts ?

Suzuki


> +{
> +	return (cap->type & SCOPE_LOCAL_CPU) && (cap->type & SCOPE_SYSTEM);
> +}
> +
>   /*
>    * Generic helper for handling capabilities with multiple (match,enable) pairs
>    * of call backs, sharing the same capability bit.
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 5ba149c0c2ac..1a5a51090c0e 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -3359,13 +3381,47 @@ static void update_cpu_capabilities(u16 scope_mask)
>   
>   	scope_mask &= ARM64_CPUCAP_SCOPE_MASK;
>   	for (i = 0; i < ARM64_NCAPS; i++) {
> +		bool global_cap = false;
> +
>   		caps = cpucap_ptrs[i];
> -		if (!caps || !(caps->type & scope_mask) ||
> -		    cpus_have_cap(caps->capability) ||
> -		    !caps->matches(caps, cpucap_default_scope(caps)))
> +		if (!caps || !(caps->type & scope_mask))
>   			continue;
>   
> -		if (caps->desc && !caps->cpus)
> +		global_cap = cpucap_global_scope(caps);
> +
> +		/*
> +		 * If it's not a global CPU capability, avoid probing if
> +		 * already detected.
> +		 */
> +		if (!global_cap && cpus_have_cap(caps->capability))
> +			continue;
> +
> +		/*
> +		 * Pass the actual scope we are probing to the match function.
> +		 * This is important for the global CPU capabilities that are
> +		 * checked both as a local CPU feature and as a system one.
> +		 */
> +		if (!caps->matches(caps,
> +				   cpucap_default_scope(caps) & scope_mask)) {
> +			/* All CPUs must have the global capability */
> +			if (global_cap)
> +				__clear_bit(caps->capability, system_cpucaps);
> +			continue;
> +		}
> +
> +		/*
> +		 * A global capability is only set when probing the boot CPU.
> +		 * It may be cleared subsequently if not detected on secondary
> +		 * ones.
> +		 */
> +		if (global_cap && !(scope_mask & SCOPE_BOOT_CPU))
> +			continue;
> +
> +		/*
> +		 * Global CPU capabilities are logged later when the system
> +		 * capabilities are finalised.
> +		 */
> +		if (!global_cap && caps->desc && !caps->cpus)
>   			pr_info("detected: %s\n", caps->desc);
>   
>   		__set_bit(caps->capability, system_cpucaps);
> @@ -3771,17 +3827,24 @@ static void __init setup_system_capabilities(void)
>   	enable_cpu_capabilities(SCOPE_ALL & ~SCOPE_BOOT_CPU);
>   	apply_alternatives_all();
>   
> -	/*
> -	 * Log any cpucaps with a cpumask as these aren't logged by
> -	 * update_cpu_capabilities().
> -	 */
>   	for (int i = 0; i < ARM64_NCAPS; i++) {
>   		const struct arm64_cpu_capabilities *caps = cpucap_ptrs[i];
>   
> -		if (caps && caps->cpus && caps->desc &&
> -			cpumask_any(caps->cpus) < nr_cpu_ids)
> +		if (!caps || !caps->desc)
> +			continue;
> +
> +		/*
> +		 * Log any cpucaps with a cpumask as these aren't logged by
> +		 * update_cpu_capabilities().
> +		 */
> +		if (caps->cpus && cpumask_any(caps->cpus) < nr_cpu_ids)
>   			pr_info("detected: %s on CPU%*pbl\n",
>   				caps->desc, cpumask_pr_args(caps->cpus));
> +
> +		/* Log global CPU capabilities */
> +		if (cpucap_global_scope(caps) &&
> +		    cpus_have_cap(caps->capability))
> +			pr_info("detected: %s\n", caps->desc);
>   	}
>   
>   	/*
> ---------------------8<-----------------------------------------
> 
> And an dummy test:
> 
> ---------------------8<-----------------------------------------
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 5ba149c0c2ac..1a5a51090c0e 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2480,6 +2480,21 @@ test_has_mpam_hcr(const struct arm64_cpu_capabilities *entry, int scope)
>   	return idr & MPAMIDR_EL1_HAS_HCR;
>   }
>   
> +static void
> +cpu_enable_dummy_global(const struct arm64_cpu_capabilities *entry)
> +{
> +	pr_info("%s: %s: smp_processor_id() = %d", __func__, entry->desc, smp_processor_id());
> +}
> +
> +static bool
> +has_dummy_global(const struct arm64_cpu_capabilities *entry, int scope)
> +{
> +	pr_info("%s: %s: scope = %x smp_processor_id() = %d", __func__, entry->desc, scope, smp_processor_id());
> +	if (smp_processor_id() < 4)
> +		return true;
> +	return false;
> +}
> +
>   static const struct arm64_cpu_capabilities arm64_features[] = {
>   	{
>   		.capability = ARM64_ALWAYS_BOOT,
> @@ -3050,6 +3065,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>   		.matches = has_pmuv3,
>   	},
>   #endif
> +	{
> +		.desc = "Dummy test for global CPU feature",
> +		.capability = ARM64_HAS_GLOBAL_CPU_TEST,
> +		.type = ARM64_CPUCAP_GLOBAL_CPU_FEATURE,
> +		.cpu_enable = cpu_enable_dummy_global,
> +		.matches = has_dummy_global,
> +	},
>   	{},
>   };
>   
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 772c1b008e43..dbc5a3eb5b3d 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -37,6 +37,7 @@ HAS_GENERIC_AUTH_IMP_DEF
>   HAS_GIC_CPUIF_SYSREGS
>   HAS_GIC_PRIO_MASKING
>   HAS_GIC_PRIO_RELAXED_SYNC
> +HAS_GLOBAL_CPU_TEST
>   HAS_HCR_NV1
>   HAS_HCX
>   HAS_LDAPR
> ---------------------8<-----------------------------------------
> 



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-19  9:45                       ` Suzuki K Poulose
@ 2025-05-22 15:23                         ` Catalin Marinas
  2025-05-22 16:29                           ` Suzuki K Poulose
  0 siblings, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2025-05-22 15:23 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Ryan Roberts, Will Deacon, Mikołaj Lenczewski, yang, corbet,
	jean-philippe, robin.murphy, joro, akpm, paulmck, mark.rutland,
	joey.gouly, maz, james.morse, broonie, oliver.upton, baohua,
	david, ioworker0, jgg, nicolinc, mshavit, jsnitsel, smostafa,
	kevin.tian, linux-doc, linux-kernel, linux-arm-kernel, iommu

Hi Suzuki,

Thanks for looking at this.

On Mon, May 19, 2025 at 10:45:31AM +0100, Suzuki K Poulose wrote:
> On 14/05/2025 13:05, Catalin Marinas wrote:
> > On Tue, May 13, 2025 at 10:15:49AM +0100, Suzuki K Poulose wrote:
> > > On 12/05/2025 17:33, Catalin Marinas wrote:
> > > > Stepping back a bit, we know that the MIDR allow-list implies
> > > > BBML2_NOABORT (and at least BBML2 as in the ID regs). In theory, we need
> > > 
> > > Please be aware that BBML2_NOABORT midr list may not always imply BBLM2 in
> > > ID registers (e.g., AmpereOne. But the plan is to fixup the per cpu
> > > ID register - struct cpuinfo_arm64 - for such cores at early boot,
> > > individually, before it is used for sanitisation of the system wide
> > > copy).
> > 
> > Ah, good point. We can then ignore BBML2 ID regs and only rely on MIDR
> > (and some future BBML3).
> > 
> > > > So how about we introduce a WEAK_BOOT_CPU_FEATURE which gets enabled by
> > > > the boot CPU if it has it _but_ cleared by any secondary early CPU if it
> > > > doesn't (and never enabled by secondary CPUs). When the features are
> > > > finalised, we know if all early CPUs had it. In combination with
> > > > PERMITTED_FOR_LATE_CPU, we'd reject late CPUs that don't have it.
> > > 
> > > That could work, but it introduces this "clearing" a capability, which
> > > we don't do at the moment.
> > > 
> > > We had an offline discussion about this some time ago, with Mark
> > > Rutland. The best way to deal with this is to change the way we compute
> > > capabilities. i.e.,
> > > 
> > > 
> > > 1. Each boot CPU run through all the capabilities and maintain a per-cpu
> > >     copy of the state.
> > > 2. System wide capabilities can then be constructed from the all early
> > >     boot CPU capability state (e.g., ANDing all the state from all CPUs
> > >     for SCOPE_SYSTEM or ORing for LOCAL_CPU).
> > > 
> > > But this requires a drastic change to the infrastructure.
> > 
> > I think it's a lot simpler to achieve the ANDing - set the (system)
> > capability if detected on the boot CPU, only clear it if missing on
> > subsequent CPUs. See below on an attempt to introduce this. For lack of
> > inspiration, I called it ARM64_CPUCAP_GLOBAL_CPU_FEATURE which has both
> > SCOPE_LOCAL and SCOPE_SYSTEM. It's permitted for late CPUs but not
> > optional if already enabled. The advantage of having both local&system
> > is that the match function will be called for both scopes. I added a
> > mask in to cpucap_default_scope() when calling matches() since so far
> > no cap had both.
> 
> Thanks, the change below does the trick. I am reasoning with the way
> the scope has been defined (hacked ;-)).
> 
> SCOPE_LOCAL_CPU && SCOPE_SYSTEM
> 
> 1. SCOPE_LOCAL_CPU : Because you need to run this on all the (early) CPUs.
> 
> 2. SCOPE_SYSTEM: To check if the capability holds at the end of the
> smp boot.
> 
> While, we really "detect" it on SCOPE_BOOT_CPU and only run the
> cap checks, if that is available. But put another way, BOOT_CPU
> is only used as an easy way to detect if this CPUs is the first
> one to run the check vs at least one CPU has run and cleared the
> cap.

Yes, we start with boot CPU and keep 'and-ing' new values onto it.

> I wonder if we could use some other flag to indicate the fact that,
> a non-boot CPU is allowed to clear the capability explicitly, rather than
> implying it with SCOPE_SYSTEM && SCOPE_LOCAL_CPU. Or may be make
> it explicit that the capability must be matched on ALL cpus and
> finalized at the end ?

I had such variant locally but then decided to reuse the SCOPE_SYSTEM
for this, more of a way to indicate that we want something system-wide
but checked per-CPU. We could add a new flag, though I was wondering
whether we can have a property that's checked both per-CPU and once more
system-wide. That's what actually happens with the above, then the probe
function can tell whether it was called in the CPU or system scope.

Alternatively, we can leave the local/system combining for later and
only add a flag to tell how they compose - "any" (default) vs "all".

> /*
>  * When paired with SCOPE_LOCAL_CPU, all CPUs must satisfy the
>  * condition. This is different from SCOPE_SYSTEM, where the check
>  * is performed only once at the end of SMP boot. But SCOPE_SYSTEM
>  * may not be sufficient in cases where the capability depends on
>  * properties that are not "sanitised" (e.g., MIDR_EL1) and must be
>  * satisfied by all the early SMP boot CPUs.
>  */
> #define ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS	((u16)BIT(7))
> 
> statici inline bool cpucap_match_all_cpus(struct arm64_capability *cap)
> {
> 	return !!(cap->type & ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS);
> }

Yes, something like this would work.

> Also, we already go through the capablity list to report the ones
> with "cpumask" separately, and we could use that to also report
> the ones with MATCH_ALL_CPUs. Something like:
> 
> 
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 9c4d6d552b25..14cbae51d802 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -3769,10 +3769,15 @@ static void __init setup_system_capabilities(void)
>         for (int i = 0; i < ARM64_NCAPS; i++) {
>                 const struct arm64_cpu_capabilities *caps = cpucap_ptrs[i];
> 
> -		if (caps && caps->cpus && caps->desc &&
> -			cpumask_any(caps->cpus) < nr_cpu_ids)
> +		if (!caps || !caps->desc)
> +			continue;
> +
> +		if (caps->cpus && cpumask_any(caps->cpus) < nr_cpu_ids)
> 			pr_info("detected: %s on CPU%*pbl\n",
> 				caps->desc, cpumask_pr_args(caps->cpus));
> +
> +		/* Report capabilities that had to be matched on all CPUs */
> +		if (capcpucap_match_all_cpus(caps) && cpus_have_cap(caps))
> +			pr_info("detected: %s\n", caps->desc);
>         }

Yeah, I hacked something similar with the 'global' proposal based on
SCOPE_SYSTEM.

> > ---------------------8<-----------------------------------------
> > diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> > index c4326f1cb917..0b0b26a6f27b 100644
> > --- a/arch/arm64/include/asm/cpufeature.h
> > +++ b/arch/arm64/include/asm/cpufeature.h
> > @@ -331,6 +331,15 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;
> >   #define ARM64_CPUCAP_BOOT_CPU_FEATURE                  \
> >   	(ARM64_CPUCAP_SCOPE_BOOT_CPU | ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU)
> > +/*
> > + * CPU feature detected at boot time based on all CPUs. It is safe for a late
> > + * CPU to have this feature even though the system hasn't enabled it, although
> > + * the feature will not be used by Linux in this case. If the system has
> > + * enabled this feature already, then every late CPU must have it.
> > + */
> > +#define ARM64_CPUCAP_GLOBAL_CPU_FEATURE			
> 
> #define ARM64_CPUCAP_MATCH_ALL_CPU_FEATURE ?
> 
> \
> > +	 (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_SYSTEM_FEATURE)
> 
>   (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS)
> 
> 
> > +
> >   struct arm64_cpu_capabilities {
> >   	const char *desc;
> >   	u16 capability;
> > @@ -391,6 +400,11 @@ static inline int cpucap_default_scope(const struct arm64_cpu_capabilities *cap)
> >   	return cap->type & ARM64_CPUCAP_SCOPE_MASK;
> >   }
> > +static inline bool cpucap_global_scope(const struct arm64_cpu_capabilities *cap)
> 
> May be call it cpucap_match_all_cpus() ?

I can respin, the alternative looks good to me.

Now, we discussed offline of a different approach: for AmpereOne we'll
have to check MIDR early (as an erratum) and pretend it has BBML2,
populate the sanitised cpuid regs accordingly. We could do something
similar for the other CPUs, pretend it's something like BBML3 and get
the architects to commit to it (but this would delay the patchset).

TBH, I'd rather not hack this and only rely on the MIDR for BBM_NOABORT
(without any level) and the above MATCH_ALL_CPUS. My proposal is to
respin this with a MATCH_ALL_CPUS flag that only checks the MIDR. We can
later add a SCOPE_SYSTEM to the same capability that would 'or' in the
BBML3 cap (or just use two capabilities, though we end up with two many
branches or NOPs in the patched alternatives).

-- 
Catalin


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature
  2025-05-22 15:23                         ` Catalin Marinas
@ 2025-05-22 16:29                           ` Suzuki K Poulose
  0 siblings, 0 replies; 28+ messages in thread
From: Suzuki K Poulose @ 2025-05-22 16:29 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Ryan Roberts, Will Deacon, Mikołaj Lenczewski, yang, corbet,
	jean-philippe, robin.murphy, joro, akpm, paulmck, mark.rutland,
	joey.gouly, maz, james.morse, broonie, oliver.upton, baohua,
	david, ioworker0, jgg, nicolinc, mshavit, jsnitsel, smostafa,
	kevin.tian, linux-doc, linux-kernel, linux-arm-kernel, iommu

On 22/05/2025 16:23, Catalin Marinas wrote:
> Hi Suzuki,
> 
> Thanks for looking at this.
> 
> On Mon, May 19, 2025 at 10:45:31AM +0100, Suzuki K Poulose wrote:
>> On 14/05/2025 13:05, Catalin Marinas wrote:
>>> On Tue, May 13, 2025 at 10:15:49AM +0100, Suzuki K Poulose wrote:
>>>> On 12/05/2025 17:33, Catalin Marinas wrote:
>>>>> Stepping back a bit, we know that the MIDR allow-list implies
>>>>> BBML2_NOABORT (and at least BBML2 as in the ID regs). In theory, we need
>>>>
>>>> Please be aware that BBML2_NOABORT midr list may not always imply BBLM2 in
>>>> ID registers (e.g., AmpereOne. But the plan is to fixup the per cpu
>>>> ID register - struct cpuinfo_arm64 - for such cores at early boot,
>>>> individually, before it is used for sanitisation of the system wide
>>>> copy).
>>>
>>> Ah, good point. We can then ignore BBML2 ID regs and only rely on MIDR
>>> (and some future BBML3).
>>>
>>>>> So how about we introduce a WEAK_BOOT_CPU_FEATURE which gets enabled by
>>>>> the boot CPU if it has it _but_ cleared by any secondary early CPU if it
>>>>> doesn't (and never enabled by secondary CPUs). When the features are
>>>>> finalised, we know if all early CPUs had it. In combination with
>>>>> PERMITTED_FOR_LATE_CPU, we'd reject late CPUs that don't have it.
>>>>
>>>> That could work, but it introduces this "clearing" a capability, which
>>>> we don't do at the moment.
>>>>
>>>> We had an offline discussion about this some time ago, with Mark
>>>> Rutland. The best way to deal with this is to change the way we compute
>>>> capabilities. i.e.,
>>>>
>>>>
>>>> 1. Each boot CPU run through all the capabilities and maintain a per-cpu
>>>>      copy of the state.
>>>> 2. System wide capabilities can then be constructed from the all early
>>>>      boot CPU capability state (e.g., ANDing all the state from all CPUs
>>>>      for SCOPE_SYSTEM or ORing for LOCAL_CPU).
>>>>
>>>> But this requires a drastic change to the infrastructure.
>>>
>>> I think it's a lot simpler to achieve the ANDing - set the (system)
>>> capability if detected on the boot CPU, only clear it if missing on
>>> subsequent CPUs. See below on an attempt to introduce this. For lack of
>>> inspiration, I called it ARM64_CPUCAP_GLOBAL_CPU_FEATURE which has both
>>> SCOPE_LOCAL and SCOPE_SYSTEM. It's permitted for late CPUs but not
>>> optional if already enabled. The advantage of having both local&system
>>> is that the match function will be called for both scopes. I added a
>>> mask in to cpucap_default_scope() when calling matches() since so far
>>> no cap had both.
>>
>> Thanks, the change below does the trick. I am reasoning with the way
>> the scope has been defined (hacked ;-)).
>>
>> SCOPE_LOCAL_CPU && SCOPE_SYSTEM
>>
>> 1. SCOPE_LOCAL_CPU : Because you need to run this on all the (early) CPUs.
>>
>> 2. SCOPE_SYSTEM: To check if the capability holds at the end of the
>> smp boot.
>>
>> While, we really "detect" it on SCOPE_BOOT_CPU and only run the
>> cap checks, if that is available. But put another way, BOOT_CPU
>> is only used as an easy way to detect if this CPUs is the first
>> one to run the check vs at least one CPU has run and cleared the
>> cap.
> 
> Yes, we start with boot CPU and keep 'and-ing' new values onto it.
> 
>> I wonder if we could use some other flag to indicate the fact that,
>> a non-boot CPU is allowed to clear the capability explicitly, rather than
>> implying it with SCOPE_SYSTEM && SCOPE_LOCAL_CPU. Or may be make
>> it explicit that the capability must be matched on ALL cpus and
>> finalized at the end ?
> 
> I had such variant locally but then decided to reuse the SCOPE_SYSTEM
> for this, more of a way to indicate that we want something system-wide
> but checked per-CPU. We could add a new flag, though I was wondering
> whether we can have a property that's checked both per-CPU and once more
> system-wide. That's what actually happens with the above, then the probe
> function can tell whether it was called in the CPU or system scope.
> 
> Alternatively, we can leave the local/system combining for later and
> only add a flag to tell how they compose - "any" (default) vs "all".
> 
>> /*
>>   * When paired with SCOPE_LOCAL_CPU, all CPUs must satisfy the
>>   * condition. This is different from SCOPE_SYSTEM, where the check
>>   * is performed only once at the end of SMP boot. But SCOPE_SYSTEM
>>   * may not be sufficient in cases where the capability depends on
>>   * properties that are not "sanitised" (e.g., MIDR_EL1) and must be
>>   * satisfied by all the early SMP boot CPUs.
>>   */
>> #define ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS	((u16)BIT(7))
>>
>> statici inline bool cpucap_match_all_cpus(struct arm64_capability *cap)
>> {
>> 	return !!(cap->type & ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS);
>> }
> 
> Yes, something like this would work.
> 
>> Also, we already go through the capablity list to report the ones
>> with "cpumask" separately, and we could use that to also report
>> the ones with MATCH_ALL_CPUs. Something like:
>>
>>
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 9c4d6d552b25..14cbae51d802 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -3769,10 +3769,15 @@ static void __init setup_system_capabilities(void)
>>          for (int i = 0; i < ARM64_NCAPS; i++) {
>>                  const struct arm64_cpu_capabilities *caps = cpucap_ptrs[i];
>>
>> -		if (caps && caps->cpus && caps->desc &&
>> -			cpumask_any(caps->cpus) < nr_cpu_ids)
>> +		if (!caps || !caps->desc)
>> +			continue;
>> +
>> +		if (caps->cpus && cpumask_any(caps->cpus) < nr_cpu_ids)
>> 			pr_info("detected: %s on CPU%*pbl\n",
>> 				caps->desc, cpumask_pr_args(caps->cpus));
>> +
>> +		/* Report capabilities that had to be matched on all CPUs */
>> +		if (capcpucap_match_all_cpus(caps) && cpus_have_cap(caps))
>> +			pr_info("detected: %s\n", caps->desc);
>>          }
> 
> Yeah, I hacked something similar with the 'global' proposal based on
> SCOPE_SYSTEM.
> 
>>> ---------------------8<-----------------------------------------
>>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>>> index c4326f1cb917..0b0b26a6f27b 100644
>>> --- a/arch/arm64/include/asm/cpufeature.h
>>> +++ b/arch/arm64/include/asm/cpufeature.h
>>> @@ -331,6 +331,15 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;
>>>    #define ARM64_CPUCAP_BOOT_CPU_FEATURE                  \
>>>    	(ARM64_CPUCAP_SCOPE_BOOT_CPU | ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU)
>>> +/*
>>> + * CPU feature detected at boot time based on all CPUs. It is safe for a late
>>> + * CPU to have this feature even though the system hasn't enabled it, although
>>> + * the feature will not be used by Linux in this case. If the system has
>>> + * enabled this feature already, then every late CPU must have it.
>>> + */
>>> +#define ARM64_CPUCAP_GLOBAL_CPU_FEATURE			
>>
>> #define ARM64_CPUCAP_MATCH_ALL_CPU_FEATURE ?
>>
>> \
>>> +	 (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_SYSTEM_FEATURE)
>>
>>    (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS)
>>
>>
>>> +
>>>    struct arm64_cpu_capabilities {
>>>    	const char *desc;
>>>    	u16 capability;
>>> @@ -391,6 +400,11 @@ static inline int cpucap_default_scope(const struct arm64_cpu_capabilities *cap)
>>>    	return cap->type & ARM64_CPUCAP_SCOPE_MASK;
>>>    }
>>> +static inline bool cpucap_global_scope(const struct arm64_cpu_capabilities *cap)
>>
>> May be call it cpucap_match_all_cpus() ?
> 
> I can respin, the alternative looks good to me.
> 
> Now, we discussed offline of a different approach: for AmpereOne we'll
> have to check MIDR early (as an erratum) and pretend it has BBML2,
> populate the sanitised cpuid regs accordingly. We could do something
> similar for the other CPUs, pretend it's something like BBML3 and get
> the architects to commit to it (but this would delay the patchset).
> 
> TBH, I'd rather not hack this and only rely on the MIDR for BBM_NOABORT
> (without any level) and the above MATCH_ALL_CPUS. My proposal is to
> respin this with a MATCH_ALL_CPUS flag that only checks the MIDR. We can
> later add a SCOPE_SYSTEM to the same capability that would 'or' in the
> BBML3 cap (or just use two capabilities, though we end up with two many
> branches or NOPs in the patched alternatives).

Sounds good to me. Thanks for taking care of this.

Cheers
Suzuki

> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-05-22 16:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-28 15:35 [RESEND PATCH v6 0/3] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
2025-04-28 15:35 ` [RESEND PATCH v6 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
2025-04-28 17:55   ` ALOK TIWARI
2025-05-06  8:36     ` Mikołaj Lenczewski
2025-05-06 14:25   ` Will Deacon
2025-05-06 14:51     ` Marc Zyngier
2025-05-06 14:57       ` Will Deacon
2025-05-06 14:52     ` Ryan Roberts
2025-05-09 13:49       ` Will Deacon
2025-05-09 14:16         ` Ryan Roberts
2025-05-09 14:28           ` Will Deacon
2025-05-09 14:58             ` Ryan Roberts
2025-05-09 15:59             ` Catalin Marinas
2025-05-09 16:04         ` Catalin Marinas
2025-05-12 13:07           ` Ryan Roberts
2025-05-12 13:24             ` Suzuki K Poulose
2025-05-12 13:35               ` Ryan Roberts
2025-05-12 16:33                 ` Catalin Marinas
2025-05-13  9:15                   ` Suzuki K Poulose
2025-05-14 12:05                     ` Catalin Marinas
2025-05-19  9:45                       ` Suzuki K Poulose
2025-05-22 15:23                         ` Catalin Marinas
2025-05-22 16:29                           ` Suzuki K Poulose
2025-05-12 17:17                 ` Suzuki K Poulose
2025-04-28 15:35 ` [RESEND PATCH v6 2/3] iommu/arm: Add BBM Level 2 smmu feature Mikołaj Lenczewski
2025-05-06 14:19   ` Will Deacon
2025-04-28 15:35 ` [RESEND PATCH v6 3/3] arm64/mm: Elide tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
2025-04-28 16:17   ` Ryan Roberts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).