* [PATCH v1 0/3] Initial BBML2 support for contpte_convert()
@ 2025-02-19 14:38 Mikołaj Lenczewski
2025-02-19 14:38 ` [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Mikołaj Lenczewski @ 2025-02-19 14:38 UTC (permalink / raw)
To: ryan.roberts, yang, catalin.marinas, will, joey.gouly, broonie,
mark.rutland, james.morse, yangyicong, robin.murphy,
anshuman.khandual, maz, liaochang1, akpm, david, baohua,
ioworker0, oliver.upton, linux-arm-kernel, linux-kernel
Cc: Mikołaj Lenczewski
Hi All,
This patch series adds adding initial support for eliding
break-before-make requirements on systems that support BBML2 and
additionally guarantee to never raise a conflict abort.
This support reorders and optionally elides a TLB invalidation in
contpte_convert(). The elision of said invalidation leads to a 12%
improvement when executing a microbenchmark designed to force the
pathological path where contpte_convert() gets called. This
represents an 80% reduction in the cost of calling contpte_convert().
However, even without the elision, the reodering represents a
performance improvement due to reducing thread contention, as there is
a smaller time window for racing threads to see an invalid pagetable
entry (especially if they already have a cached entry in their TLB
that they are working off of).
This series is based on v6.14-rc3 (0ad2507d5d93).
Patch 1 implements an allow-list of cpus that support BBML2, but with
the additional constraint of never causing TLB conflict aborts. We
settled on this constraint because we will use the feature for kernel
mappings in the future, for which we cannot handle conflict aborts
safely.
Yang Shi has a series at [1] that aims to use BBML2 to enable splitting
the linear map at runtime. This series partially overlaps with it to add
the cpu feature. We beleive this series is fully compatible with Yang's
requirements and could go first, given there is still a lot of discussion
around the best way to manage the mechanics of splitting/collapsing the
linear map.
[1]:
https://lore.kernel.org/linux-arm-kernel/20250103011822.1257189-1-yang@os.amperecomputing.com/
Mikołaj Lenczewski (3):
arm64: Add BBM Level 2 cpu feature
arm64/mm: Delay tlbi in contpte_convert() under BBML2
arm64/mm: Elide tlbi in contpte_convert() under BBML2
arch/arm64/Kconfig | 9 ++++++++
arch/arm64/include/asm/cpufeature.h | 5 +++++
arch/arm64/kernel/cpufeature.c | 32 +++++++++++++++++++++++++++++
arch/arm64/mm/contpte.c | 3 ++-
arch/arm64/tools/cpucaps | 1 +
5 files changed, 49 insertions(+), 1 deletion(-)
--
2.45.3
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature
2025-02-19 14:38 [PATCH v1 0/3] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
@ 2025-02-19 14:38 ` Mikołaj Lenczewski
2025-02-19 15:39 ` Robin Murphy
` (2 more replies)
2025-02-19 14:38 ` [PATCH v1 2/3] arm64/mm: Delay tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
2025-02-19 14:38 ` [PATCH v1 3/3] arm64/mm: Elide " Mikołaj Lenczewski
2 siblings, 3 replies; 14+ messages in thread
From: Mikołaj Lenczewski @ 2025-02-19 14:38 UTC (permalink / raw)
To: ryan.roberts, yang, catalin.marinas, will, joey.gouly, broonie,
mark.rutland, james.morse, yangyicong, robin.murphy,
anshuman.khandual, maz, liaochang1, akpm, david, baohua,
ioworker0, oliver.upton, linux-arm-kernel, linux-kernel
Cc: Mikołaj Lenczewski
The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
and this commit adds a dedicated BBML2 cpufeature to test against
support for.
This is a system feature as we might have a big.LITTLE architecture
where some cores support BBML2 and some don't, but we want all cores to
be available and BBM to default to level 0 (as opposed to having cores
without BBML2 not coming online).
To support BBML2 in as wide a range of contexts as we can, we want not
only the architectural guarantees that BBML2 makes, but additionally
want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
us having to prove that no recursive faults can be induced in any path
that uses BBML2, allowing its use for arbitrary kernel mappings.
Support detection of such CPUs.
Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
---
arch/arm64/Kconfig | 9 ++++++++
arch/arm64/include/asm/cpufeature.h | 5 +++++
arch/arm64/kernel/cpufeature.c | 32 +++++++++++++++++++++++++++++
arch/arm64/tools/cpucaps | 1 +
4 files changed, 47 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 940343beb3d4..84be2c5976f0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2057,6 +2057,15 @@ config ARM64_TLB_RANGE
The feature introduces new assembly instructions, and they were
support when binutils >= 2.30.
+config ARM64_ENABLE_BBML2
+ bool "Enable support for Break-Before-Make Level 2 detection and usage"
+ default y
+ help
+ FEAT_BBM provides detection of support levels for break-before-make
+ sequences. If BBM level 2 is supported, some TLB maintenance requirements
+ can be relaxed to improve performance. Selecting N causes the kernel to
+ fallback to BBM level 0 behaviour even if the system supports BBM level 2.
+
endmenu # "ARMv8.4 architectural features"
menu "ARMv8.5 architectural features"
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index e0e4478f5fb5..2da872035f2e 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -866,6 +866,11 @@ static __always_inline bool system_supports_mpam_hcr(void)
return alternative_has_cap_unlikely(ARM64_MPAM_HCR);
}
+static inline bool system_supports_bbml2_noconflict(void)
+{
+ return alternative_has_cap_unlikely(ARM64_HAS_BBML2_NOCONFLICT);
+}
+
int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index d561cf3b8ac7..8c337bd95ef7 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2176,6 +2176,31 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
}
+static bool has_bbml2_noconflict(const struct arm64_cpu_capabilities *entry,
+ int scope)
+{
+ if (!IS_ENABLED(CONFIG_ARM64_ENABLE_BBML2))
+ return false;
+
+ /* We want to allow usage of bbml2 in as wide a range of kernel contexts
+ * as possible. This list is therefore an allow-list of known-good
+ * implementations that both support bbml2 and additionally, fulfil the
+ * extra constraint of never generating TLB conflict aborts when using
+ * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
+ * kernel contexts difficult to prove safe against recursive aborts).
+ */
+ static const struct midr_range supports_bbml2_without_abort_list[] = {
+ MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
+ MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
+ {}
+ };
+
+ if (!is_midr_in_range_list(read_cpuid_id(), supports_bbml2_without_abort_list))
+ return false;
+
+ return true;
+}
+
#ifdef CONFIG_ARM64_PAN
static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
{
@@ -2926,6 +2951,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.matches = has_cpuid_feature,
ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
},
+ {
+ .desc = "BBM Level 2 without conflict abort",
+ .capability = ARM64_HAS_BBML2_NOCONFLICT,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .matches = has_bbml2_noconflict,
+ ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, BBM, 2)
+ },
{
.desc = "52-bit Virtual Addressing for KVM (LPA2)",
.capability = ARM64_HAS_LPA2,
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 1e65f2fb45bd..8d67bb4448c5 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -26,6 +26,7 @@ HAS_ECV
HAS_ECV_CNTPOFF
HAS_EPAN
HAS_EVT
+HAS_BBML2_NOCONFLICT
HAS_FPMR
HAS_FGT
HAS_FPSIMD
--
2.45.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v1 2/3] arm64/mm: Delay tlbi in contpte_convert() under BBML2
2025-02-19 14:38 [PATCH v1 0/3] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
2025-02-19 14:38 ` [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
@ 2025-02-19 14:38 ` Mikołaj Lenczewski
2025-02-19 14:38 ` [PATCH v1 3/3] arm64/mm: Elide " Mikołaj Lenczewski
2 siblings, 0 replies; 14+ messages in thread
From: Mikołaj Lenczewski @ 2025-02-19 14:38 UTC (permalink / raw)
To: ryan.roberts, yang, catalin.marinas, will, joey.gouly, broonie,
mark.rutland, james.morse, yangyicong, robin.murphy,
anshuman.khandual, maz, liaochang1, akpm, david, baohua,
ioworker0, oliver.upton, linux-arm-kernel, linux-kernel
Cc: Mikołaj Lenczewski
When converting a region via contpte_convert() to use mTHP, we have two
different goals. We have to mark each entry as contiguous, and we would
like to smear the dirty and young (access) bits across all entries in
the contiguous block. Currently, we do this by first accumulating the
dirty and young bits in the block, using an atomic
__ptep_get_and_clear() and the relevant pte_{dirty,young}() calls,
performing a tlbi, and finally smearing the correct bits across the
block using __set_ptes().
This approach works fine for BBM level 0, but with support for BBM level
2 we are allowed to reorder the tlbi to after setting the pagetable
entries. This reordering means that other threads will not see an
invalid pagetable entry, instead operating on stale data, until we have
performed our smearing and issued the invalidation. Avoiding this
invalid entry reduces faults in other threads, and thus improves
performance marginally (more so when there are more threads).
Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
---
arch/arm64/mm/contpte.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
index 55107d27d3f8..e26e8f8cfb9b 100644
--- a/arch/arm64/mm/contpte.c
+++ b/arch/arm64/mm/contpte.c
@@ -68,9 +68,13 @@ static void contpte_convert(struct mm_struct *mm, unsigned long addr,
pte = pte_mkyoung(pte);
}
- __flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
+ if (!system_supports_bbml2_noconflict())
+ __flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
__set_ptes(mm, start_addr, start_ptep, pte, CONT_PTES);
+
+ if (system_supports_bbml2_noconflict())
+ __flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
}
void __contpte_try_fold(struct mm_struct *mm, unsigned long addr,
--
2.45.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v1 3/3] arm64/mm: Elide tlbi in contpte_convert() under BBML2
2025-02-19 14:38 [PATCH v1 0/3] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
2025-02-19 14:38 ` [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
2025-02-19 14:38 ` [PATCH v1 2/3] arm64/mm: Delay tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
@ 2025-02-19 14:38 ` Mikołaj Lenczewski
2 siblings, 0 replies; 14+ messages in thread
From: Mikołaj Lenczewski @ 2025-02-19 14:38 UTC (permalink / raw)
To: ryan.roberts, yang, catalin.marinas, will, joey.gouly, broonie,
mark.rutland, james.morse, yangyicong, robin.murphy,
anshuman.khandual, maz, liaochang1, akpm, david, baohua,
ioworker0, oliver.upton, linux-arm-kernel, linux-kernel
Cc: Mikołaj Lenczewski
If we support bbml2 without conflict aborts, we can avoid the final
flush and have hardware manage the tlb entries for us. Avoiding flushes
is a win.
Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
---
arch/arm64/mm/contpte.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
index e26e8f8cfb9b..26a86248f897 100644
--- a/arch/arm64/mm/contpte.c
+++ b/arch/arm64/mm/contpte.c
@@ -72,9 +72,6 @@ static void contpte_convert(struct mm_struct *mm, unsigned long addr,
__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
__set_ptes(mm, start_addr, start_ptep, pte, CONT_PTES);
-
- if (system_supports_bbml2_noconflict())
- __flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
}
void __contpte_try_fold(struct mm_struct *mm, unsigned long addr,
--
2.45.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature
2025-02-19 14:38 ` [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
@ 2025-02-19 15:39 ` Robin Murphy
2025-02-19 15:43 ` Ryan Roberts
2025-02-19 23:34 ` Oliver Upton
2025-02-20 1:25 ` Yang Shi
2 siblings, 1 reply; 14+ messages in thread
From: Robin Murphy @ 2025-02-19 15:39 UTC (permalink / raw)
To: Mikołaj Lenczewski, ryan.roberts, yang, catalin.marinas,
will, joey.gouly, broonie, mark.rutland, james.morse, yangyicong,
anshuman.khandual, maz, liaochang1, akpm, david, baohua,
ioworker0, oliver.upton, linux-arm-kernel, linux-kernel
Hi Miko,
On 2025-02-19 2:38 pm, Mikołaj Lenczewski wrote:
> The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
> and this commit adds a dedicated BBML2 cpufeature to test against
> support for.
>
> This is a system feature as we might have a big.LITTLE architecture
> where some cores support BBML2 and some don't, but we want all cores to
> be available and BBM to default to level 0 (as opposed to having cores
> without BBML2 not coming online).
>
> To support BBML2 in as wide a range of contexts as we can, we want not
> only the architectural guarantees that BBML2 makes, but additionally
> want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
> us having to prove that no recursive faults can be induced in any path
> that uses BBML2, allowing its use for arbitrary kernel mappings.
> Support detection of such CPUs.
If this may be used for splitting/compacting userspace mappings, then
similarly to 6e192214c6c8 ("iommu/arm-smmu-v3: Document SVA interaction
with new pagetable features"), strictly we'll also want a check in
arm_smmu_sva_supported() to make sure that the SMMU is OK with BBML2
behaviour too, and disallow SVA if not. Note that the corresponding
SMMUv3.2-BBML2 feature is already strict about TLB conflict aborts, so
is comparatively nice and straightforward.
Thanks,
Robin.
> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
> ---
> arch/arm64/Kconfig | 9 ++++++++
> arch/arm64/include/asm/cpufeature.h | 5 +++++
> arch/arm64/kernel/cpufeature.c | 32 +++++++++++++++++++++++++++++
> arch/arm64/tools/cpucaps | 1 +
> 4 files changed, 47 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 940343beb3d4..84be2c5976f0 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2057,6 +2057,15 @@ config ARM64_TLB_RANGE
> The feature introduces new assembly instructions, and they were
> support when binutils >= 2.30.
>
> +config ARM64_ENABLE_BBML2
> + bool "Enable support for Break-Before-Make Level 2 detection and usage"
> + default y
> + help
> + FEAT_BBM provides detection of support levels for break-before-make
> + sequences. If BBM level 2 is supported, some TLB maintenance requirements
> + can be relaxed to improve performance. Selecting N causes the kernel to
> + fallback to BBM level 0 behaviour even if the system supports BBM level 2.
> +
> endmenu # "ARMv8.4 architectural features"
>
> menu "ARMv8.5 architectural features"
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index e0e4478f5fb5..2da872035f2e 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -866,6 +866,11 @@ static __always_inline bool system_supports_mpam_hcr(void)
> return alternative_has_cap_unlikely(ARM64_MPAM_HCR);
> }
>
> +static inline bool system_supports_bbml2_noconflict(void)
> +{
> + return alternative_has_cap_unlikely(ARM64_HAS_BBML2_NOCONFLICT);
> +}
> +
> int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
> bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
>
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index d561cf3b8ac7..8c337bd95ef7 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2176,6 +2176,31 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
> return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
> }
>
> +static bool has_bbml2_noconflict(const struct arm64_cpu_capabilities *entry,
> + int scope)
> +{
> + if (!IS_ENABLED(CONFIG_ARM64_ENABLE_BBML2))
> + return false;
> +
> + /* We want to allow usage of bbml2 in as wide a range of kernel contexts
> + * as possible. This list is therefore an allow-list of known-good
> + * implementations that both support bbml2 and additionally, fulfil the
> + * extra constraint of never generating TLB conflict aborts when using
> + * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
> + * kernel contexts difficult to prove safe against recursive aborts).
> + */
> + static const struct midr_range supports_bbml2_without_abort_list[] = {
> + MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
> + MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
> + {}
> + };
> +
> + if (!is_midr_in_range_list(read_cpuid_id(), supports_bbml2_without_abort_list))
> + return false;
> +
> + return true;
> +}
> +
> #ifdef CONFIG_ARM64_PAN
> static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
> {
> @@ -2926,6 +2951,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> .matches = has_cpuid_feature,
> ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
> },
> + {
> + .desc = "BBM Level 2 without conflict abort",
> + .capability = ARM64_HAS_BBML2_NOCONFLICT,
> + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> + .matches = has_bbml2_noconflict,
> + ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, BBM, 2)
> + },
> {
> .desc = "52-bit Virtual Addressing for KVM (LPA2)",
> .capability = ARM64_HAS_LPA2,
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 1e65f2fb45bd..8d67bb4448c5 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -26,6 +26,7 @@ HAS_ECV
> HAS_ECV_CNTPOFF
> HAS_EPAN
> HAS_EVT
> +HAS_BBML2_NOCONFLICT
> HAS_FPMR
> HAS_FGT
> HAS_FPSIMD
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature
2025-02-19 15:39 ` Robin Murphy
@ 2025-02-19 15:43 ` Ryan Roberts
2025-02-19 16:25 ` Robin Murphy
0 siblings, 1 reply; 14+ messages in thread
From: Ryan Roberts @ 2025-02-19 15:43 UTC (permalink / raw)
To: Robin Murphy, Mikołaj Lenczewski, yang, catalin.marinas,
will, joey.gouly, broonie, mark.rutland, james.morse, yangyicong,
anshuman.khandual, maz, liaochang1, akpm, david, baohua,
ioworker0, oliver.upton, linux-arm-kernel, linux-kernel
On 19/02/2025 15:39, Robin Murphy wrote:
> Hi Miko,
>
> On 2025-02-19 2:38 pm, Mikołaj Lenczewski wrote:
>> The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
>> and this commit adds a dedicated BBML2 cpufeature to test against
>> support for.
>>
>> This is a system feature as we might have a big.LITTLE architecture
>> where some cores support BBML2 and some don't, but we want all cores to
>> be available and BBM to default to level 0 (as opposed to having cores
>> without BBML2 not coming online).
>>
>> To support BBML2 in as wide a range of contexts as we can, we want not
>> only the architectural guarantees that BBML2 makes, but additionally
>> want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
>> us having to prove that no recursive faults can be induced in any path
>> that uses BBML2, allowing its use for arbitrary kernel mappings.
>> Support detection of such CPUs.
>
> If this may be used for splitting/compacting userspace mappings, then similarly
> to 6e192214c6c8 ("iommu/arm-smmu-v3: Document SVA interaction with new pagetable
> features"), strictly we'll also want a check in arm_smmu_sva_supported() to make
> sure that the SMMU is OK with BBML2 behaviour too, and disallow SVA if not. Note
> that the corresponding SMMUv3.2-BBML2 feature is already strict about TLB
> conflict aborts, so is comparatively nice and straightforward.
Thanks for catching this, Robin, as I completely forgot to pass this onto Miko
yesterday after our conversation. I suggest we tack a commit on to the end of
this series to cover that?
I think that strictly this is not needed for Yang's series since that only uses
BBML2 for kernel mappings, and those pgtables would never be directly shared
with the SMMU.
>
> Thanks,
> Robin.
>
>> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
>> ---
>> arch/arm64/Kconfig | 9 ++++++++
>> arch/arm64/include/asm/cpufeature.h | 5 +++++
>> arch/arm64/kernel/cpufeature.c | 32 +++++++++++++++++++++++++++++
>> arch/arm64/tools/cpucaps | 1 +
>> 4 files changed, 47 insertions(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 940343beb3d4..84be2c5976f0 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -2057,6 +2057,15 @@ config ARM64_TLB_RANGE
>> The feature introduces new assembly instructions, and they were
>> support when binutils >= 2.30.
>> +config ARM64_ENABLE_BBML2
>> + bool "Enable support for Break-Before-Make Level 2 detection and usage"
>> + default y
>> + help
>> + FEAT_BBM provides detection of support levels for break-before-make
>> + sequences. If BBM level 2 is supported, some TLB maintenance requirements
>> + can be relaxed to improve performance. Selecting N causes the kernel to
>> + fallback to BBM level 0 behaviour even if the system supports BBM level 2.
>> +
>> endmenu # "ARMv8.4 architectural features"
>> menu "ARMv8.5 architectural features"
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/
>> cpufeature.h
>> index e0e4478f5fb5..2da872035f2e 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -866,6 +866,11 @@ static __always_inline bool system_supports_mpam_hcr(void)
>> return alternative_has_cap_unlikely(ARM64_MPAM_HCR);
>> }
>> +static inline bool system_supports_bbml2_noconflict(void)
>> +{
>> + return alternative_has_cap_unlikely(ARM64_HAS_BBML2_NOCONFLICT);
>> +}
>> +
>> int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
>> bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index d561cf3b8ac7..8c337bd95ef7 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -2176,6 +2176,31 @@ static bool hvhe_possible(const struct
>> arm64_cpu_capabilities *entry,
>> return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
>> }
>> +static bool has_bbml2_noconflict(const struct arm64_cpu_capabilities *entry,
>> + int scope)
>> +{
>> + if (!IS_ENABLED(CONFIG_ARM64_ENABLE_BBML2))
>> + return false;
>> +
>> + /* We want to allow usage of bbml2 in as wide a range of kernel contexts
>> + * as possible. This list is therefore an allow-list of known-good
>> + * implementations that both support bbml2 and additionally, fulfil the
>> + * extra constraint of never generating TLB conflict aborts when using
>> + * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
>> + * kernel contexts difficult to prove safe against recursive aborts).
>> + */
>> + static const struct midr_range supports_bbml2_without_abort_list[] = {
>> + MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
>> + MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
>> + {}
>> + };
>> +
>> + if (!is_midr_in_range_list(read_cpuid_id(),
>> supports_bbml2_without_abort_list))
>> + return false;
>> +
>> + return true;
>> +}
>> +
>> #ifdef CONFIG_ARM64_PAN
>> static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
>> {
>> @@ -2926,6 +2951,13 @@ static const struct arm64_cpu_capabilities
>> arm64_features[] = {
>> .matches = has_cpuid_feature,
>> ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
>> },
>> + {
>> + .desc = "BBM Level 2 without conflict abort",
>> + .capability = ARM64_HAS_BBML2_NOCONFLICT,
>> + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
>> + .matches = has_bbml2_noconflict,
>> + ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, BBM, 2)
>> + },
>> {
>> .desc = "52-bit Virtual Addressing for KVM (LPA2)",
>> .capability = ARM64_HAS_LPA2,
>> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
>> index 1e65f2fb45bd..8d67bb4448c5 100644
>> --- a/arch/arm64/tools/cpucaps
>> +++ b/arch/arm64/tools/cpucaps
>> @@ -26,6 +26,7 @@ HAS_ECV
>> HAS_ECV_CNTPOFF
>> HAS_EPAN
>> HAS_EVT
>> +HAS_BBML2_NOCONFLICT
>> HAS_FPMR
>> HAS_FGT
>> HAS_FPSIMD
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature
2025-02-19 15:43 ` Ryan Roberts
@ 2025-02-19 16:25 ` Robin Murphy
2025-02-20 9:33 ` Mikołaj Lenczewski
0 siblings, 1 reply; 14+ messages in thread
From: Robin Murphy @ 2025-02-19 16:25 UTC (permalink / raw)
To: Ryan Roberts, Mikołaj Lenczewski, yang, catalin.marinas,
will, joey.gouly, broonie, mark.rutland, james.morse, yangyicong,
anshuman.khandual, maz, liaochang1, akpm, david, baohua,
ioworker0, oliver.upton, linux-arm-kernel, linux-kernel
On 2025-02-19 3:43 pm, Ryan Roberts wrote:
> On 19/02/2025 15:39, Robin Murphy wrote:
>> Hi Miko,
>>
>> On 2025-02-19 2:38 pm, Mikołaj Lenczewski wrote:
>>> The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
>>> and this commit adds a dedicated BBML2 cpufeature to test against
>>> support for.
>>>
>>> This is a system feature as we might have a big.LITTLE architecture
>>> where some cores support BBML2 and some don't, but we want all cores to
>>> be available and BBM to default to level 0 (as opposed to having cores
>>> without BBML2 not coming online).
>>>
>>> To support BBML2 in as wide a range of contexts as we can, we want not
>>> only the architectural guarantees that BBML2 makes, but additionally
>>> want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
>>> us having to prove that no recursive faults can be induced in any path
>>> that uses BBML2, allowing its use for arbitrary kernel mappings.
>>> Support detection of such CPUs.
>>
>> If this may be used for splitting/compacting userspace mappings, then similarly
>> to 6e192214c6c8 ("iommu/arm-smmu-v3: Document SVA interaction with new pagetable
>> features"), strictly we'll also want a check in arm_smmu_sva_supported() to make
>> sure that the SMMU is OK with BBML2 behaviour too, and disallow SVA if not. Note
>> that the corresponding SMMUv3.2-BBML2 feature is already strict about TLB
>> conflict aborts, so is comparatively nice and straightforward.
>
> Thanks for catching this, Robin, as I completely forgot to pass this onto Miko
> yesterday after our conversation. I suggest we tack a commit on to the end of
> this series to cover that?
>
> I think that strictly this is not needed for Yang's series since that only uses
> BBML2 for kernel mappings, and those pgtables would never be directly shared
> with the SMMU.
Yup, it's really more just a theoretical correctness concern - certainly
Arm's implementations from MMU-700 onwards do support BBML2, while
MMU-600 is now sufficiently old that nobody is likely to pair it with
new BBML-capable CPUs anyway - so it's just to cover the gap that in
principle there may be 3rd-party implementations which might get confused.
Cheers,
Robin.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature
2025-02-19 14:38 ` [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
2025-02-19 15:39 ` Robin Murphy
@ 2025-02-19 23:34 ` Oliver Upton
2025-02-19 23:57 ` Oliver Upton
2025-02-20 1:25 ` Yang Shi
2 siblings, 1 reply; 14+ messages in thread
From: Oliver Upton @ 2025-02-19 23:34 UTC (permalink / raw)
To: Mikołaj Lenczewski
Cc: ryan.roberts, yang, catalin.marinas, will, joey.gouly, broonie,
mark.rutland, james.morse, yangyicong, robin.murphy,
anshuman.khandual, maz, liaochang1, akpm, david, baohua,
ioworker0, linux-arm-kernel, linux-kernel
Hi Miko,
On Wed, Feb 19, 2025 at 02:38:38PM +0000, Mikołaj Lenczewski wrote:
> +config ARM64_ENABLE_BBML2
nit: consider calling this ARM64_BBML2_NOABORT or similar, since this
assumes behavior that exceeds the BBML2 baseline.
> + bool "Enable support for Break-Before-Make Level 2 detection and usage"
> + default y
> + help
> + FEAT_BBM provides detection of support levels for break-before-make
> + sequences. If BBM level 2 is supported, some TLB maintenance requirements
> + can be relaxed to improve performance. Selecting N causes the kernel to
> + fallback to BBM level 0 behaviour even if the system supports BBM level 2.
> +
[...]
> +static bool has_bbml2_noconflict(const struct arm64_cpu_capabilities *entry,
> + int scope)
> +{
> + if (!IS_ENABLED(CONFIG_ARM64_ENABLE_BBML2))
> + return false;
> +
> + /* We want to allow usage of bbml2 in as wide a range of kernel contexts
> + * as possible. This list is therefore an allow-list of known-good
> + * implementations that both support bbml2 and additionally, fulfil the
typo: fullfill
> + * extra constraint of never generating TLB conflict aborts when using
> + * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
> + * kernel contexts difficult to prove safe against recursive aborts).
> + */
We should be *very* specific of what qualifies a 'known-good'
implementation here. Implementations shouldn't be added to this list
based on the observed behavior, only if *the implementer* states their
design will not generate conflict aborts for BBML2 mapping granularity
changes.
> + static const struct midr_range supports_bbml2_without_abort_list[] = {
> + MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
> + MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
> + {}
> + };
> +
> + if (!is_midr_in_range_list(read_cpuid_id(), supports_bbml2_without_abort_list))
> + return false;
> +
> + return true;
> +}
> +
> #ifdef CONFIG_ARM64_PAN
> static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
> {
> @@ -2926,6 +2951,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> .matches = has_cpuid_feature,
> ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
> },
> + {
> + .desc = "BBM Level 2 without conflict abort",
> + .capability = ARM64_HAS_BBML2_NOCONFLICT,
> + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> + .matches = has_bbml2_noconflict,
> + ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, BBM, 2)
> + },
> {
> .desc = "52-bit Virtual Addressing for KVM (LPA2)",
> .capability = ARM64_HAS_LPA2,
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 1e65f2fb45bd..8d67bb4448c5 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -26,6 +26,7 @@ HAS_ECV
> HAS_ECV_CNTPOFF
> HAS_EPAN
> HAS_EVT
> +HAS_BBML2_NOCONFLICT
Please add this cap to cpucap_is_possible() test for the config option.
Thanks,
Oliver
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature
2025-02-19 23:34 ` Oliver Upton
@ 2025-02-19 23:57 ` Oliver Upton
2025-02-20 9:37 ` Mikołaj Lenczewski
0 siblings, 1 reply; 14+ messages in thread
From: Oliver Upton @ 2025-02-19 23:57 UTC (permalink / raw)
To: Mikołaj Lenczewski
Cc: ryan.roberts, yang, catalin.marinas, will, joey.gouly, broonie,
mark.rutland, james.morse, yangyicong, robin.murphy,
anshuman.khandual, maz, liaochang1, akpm, david, baohua,
ioworker0, linux-arm-kernel, linux-kernel
On Wed, Feb 19, 2025 at 03:34:12PM -0800, Oliver Upton wrote:
> Hi Miko,
>
> On Wed, Feb 19, 2025 at 02:38:38PM +0000, Mikołaj Lenczewski wrote:
> > +config ARM64_ENABLE_BBML2
>
> nit: consider calling this ARM64_BBML2_NOABORT or similar, since this
> assumes behavior that exceeds the BBML2 baseline.
>
> > + bool "Enable support for Break-Before-Make Level 2 detection and usage"
> > + default y
> > + help
> > + FEAT_BBM provides detection of support levels for break-before-make
> > + sequences. If BBM level 2 is supported, some TLB maintenance requirements
> > + can be relaxed to improve performance. Selecting N causes the kernel to
> > + fallback to BBM level 0 behaviour even if the system supports BBM level 2.
> > +
>
> [...]
>
> > +static bool has_bbml2_noconflict(const struct arm64_cpu_capabilities *entry,
> > + int scope)
> > +{
> > + if (!IS_ENABLED(CONFIG_ARM64_ENABLE_BBML2))
> > + return false;
> > +
> > + /* We want to allow usage of bbml2 in as wide a range of kernel contexts
> > + * as possible. This list is therefore an allow-list of known-good
> > + * implementations that both support bbml2 and additionally, fulfil the
>
> typo: fullfill
I can't spell either ;-)
> > + * extra constraint of never generating TLB conflict aborts when using
> > + * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
> > + * kernel contexts difficult to prove safe against recursive aborts).
> > + */
>
> We should be *very* specific of what qualifies a 'known-good'
> implementation here. Implementations shouldn't be added to this list
> based on the observed behavior, only if *the implementer* states their
> design will not generate conflict aborts for BBML2 mapping granularity
> changes.
>
> > + static const struct midr_range supports_bbml2_without_abort_list[] = {
> > + MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
> > + MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
> > + {}
> > + };
> > +
> > + if (!is_midr_in_range_list(read_cpuid_id(), supports_bbml2_without_abort_list))
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > #ifdef CONFIG_ARM64_PAN
> > static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
> > {
> > @@ -2926,6 +2951,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> > .matches = has_cpuid_feature,
> > ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
> > },
> > + {
> > + .desc = "BBM Level 2 without conflict abort",
> > + .capability = ARM64_HAS_BBML2_NOCONFLICT,
> > + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> > + .matches = has_bbml2_noconflict,
> > + ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, BBM, 2)
> > + },
> > {
> > .desc = "52-bit Virtual Addressing for KVM (LPA2)",
> > .capability = ARM64_HAS_LPA2,
> > diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> > index 1e65f2fb45bd..8d67bb4448c5 100644
> > --- a/arch/arm64/tools/cpucaps
> > +++ b/arch/arm64/tools/cpucaps
> > @@ -26,6 +26,7 @@ HAS_ECV
> > HAS_ECV_CNTPOFF
> > HAS_EPAN
> > HAS_EVT
> > +HAS_BBML2_NOCONFLICT
>
> Please add this cap to cpucap_is_possible() test for the config option.
>
> Thanks,
> Oliver
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature
2025-02-19 14:38 ` [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
2025-02-19 15:39 ` Robin Murphy
2025-02-19 23:34 ` Oliver Upton
@ 2025-02-20 1:25 ` Yang Shi
2025-02-20 10:34 ` Mikołaj Lenczewski
2 siblings, 1 reply; 14+ messages in thread
From: Yang Shi @ 2025-02-20 1:25 UTC (permalink / raw)
To: Mikołaj Lenczewski, ryan.roberts, catalin.marinas, will,
joey.gouly, broonie, mark.rutland, james.morse, yangyicong,
robin.murphy, anshuman.khandual, maz, liaochang1, akpm, david,
baohua, ioworker0, oliver.upton, linux-arm-kernel, linux-kernel
On 2/19/25 6:38 AM, Mikołaj Lenczewski wrote:
> The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
> and this commit adds a dedicated BBML2 cpufeature to test against
> support for.
>
> This is a system feature as we might have a big.LITTLE architecture
> where some cores support BBML2 and some don't, but we want all cores to
> be available and BBM to default to level 0 (as opposed to having cores
> without BBML2 not coming online).
>
> To support BBML2 in as wide a range of contexts as we can, we want not
> only the architectural guarantees that BBML2 makes, but additionally
> want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
> us having to prove that no recursive faults can be induced in any path
> that uses BBML2, allowing its use for arbitrary kernel mappings.
> Support detection of such CPUs.
>
> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
> ---
> arch/arm64/Kconfig | 9 ++++++++
> arch/arm64/include/asm/cpufeature.h | 5 +++++
> arch/arm64/kernel/cpufeature.c | 32 +++++++++++++++++++++++++++++
> arch/arm64/tools/cpucaps | 1 +
> 4 files changed, 47 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 940343beb3d4..84be2c5976f0 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2057,6 +2057,15 @@ config ARM64_TLB_RANGE
> The feature introduces new assembly instructions, and they were
> support when binutils >= 2.30.
>
> +config ARM64_ENABLE_BBML2
> + bool "Enable support for Break-Before-Make Level 2 detection and usage"
> + default y
> + help
> + FEAT_BBM provides detection of support levels for break-before-make
> + sequences. If BBM level 2 is supported, some TLB maintenance requirements
> + can be relaxed to improve performance. Selecting N causes the kernel to
> + fallback to BBM level 0 behaviour even if the system supports BBM level 2.
> +
> endmenu # "ARMv8.4 architectural features"
>
> menu "ARMv8.5 architectural features"
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index e0e4478f5fb5..2da872035f2e 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -866,6 +866,11 @@ static __always_inline bool system_supports_mpam_hcr(void)
> return alternative_has_cap_unlikely(ARM64_MPAM_HCR);
> }
>
> +static inline bool system_supports_bbml2_noconflict(void)
> +{
> + return alternative_has_cap_unlikely(ARM64_HAS_BBML2_NOCONFLICT);
> +}
> +
> int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
> bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
>
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index d561cf3b8ac7..8c337bd95ef7 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2176,6 +2176,31 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
> return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
> }
>
> +static bool has_bbml2_noconflict(const struct arm64_cpu_capabilities *entry,
> + int scope)
> +{
> + if (!IS_ENABLED(CONFIG_ARM64_ENABLE_BBML2))
> + return false;
> +
> + /* We want to allow usage of bbml2 in as wide a range of kernel contexts
> + * as possible. This list is therefore an allow-list of known-good
> + * implementations that both support bbml2 and additionally, fulfil the
> + * extra constraint of never generating TLB conflict aborts when using
> + * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
> + * kernel contexts difficult to prove safe against recursive aborts).
> + */
> + static const struct midr_range supports_bbml2_without_abort_list[] = {
> + MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
> + MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
> + {}
> + };
> +
> + if (!is_midr_in_range_list(read_cpuid_id(), supports_bbml2_without_abort_list))
> + return false;
> +
> + return true;
> +}
> +
> #ifdef CONFIG_ARM64_PAN
> static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
> {
> @@ -2926,6 +2951,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> .matches = has_cpuid_feature,
> ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
> },
> + {
> + .desc = "BBM Level 2 without conflict abort",
> + .capability = ARM64_HAS_BBML2_NOCONFLICT,
> + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> + .matches = has_bbml2_noconflict,
> + ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, BBM, 2)
Hi Miko,
Thanks for cc'ing me this series. I and Ryan discussed about how to
advertise BBML2 properly in my thread
(https://lore.kernel.org/linux-arm-kernel/4c44cf6e-98de-47bb-b430-2b1331114904@os.amperecomputing.com/).
IIUC, this may not work as expected.
The boot cpu initializes the boot_cpu_data, then the secondary cpus need
to update it, the "sanitized" register value will be generated. For
example, TLB range capability is determined by ISAR0_EL1. If all the
cpus have this feature, the "sanitized" register value will show true
otherwise it will show false.
BBML2 can be determined by MMFR2_EL1. If we can rely on it then system
feature does work. But the problem is some implementations may have
MMFR2_EL1 set, but they may not be able to handle TLB conflict. We can't
rely on it solely so we check MIDR in .matches callback instead of
MMFR2_EL1. But system feature .matches callback is just called once on
boot CPU because it is supposed to read the sanitized register value. So
you actually just checked the MIDR on boot CPU in .matches callback if I
read the code correctly.
I'm not quite familiar with cpufeature details, if I'm wrong please feel
free to correct me.
Yang
> + },
> {
> .desc = "52-bit Virtual Addressing for KVM (LPA2)",
> .capability = ARM64_HAS_LPA2,
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 1e65f2fb45bd..8d67bb4448c5 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -26,6 +26,7 @@ HAS_ECV
> HAS_ECV_CNTPOFF
> HAS_EPAN
> HAS_EVT
> +HAS_BBML2_NOCONFLICT
> HAS_FPMR
> HAS_FGT
> HAS_FPSIMD
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature
2025-02-19 16:25 ` Robin Murphy
@ 2025-02-20 9:33 ` Mikołaj Lenczewski
0 siblings, 0 replies; 14+ messages in thread
From: Mikołaj Lenczewski @ 2025-02-20 9:33 UTC (permalink / raw)
To: Robin Murphy
Cc: Ryan Roberts, yang, catalin.marinas, will, joey.gouly, broonie,
mark.rutland, james.morse, yangyicong, anshuman.khandual, maz,
liaochang1, akpm, david, baohua, ioworker0, oliver.upton,
linux-arm-kernel, linux-kernel
On Wed, Feb 19, 2025 at 04:25:56PM +0000, Robin Murphy wrote:
> > On 19/02/2025 15:39, Robin Murphy wrote:
> >> ...
> >>
> >> If this may be used for splitting/compacting userspace mappings, then similarly
> >> to 6e192214c6c8 ("iommu/arm-smmu-v3: Document SVA interaction with new pagetable
> >> features"), strictly we'll also want a check in arm_smmu_sva_supported() to make
> >> sure that the SMMU is OK with BBML2 behaviour too, and disallow SVA if not. Note
> >> that the corresponding SMMUv3.2-BBML2 feature is already strict about TLB
> >> conflict aborts, so is comparatively nice and straightforward.
>
> Yup, it's really more just a theoretical correctness concern - certainly
> Arm's implementations from MMU-700 onwards do support BBML2, while
> MMU-600 is now sufficiently old that nobody is likely to pair it with
> new BBML-capable CPUs anyway - so it's just to cover the gap that in
> principle there may be 3rd-party implementations which might get confused.
>
> Cheers,
> Robin.
Hi Robin,
Thank you for taking the time to review these patches. I will add the
check in the next patch series.
--
Kind regards,
Mikołaj Lenczewski
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature
2025-02-19 23:57 ` Oliver Upton
@ 2025-02-20 9:37 ` Mikołaj Lenczewski
0 siblings, 0 replies; 14+ messages in thread
From: Mikołaj Lenczewski @ 2025-02-20 9:37 UTC (permalink / raw)
To: Oliver Upton
Cc: ryan.roberts, yang, catalin.marinas, will, joey.gouly, broonie,
mark.rutland, james.morse, yangyicong, robin.murphy,
anshuman.khandual, maz, liaochang1, akpm, david, baohua,
ioworker0, linux-arm-kernel, linux-kernel
Hi Oliver,
Thank you for taking the time to review this patch series.
On Wed, Feb 19, 2025 at 03:57:43PM -0800, Oliver Upton wrote:
> On Wed, Feb 19, 2025 at 03:34:12PM -0800, Oliver Upton wrote:
> > Hi Miko,
> >
> > On Wed, Feb 19, 2025 at 02:38:38PM +0000, Mikołaj Lenczewski wrote:
> > > +config ARM64_ENABLE_BBML2
> >
> > nit: consider calling this ARM64_BBML2_NOABORT or similar, since this
> > assumes behavior that exceeds the BBML2 baseline.
> >
That is a better phrasing, will change this.
> > > + bool "Enable support for Break-Before-Make Level 2 detection and usage"
> > > + default y
> > > + help
> > > + FEAT_BBM provides detection of support levels for break-before-make
> > > + sequences. If BBM level 2 is supported, some TLB maintenance requirements
> > > + can be relaxed to improve performance. Selecting N causes the kernel to
> > > + fallback to BBM level 0 behaviour even if the system supports BBM level 2.
> > > +
> >
> > [...]
> >
I will assume you mean to add the comment about this technically
exceeding the BBML2 baseline to the docs here as well? Or am I
misunderstanding?
> > > +static bool has_bbml2_noconflict(const struct arm64_cpu_capabilities *entry,
> > > + int scope)
> > > +{
> > > + if (!IS_ENABLED(CONFIG_ARM64_ENABLE_BBML2))
> > > + return false;
> > > +
> > > + /* We want to allow usage of bbml2 in as wide a range of kernel contexts
> > > + * as possible. This list is therefore an allow-list of known-good
> > > + * implementations that both support bbml2 and additionally, fulfil the
> >
> > typo: fullfill
>
> I can't spell either ;-)
Spelling is hard, will fix :)
> > > + * extra constraint of never generating TLB conflict aborts when using
> > > + * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
> > > + * kernel contexts difficult to prove safe against recursive aborts).
> > > + */
> >
> > We should be *very* specific of what qualifies a 'known-good'
> > implementation here. Implementations shouldn't be added to this list
> > based on the observed behavior, only if *the implementer* states their
> > design will not generate conflict aborts for BBML2 mapping granularity
> > changes.
> >
Understood, will clarify.
> > > diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> > > index 1e65f2fb45bd..8d67bb4448c5 100644
> > > --- a/arch/arm64/tools/cpucaps
> > > +++ b/arch/arm64/tools/cpucaps
> > > @@ -26,6 +26,7 @@ HAS_ECV
> > > HAS_ECV_CNTPOFF
> > > HAS_EPAN
> > > HAS_EVT
> > > +HAS_BBML2_NOCONFLICT
> >
> > Please add this cap to cpucap_is_possible() test for the config option.
> >
Sure, will do so.
--
Kind regards,
Mikołaj Lenczewski
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature
2025-02-20 1:25 ` Yang Shi
@ 2025-02-20 10:34 ` Mikołaj Lenczewski
2025-02-20 20:01 ` Yang Shi
0 siblings, 1 reply; 14+ messages in thread
From: Mikołaj Lenczewski @ 2025-02-20 10:34 UTC (permalink / raw)
To: Yang Shi
Cc: ryan.roberts, catalin.marinas, will, joey.gouly, broonie,
mark.rutland, james.morse, yangyicong, robin.murphy,
anshuman.khandual, maz, liaochang1, akpm, david, baohua,
ioworker0, oliver.upton, linux-arm-kernel, linux-kernel
On Wed, Feb 19, 2025 at 05:25:00PM -0800, Yang Shi wrote:
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index d561cf3b8ac7..8c337bd95ef7 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -2176,6 +2176,31 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
> > return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
> > }
> >
> > +static bool has_bbml2_noconflict(const struct arm64_cpu_capabilities *entry,
> > + int scope)
> > +{
> > + if (!IS_ENABLED(CONFIG_ARM64_ENABLE_BBML2))
> > + return false;
> > +
> > + /* We want to allow usage of bbml2 in as wide a range of kernel contexts
> > + * as possible. This list is therefore an allow-list of known-good
> > + * implementations that both support bbml2 and additionally, fulfil the
> > + * extra constraint of never generating TLB conflict aborts when using
> > + * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
> > + * kernel contexts difficult to prove safe against recursive aborts).
> > + */
> > + static const struct midr_range supports_bbml2_without_abort_list[] = {
> > + MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
> > + MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
> > + {}
> > + };
> > +
> > + if (!is_midr_in_range_list(read_cpuid_id(), supports_bbml2_without_abort_list))
> > + return false;
> > +
> > + return true;
> > +}
> Hi Miko,
>
> Thanks for cc'ing me this series. I and Ryan discussed about how to
> advertise BBML2 properly in my thread
> (https://lore.kernel.org/linux-arm-kernel/4c44cf6e-98de-47bb-b430-2b1331114904@os.amperecomputing.com/).
> IIUC, this may not work as expected.
>
> The boot cpu initializes the boot_cpu_data, then the secondary cpus need
> to update it, the "sanitized" register value will be generated. For
> example, TLB range capability is determined by ISAR0_EL1. If all the
> cpus have this feature, the "sanitized" register value will show true
> otherwise it will show false.
>
> BBML2 can be determined by MMFR2_EL1. If we can rely on it then system
> feature does work. But the problem is some implementations may have
> MMFR2_EL1 set, but they may not be able to handle TLB conflict. We can't
> rely on it solely so we check MIDR in .matches callback instead of
> MMFR2_EL1. But system feature .matches callback is just called once on
> boot CPU because it is supposed to read the sanitized register value. So
> you actually just checked the MIDR on boot CPU in .matches callback if I
> read the code correctly.
>
> I'm not quite familiar with cpufeature details, if I'm wrong please feel
> free to correct me.
>
> Yang
Hi Yang,
Thank you for taking the time to review this patch series.
Thank you also for the spot. I am very much not an expert on
cpufeatures, but I think you are correct. IIUC, currently the .matches
check would go through as long as the the boot CPU executing the
.matches function has the correct MIDR value, and as long as
CONFIG_ARM64_ENABLE_BBML2 is set.
If, as you point out, another CPU has a MIDR that is not on this list
and which was not checked (because .matches only executes on a single
boot CPU), then .matches should still go through (and we could run into
problems when said other CPU executes any BBML2 aware code).
Please let me know if I have understood what you are saying correctly.
From re-reading `include/asm/cpufeature.h`, making it a SCOPE_LOCAL_CPU
feature seems to be close to what we want. We want to have each CPU test
its cpuid against the allowed MIDR list, and the feature overall to
only be considered present if *all* boot cpus returned true (not as
SCOPE_LOCAL_CPU puts it "... detect if at least one matches.").
I will see if this can be hacked around with the current system, and if
not, we might have to extend the current cpucaps scopes / machinery like
you suggest in your patch series comments.
--
Kind regards,
Mikołaj Lenczewski
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature
2025-02-20 10:34 ` Mikołaj Lenczewski
@ 2025-02-20 20:01 ` Yang Shi
0 siblings, 0 replies; 14+ messages in thread
From: Yang Shi @ 2025-02-20 20:01 UTC (permalink / raw)
To: Mikołaj Lenczewski
Cc: ryan.roberts, catalin.marinas, will, joey.gouly, broonie,
mark.rutland, james.morse, yangyicong, robin.murphy,
anshuman.khandual, maz, liaochang1, akpm, david, baohua,
ioworker0, oliver.upton, linux-arm-kernel, linux-kernel
On 2/20/25 2:34 AM, Mikołaj Lenczewski wrote:
> On Wed, Feb 19, 2025 at 05:25:00PM -0800, Yang Shi wrote:
>>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>>> index d561cf3b8ac7..8c337bd95ef7 100644
>>> --- a/arch/arm64/kernel/cpufeature.c
>>> +++ b/arch/arm64/kernel/cpufeature.c
>>> @@ -2176,6 +2176,31 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
>>> return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
>>> }
>>>
>>> +static bool has_bbml2_noconflict(const struct arm64_cpu_capabilities *entry,
>>> + int scope)
>>> +{
>>> + if (!IS_ENABLED(CONFIG_ARM64_ENABLE_BBML2))
>>> + return false;
>>> +
>>> + /* We want to allow usage of bbml2 in as wide a range of kernel contexts
>>> + * as possible. This list is therefore an allow-list of known-good
>>> + * implementations that both support bbml2 and additionally, fulfil the
>>> + * extra constraint of never generating TLB conflict aborts when using
>>> + * the relaxed bbml2 semantics (such aborts make use of bbml2 in certain
>>> + * kernel contexts difficult to prove safe against recursive aborts).
>>> + */
>>> + static const struct midr_range supports_bbml2_without_abort_list[] = {
>>> + MIDR_REV_RANGE(MIDR_CORTEX_X4, 0, 3, 0xf),
>>> + MIDR_REV_RANGE(MIDR_NEOVERSE_V3, 0, 2, 0xf),
>>> + {}
>>> + };
>>> +
>>> + if (!is_midr_in_range_list(read_cpuid_id(), supports_bbml2_without_abort_list))
>>> + return false;
>>> +
>>> + return true;
>>> +}
>> Hi Miko,
>>
>> Thanks for cc'ing me this series. I and Ryan discussed about how to
>> advertise BBML2 properly in my thread
>> (https://lore.kernel.org/linux-arm-kernel/4c44cf6e-98de-47bb-b430-2b1331114904@os.amperecomputing.com/).
>> IIUC, this may not work as expected.
>>
>> The boot cpu initializes the boot_cpu_data, then the secondary cpus need
>> to update it, the "sanitized" register value will be generated. For
>> example, TLB range capability is determined by ISAR0_EL1. If all the
>> cpus have this feature, the "sanitized" register value will show true
>> otherwise it will show false.
>>
>> BBML2 can be determined by MMFR2_EL1. If we can rely on it then system
>> feature does work. But the problem is some implementations may have
>> MMFR2_EL1 set, but they may not be able to handle TLB conflict. We can't
>> rely on it solely so we check MIDR in .matches callback instead of
>> MMFR2_EL1. But system feature .matches callback is just called once on
>> boot CPU because it is supposed to read the sanitized register value. So
>> you actually just checked the MIDR on boot CPU in .matches callback if I
>> read the code correctly.
>>
>> I'm not quite familiar with cpufeature details, if I'm wrong please feel
>> free to correct me.
>>
>> Yang
> Hi Yang,
>
> Thank you for taking the time to review this patch series.
>
> Thank you also for the spot. I am very much not an expert on
> cpufeatures, but I think you are correct. IIUC, currently the .matches
> check would go through as long as the the boot CPU executing the
> .matches function has the correct MIDR value, and as long as
> CONFIG_ARM64_ENABLE_BBML2 is set.
>
> If, as you point out, another CPU has a MIDR that is not on this list
> and which was not checked (because .matches only executes on a single
> boot CPU), then .matches should still go through (and we could run into
> problems when said other CPU executes any BBML2 aware code).
>
> Please let me know if I have understood what you are saying correctly.
Yes, that is exactly what I meant.
>
> From re-reading `include/asm/cpufeature.h`, making it a SCOPE_LOCAL_CPU
> feature seems to be close to what we want. We want to have each CPU test
> its cpuid against the allowed MIDR list, and the feature overall to
> only be considered present if *all* boot cpus returned true (not as
> SCOPE_LOCAL_CPU puts it "... detect if at least one matches.").
Yeah, LOCAL_CPU feature may work, but I'm not sure whether it can
satisfy this usecase in the current shape.
>
> I will see if this can be hacked around with the current system, and if
> not, we might have to extend the current cpucaps scopes / machinery like
> you suggest in your patch series comments.
Thank you for looking into it.
Yang
>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-02-20 20:03 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-19 14:38 [PATCH v1 0/3] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
2025-02-19 14:38 ` [PATCH v1 1/3] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
2025-02-19 15:39 ` Robin Murphy
2025-02-19 15:43 ` Ryan Roberts
2025-02-19 16:25 ` Robin Murphy
2025-02-20 9:33 ` Mikołaj Lenczewski
2025-02-19 23:34 ` Oliver Upton
2025-02-19 23:57 ` Oliver Upton
2025-02-20 9:37 ` Mikołaj Lenczewski
2025-02-20 1:25 ` Yang Shi
2025-02-20 10:34 ` Mikołaj Lenczewski
2025-02-20 20:01 ` Yang Shi
2025-02-19 14:38 ` [PATCH v1 2/3] arm64/mm: Delay tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
2025-02-19 14:38 ` [PATCH v1 3/3] arm64/mm: Elide " Mikołaj Lenczewski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).