linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RESEND RFC PATCH v1 0/5] Initial BBML2 support for contpte_convert()
@ 2024-12-11 16:01 Mikołaj Lenczewski
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM Mikołaj Lenczewski
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Mikołaj Lenczewski @ 2024-12-11 16:01 UTC (permalink / raw)
  To: ryan.roberts, catalin.marinas, will, corbet, maz, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui
  Cc: Mikołaj Lenczewski, linux-arm-kernel, linux-doc,
	linux-kernel, kvmarm

Resending as had wrong address for linux-doc and kvmarm. Apologies for
spam.

Hi All,

This patch series seeks to gather feedback on adding initial support
for level 2 of the Break-Before-Make arm64 architectural feature,
specifically to contpte_convert().

This support reorders a TLB invalidation in contpte_convert(), and
optionally elides said invalidation completely which leads to a 12%
improvement when executing a microbenchmark designed to force the
pathological path where contpte_convert() gets called. This
represents an 80% reduction in the cost of calling contpte_convert().

However, the elision of the invalidation is still pending review to
ensure it is architecturally valid. Without it, the reodering also
represents a performance improvement due to reducing thread contention,
as there is a smaller time window for racing threads to see an invalid
pagetable entry (especially if they already have a cached entry in their
TLB that they are working off of).

This series is based on v6.13-rc2 (fac04efc5c79).

Break-Before-Make Level 2
=========================

Break-Before-Make (BBM) sequences ensure a consistent view of the
page tables. They avoid TLB multi-hits and ensure atomicity and
ordering guarantees. BBM level 0 simply defines the current use
of page tables. When you want to change certain bits in a pte,
you need to:

- clear the pte
- dsb()
- issue a tlbi for the pte
- dsb()
- repaint the pte
- dsb()

When changing block size, or toggling the contiguous bit, we
currently use this BBM level 0 sequence. With BBM level 2 support,
however, we can relax the BBM sequence and benefit from a performance
improvement. The hardware would then either automatically handle the
TLB invalidations, or would take a TLB Conflict Abort Exception.

This exception can either be a stage 1 or stage 2 exception, depending
on whether stage 1 or stage 2 translations are in use. The architecture
currently mandates a worst-case invalidation of vmalle1 or vmalls12e1,
when stage 2 translation is not in-use and in-use respectively.

Outstanding Questions and Remaining TODOs
=========================================

Patch 4 moves the tlbi so that the window where the pte is invalid is
significantly smaller. This reduces the chances of racing threads
accessing the memory during the window and taking a fault. This is
confirmed to be architecturally sound.

Patch 5 removes the tlbi entirely. This has the benefit of
significantly reducing the cost of contpte_convert(). While testing
has demonstrated that this works as expected on Arm-designed CPUs, we
are still in the process of confirming whether it is architecturally
correct. I am requesting review while that process is on-going. Patch 5
would be dropped if it turns out to be architecturally unsound.

Another note is that the stage 2 TLB conflict handling is included as
patch 1 of this series. This patch could (and probably should) be sent
separately as it may be useful outside this series, but is included for
reference.

Thanks,
Miko

Mikołaj Lenczewski (5):
  arm64: Add TLB Conflict Abort Exception handler to KVM
  arm64: Add BBM Level 2 cpu feature
  arm64: Add errata and workarounds for systems with broken BBML2
  arm64/mm: Delay tlbi in contpte_convert() under BBML2
  arm64/mm: Elide tlbi in contpte_convert() under BBML2

 Documentation/arch/arm64/silicon-errata.rst |  32 ++++
 arch/arm64/Kconfig                          | 164 ++++++++++++++++++++
 arch/arm64/include/asm/cpufeature.h         |  14 ++
 arch/arm64/include/asm/esr.h                |   8 +
 arch/arm64/kernel/cpufeature.c              |  37 +++++
 arch/arm64/kvm/mmu.c                        |   6 +
 arch/arm64/mm/contpte.c                     |   3 +-
 arch/arm64/mm/fault.c                       |  27 +++-
 arch/arm64/tools/cpucaps                    |   1 +
 9 files changed, 290 insertions(+), 2 deletions(-)

-- 
2.45.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM
  2024-12-11 16:01 [RESEND RFC PATCH v1 0/5] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
@ 2024-12-11 16:01 ` Mikołaj Lenczewski
  2024-12-11 17:40   ` Marc Zyngier
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Mikołaj Lenczewski @ 2024-12-11 16:01 UTC (permalink / raw)
  To: ryan.roberts, catalin.marinas, will, corbet, maz, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui
  Cc: Mikołaj Lenczewski, linux-arm-kernel, linux-doc,
	linux-kernel, kvmarm

Currently, KVM does not handle the case of a stage 2 TLB conflict abort
exception. The Arm ARM specifies that the worst-case handling of such an
exception requires a `tlbi vmalls12e1`. Perform such an invalidation
when this exception is encountered.

Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
---
 arch/arm64/include/asm/esr.h | 8 ++++++++
 arch/arm64/kvm/mmu.c         | 6 ++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index d1b1a33f9a8b..8a66f81ca291 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -121,6 +121,7 @@
 #define ESR_ELx_FSC_SEA_TTW(n)	(0x14 + (n))
 #define ESR_ELx_FSC_SECC	(0x18)
 #define ESR_ELx_FSC_SECC_TTW(n)	(0x1c + (n))
+#define ESR_ELx_FSC_TLBABT	(0x30)
 
 /* Status codes for individual page table levels */
 #define ESR_ELx_FSC_ACCESS_L(n)	(ESR_ELx_FSC_ACCESS + (n))
@@ -464,6 +465,13 @@ static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
 	       (esr == ESR_ELx_FSC_ACCESS_L(0));
 }
 
+static inline bool esr_fsc_is_tlb_conflict_abort(unsigned long esr)
+{
+	esr = esr & ESR_ELx_FSC;
+
+	return esr == ESR_ELx_FSC_TLBABT;
+}
+
 /* Indicate whether ESR.EC==0x1A is for an ERETAx instruction */
 static inline bool esr_iss_is_eretax(unsigned long esr)
 {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c9d46ad57e52..c8c6f5a97a1b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,6 +1756,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
 	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
 
+	if (esr_fsc_is_tlb_conflict_abort(esr)) {
+		// does a `tlbi vmalls12e1is`
+		__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
+		return 1;
+	}
+
 	if (esr_fsc_is_translation_fault(esr)) {
 		/* Beyond sanitised PARange (which is the IPA limit) */
 		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2024-12-11 16:01 [RESEND RFC PATCH v1 0/5] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM Mikołaj Lenczewski
@ 2024-12-11 16:01 ` Mikołaj Lenczewski
  2024-12-12  8:25   ` Marc Zyngier
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 3/5] arm64: Add errata and workarounds for systems with broken BBML2 Mikołaj Lenczewski
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Mikołaj Lenczewski @ 2024-12-11 16:01 UTC (permalink / raw)
  To: ryan.roberts, catalin.marinas, will, corbet, maz, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui
  Cc: Mikołaj Lenczewski, linux-arm-kernel, linux-doc,
	linux-kernel, kvmarm

The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
and this commit adds a dedicated BBML2 cpufeature to test against
support for.

In supporting BBM level 2, we open ourselves up to potential TLB
Conflict Abort Exceptions during expected execution, instead of only
in exceptional circumstances. In the case of an abort, it is
implementation defined at what stage the abort is generated, and
the minimal set of required invalidations is also implementation
defined. The maximal set of invalidations is to do a `tlbi vmalle1`
or `tlbi vmalls12e1`, depending on the stage.

Such aborts should not occur on Arm hardware, and were not seen in
benchmarked systems, so unless performance concerns arise, implementing
the abort handlers with the worst-case invalidations seems like an
alright hack.

Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 14 ++++++++++++++
 arch/arm64/kernel/cpufeature.c      |  7 +++++++
 arch/arm64/mm/fault.c               | 27 ++++++++++++++++++++++++++-
 arch/arm64/tools/cpucaps            |  1 +
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 8b4e5a3cd24c..a9f2ac335392 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -866,6 +866,20 @@ static __always_inline bool system_supports_mpam_hcr(void)
 	return alternative_has_cap_unlikely(ARM64_MPAM_HCR);
 }
 
+static inline bool system_supports_bbml2(void)
+{
+	/* currently, BBM is only relied on by code touching the userspace page
+	 * tables, and as such we are guaranteed that caps have been finalised.
+	 *
+	 * if later we want to use BBM for kernel mappings, particularly early
+	 * in the kernel, this may return 0 even if BBML2 is actually supported,
+	 * which means unnecessary break-before-make sequences, but is still
+	 * correct
+	 */
+
+	return alternative_has_cap_unlikely(ARM64_HAS_BBML2);
+}
+
 int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
 bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6ce71f444ed8..7cc94bd5da24 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2917,6 +2917,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_cpuid_feature,
 		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
 	},
+	{
+		.desc = "BBM Level 2 Support",
+		.capability = ARM64_HAS_BBML2,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_cpuid_feature,
+		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, BBM, 2)
+	},
 	{
 		.desc = "52-bit Virtual Addressing for KVM (LPA2)",
 		.capability = ARM64_HAS_LPA2,
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index ef63651099a9..dc119358cbc1 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -844,6 +844,31 @@ static int do_tag_check_fault(unsigned long far, unsigned long esr,
 	return 0;
 }
 
+static int do_conflict_abort(unsigned long far, unsigned long esr,
+			     struct pt_regs *regs)
+{
+	if (!system_supports_bbml2())
+		return do_bad(far, esr, regs);
+
+	/* if we receive a TLB conflict abort, we know that there are multiple
+	 * TLB entries that translate the same address range. the minimum set
+	 * of invalidations to clear these entries is implementation defined.
+	 * the maximum set is defined as either tlbi(vmalls12e1) or tlbi(alle1).
+	 *
+	 * if el2 is enabled and stage 2 translation enabled, this may be
+	 * raised as a stage 2 abort. if el2 is enabled but stage 2 translation
+	 * disabled, or if el2 is disabled, it will be raised as a stage 1
+	 * abort.
+	 *
+	 * local_flush_tlb_all() does a tlbi(vmalle1), which is enough to
+	 * handle a stage 1 abort.
+	 */
+
+	local_flush_tlb_all();
+
+	return 0;
+}
+
 static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"ttbr address size fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"level 1 address size fault"	},
@@ -893,7 +918,7 @@ static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 45"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 46"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 47"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"TLB conflict abort"		},
+	{ do_conflict_abort,	SIGKILL, SI_KERNEL,	"TLB conflict abort"		},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"Unsupported atomic hardware update fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 50"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 51"			},
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index eb17f59e543c..4ee0fbb7765b 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -26,6 +26,7 @@ HAS_ECV
 HAS_ECV_CNTPOFF
 HAS_EPAN
 HAS_EVT
+HAS_BBML2
 HAS_FPMR
 HAS_FGT
 HAS_FPSIMD
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND RFC PATCH v1 3/5] arm64: Add errata and workarounds for systems with broken BBML2
  2024-12-11 16:01 [RESEND RFC PATCH v1 0/5] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM Mikołaj Lenczewski
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
@ 2024-12-11 16:01 ` Mikołaj Lenczewski
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 5/5] arm64/mm: Elide " Mikołaj Lenczewski
  4 siblings, 0 replies; 27+ messages in thread
From: Mikołaj Lenczewski @ 2024-12-11 16:01 UTC (permalink / raw)
  To: ryan.roberts, catalin.marinas, will, corbet, maz, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui
  Cc: Mikołaj Lenczewski, linux-arm-kernel, linux-doc,
	linux-kernel, kvmarm

There are systems which claim support for BBML2, but whose
implementation of this support is broken. Add a Kconfig erratum for each
of these systems, and a cpufeature workaround that forces the supported
BBM level on these systems to 0.

Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
---
 Documentation/arch/arm64/silicon-errata.rst |  32 ++++
 arch/arm64/Kconfig                          | 164 ++++++++++++++++++++
 arch/arm64/kernel/cpufeature.c              |  32 +++-
 3 files changed, 227 insertions(+), 1 deletion(-)

diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index b42fea07c5ce..4b4c1dd9b671 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -126,16 +126,26 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A76      | #3324349        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A76      | #3696297        | ARM64_ERRATUM_3696297       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A77      | #1491015        | N/A                         |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A77      | #1508412        | ARM64_ERRATUM_1508412       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A77      | #3324348        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A77      | #3696294        | ARM64_ERRATUM_3696294       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A78      | #3324344        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A78      | #3696287        | ARM64_ERRATUM_3696287       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A78C     | #3324346,3324347| ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A78C     | #3696291        | ARM64_ERRATUM_3696291       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A78C     | #3696292        | ARM64_ERRATUM_3696292       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A710     | #2119858        | ARM64_ERRATUM_2119858       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A710     | #2054223        | ARM64_ERRATUM_2054223       |
@@ -144,6 +154,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A710     | #3324338        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A710     | #3696244        | ARM64_ERRATUM_3696244       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A715     | #2645198        | ARM64_ERRATUM_2645198       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A715     | #3456084        | ARM64_ERRATUM_3194386       |
@@ -156,6 +168,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-X1       | #3324344        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-X1       | #3696287        | ARM64_ERRATUM_3696287       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-X1C      | #3324346        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-X2       | #2119858        | ARM64_ERRATUM_2119858       |
@@ -164,10 +178,18 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-X2       | #3324338        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-X2       | #3696244        | ARM64_ERRATUM_3696244       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-X3       | #3324335        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-X3       | #3696239        | ARM64_ERRATUM_3696239       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-X4       | #3043263        | ARM64_ERRATUM_3043263       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-X4       | #3194386        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-X925     | #3056274        | ARM64_ERRATUM_3056274       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-X925     | #3324334        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
@@ -180,6 +202,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N1     | #3324349        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-N1     | #3696297        | ARM64_ERRATUM_3696297       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N2     | #2139208        | ARM64_ERRATUM_2139208       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N2     | #2067961        | ARM64_ERRATUM_2067961       |
@@ -188,14 +212,22 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N2     | #3324339        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-N2     | #3696250        | ARM64_ERRATUM_3696250       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N3     | #3456111        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-V1     | #1619801        | N/A                         |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-V1     | #3324341        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-V1     | #3696285        | ARM64_ERRATUM_3696285       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-V2     | #3324336        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-V2     | #3696242        | ARM64_ERRATUM_3696242       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-V3     | #3053180        | ARM64_ERRATUM_3053180       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-V3     | #3312417        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | MMU-500         | #841119,826419  | N/A                         |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 100570a048c5..9ef8418e8410 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1127,6 +1127,170 @@ config ARM64_ERRATUM_3194386
 
 	  If unsure, say Y.
 
+config ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	bool
+
+config ARM64_ERRATUM_3696250
+	bool "Neoverse-N2: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  Affected Neoverse-N2 cores (r0p0, r0p1, r0p2, r0p3) declare
+	  break-before-make level 2 support, but changing the block size
+	  without utilising a break-before-make sequence, or mis-programming
+	  the contiguous hint bit can lead to a livelock.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_3696244
+	bool "Cortex-A710/Cortex-X2: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  Affected Cortex-A710 and Cortex-X2 cores (r0p0, r1p0, r2p0, r2p1)
+	  declare break-before-make level 2 support, but changing the block
+	  size without utilising a break-before-make sequence, or
+	  mis-programming the contiguous hint bit can lead to a livelock.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_3696297
+	bool "Cortex-A76/Neoverse-N1: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  This option adds a workaround for ARM Cortex-A76/Neoverse-N1 erratum
+	  3696297.
+
+	  Affected Cortex-A76 and Neoverse-N1 cores (r0p0, r1p0, r2p0, r3p0,
+	  r3p1, r4p0, r4p1) declare break-before-make level 2 support, but
+	  changing the block size without utilising a break-before-make sequence,
+	  or mis-programming the contiguous hint bit can lead to a livelock.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_3696294
+	bool "Cortex-A77: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  This option adds a workaround for ARM Cortex-A77 erratum 3696294.
+
+	  Affected Cortex-A77 cores (r0p0, r1p0, r1p1) declare break-before-make
+	  level 2 support, but changing the block size without utilising a
+	  break-before-make sequence, or mis-programming the contiguous hint
+	  bit can lead to a livelock.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_3696239
+	bool "Cortex-X3: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  Affected Cortex-X3 cores (r0p0, r1p0, r1p1, r1p2) declare
+	  break-before-make level 2 support, but changing the block size
+	  without utilising a break-before-make sequence, or mis-programming
+	  the contiguous hint bit can lead to a livelock.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_3696242
+	bool "Neoverse-V2: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  Affected Neoverse-V2 cores (r0p0, r0p1, r0p2) declare
+	  break-before-make level 2 support, but changing the block size
+	  without utilising a break-before-make sequence, or mis-programming
+	  the contiguous hint bit can lead to a livelock.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_3696285
+	bool "Neoverse-V1: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  Affected Neoverse-V1 cores (r0p0, r1p0, r1p1, r1p2) declare
+	  break-before-make level 2 support, but changing the block size
+	  without utilising a break-before-make sequence, or mis-programming
+	  the contiguous hint bit can lead to a livelock.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_3696287
+	bool "Cortex-A78/Cortex-X1: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  Affected Cortex-A78 and Cortex-X1 cores (r0p0, r1p0, r1p1, r1p2)
+	  declare break-before-make level 2 support, but changing the block
+	  size without utilising a break-before-make sequence, or
+	  mis-programming the contiguous hint bit can lead to a livelock.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_3696291
+	bool "Cortex-A78C: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  Affected Cortex-A78C cores (r0p0, r0p1, r0p2) declare
+	  break-before-make level 2 support, but changing the block size
+	  without utilising a break-before-make sequence, or mis-programming
+	  the contiguous hint bit can lead to a livelock.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_3696292
+	bool "Cortex-A78C: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  Affected Cortex-A78C cores (r0p1, r0p2) declare break-before-make
+	  level 2 support, but changing the block size without utilising a
+	  break-before-make sequence, or mis-programming the contiguous hint
+	  bit can lead to a livelock.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_3056274
+	bool "Cortex-X925: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  Affected Cortex-X925 cores (r0p0, r0p1) declare break-before-make
+	  level 2 support, but changing the block size without utilising a
+	  break-before-make sequence, or mis-programming the contiguous hint
+	  bit can lead to a livelock.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_3043263
+	bool "Cortex-X4: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  Affected Cortex-X4 cores (r0p0, r0p1, r0p2) declare break-before-make
+	  level 2 support, but changing the block size without utilising a
+	  break-before-make sequence, or mis-programming the contiguous hint
+	  bit can lead to a livelock.
+
+	  If unsure, say Y.
+
+config ARM64_ERRATUM_3053180
+	bool "Neoverse-V3: workaround for broken BBM level 2 support"
+	default y
+	select ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT
+	help
+	  Affected Neoverse-V3 cores (r0p0, r0p1) declare break-before-make
+	  level 2 support, but changing the block size without utilising a
+	  break-before-make sequence, or mis-programming the contiguous hint
+	  bit can lead to a livelock.
+
+	  If unsure, say Y.
+
 config CAVIUM_ERRATUM_22375
 	bool "Cavium erratum 22375, 24313"
 	default y
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 7cc94bd5da24..e6c05b330e0f 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2167,6 +2167,36 @@ static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
 	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
 }
 
+static bool has_bbml2(const struct arm64_cpu_capabilities *entry,
+		      int scope)
+{
+	if (IS_ENABLED(CONFIG_ARM64_WORKAROUND_BROKEN_BBML2_SUPPORT)) {
+		static const struct midr_range broken_bbml2_list[] = {
+			MIDR_ALL_VERSIONS(MIDR_CORTEX_A76),
+			MIDR_ALL_VERSIONS(MIDR_CORTEX_A77),
+			MIDR_ALL_VERSIONS(MIDR_CORTEX_A78),
+			MIDR_ALL_VERSIONS(MIDR_CORTEX_A78C),
+			MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
+			MIDR_ALL_VERSIONS(MIDR_CORTEX_X1),
+			MIDR_ALL_VERSIONS(MIDR_CORTEX_X2),
+			MIDR_ALL_VERSIONS(MIDR_CORTEX_X3),
+			MIDR_ALL_VERSIONS(MIDR_CORTEX_X4),
+			MIDR_ALL_VERSIONS(MIDR_CORTEX_X925),
+			MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
+			MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+			MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
+			MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V2),
+			MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V3),
+			{}
+		};
+
+		if (is_midr_in_range_list(read_cpuid_id(), broken_bbml2_list))
+			return false;
+	}
+
+	return has_cpuid_feature(entry, scope);
+}
+
 #ifdef CONFIG_ARM64_PAN
 static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
 {
@@ -2921,7 +2951,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.desc = "BBM Level 2 Support",
 		.capability = ARM64_HAS_BBML2,
 		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
-		.matches = has_cpuid_feature,
+		.matches = has_bbml2,
 		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, BBM, 2)
 	},
 	{
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in contpte_convert() under BBML2
  2024-12-11 16:01 [RESEND RFC PATCH v1 0/5] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
                   ` (2 preceding siblings ...)
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 3/5] arm64: Add errata and workarounds for systems with broken BBML2 Mikołaj Lenczewski
@ 2024-12-11 16:01 ` Mikołaj Lenczewski
  2024-12-19 16:36   ` Will Deacon
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 5/5] arm64/mm: Elide " Mikołaj Lenczewski
  4 siblings, 1 reply; 27+ messages in thread
From: Mikołaj Lenczewski @ 2024-12-11 16:01 UTC (permalink / raw)
  To: ryan.roberts, catalin.marinas, will, corbet, maz, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui
  Cc: Mikołaj Lenczewski, linux-arm-kernel, linux-doc,
	linux-kernel, kvmarm

When converting a region via contpte_convert() to use mTHP, we have two
different goals. We have to mark each entry as contiguous, and we would
like to smear the dirty and young (access) bits across all entries in
the contiguous block. Currently, we do this by first accumulating the
dirty and young bits in the block, using an atomic
__ptep_get_and_clear() and the relevant pte_{dirty,young}() calls,
performing a tlbi, and finally smearing the correct bits across the
block using __set_ptes().

This approach works fine for BBM level 0, but with support for BBM level
2 we are allowed to reorder the tlbi to after setting the pagetable
entries. This reordering means that other threads will not see an
invalid pagetable entry, instead operating on stale data, until we have
performed our smearing and issued the invalidation. Avoiding this
invalid entry reduces faults in other threads, and thus improves
performance marginally (more so when there are more threads).

Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
---
 arch/arm64/mm/contpte.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
index 55107d27d3f8..fc927be800ee 100644
--- a/arch/arm64/mm/contpte.c
+++ b/arch/arm64/mm/contpte.c
@@ -68,9 +68,13 @@ static void contpte_convert(struct mm_struct *mm, unsigned long addr,
 			pte = pte_mkyoung(pte);
 	}
 
-	__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
+	if (!system_supports_bbml2())
+		__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
 
 	__set_ptes(mm, start_addr, start_ptep, pte, CONT_PTES);
+
+	if (system_supports_bbml2())
+		__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
 }
 
 void __contpte_try_fold(struct mm_struct *mm, unsigned long addr,
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RESEND RFC PATCH v1 5/5] arm64/mm: Elide tlbi in contpte_convert() under BBML2
  2024-12-11 16:01 [RESEND RFC PATCH v1 0/5] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
                   ` (3 preceding siblings ...)
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
@ 2024-12-11 16:01 ` Mikołaj Lenczewski
  2024-12-19 16:37   ` Will Deacon
  4 siblings, 1 reply; 27+ messages in thread
From: Mikołaj Lenczewski @ 2024-12-11 16:01 UTC (permalink / raw)
  To: ryan.roberts, catalin.marinas, will, corbet, maz, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui
  Cc: Mikołaj Lenczewski, linux-arm-kernel, linux-doc,
	linux-kernel, kvmarm

If we support BBM level 2, we can potentially avoid an intermediate
TLB invalidation, as hardware is capable of managing the TLB itself
in this situation. Hardware will either silently clear out the
offending entry, or will take a TLB Conflict Abort Exception.

Note that such aborts should not occur on Arm hardware and indeed
were not seen on any of the benchmarked systems.

Eliding the invalidation results in a 12% improvement on a
microbenchmark which targeted the worst case of contpte_convert(), which
represents an 80% reduction in the overhead of contpte_convert().

Note also that this patch is pending review to ensure that it is
architecturally valid, and we are working with Arm architects to
validate this patch.

Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
---
 arch/arm64/mm/contpte.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
index fc927be800ee..009690770415 100644
--- a/arch/arm64/mm/contpte.c
+++ b/arch/arm64/mm/contpte.c
@@ -72,9 +72,6 @@ static void contpte_convert(struct mm_struct *mm, unsigned long addr,
 		__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
 
 	__set_ptes(mm, start_addr, start_ptep, pte, CONT_PTES);
-
-	if (system_supports_bbml2())
-		__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
 }
 
 void __contpte_try_fold(struct mm_struct *mm, unsigned long addr,
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM Mikołaj Lenczewski
@ 2024-12-11 17:40   ` Marc Zyngier
  2024-12-12  9:23     ` Ryan Roberts
  2024-12-13 16:24     ` Mikołaj Lenczewski
  0 siblings, 2 replies; 27+ messages in thread
From: Marc Zyngier @ 2024-12-11 17:40 UTC (permalink / raw)
  To: Mikołaj Lenczewski
  Cc: ryan.roberts, catalin.marinas, will, corbet, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui, linux-arm-kernel,
	linux-doc, linux-kernel, kvmarm

On Wed, 11 Dec 2024 16:01:37 +0000,
Mikołaj Lenczewski <miko.lenczewski@arm.com> wrote:
> 
> Currently, KVM does not handle the case of a stage 2 TLB conflict abort
> exception. The Arm ARM specifies that the worst-case handling of such an
> exception requires a `tlbi vmalls12e1`.

Not quite. It says (I_JCCRT):

<quote>
* For the EL1&0 translation regime, when stage 2 translations are in
  use, either VMALLS12E1 or ALLE1.
</quote>

> Perform such an invalidation when this exception is encountered.

What you fail to describe is *why* this is needed. You know it, I know
it, but not everybody does. A reference to the ARM ARM would
definitely be helpful.

> 
> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
> ---
>  arch/arm64/include/asm/esr.h | 8 ++++++++
>  arch/arm64/kvm/mmu.c         | 6 ++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index d1b1a33f9a8b..8a66f81ca291 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -121,6 +121,7 @@
>  #define ESR_ELx_FSC_SEA_TTW(n)	(0x14 + (n))
>  #define ESR_ELx_FSC_SECC	(0x18)
>  #define ESR_ELx_FSC_SECC_TTW(n)	(0x1c + (n))
> +#define ESR_ELx_FSC_TLBABT	(0x30)
>  
>  /* Status codes for individual page table levels */
>  #define ESR_ELx_FSC_ACCESS_L(n)	(ESR_ELx_FSC_ACCESS + (n))
> @@ -464,6 +465,13 @@ static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
>  	       (esr == ESR_ELx_FSC_ACCESS_L(0));
>  }
>  
> +static inline bool esr_fsc_is_tlb_conflict_abort(unsigned long esr)
> +{
> +	esr = esr & ESR_ELx_FSC;
> +
> +	return esr == ESR_ELx_FSC_TLBABT;
> +}
> +
>  /* Indicate whether ESR.EC==0x1A is for an ERETAx instruction */
>  static inline bool esr_iss_is_eretax(unsigned long esr)
>  {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c9d46ad57e52..c8c6f5a97a1b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1756,6 +1756,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
>  
> +	if (esr_fsc_is_tlb_conflict_abort(esr)) {
> +		// does a `tlbi vmalls12e1is`

nit: this isn't a very useful comment.

> +		__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
> +		return 1;
> +	}

That's not enough, unfortunately. A nested VM has *many* VMIDs (the
flattening of all translation contexts that the guest uses).

So you can either iterate over all the valid VMIDs owned by this
guest, or more simply issue a TLBI ALLE1, which will do the trick in a
much more efficient way.

The other thing is that you are using an IS invalidation, which is
farther reaching than necessary. Why would you invalidate the TLBs for
CPUs that are only innocent bystanders? A non-shareable invalidation
seems preferable to me.

> +
>  	if (esr_fsc_is_translation_fault(esr)) {
>  		/* Beyond sanitised PARange (which is the IPA limit) */
>  		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {

But it also begs the question: why only KVM, and not the host? This
handler will only take effect for a TLB Conflict abort delivered from
an EL1 guest to EL2.

However, it doesn't seem to me that the host is equipped to deal with
this sort of exception for itself. Shouldn't you start with that?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
@ 2024-12-12  8:25   ` Marc Zyngier
  2024-12-12 10:55     ` Ryan Roberts
  2024-12-13 16:49     ` Mikołaj Lenczewski
  0 siblings, 2 replies; 27+ messages in thread
From: Marc Zyngier @ 2024-12-12  8:25 UTC (permalink / raw)
  To: Mikołaj Lenczewski
  Cc: ryan.roberts, catalin.marinas, will, corbet, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui, linux-arm-kernel,
	linux-doc, linux-kernel, kvmarm

Ah, so this is where this is hiding. I missed it in my review of patch
#1 yesterday.

On Wed, 11 Dec 2024 16:01:38 +0000,
Mikołaj Lenczewski <miko.lenczewski@arm.com> wrote:
> 
> The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
> and this commit adds a dedicated BBML2 cpufeature to test against
> support for.
> 
> In supporting BBM level 2, we open ourselves up to potential TLB
> Conflict Abort Exceptions during expected execution, instead of only
> in exceptional circumstances. In the case of an abort, it is
> implementation defined at what stage the abort is generated, and

*IF* stage-2 is enabled. Also, in the case of the EL2&0 translation
regime, no stage-2 applies, so it can only be a stage-1 abort.

> the minimal set of required invalidations is also implementation
> defined. The maximal set of invalidations is to do a `tlbi vmalle1`
> or `tlbi vmalls12e1`, depending on the stage.
> 
> Such aborts should not occur on Arm hardware, and were not seen in
> benchmarked systems, so unless performance concerns arise, implementing

Which systems? Given that you have deny-listed *all* half recent ARM
Ltd implementations, I'm a bit puzzled.

> the abort handlers with the worst-case invalidations seems like an
> alright hack.
> 
> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
> ---
>  arch/arm64/include/asm/cpufeature.h | 14 ++++++++++++++
>  arch/arm64/kernel/cpufeature.c      |  7 +++++++
>  arch/arm64/mm/fault.c               | 27 ++++++++++++++++++++++++++-
>  arch/arm64/tools/cpucaps            |  1 +
>  4 files changed, 48 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 8b4e5a3cd24c..a9f2ac335392 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -866,6 +866,20 @@ static __always_inline bool system_supports_mpam_hcr(void)
>  	return alternative_has_cap_unlikely(ARM64_MPAM_HCR);
>  }
>  
> +static inline bool system_supports_bbml2(void)
> +{
> +	/* currently, BBM is only relied on by code touching the userspace page
> +	 * tables, and as such we are guaranteed that caps have been finalised.
> +	 *
> +	 * if later we want to use BBM for kernel mappings, particularly early
> +	 * in the kernel, this may return 0 even if BBML2 is actually supported,
> +	 * which means unnecessary break-before-make sequences, but is still
> +	 * correct

Comment style, capitalisation, punctuation.

> +	 */
> +
> +	return alternative_has_cap_unlikely(ARM64_HAS_BBML2);
> +}
> +
>  int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
>  bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
>  
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 6ce71f444ed8..7cc94bd5da24 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2917,6 +2917,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>  		.matches = has_cpuid_feature,
>  		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
>  	},
> +	{
> +		.desc = "BBM Level 2 Support",
> +		.capability = ARM64_HAS_BBML2,
> +		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
> +		.matches = has_cpuid_feature,
> +		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, BBM, 2)
> +	},
>  	{
>  		.desc = "52-bit Virtual Addressing for KVM (LPA2)",
>  		.capability = ARM64_HAS_LPA2,
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index ef63651099a9..dc119358cbc1 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -844,6 +844,31 @@ static int do_tag_check_fault(unsigned long far, unsigned long esr,
>  	return 0;
>  }
>  
> +static int do_conflict_abort(unsigned long far, unsigned long esr,
> +			     struct pt_regs *regs)
> +{
> +	if (!system_supports_bbml2())
> +		return do_bad(far, esr, regs);
> +
> +	/* if we receive a TLB conflict abort, we know that there are multiple
> +	 * TLB entries that translate the same address range. the minimum set
> +	 * of invalidations to clear these entries is implementation defined.
> +	 * the maximum set is defined as either tlbi(vmalls12e1) or tlbi(alle1).
> +	 *
> +	 * if el2 is enabled and stage 2 translation enabled, this may be
> +	 * raised as a stage 2 abort. if el2 is enabled but stage 2 translation
> +	 * disabled, or if el2 is disabled, it will be raised as a stage 1
> +	 * abort.
> +	 *
> +	 * local_flush_tlb_all() does a tlbi(vmalle1), which is enough to
> +	 * handle a stage 1 abort.

Same comment about comments.

> +	 */
> +
> +	local_flush_tlb_all();

The elephant in the room: if TLBs are in such a sorry state, what
guarantees we can make it this far?

I honestly don't think you can reliably handle a TLB Conflict abort in
the same translation regime as the original fault, given that we don't
know the scope of that fault. You are probably making an educated
guess that it is good enough on the CPUs you know of, but I don't see
anything in the architecture that indicates the "blast radius" of a
TLB conflict.

Which makes me think that your KVM patch is equally broken on nVHE and
hVHE. Such fault should probably be handled while at EL2, not after
returning to EL1.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM
  2024-12-11 17:40   ` Marc Zyngier
@ 2024-12-12  9:23     ` Ryan Roberts
  2024-12-12  9:57       ` Marc Zyngier
  2024-12-13 16:24     ` Mikołaj Lenczewski
  1 sibling, 1 reply; 27+ messages in thread
From: Ryan Roberts @ 2024-12-12  9:23 UTC (permalink / raw)
  To: Marc Zyngier, Mikołaj Lenczewski
  Cc: catalin.marinas, will, corbet, oliver.upton, joey.gouly,
	suzuki.poulose, yuzenghui, linux-arm-kernel, linux-doc,
	linux-kernel, kvmarm

On 11/12/2024 17:40, Marc Zyngier wrote:
> On Wed, 11 Dec 2024 16:01:37 +0000,
> Mikołaj Lenczewski <miko.lenczewski@arm.com> wrote:
>>
>> Currently, KVM does not handle the case of a stage 2 TLB conflict abort
>> exception. The Arm ARM specifies that the worst-case handling of such an
>> exception requires a `tlbi vmalls12e1`.
> 
> Not quite. It says (I_JCCRT):
> 
> <quote>
> * For the EL1&0 translation regime, when stage 2 translations are in
>   use, either VMALLS12E1 or ALLE1.
> </quote>
> 
>> Perform such an invalidation when this exception is encountered.
> 
> What you fail to describe is *why* this is needed. You know it, I know
> it, but not everybody does. A reference to the ARM ARM would
> definitely be helpful.
> 
>>
>> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
>> ---
>>  arch/arm64/include/asm/esr.h | 8 ++++++++
>>  arch/arm64/kvm/mmu.c         | 6 ++++++
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index d1b1a33f9a8b..8a66f81ca291 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -121,6 +121,7 @@
>>  #define ESR_ELx_FSC_SEA_TTW(n)	(0x14 + (n))
>>  #define ESR_ELx_FSC_SECC	(0x18)
>>  #define ESR_ELx_FSC_SECC_TTW(n)	(0x1c + (n))
>> +#define ESR_ELx_FSC_TLBABT	(0x30)
>>  
>>  /* Status codes for individual page table levels */
>>  #define ESR_ELx_FSC_ACCESS_L(n)	(ESR_ELx_FSC_ACCESS + (n))
>> @@ -464,6 +465,13 @@ static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
>>  	       (esr == ESR_ELx_FSC_ACCESS_L(0));
>>  }
>>  
>> +static inline bool esr_fsc_is_tlb_conflict_abort(unsigned long esr)
>> +{
>> +	esr = esr & ESR_ELx_FSC;
>> +
>> +	return esr == ESR_ELx_FSC_TLBABT;
>> +}
>> +
>>  /* Indicate whether ESR.EC==0x1A is for an ERETAx instruction */
>>  static inline bool esr_iss_is_eretax(unsigned long esr)
>>  {
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c9d46ad57e52..c8c6f5a97a1b 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -1756,6 +1756,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>>  	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>>  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
>>  
>> +	if (esr_fsc_is_tlb_conflict_abort(esr)) {
>> +		// does a `tlbi vmalls12e1is`
> 
> nit: this isn't a very useful comment.
> 
>> +		__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
>> +		return 1;
>> +	}
> 
> That's not enough, unfortunately. A nested VM has *many* VMIDs (the
> flattening of all translation contexts that the guest uses).
> 
> So you can either iterate over all the valid VMIDs owned by this
> guest, or more simply issue a TLBI ALLE1, which will do the trick in a
> much more efficient way.
> 
> The other thing is that you are using an IS invalidation, which is
> farther reaching than necessary. Why would you invalidate the TLBs for
> CPUs that are only innocent bystanders? A non-shareable invalidation
> seems preferable to me.
> 
>> +
>>  	if (esr_fsc_is_translation_fault(esr)) {
>>  		/* Beyond sanitised PARange (which is the IPA limit) */
>>  		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
> 
> But it also begs the question: why only KVM, and not the host? This
> handler will only take effect for a TLB Conflict abort delivered from
> an EL1 guest to EL2.

Hi Marc,

I believe the intent of this patch is to protect the host/KVM against a guest
that is using BBML2. The host/KVM always assumes BBML0 and therefore doesn't do
any operations that are allowed by the arch to cause a conflict abort. Therefore
the host doesn't need to handle it. But a guest could be taking advantage of
BBML2 and therefore it's architiecturally possible for a conflict abort to be
raised to EL2. I think today that would take down the host?

So really I think this could be considered a stand-alone KVM hardening improvement?

> 
> However, it doesn't seem to me that the host is equipped to deal with
> this sort of exception for itself. Shouldn't you start with that?

If the host isn't doing any BBML2 operations it doesn't need to handle it, I
don't think? Obviously that changes later in the series and Miko is adding the
required handling to the host.

Thanks,
Ryan

> 
> Thanks,
> 
> 	M.
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM
  2024-12-12  9:23     ` Ryan Roberts
@ 2024-12-12  9:57       ` Marc Zyngier
  2024-12-12 10:37         ` Ryan Roberts
  0 siblings, 1 reply; 27+ messages in thread
From: Marc Zyngier @ 2024-12-12  9:57 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Mikołaj Lenczewski, catalin.marinas, will, corbet,
	oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	linux-arm-kernel, linux-doc, linux-kernel, kvmarm

Hi Ryan,

On Thu, 12 Dec 2024 09:23:20 +0000,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> Hi Marc,
> 
> I believe the intent of this patch is to protect the host/KVM against a guest
> that is using BBML2. The host/KVM always assumes BBML0 and therefore doesn't do
> any operations that are allowed by the arch to cause a conflict abort. Therefore
> the host doesn't need to handle it. But a guest could be taking advantage of
> BBML2 and therefore it's architiecturally possible for a conflict abort to be
> raised to EL2. I think today that would take down the host?
> 
> So really I think this could be considered a stand-alone KVM
> hardening improvement?

I'm not disputing the need for a TLB Conflict abort handler. It will
be a good addition once we agree on what needs to be done.

> > However, it doesn't seem to me that the host is equipped to deal with
> > this sort of exception for itself. Shouldn't you start with that?
> 
> If the host isn't doing any BBML2 operations it doesn't need to handle it, I
> don't think? Obviously that changes later in the series and Miko is adding the
> required handling to the host.

Yes, and that's what I overlooked yesterday, and I replied to that
change this morning.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM
  2024-12-12  9:57       ` Marc Zyngier
@ 2024-12-12 10:37         ` Ryan Roberts
  0 siblings, 0 replies; 27+ messages in thread
From: Ryan Roberts @ 2024-12-12 10:37 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Mikołaj Lenczewski, catalin.marinas, will, corbet,
	oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	linux-arm-kernel, linux-doc, linux-kernel, kvmarm

On 12/12/2024 09:57, Marc Zyngier wrote:
> Hi Ryan,
> 
> On Thu, 12 Dec 2024 09:23:20 +0000,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Hi Marc,
>>
>> I believe the intent of this patch is to protect the host/KVM against a guest
>> that is using BBML2. The host/KVM always assumes BBML0 and therefore doesn't do
>> any operations that are allowed by the arch to cause a conflict abort. Therefore
>> the host doesn't need to handle it. But a guest could be taking advantage of
>> BBML2 and therefore it's architiecturally possible for a conflict abort to be
>> raised to EL2. I think today that would take down the host?
>>
>> So really I think this could be considered a stand-alone KVM
>> hardening improvement?
> 
> I'm not disputing the need for a TLB Conflict abort handler. It will
> be a good addition once we agree on what needs to be done.

OK great, glad we are on the same page. I'll leave Miko to work through the details.

> 
>>> However, it doesn't seem to me that the host is equipped to deal with
>>> this sort of exception for itself. Shouldn't you start with that?
>>
>> If the host isn't doing any BBML2 operations it doesn't need to handle it, I
>> don't think? Obviously that changes later in the series and Miko is adding the
>> required handling to the host.
> 
> Yes, and that's what I overlooked yesterday, and I replied to that
> change this morning.
> 
> Thanks,
> 
> 	M.
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2024-12-12  8:25   ` Marc Zyngier
@ 2024-12-12 10:55     ` Ryan Roberts
  2024-12-12 14:26       ` Marc Zyngier
  2024-12-13 16:49     ` Mikołaj Lenczewski
  1 sibling, 1 reply; 27+ messages in thread
From: Ryan Roberts @ 2024-12-12 10:55 UTC (permalink / raw)
  To: Marc Zyngier, Mikołaj Lenczewski
  Cc: catalin.marinas, will, corbet, oliver.upton, joey.gouly,
	suzuki.poulose, yuzenghui, linux-arm-kernel, linux-doc,
	linux-kernel, kvmarm

On 12/12/2024 08:25, Marc Zyngier wrote:
> Ah, so this is where this is hiding. I missed it in my review of patch
> #1 yesterday.
> 
> On Wed, 11 Dec 2024 16:01:38 +0000,
> Mikołaj Lenczewski <miko.lenczewski@arm.com> wrote:
>>
>> The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
>> and this commit adds a dedicated BBML2 cpufeature to test against
>> support for.
>>
>> In supporting BBM level 2, we open ourselves up to potential TLB
>> Conflict Abort Exceptions during expected execution, instead of only
>> in exceptional circumstances. In the case of an abort, it is
>> implementation defined at what stage the abort is generated, and
> 
> *IF* stage-2 is enabled. Also, in the case of the EL2&0 translation
> regime, no stage-2 applies, so it can only be a stage-1 abort.
> 
>> the minimal set of required invalidations is also implementation
>> defined. The maximal set of invalidations is to do a `tlbi vmalle1`
>> or `tlbi vmalls12e1`, depending on the stage.
>>
>> Such aborts should not occur on Arm hardware, and were not seen in
>> benchmarked systems, so unless performance concerns arise, implementing
> 
> Which systems? Given that you have deny-listed *all* half recent ARM
> Ltd implementations, I'm a bit puzzled.
> 
>> the abort handlers with the worst-case invalidations seems like an
>> alright hack.
>>
>> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
>> ---
>>  arch/arm64/include/asm/cpufeature.h | 14 ++++++++++++++
>>  arch/arm64/kernel/cpufeature.c      |  7 +++++++
>>  arch/arm64/mm/fault.c               | 27 ++++++++++++++++++++++++++-
>>  arch/arm64/tools/cpucaps            |  1 +
>>  4 files changed, 48 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index 8b4e5a3cd24c..a9f2ac335392 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -866,6 +866,20 @@ static __always_inline bool system_supports_mpam_hcr(void)
>>  	return alternative_has_cap_unlikely(ARM64_MPAM_HCR);
>>  }
>>  
>> +static inline bool system_supports_bbml2(void)
>> +{
>> +	/* currently, BBM is only relied on by code touching the userspace page
>> +	 * tables, and as such we are guaranteed that caps have been finalised.
>> +	 *
>> +	 * if later we want to use BBM for kernel mappings, particularly early
>> +	 * in the kernel, this may return 0 even if BBML2 is actually supported,
>> +	 * which means unnecessary break-before-make sequences, but is still
>> +	 * correct
> 
> Comment style, capitalisation, punctuation.
> 
>> +	 */
>> +
>> +	return alternative_has_cap_unlikely(ARM64_HAS_BBML2);
>> +}
>> +
>>  int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
>>  bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
>>  
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 6ce71f444ed8..7cc94bd5da24 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -2917,6 +2917,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>>  		.matches = has_cpuid_feature,
>>  		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
>>  	},
>> +	{
>> +		.desc = "BBM Level 2 Support",
>> +		.capability = ARM64_HAS_BBML2,
>> +		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
>> +		.matches = has_cpuid_feature,
>> +		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, BBM, 2)
>> +	},
>>  	{
>>  		.desc = "52-bit Virtual Addressing for KVM (LPA2)",
>>  		.capability = ARM64_HAS_LPA2,
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index ef63651099a9..dc119358cbc1 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -844,6 +844,31 @@ static int do_tag_check_fault(unsigned long far, unsigned long esr,
>>  	return 0;
>>  }
>>  
>> +static int do_conflict_abort(unsigned long far, unsigned long esr,
>> +			     struct pt_regs *regs)
>> +{
>> +	if (!system_supports_bbml2())
>> +		return do_bad(far, esr, regs);
>> +
>> +	/* if we receive a TLB conflict abort, we know that there are multiple
>> +	 * TLB entries that translate the same address range. the minimum set
>> +	 * of invalidations to clear these entries is implementation defined.
>> +	 * the maximum set is defined as either tlbi(vmalls12e1) or tlbi(alle1).
>> +	 *
>> +	 * if el2 is enabled and stage 2 translation enabled, this may be
>> +	 * raised as a stage 2 abort. if el2 is enabled but stage 2 translation
>> +	 * disabled, or if el2 is disabled, it will be raised as a stage 1
>> +	 * abort.
>> +	 *
>> +	 * local_flush_tlb_all() does a tlbi(vmalle1), which is enough to
>> +	 * handle a stage 1 abort.
> 
> Same comment about comments.
> 
>> +	 */
>> +
>> +	local_flush_tlb_all();
> 
> The elephant in the room: if TLBs are in such a sorry state, what
> guarantees we can make it this far?

I'll leave Miko to respond to your other comments, but I wanted to address this
one, since it's pretty fundamental. We went around this loop internally and
concluded that what we are doing is architecturally sound.

The expectation is that a conflict abort can only be generated as a result of
the change in patch 4 (and patch 5). That change makes it possible for the TLB
to end up with a multihit. But crucially that can only happen for user space
memory because that change only operates on user memory. And while the TLB may
detect the conflict at any time, the conflict abort is only permitted to be
reported when an architectural access is prevented by the conflict. So we never
do anything that would allow a conflict for a kernel memory access and a user
memory conflict abort can never be triggered as a result of accessing kernel memory.

Copy/pasting comment from AlexC on the topic, which explains it better than I can:

"""
The intent is certainly that in cases where the hardware detects a TLB conflict
abort, it is only permitted to report it (by generating an exception) if it
applies to an access that is being attempted architecturally. ... that property
can be built from the following two properties:

1. The TLB conflict can only be reported as an Instruction Abort or a Data Abort

2. Those two exception types must be reported synchronously and precisely.
"""

> 
> I honestly don't think you can reliably handle a TLB Conflict abort in
> the same translation regime as the original fault, given that we don't
> know the scope of that fault. You are probably making an educated
> guess that it is good enough on the CPUs you know of, but I don't see
> anything in the architecture that indicates the "blast radius" of a
> TLB conflict.

OK, so I'm claiming that the blast radius is limited to the region of memory
that we are operating on in contpte_collapse() in patch 4. Perhaps we need to go
re-read the ARM and come back with the specific statements that led us to that
conclusion?

Thanks,
Ryan

> 
> Which makes me think that your KVM patch is equally broken on nVHE and
> hVHE. Such fault should probably be handled while at EL2, not after
> returning to EL1.
> 
> Thanks,
> 
> 	M.
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2024-12-12 10:55     ` Ryan Roberts
@ 2024-12-12 14:26       ` Marc Zyngier
  2024-12-12 15:05         ` Ryan Roberts
  0 siblings, 1 reply; 27+ messages in thread
From: Marc Zyngier @ 2024-12-12 14:26 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Mikołaj Lenczewski, catalin.marinas, will, corbet,
	oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	linux-arm-kernel, linux-doc, linux-kernel, kvmarm

On Thu, 12 Dec 2024 10:55:45 +0000,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 12/12/2024 08:25, Marc Zyngier wrote:
> >> +
> >> +	local_flush_tlb_all();
> > 
> > The elephant in the room: if TLBs are in such a sorry state, what
> > guarantees we can make it this far?
> 
> I'll leave Miko to respond to your other comments, but I wanted to address this
> one, since it's pretty fundamental. We went around this loop internally and
> concluded that what we are doing is architecturally sound.
> 
> The expectation is that a conflict abort can only be generated as a result of
> the change in patch 4 (and patch 5). That change makes it possible for the TLB
> to end up with a multihit. But crucially that can only happen for user space
> memory because that change only operates on user memory. And while the TLB may
> detect the conflict at any time, the conflict abort is only permitted to be
> reported when an architectural access is prevented by the conflict. So we never
> do anything that would allow a conflict for a kernel memory access and a user
> memory conflict abort can never be triggered as a result of accessing kernel memory.
> 
> Copy/pasting comment from AlexC on the topic, which explains it better than I can:
> 
> """
> The intent is certainly that in cases where the hardware detects a TLB conflict
> abort, it is only permitted to report it (by generating an exception) if it
> applies to an access that is being attempted architecturally. ... that property
> can be built from the following two properties:
> 
> 1. The TLB conflict can only be reported as an Instruction Abort or a Data Abort
> 
> 2. Those two exception types must be reported synchronously and precisely.
> """

I totally agree with this. The issue is that nothing says that the
abort is in any way related to userspace.

> > 
> > I honestly don't think you can reliably handle a TLB Conflict abort in
> > the same translation regime as the original fault, given that we don't
> > know the scope of that fault. You are probably making an educated
> > guess that it is good enough on the CPUs you know of, but I don't see
> > anything in the architecture that indicates the "blast radius" of a
> > TLB conflict.
> 
> OK, so I'm claiming that the blast radius is limited to the region of memory
> that we are operating on in contpte_collapse() in patch 4. Perhaps we need to go
> re-read the ARM and come back with the specific statements that led us to that
> conclusion?

But we don't know for sure what caused this conflict by the time we
arrive in the handler. It could equally be because we have a glaring
bug somewhere on the kernel side, even if you are *now* only concerned
with userspace.

If anything, this should absolutely check for FAR_EL1 and assert that
this is indeed caused by such change.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2024-12-12 14:26       ` Marc Zyngier
@ 2024-12-12 15:05         ` Ryan Roberts
  2024-12-12 15:48           ` Marc Zyngier
  0 siblings, 1 reply; 27+ messages in thread
From: Ryan Roberts @ 2024-12-12 15:05 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Mikołaj Lenczewski, catalin.marinas, will, corbet,
	oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	linux-arm-kernel, linux-doc, linux-kernel, kvmarm

On 12/12/2024 14:26, Marc Zyngier wrote:
> On Thu, 12 Dec 2024 10:55:45 +0000,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 12/12/2024 08:25, Marc Zyngier wrote:
>>>> +
>>>> +	local_flush_tlb_all();
>>>
>>> The elephant in the room: if TLBs are in such a sorry state, what
>>> guarantees we can make it this far?
>>
>> I'll leave Miko to respond to your other comments, but I wanted to address this
>> one, since it's pretty fundamental. We went around this loop internally and
>> concluded that what we are doing is architecturally sound.
>>
>> The expectation is that a conflict abort can only be generated as a result of
>> the change in patch 4 (and patch 5). That change makes it possible for the TLB
>> to end up with a multihit. But crucially that can only happen for user space
>> memory because that change only operates on user memory. And while the TLB may
>> detect the conflict at any time, the conflict abort is only permitted to be
>> reported when an architectural access is prevented by the conflict. So we never
>> do anything that would allow a conflict for a kernel memory access and a user
>> memory conflict abort can never be triggered as a result of accessing kernel memory.
>>
>> Copy/pasting comment from AlexC on the topic, which explains it better than I can:
>>
>> """
>> The intent is certainly that in cases where the hardware detects a TLB conflict
>> abort, it is only permitted to report it (by generating an exception) if it
>> applies to an access that is being attempted architecturally. ... that property
>> can be built from the following two properties:
>>
>> 1. The TLB conflict can only be reported as an Instruction Abort or a Data Abort
>>
>> 2. Those two exception types must be reported synchronously and precisely.
>> """
> 
> I totally agree with this. The issue is that nothing says that the
> abort is in any way related to userspace.
> 
>>>
>>> I honestly don't think you can reliably handle a TLB Conflict abort in
>>> the same translation regime as the original fault, given that we don't
>>> know the scope of that fault. You are probably making an educated
>>> guess that it is good enough on the CPUs you know of, but I don't see
>>> anything in the architecture that indicates the "blast radius" of a
>>> TLB conflict.
>>
>> OK, so I'm claiming that the blast radius is limited to the region of memory
>> that we are operating on in contpte_collapse() in patch 4. Perhaps we need to go
>> re-read the ARM and come back with the specific statements that led us to that
>> conclusion?

From the ARM:
"""
RFCPSG: If level 1 or level 2 is supported and the Contiguous bit in a set of
Block descriptors or Page descriptors is changed, then a TLB conflict abort can
be generated because multiple translation table entries might exist within a TLB
that translates the same IA.
"""

Although I guess it's not totally explicit, I've interpretted that as saying
that conflicting TLB entries can only arise for the IA range for which the
contiguous bits have been modified in the translation tables.

Given we are only fiddling with the contiguous bits for user space mappings in
this way, that's why I'm asserting we will only get a conflict abort for user
space mappings... assuming the absence of kernel bugs, anyway...

> 
> But we don't know for sure what caused this conflict by the time we
> arrive in the handler. It could equally be because we have a glaring
> bug somewhere on the kernel side, even if you are *now* only concerned
> with userspace.

OK I see what you are saying; previously a conflict abort would have led to
calling do_bad(), which returns 1, which causes do_mem_abort() to either kill
the kernel or the process depending on the origin of the abort. (although if it
came from kernel due to bug, we're just hoping that the conflict doesn't affect
the path through the handler). With this change, we always assume we can fix it
with the TLBI.

How about this change to ensure we still die for issues originating from the kernel?

if (!user_mode(regs) || !system_supports_bbml2())
		return do_bad(far, esr, regs);

> 
> If anything, this should absolutely check for FAR_EL1 and assert that
> this is indeed caused by such change.

I'm not really sure how we would check this reliably? Without patch 5, the
problem is somewhat constrained; we could have as many changes in flight as
there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
being modified. But if patch 5 is confirmed to be architecturally sound, then
there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
VA-range}'s that could legitimately cause a conflict abort.

Thanks,
Ryan

> 
> Thanks,
> 
> 	M.
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2024-12-12 15:05         ` Ryan Roberts
@ 2024-12-12 15:48           ` Marc Zyngier
  2024-12-12 16:03             ` Ryan Roberts
  2024-12-13 16:53             ` Mikołaj Lenczewski
  0 siblings, 2 replies; 27+ messages in thread
From: Marc Zyngier @ 2024-12-12 15:48 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Mikołaj Lenczewski, catalin.marinas, will, corbet,
	oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	linux-arm-kernel, linux-doc, linux-kernel, kvmarm

On Thu, 12 Dec 2024 15:05:24 +0000,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 12/12/2024 14:26, Marc Zyngier wrote:
> > On Thu, 12 Dec 2024 10:55:45 +0000,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 12/12/2024 08:25, Marc Zyngier wrote:
> >>>> +
> >>>> +	local_flush_tlb_all();
> >>>
> >>> The elephant in the room: if TLBs are in such a sorry state, what
> >>> guarantees we can make it this far?
> >>
> >> I'll leave Miko to respond to your other comments, but I wanted to address this
> >> one, since it's pretty fundamental. We went around this loop internally and
> >> concluded that what we are doing is architecturally sound.
> >>
> >> The expectation is that a conflict abort can only be generated as a result of
> >> the change in patch 4 (and patch 5). That change makes it possible for the TLB
> >> to end up with a multihit. But crucially that can only happen for user space
> >> memory because that change only operates on user memory. And while the TLB may
> >> detect the conflict at any time, the conflict abort is only permitted to be
> >> reported when an architectural access is prevented by the conflict. So we never
> >> do anything that would allow a conflict for a kernel memory access and a user
> >> memory conflict abort can never be triggered as a result of accessing kernel memory.
> >>
> >> Copy/pasting comment from AlexC on the topic, which explains it better than I can:
> >>
> >> """
> >> The intent is certainly that in cases where the hardware detects a TLB conflict
> >> abort, it is only permitted to report it (by generating an exception) if it
> >> applies to an access that is being attempted architecturally. ... that property
> >> can be built from the following two properties:
> >>
> >> 1. The TLB conflict can only be reported as an Instruction Abort or a Data Abort
> >>
> >> 2. Those two exception types must be reported synchronously and precisely.
> >> """
> > 
> > I totally agree with this. The issue is that nothing says that the
> > abort is in any way related to userspace.
> > 
> >>>
> >>> I honestly don't think you can reliably handle a TLB Conflict abort in
> >>> the same translation regime as the original fault, given that we don't
> >>> know the scope of that fault. You are probably making an educated
> >>> guess that it is good enough on the CPUs you know of, but I don't see
> >>> anything in the architecture that indicates the "blast radius" of a
> >>> TLB conflict.
> >>
> >> OK, so I'm claiming that the blast radius is limited to the region of memory
> >> that we are operating on in contpte_collapse() in patch 4. Perhaps we need to go
> >> re-read the ARM and come back with the specific statements that led us to that
> >> conclusion?
> 
> From the ARM:
> """
> RFCPSG: If level 1 or level 2 is supported and the Contiguous bit in a set of
> Block descriptors or Page descriptors is changed, then a TLB conflict abort can
> be generated because multiple translation table entries might exist within a TLB
> that translates the same IA.
> """
> 
> Although I guess it's not totally explicit, I've interpretted that as saying
> that conflicting TLB entries can only arise for the IA range for which the
> contiguous bits have been modified in the translation tables.

Right, that's reassuring, thanks for digging that one.

> Given we are only fiddling with the contiguous bits for user space mappings in
> this way, that's why I'm asserting we will only get a conflict abort for user
> space mappings... assuming the absence of kernel bugs, anyway...

For now. But if you dare scanning the list, you'll find a lot of
people willing to do far more than just that. Including changing the
shape of the linear map.

>
> > 
> > But we don't know for sure what caused this conflict by the time we
> > arrive in the handler. It could equally be because we have a glaring
> > bug somewhere on the kernel side, even if you are *now* only concerned
> > with userspace.
> 
> OK I see what you are saying; previously a conflict abort would have led to
> calling do_bad(), which returns 1, which causes do_mem_abort() to either kill
> the kernel or the process depending on the origin of the abort. (although if it
> came from kernel due to bug, we're just hoping that the conflict doesn't affect
> the path through the handler). With this change, we always assume we can fix it
> with the TLBI.
> 
> How about this change to ensure we still die for issues originating from the kernel?
> 
> if (!user_mode(regs) || !system_supports_bbml2())
> 		return do_bad(far, esr, regs);

That wouldn't catch a TLB conflict on get_user(), would it?

> > If anything, this should absolutely check for FAR_EL1 and assert that
> > this is indeed caused by such change.
> 
> I'm not really sure how we would check this reliably? Without patch 5, the
> problem is somewhat constrained; we could have as many changes in flight as
> there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
> being modified. But if patch 5 is confirmed to be architecturally sound, then
> there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
> VA-range}'s that could legitimately cause a conflict abort.

I didn't mean to imply that we should identify the exact cause of the
abort. I was hoping to simply check that FAR_EL1 reports a userspace
VA. Why wouldn't that work?

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2024-12-12 15:48           ` Marc Zyngier
@ 2024-12-12 16:03             ` Ryan Roberts
  2024-12-19 16:45               ` Will Deacon
  2024-12-13 16:53             ` Mikołaj Lenczewski
  1 sibling, 1 reply; 27+ messages in thread
From: Ryan Roberts @ 2024-12-12 16:03 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Mikołaj Lenczewski, catalin.marinas, will, corbet,
	oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	linux-arm-kernel, linux-doc, linux-kernel, kvmarm

On 12/12/2024 15:48, Marc Zyngier wrote:
> On Thu, 12 Dec 2024 15:05:24 +0000,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 12/12/2024 14:26, Marc Zyngier wrote:
>>> On Thu, 12 Dec 2024 10:55:45 +0000,
>>> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> On 12/12/2024 08:25, Marc Zyngier wrote:
>>>>>> +
>>>>>> +	local_flush_tlb_all();
>>>>>
>>>>> The elephant in the room: if TLBs are in such a sorry state, what
>>>>> guarantees we can make it this far?
>>>>
>>>> I'll leave Miko to respond to your other comments, but I wanted to address this
>>>> one, since it's pretty fundamental. We went around this loop internally and
>>>> concluded that what we are doing is architecturally sound.
>>>>
>>>> The expectation is that a conflict abort can only be generated as a result of
>>>> the change in patch 4 (and patch 5). That change makes it possible for the TLB
>>>> to end up with a multihit. But crucially that can only happen for user space
>>>> memory because that change only operates on user memory. And while the TLB may
>>>> detect the conflict at any time, the conflict abort is only permitted to be
>>>> reported when an architectural access is prevented by the conflict. So we never
>>>> do anything that would allow a conflict for a kernel memory access and a user
>>>> memory conflict abort can never be triggered as a result of accessing kernel memory.
>>>>
>>>> Copy/pasting comment from AlexC on the topic, which explains it better than I can:
>>>>
>>>> """
>>>> The intent is certainly that in cases where the hardware detects a TLB conflict
>>>> abort, it is only permitted to report it (by generating an exception) if it
>>>> applies to an access that is being attempted architecturally. ... that property
>>>> can be built from the following two properties:
>>>>
>>>> 1. The TLB conflict can only be reported as an Instruction Abort or a Data Abort
>>>>
>>>> 2. Those two exception types must be reported synchronously and precisely.
>>>> """
>>>
>>> I totally agree with this. The issue is that nothing says that the
>>> abort is in any way related to userspace.
>>>
>>>>>
>>>>> I honestly don't think you can reliably handle a TLB Conflict abort in
>>>>> the same translation regime as the original fault, given that we don't
>>>>> know the scope of that fault. You are probably making an educated
>>>>> guess that it is good enough on the CPUs you know of, but I don't see
>>>>> anything in the architecture that indicates the "blast radius" of a
>>>>> TLB conflict.
>>>>
>>>> OK, so I'm claiming that the blast radius is limited to the region of memory
>>>> that we are operating on in contpte_collapse() in patch 4. Perhaps we need to go
>>>> re-read the ARM and come back with the specific statements that led us to that
>>>> conclusion?
>>
>> From the ARM:
>> """
>> RFCPSG: If level 1 or level 2 is supported and the Contiguous bit in a set of
>> Block descriptors or Page descriptors is changed, then a TLB conflict abort can
>> be generated because multiple translation table entries might exist within a TLB
>> that translates the same IA.
>> """
>>
>> Although I guess it's not totally explicit, I've interpretted that as saying
>> that conflicting TLB entries can only arise for the IA range for which the
>> contiguous bits have been modified in the translation tables.
> 
> Right, that's reassuring, thanks for digging that one.
> 
>> Given we are only fiddling with the contiguous bits for user space mappings in
>> this way, that's why I'm asserting we will only get a conflict abort for user
>> space mappings... assuming the absence of kernel bugs, anyway...
> 
> For now. But if you dare scanning the list, you'll find a lot of
> people willing to do far more than just that. Including changing the
> shape of the linear map.

Ahh. Sorry I don't do a good job of monitoring the lists. But was just having a
conversation with Catalin about exactly this.

> 
>>
>>>
>>> But we don't know for sure what caused this conflict by the time we
>>> arrive in the handler. It could equally be because we have a glaring
>>> bug somewhere on the kernel side, even if you are *now* only concerned
>>> with userspace.
>>
>> OK I see what you are saying; previously a conflict abort would have led to
>> calling do_bad(), which returns 1, which causes do_mem_abort() to either kill
>> the kernel or the process depending on the origin of the abort. (although if it
>> came from kernel due to bug, we're just hoping that the conflict doesn't affect
>> the path through the handler). With this change, we always assume we can fix it
>> with the TLBI.
>>
>> How about this change to ensure we still die for issues originating from the kernel?
>>
>> if (!user_mode(regs) || !system_supports_bbml2())
>> 		return do_bad(far, esr, regs);
> 
> That wouldn't catch a TLB conflict on get_user(), would it?

Oh good point.

> 
>>> If anything, this should absolutely check for FAR_EL1 and assert that
>>> this is indeed caused by such change.
>>
>> I'm not really sure how we would check this reliably? Without patch 5, the
>> problem is somewhat constrained; we could have as many changes in flight as
>> there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
>> being modified. But if patch 5 is confirmed to be architecturally sound, then
>> there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
>> VA-range}'s that could legitimately cause a conflict abort.
> 
> I didn't mean to imply that we should identify the exact cause of the
> abort. I was hoping to simply check that FAR_EL1 reports a userspace
> VA. Why wouldn't that work?

Ahh gottya! Yes agreed, this sounds like the right approach.

> 
> 	M.
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM
  2024-12-11 17:40   ` Marc Zyngier
  2024-12-12  9:23     ` Ryan Roberts
@ 2024-12-13 16:24     ` Mikołaj Lenczewski
  1 sibling, 0 replies; 27+ messages in thread
From: Mikołaj Lenczewski @ 2024-12-13 16:24 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: ryan.roberts, catalin.marinas, will, corbet, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui, linux-arm-kernel,
	linux-doc, linux-kernel, kvmarm

Apologies again for spam (replied instead of group-replied).

On Wed, Dec 11, 2024 at 05:40:36PM +0000, Marc Zyngier wrote:
> On Wed, 11 Dec 2024 16:01:37 +0000,
> Mikołaj Lenczewski <miko.lenczewski@arm.com> wrote:
> > 
> > Currently, KVM does not handle the case of a stage 2 TLB conflict abort
> > exception. The Arm ARM specifies that the worst-case handling of such an
> > exception requires a `tlbi vmalls12e1`.
> 
> Not quite. It says (I_JCCRT):
> 
> <quote>
> * For the EL1&0 translation regime, when stage 2 translations are in
>   use, either VMALLS12E1 or ALLE1.
> </quote>
> 
> > Perform such an invalidation when this exception is encountered.
> 
> What you fail to describe is *why* this is needed. You know it, I know
> it, but not everybody does. A reference to the ARM ARM would
> definitely be helpful.
> 

You are correct. Will update the commit message.

> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c9d46ad57e52..c8c6f5a97a1b 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1756,6 +1756,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >  	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> >  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> >  
> > +	if (esr_fsc_is_tlb_conflict_abort(esr)) {
> > +		// does a `tlbi vmalls12e1is`
> 
> nit: this isn't a very useful comment.
> 

Will remove it.

> > +		__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
> > +		return 1;
> > +	}
> 
> That's not enough, unfortunately. A nested VM has *many* VMIDs (the
> flattening of all translation contexts that the guest uses).
> 
> So you can either iterate over all the valid VMIDs owned by this
> guest, or more simply issue a TLBI ALLE1, which will do the trick in a
> much more efficient way.
> 
> The other thing is that you are using an IS invalidation, which is
> farther reaching than necessary. Why would you invalidate the TLBs for
> CPUs that are only innocent bystanders? A non-shareable invalidation
> seems preferable to me.
>

You are completely correct here. I had forgotten about nested VMs, and
agree that issuing a `tlbi alle1` is simpler and more efficient. I agree
also on not using an IS invalidation.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2024-12-12  8:25   ` Marc Zyngier
  2024-12-12 10:55     ` Ryan Roberts
@ 2024-12-13 16:49     ` Mikołaj Lenczewski
  1 sibling, 0 replies; 27+ messages in thread
From: Mikołaj Lenczewski @ 2024-12-13 16:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: ryan.roberts, catalin.marinas, will, corbet, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui, linux-arm-kernel,
	linux-doc, linux-kernel, kvmarm

On Thu, Dec 12, 2024 at 08:25:53AM +0000, Marc Zyngier wrote:
> Ah, so this is where this is hiding. I missed it in my review of patch
> #1 yesterday.
> 
> On Wed, 11 Dec 2024 16:01:38 +0000,
> Mikołaj Lenczewski <miko.lenczewski@arm.com> wrote:
> > 
> > The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
> > and this commit adds a dedicated BBML2 cpufeature to test against
> > support for.
> > 
> > In supporting BBM level 2, we open ourselves up to potential TLB
> > Conflict Abort Exceptions during expected execution, instead of only
> > in exceptional circumstances. In the case of an abort, it is
> > implementation defined at what stage the abort is generated, and
> 
> *IF* stage-2 is enabled. Also, in the case of the EL2&0 translation
> regime, no stage-2 applies, so it can only be a stage-1 abort.
> 
> > the minimal set of required invalidations is also implementation
> > defined. The maximal set of invalidations is to do a `tlbi vmalle1`
> > or `tlbi vmalls12e1`, depending on the stage.
> > 
> > Such aborts should not occur on Arm hardware, and were not seen in
> > benchmarked systems, so unless performance concerns arise, implementing
> 
> Which systems? Given that you have deny-listed *all* half recent ARM
> Ltd implementations, I'm a bit puzzled.
> 

I had tested on an earlier series of the patchset that didn't introduce
the MIDR checks (has_bbml2() only read the claimed level of support),
but otherwise had the same implementation.

> >
> > +static inline bool system_supports_bbml2(void)
> > +{
> > +	/* currently, BBM is only relied on by code touching the userspace page
> > +	 * tables, and as such we are guaranteed that caps have been finalised.
> > +	 *
> > +	 * if later we want to use BBM for kernel mappings, particularly early
> > +	 * in the kernel, this may return 0 even if BBML2 is actually supported,
> > +	 * which means unnecessary break-before-make sequences, but is still
> > +	 * correct
> 
> Comment style, capitalisation, punctuation.
> 
> > +	if (!system_supports_bbml2())
> > +		return do_bad(far, esr, regs);
> > +
> > +	/* if we receive a TLB conflict abort, we know that there are multiple
> > +	 * TLB entries that translate the same address range. the minimum set
> > +	 * of invalidations to clear these entries is implementation defined.
> > +	 * the maximum set is defined as either tlbi(vmalls12e1) or tlbi(alle1).
> > +	 *
> > +	 * if el2 is enabled and stage 2 translation enabled, this may be
> > +	 * raised as a stage 2 abort. if el2 is enabled but stage 2 translation
> > +	 * disabled, or if el2 is disabled, it will be raised as a stage 1
> > +	 * abort.
> > +	 *
> > +	 * local_flush_tlb_all() does a tlbi(vmalle1), which is enough to
> > +	 * handle a stage 1 abort.
> 
> Same comment about comments.
> 

Will fix.

Kind regard,
Mikołaj


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2024-12-12 15:48           ` Marc Zyngier
  2024-12-12 16:03             ` Ryan Roberts
@ 2024-12-13 16:53             ` Mikołaj Lenczewski
  1 sibling, 0 replies; 27+ messages in thread
From: Mikołaj Lenczewski @ 2024-12-13 16:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ryan Roberts, catalin.marinas, will, corbet, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui, linux-arm-kernel,
	linux-doc, linux-kernel, kvmarm

> > 
> > From the ARM:
> > """
> > RFCPSG: If level 1 or level 2 is supported and the Contiguous bit in a set of
> > Block descriptors or Page descriptors is changed, then a TLB conflict abort can
> > be generated because multiple translation table entries might exist within a TLB
> > that translates the same IA.
> > """
> > 
> > Although I guess it's not totally explicit, I've interpretted that as saying
> > that conflicting TLB entries can only arise for the IA range for which the
> > contiguous bits have been modified in the translation tables.
> 
> Right, that's reassuring, thanks for digging that one.
> 

Thanks Ryan, will improve the comment and commit messages to make this
clearer and to include references to this information.

> > > If anything, this should absolutely check for FAR_EL1 and assert that
> > > this is indeed caused by such change.
> > 
> > I'm not really sure how we would check this reliably? Without patch 5, the
> > problem is somewhat constrained; we could have as many changes in flight as
> > there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
> > being modified. But if patch 5 is confirmed to be architecturally sound, then
> > there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
> > VA-range}'s that could legitimately cause a conflict abort.
> 
> I didn't mean to imply that we should identify the exact cause of the
> abort. I was hoping to simply check that FAR_EL1 reports a userspace
> VA. Why wouldn't that work?
> 

Will do so.

Kind regards,
Mikołaj


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in contpte_convert() under BBML2
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
@ 2024-12-19 16:36   ` Will Deacon
  0 siblings, 0 replies; 27+ messages in thread
From: Will Deacon @ 2024-12-19 16:36 UTC (permalink / raw)
  To: Mikołaj Lenczewski
  Cc: ryan.roberts, catalin.marinas, corbet, maz, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui, linux-arm-kernel,
	linux-doc, linux-kernel, kvmarm

On Wed, Dec 11, 2024 at 04:01:40PM +0000, Mikołaj Lenczewski wrote:
> When converting a region via contpte_convert() to use mTHP, we have two
> different goals. We have to mark each entry as contiguous, and we would
> like to smear the dirty and young (access) bits across all entries in
> the contiguous block. Currently, we do this by first accumulating the
> dirty and young bits in the block, using an atomic
> __ptep_get_and_clear() and the relevant pte_{dirty,young}() calls,
> performing a tlbi, and finally smearing the correct bits across the
> block using __set_ptes().
> 
> This approach works fine for BBM level 0, but with support for BBM level
> 2 we are allowed to reorder the tlbi to after setting the pagetable
> entries. This reordering means that other threads will not see an
> invalid pagetable entry, instead operating on stale data, until we have
> performed our smearing and issued the invalidation. Avoiding this
> invalid entry reduces faults in other threads, and thus improves
> performance marginally (more so when there are more threads).

Please provide the performance data.

Will


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 5/5] arm64/mm: Elide tlbi in contpte_convert() under BBML2
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 5/5] arm64/mm: Elide " Mikołaj Lenczewski
@ 2024-12-19 16:37   ` Will Deacon
  0 siblings, 0 replies; 27+ messages in thread
From: Will Deacon @ 2024-12-19 16:37 UTC (permalink / raw)
  To: Mikołaj Lenczewski
  Cc: ryan.roberts, catalin.marinas, corbet, maz, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui, linux-arm-kernel,
	linux-doc, linux-kernel, kvmarm

On Wed, Dec 11, 2024 at 04:01:41PM +0000, Mikołaj Lenczewski wrote:
> If we support BBM level 2, we can potentially avoid an intermediate
> TLB invalidation, as hardware is capable of managing the TLB itself
> in this situation. Hardware will either silently clear out the
> offending entry, or will take a TLB Conflict Abort Exception.
> 
> Note that such aborts should not occur on Arm hardware and indeed
> were not seen on any of the benchmarked systems.
> 
> Eliding the invalidation results in a 12% improvement on a
> microbenchmark which targeted the worst case of contpte_convert(), which
> represents an 80% reduction in the overhead of contpte_convert().

Can you run something more indicative of real world performance than a
targetted microbenchmark please?

Will


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2024-12-12 16:03             ` Ryan Roberts
@ 2024-12-19 16:45               ` Will Deacon
  2025-01-02 12:07                 ` Jonathan Cameron
  0 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2024-12-19 16:45 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Mikołaj Lenczewski, catalin.marinas, corbet,
	oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	linux-arm-kernel, linux-doc, linux-kernel, kvmarm

On Thu, Dec 12, 2024 at 04:03:52PM +0000, Ryan Roberts wrote:
> >>> If anything, this should absolutely check for FAR_EL1 and assert that
> >>> this is indeed caused by such change.
> >>
> >> I'm not really sure how we would check this reliably? Without patch 5, the
> >> problem is somewhat constrained; we could have as many changes in flight as
> >> there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
> >> being modified. But if patch 5 is confirmed to be architecturally sound, then
> >> there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
> >> VA-range}'s that could legitimately cause a conflict abort.
> > 
> > I didn't mean to imply that we should identify the exact cause of the
> > abort. I was hoping to simply check that FAR_EL1 reports a userspace
> > VA. Why wouldn't that work?
> 
> Ahh gottya! Yes agreed, this sounds like the right approach.

Please, can we just not bother handling conflict aborts at all outside of
KVM? This is all dead code, it's complicated and it doesn't scale to the
in-kernel use-cases that others want. There's also not been any attempt
to add the pKVM support for handling host-side conflict aborts from what
I can tell.

For now, I would suggest limiting this series just to the KVM support
for handling a broken/malicious guest. If the contpte performance
improvements are worthwhile (I've asked for data), then let's add support
for the CPUs that handle the conflict in hardware (I believe this is far
more common than reporting the abort) so that the in-kernel users can
benefit whilst keeping the code manageable at the same time.

Will


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2024-12-19 16:45               ` Will Deacon
@ 2025-01-02 12:07                 ` Jonathan Cameron
  2025-01-02 12:30                   ` Marc Zyngier
  0 siblings, 1 reply; 27+ messages in thread
From: Jonathan Cameron @ 2025-01-02 12:07 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ryan Roberts, Marc Zyngier, Mikołaj Lenczewski,
	catalin.marinas, corbet, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, linux-arm-kernel, linux-doc, linux-kernel, kvmarm,
	yangyicong, guohanjun, wangkefeng.wang, liaochang1, sunnanyong,
	yangyicong, linuxarm

On Thu, 19 Dec 2024 16:45:28 +0000
Will Deacon <will@kernel.org> wrote:

> On Thu, Dec 12, 2024 at 04:03:52PM +0000, Ryan Roberts wrote:
> > >>> If anything, this should absolutely check for FAR_EL1 and assert that
> > >>> this is indeed caused by such change.  
> > >>
> > >> I'm not really sure how we would check this reliably? Without patch 5, the
> > >> problem is somewhat constrained; we could have as many changes in flight as
> > >> there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
> > >> being modified. But if patch 5 is confirmed to be architecturally sound, then
> > >> there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
> > >> VA-range}'s that could legitimately cause a conflict abort.  
> > > 
> > > I didn't mean to imply that we should identify the exact cause of the
> > > abort. I was hoping to simply check that FAR_EL1 reports a userspace
> > > VA. Why wouldn't that work?  
> > 
> > Ahh gottya! Yes agreed, this sounds like the right approach.  
> 
> Please, can we just not bother handling conflict aborts at all outside of
> KVM? This is all dead code, it's complicated and it doesn't scale to the
> in-kernel use-cases that others want. There's also not been any attempt
> to add the pKVM support for handling host-side conflict aborts from what
> I can tell.
> 
> For now, I would suggest limiting this series just to the KVM support
> for handling a broken/malicious guest. If the contpte performance
> improvements are worthwhile (I've asked for data), then let's add support
> for the CPUs that handle the conflict in hardware (I believe this is far
> more common than reporting the abort) so that the in-kernel users can
> benefit whilst keeping the code manageable at the same time.
> 

Hi All,

Given direction the discussion is going in time to raise a hand.

Huawei has implementations that support BBML2, and might report TLB conflict
abort after changing block size directly until an appropriate TLB invalidation
instruction completes and this Implementation Choice is architecturally compliant.

Jonathan

> Will
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2025-01-02 12:07                 ` Jonathan Cameron
@ 2025-01-02 12:30                   ` Marc Zyngier
  2025-01-03 15:35                     ` Will Deacon
  0 siblings, 1 reply; 27+ messages in thread
From: Marc Zyngier @ 2025-01-02 12:30 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Will Deacon, Ryan Roberts, Mikołaj Lenczewski,
	catalin.marinas, corbet, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, linux-arm-kernel, linux-doc, linux-kernel, kvmarm,
	yangyicong, guohanjun, wangkefeng.wang, liaochang1, sunnanyong,
	linuxarm

Hi Jonathan,

On Thu, 02 Jan 2025 12:07:04 +0000,
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> On Thu, 19 Dec 2024 16:45:28 +0000
> Will Deacon <will@kernel.org> wrote:
> 
> > On Thu, Dec 12, 2024 at 04:03:52PM +0000, Ryan Roberts wrote:
> > > >>> If anything, this should absolutely check for FAR_EL1 and assert that
> > > >>> this is indeed caused by such change.  
> > > >>
> > > >> I'm not really sure how we would check this reliably? Without patch 5, the
> > > >> problem is somewhat constrained; we could have as many changes in flight as
> > > >> there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
> > > >> being modified. But if patch 5 is confirmed to be architecturally sound, then
> > > >> there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
> > > >> VA-range}'s that could legitimately cause a conflict abort.  
> > > > 
> > > > I didn't mean to imply that we should identify the exact cause of the
> > > > abort. I was hoping to simply check that FAR_EL1 reports a userspace
> > > > VA. Why wouldn't that work?  
> > > 
> > > Ahh gottya! Yes agreed, this sounds like the right approach.  
> > 
> > Please, can we just not bother handling conflict aborts at all outside of
> > KVM? This is all dead code, it's complicated and it doesn't scale to the
> > in-kernel use-cases that others want. There's also not been any attempt
> > to add the pKVM support for handling host-side conflict aborts from what
> > I can tell.
> > 
> > For now, I would suggest limiting this series just to the KVM support
> > for handling a broken/malicious guest. If the contpte performance
> > improvements are worthwhile (I've asked for data), then let's add support
> > for the CPUs that handle the conflict in hardware (I believe this is far
> > more common than reporting the abort) so that the in-kernel users can
> > benefit whilst keeping the code manageable at the same time.
> > 
> 
> Hi All,
> 
> Given direction the discussion is going in time to raise a hand.
> 
> Huawei has implementations that support BBML2, and might report TLB conflict
> abort after changing block size directly until an appropriate TLB invalidation
> instruction completes and this Implementation Choice is architecturally compliant.

Compliant, absolutely. That's the letter of the spec. The usefulness
aspect is, however, more debatable, and this is what Will is pointing
out.

Dealing with TLB Conflict aborts is an absolute pain if you need
to handle it within the same Translation Regime and using the same
TTBR as the one that has generated the fault. So at least for the time
being, it might be preferable to only worry about the implementations
that will promise to never generate such an abort and quietly perform
an invalidation behind the kernel's back.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2025-01-02 12:30                   ` Marc Zyngier
@ 2025-01-03 15:35                     ` Will Deacon
  2025-01-03 16:00                       ` Ryan Roberts
  0 siblings, 1 reply; 27+ messages in thread
From: Will Deacon @ 2025-01-03 15:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Jonathan Cameron, Ryan Roberts, Mikołaj Lenczewski,
	catalin.marinas, corbet, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, linux-arm-kernel, linux-doc, linux-kernel, kvmarm,
	yangyicong, guohanjun, wangkefeng.wang, liaochang1, sunnanyong,
	linuxarm

On Thu, Jan 02, 2025 at 12:30:34PM +0000, Marc Zyngier wrote:
> On Thu, 02 Jan 2025 12:07:04 +0000,
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > On Thu, 19 Dec 2024 16:45:28 +0000
> > Will Deacon <will@kernel.org> wrote:
> > > On Thu, Dec 12, 2024 at 04:03:52PM +0000, Ryan Roberts wrote:
> > > > >>> If anything, this should absolutely check for FAR_EL1 and assert that
> > > > >>> this is indeed caused by such change.  
> > > > >>
> > > > >> I'm not really sure how we would check this reliably? Without patch 5, the
> > > > >> problem is somewhat constrained; we could have as many changes in flight as
> > > > >> there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
> > > > >> being modified. But if patch 5 is confirmed to be architecturally sound, then
> > > > >> there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
> > > > >> VA-range}'s that could legitimately cause a conflict abort.  
> > > > > 
> > > > > I didn't mean to imply that we should identify the exact cause of the
> > > > > abort. I was hoping to simply check that FAR_EL1 reports a userspace
> > > > > VA. Why wouldn't that work?  
> > > > 
> > > > Ahh gottya! Yes agreed, this sounds like the right approach.  
> > > 
> > > Please, can we just not bother handling conflict aborts at all outside of
> > > KVM? This is all dead code, it's complicated and it doesn't scale to the
> > > in-kernel use-cases that others want. There's also not been any attempt
> > > to add the pKVM support for handling host-side conflict aborts from what
> > > I can tell.
> > > 
> > > For now, I would suggest limiting this series just to the KVM support
> > > for handling a broken/malicious guest. If the contpte performance
> > > improvements are worthwhile (I've asked for data), then let's add support
> > > for the CPUs that handle the conflict in hardware (I believe this is far
> > > more common than reporting the abort) so that the in-kernel users can
> > > benefit whilst keeping the code manageable at the same time.
> > > 
> > 
> > Given direction the discussion is going in time to raise a hand.
> > 
> > Huawei has implementations that support BBML2, and might report TLB conflict
> > abort after changing block size directly until an appropriate TLB invalidation
> > instruction completes and this Implementation Choice is architecturally compliant.
> 
> Compliant, absolutely. That's the letter of the spec. The usefulness
> aspect is, however, more debatable, and this is what Will is pointing
> out.
> 
> Dealing with TLB Conflict aborts is an absolute pain if you need
> to handle it within the same Translation Regime and using the same
> TTBR as the one that has generated the fault. So at least for the time
> being, it might be preferable to only worry about the implementations
> that will promise to never generate such an abort and quietly perform
> an invalidation behind the kernel's back.

Agreed. We're not dropping support for CPUs that don't give us what we'd
like here, we're just not bending over to port and maintain new
optimisations for them. I think that's a reasonable compromise?

That said, thanks for raising this, Jonathan. It's a useful data point
to know that TLB conflict aborts exist in the wild!

Will


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2025-01-03 15:35                     ` Will Deacon
@ 2025-01-03 16:00                       ` Ryan Roberts
  2025-01-03 18:18                         ` Jonathan Cameron
  0 siblings, 1 reply; 27+ messages in thread
From: Ryan Roberts @ 2025-01-03 16:00 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier
  Cc: Jonathan Cameron, Mikołaj Lenczewski, catalin.marinas,
	corbet, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	linux-arm-kernel, linux-doc, linux-kernel, kvmarm, yangyicong,
	guohanjun, wangkefeng.wang, liaochang1, sunnanyong, linuxarm

On 03/01/2025 15:35, Will Deacon wrote:
> On Thu, Jan 02, 2025 at 12:30:34PM +0000, Marc Zyngier wrote:
>> On Thu, 02 Jan 2025 12:07:04 +0000,
>> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>>> On Thu, 19 Dec 2024 16:45:28 +0000
>>> Will Deacon <will@kernel.org> wrote:
>>>> On Thu, Dec 12, 2024 at 04:03:52PM +0000, Ryan Roberts wrote:
>>>>>>>> If anything, this should absolutely check for FAR_EL1 and assert that
>>>>>>>> this is indeed caused by such change.  
>>>>>>>
>>>>>>> I'm not really sure how we would check this reliably? Without patch 5, the
>>>>>>> problem is somewhat constrained; we could have as many changes in flight as
>>>>>>> there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
>>>>>>> being modified. But if patch 5 is confirmed to be architecturally sound, then
>>>>>>> there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
>>>>>>> VA-range}'s that could legitimately cause a conflict abort.  
>>>>>>
>>>>>> I didn't mean to imply that we should identify the exact cause of the
>>>>>> abort. I was hoping to simply check that FAR_EL1 reports a userspace
>>>>>> VA. Why wouldn't that work?  
>>>>>
>>>>> Ahh gottya! Yes agreed, this sounds like the right approach.  
>>>>
>>>> Please, can we just not bother handling conflict aborts at all outside of
>>>> KVM? This is all dead code, it's complicated and it doesn't scale to the
>>>> in-kernel use-cases that others want. There's also not been any attempt
>>>> to add the pKVM support for handling host-side conflict aborts from what
>>>> I can tell.
>>>>
>>>> For now, I would suggest limiting this series just to the KVM support
>>>> for handling a broken/malicious guest. If the contpte performance
>>>> improvements are worthwhile (I've asked for data), then let's add support
>>>> for the CPUs that handle the conflict in hardware (I believe this is far
>>>> more common than reporting the abort) so that the in-kernel users can
>>>> benefit whilst keeping the code manageable at the same time.
>>>>
>>>
>>> Given direction the discussion is going in time to raise a hand.
>>>
>>> Huawei has implementations that support BBML2, and might report TLB conflict
>>> abort after changing block size directly until an appropriate TLB invalidation
>>> instruction completes and this Implementation Choice is architecturally compliant.
>>
>> Compliant, absolutely. That's the letter of the spec. The usefulness
>> aspect is, however, more debatable, and this is what Will is pointing
>> out.
>>
>> Dealing with TLB Conflict aborts is an absolute pain if you need
>> to handle it within the same Translation Regime and using the same
>> TTBR as the one that has generated the fault. So at least for the time
>> being, it might be preferable to only worry about the implementations
>> that will promise to never generate such an abort and quietly perform
>> an invalidation behind the kernel's back.
> 
> Agreed. We're not dropping support for CPUs that don't give us what we'd
> like here, we're just not bending over to port and maintain new
> optimisations for them. I think that's a reasonable compromise?
> 
> That said, thanks for raising this, Jonathan. It's a useful data point
> to know that TLB conflict aborts exist in the wild!

Indeed. Just to make it explicit; if we were to support all architecturally
compliant BBML2 implementations, we would need to drop the final patch in this
series. But since it sounds like we will be taking the approach of only allowing
these optimizations for implementations that never raise conflict abort and
handle it all in HW, it should be safe to keep the optimization in that final
patch. I'll work with Miko to get this bashed into shape and reposted.

Thanks,
Ryan

> 
> Will



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
  2025-01-03 16:00                       ` Ryan Roberts
@ 2025-01-03 18:18                         ` Jonathan Cameron
  0 siblings, 0 replies; 27+ messages in thread
From: Jonathan Cameron @ 2025-01-03 18:18 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Will Deacon, Marc Zyngier, Mikołaj Lenczewski,
	catalin.marinas, corbet, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, linux-arm-kernel, linux-doc, linux-kernel, kvmarm,
	yangyicong, guohanjun, wangkefeng.wang, liaochang1, sunnanyong,
	linuxarm

On Fri, 3 Jan 2025 16:00:59 +0000
Ryan Roberts <ryan.roberts@arm.com> wrote:

> On 03/01/2025 15:35, Will Deacon wrote:
> > On Thu, Jan 02, 2025 at 12:30:34PM +0000, Marc Zyngier wrote:  
> >> On Thu, 02 Jan 2025 12:07:04 +0000,
> >> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:  
> >>> On Thu, 19 Dec 2024 16:45:28 +0000
> >>> Will Deacon <will@kernel.org> wrote:  
> >>>> On Thu, Dec 12, 2024 at 04:03:52PM +0000, Ryan Roberts wrote:  
> >>>>>>>> If anything, this should absolutely check for FAR_EL1 and assert that
> >>>>>>>> this is indeed caused by such change.    
> >>>>>>>
> >>>>>>> I'm not really sure how we would check this reliably? Without patch 5, the
> >>>>>>> problem is somewhat constrained; we could have as many changes in flight as
> >>>>>>> there are CPUs so we could keep a list of all the {mm_struct, VA-range} that are
> >>>>>>> being modified. But if patch 5 is confirmed to be architecturally sound, then
> >>>>>>> there is no "terminating tlbi" so there is no bound on the set of {mm_struct,
> >>>>>>> VA-range}'s that could legitimately cause a conflict abort.    
> >>>>>>
> >>>>>> I didn't mean to imply that we should identify the exact cause of the
> >>>>>> abort. I was hoping to simply check that FAR_EL1 reports a userspace
> >>>>>> VA. Why wouldn't that work?    
> >>>>>
> >>>>> Ahh gottya! Yes agreed, this sounds like the right approach.    
> >>>>
> >>>> Please, can we just not bother handling conflict aborts at all outside of
> >>>> KVM? This is all dead code, it's complicated and it doesn't scale to the
> >>>> in-kernel use-cases that others want. There's also not been any attempt
> >>>> to add the pKVM support for handling host-side conflict aborts from what
> >>>> I can tell.
> >>>>
> >>>> For now, I would suggest limiting this series just to the KVM support
> >>>> for handling a broken/malicious guest. If the contpte performance
> >>>> improvements are worthwhile (I've asked for data), then let's add support
> >>>> for the CPUs that handle the conflict in hardware (I believe this is far
> >>>> more common than reporting the abort) so that the in-kernel users can
> >>>> benefit whilst keeping the code manageable at the same time.
> >>>>  
> >>>
> >>> Given direction the discussion is going in time to raise a hand.
> >>>
> >>> Huawei has implementations that support BBML2, and might report TLB conflict
> >>> abort after changing block size directly until an appropriate TLB invalidation
> >>> instruction completes and this Implementation Choice is architecturally compliant.  
> >>
> >> Compliant, absolutely. That's the letter of the spec. The usefulness
> >> aspect is, however, more debatable, and this is what Will is pointing
> >> out.
> >>
> >> Dealing with TLB Conflict aborts is an absolute pain if you need
> >> to handle it within the same Translation Regime and using the same
> >> TTBR as the one that has generated the fault. So at least for the time
> >> being, it might be preferable to only worry about the implementations
> >> that will promise to never generate such an abort and quietly perform
> >> an invalidation behind the kernel's back.  
> > 
> > Agreed. We're not dropping support for CPUs that don't give us what we'd
> > like here, we're just not bending over to port and maintain new
> > optimisations for them. I think that's a reasonable compromise?

Subject to usual maintainability vs performance questions sure.
Given we are the ones with the implementation, it's perhaps up to us
to prove the added complexity for a given optimization is worth the hassle
(maybe leaning on Arm to help out ;)  We have some activity going on
around this, but are unfortunately not ready to share.

> > 
> > That said, thanks for raising this, Jonathan. It's a useful data point
> > to know that TLB conflict aborts exist in the wild!  

My work here is done ;)

> 
> Indeed. Just to make it explicit; if we were to support all architecturally
> compliant BBML2 implementations, we would need to drop the final patch in this
> series. But since it sounds like we will be taking the approach of only allowing
> these optimizations for implementations that never raise conflict abort and
> handle it all in HW, it should be safe to keep the optimization in that final
> patch. I'll work with Miko to get this bashed into shape and reposted.

Obviously I'd want perf numbers to justify it (working on that) but I'd like
to keep on the table the option of patch 5 being the only part that is dependent
on being non conflict aborting hardware. I think even that is a performance
question rather than a correctness one - it simply widens the window in which
we might see a fault and have to dump the TLB. (I may well have missed something
though).

As a side note on that last patch, it is easy to conceive of a BBML2
solution that doesn't do conflict aborts, but for which it is still a performance
nightmare to not flush. As a fictional implementation, where our CPUs get a conflict
abort, we could instead have stalled the core and pushed the abort info to a management
controller, and flushed the whole TLB to resolve (plus probably the CPU pipeline).
If sufficiently rare that's not a totally stupid implementation (subject to some
optimizations). It is basically offloading what we are going to do in software on
a conflict abort with somewhat similar performance cost making widening the window
a very bad idea.

So the proposed allow list might need to be rather more nuanced than "can we get
a fault?"  We all love per uarch performance related opt ins.

Jonathan

> 
> Thanks,
> Ryan
> 
> > 
> > Will  
> 
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-01-03 18:25 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-11 16:01 [RESEND RFC PATCH v1 0/5] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM Mikołaj Lenczewski
2024-12-11 17:40   ` Marc Zyngier
2024-12-12  9:23     ` Ryan Roberts
2024-12-12  9:57       ` Marc Zyngier
2024-12-12 10:37         ` Ryan Roberts
2024-12-13 16:24     ` Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
2024-12-12  8:25   ` Marc Zyngier
2024-12-12 10:55     ` Ryan Roberts
2024-12-12 14:26       ` Marc Zyngier
2024-12-12 15:05         ` Ryan Roberts
2024-12-12 15:48           ` Marc Zyngier
2024-12-12 16:03             ` Ryan Roberts
2024-12-19 16:45               ` Will Deacon
2025-01-02 12:07                 ` Jonathan Cameron
2025-01-02 12:30                   ` Marc Zyngier
2025-01-03 15:35                     ` Will Deacon
2025-01-03 16:00                       ` Ryan Roberts
2025-01-03 18:18                         ` Jonathan Cameron
2024-12-13 16:53             ` Mikołaj Lenczewski
2024-12-13 16:49     ` Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 3/5] arm64: Add errata and workarounds for systems with broken BBML2 Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
2024-12-19 16:36   ` Will Deacon
2024-12-11 16:01 ` [RESEND RFC PATCH v1 5/5] arm64/mm: Elide " Mikołaj Lenczewski
2024-12-19 16:37   ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).