public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] KVM: arm64: Fix SPE and TRBE nVHE world switch
@ 2026-02-27 21:21 Will Deacon
  2026-02-27 21:21 ` [PATCH v2 1/3] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context Will Deacon
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Will Deacon @ 2026-02-27 21:21 UTC (permalink / raw)
  To: kvmarm
  Cc: mark.rutland, linux-arm-kernel, Will Deacon, Marc Zyngier,
	Oliver Upton, James Clark, Leo Yan, Suzuki K Poulose, Fuad Tabba,
	Alexandru Elisei, Yabin Cui

Hi folks,

This is version two of the patch that I originally sent here:

  https://lore.kernel.org/r/20260216130959.19317-1-will@kernel.org

Changes since v1 include:

  * Add missing memory barriers and comments for each step
  * Work around known CPU errata in TRBE draining / enabling sequences
  * Fix SPE as well!
  * Minor clean up to BRBE so that it smells the same as SPE and TRBE

As before, I've not been able to test this as-is.

Cheers,

Will

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Yabin Cui <yabinc@google.com>

--->8

Will Deacon (3):
  KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest
    context
  KVM: arm64: Disable SPE Profiling Buffer when running in guest context
  KVM: arm64: Don't pass host_debug_state to BRBE world-switch routines

 arch/arm64/include/asm/kvm_host.h  |   2 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c | 118 +++++++++++++++++++++++------
 arch/arm64/kvm/hyp/nvhe/switch.c   |   2 +-
 3 files changed, 96 insertions(+), 26 deletions(-)

-- 
2.53.0.473.g4a7958ca14-goog



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/3] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
  2026-02-27 21:21 [PATCH v2 0/3] KVM: arm64: Fix SPE and TRBE nVHE world switch Will Deacon
@ 2026-02-27 21:21 ` Will Deacon
  2026-03-03  9:23   ` Suzuki K Poulose
                     ` (2 more replies)
  2026-02-27 21:21 ` [PATCH v2 2/3] KVM: arm64: Disable SPE Profiling Buffer " Will Deacon
  2026-02-27 21:21 ` [PATCH v2 3/3] KVM: arm64: Don't pass host_debug_state to BRBE world-switch routines Will Deacon
  2 siblings, 3 replies; 14+ messages in thread
From: Will Deacon @ 2026-02-27 21:21 UTC (permalink / raw)
  To: kvmarm
  Cc: mark.rutland, linux-arm-kernel, Will Deacon, Marc Zyngier,
	Oliver Upton, James Clark, Leo Yan, Suzuki K Poulose, Fuad Tabba,
	Alexandru Elisei, Yabin Cui

The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
generation in guest context when self-hosted TRBE is in use by the host.

Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
per R_YCHKJ the Trace Buffer Unit will still be enabled if
TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
Trace Buffer Unit can perform address translation for the "owning
exception level" even when it is out of context.

Consequently, we can end up in a state where TRBE performs speculative
page-table walks for a host VA/IPA in guest/hypervisor context depending
on the value of MDCR_EL2.E2TB, which changes over world-switch. The
potential result appears to be a heady mixture of SErrors, data
corruption and hardware lockups.

Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
draining the buffer, restoring the register on return to the host. This
unfortunately means we need to tackle CPU errata #2064142 and #2038923
which add additional synchronisation requirements around manipulations
of the limit register. Hopefully this doesn't need to be fast.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h  |  1 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c | 73 ++++++++++++++++++++++++++----
 arch/arm64/kvm/hyp/nvhe/switch.c   |  2 +-
 3 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 5d5a3bbdb95e..1532ad2b2ec2 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -770,6 +770,7 @@ struct kvm_host_data {
 		u64 pmscr_el1;
 		/* Self-hosted trace */
 		u64 trfcr_el1;
+		u64 trblimitr_el1;
 		/* Values of trap registers for the host before guest entry. */
 		u64 mdcr_el2;
 		u64 brbcr_el1;
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 2a1c0f49792b..3dbdee1148d3 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -57,12 +57,56 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
 	write_sysreg_el1(new_trfcr, SYS_TRFCR);
 }
 
-static bool __trace_needs_drain(void)
+static void __trace_drain_and_disable(void)
 {
-	if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
-		return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
+	u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
 
-	return host_data_test_flag(TRBE_ENABLED);
+	*trblimitr_el1 = 0;
+
+	if (is_protected_kvm_enabled()) {
+		if (!host_data_test_flag(HAS_TRBE))
+			return;
+	} else {
+		if (!host_data_test_flag(TRBE_ENABLED))
+			return;
+	}
+
+	*trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
+	if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
+		/*
+		 * The host has enabled the Trace Buffer Unit so we have
+		 * to beat the CPU with a stick until it stops accessing
+		 * memory.
+		 */
+
+		/* First, ensure that our prior write to TRFCR has stuck. */
+		isb();
+
+		/* Now synchronise with the trace and drain the buffer. */
+		tsb_csync();
+		dsb(nsh);
+
+		/*
+		 * With no more trace being generated, we can disable the
+		 * Trace Buffer Unit.
+		 */
+		write_sysreg_s(0, SYS_TRBLIMITR_EL1);
+		if (cpus_have_final_cap(ARM64_WORKAROUND_2064142)) {
+			/*
+			 * Some CPUs are so good, we have to drain 'em
+			 * twice.
+			 */
+			tsb_csync();
+			dsb(nsh);
+		}
+
+		/*
+		 * Ensure that the Trace Buffer Unit is disabled before
+		 * we start mucking with the stage-2 and trap
+		 * configuration.
+		 */
+		isb();
+	}
 }
 
 static bool __trace_needs_switch(void)
@@ -79,15 +123,26 @@ static void __trace_switch_to_guest(void)
 
 	__trace_do_switch(host_data_ptr(host_debug_state.trfcr_el1),
 			  *host_data_ptr(trfcr_while_in_guest));
-
-	if (__trace_needs_drain()) {
-		isb();
-		tsb_csync();
-	}
+	__trace_drain_and_disable();
 }
 
 static void __trace_switch_to_host(void)
 {
+	u64 trblimitr_el1 = *host_data_ptr(host_debug_state.trblimitr_el1);
+
+	if (trblimitr_el1 & TRBLIMITR_EL1_E) {
+		/* Re-enable the Trace Buffer Unit for the host. */
+		write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
+		isb();
+		if (cpus_have_final_cap(ARM64_WORKAROUND_2038923)) {
+			/*
+			 * Make sure the unit is re-enabled before we
+			 * poke TRFCR.
+			 */
+			isb();
+		}
+	}
+
 	__trace_do_switch(host_data_ptr(trfcr_while_in_guest),
 			  *host_data_ptr(host_debug_state.trfcr_el1));
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 779089e42681..f00688e69d88 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -278,7 +278,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 * We're about to restore some new MMU state. Make sure
 	 * ongoing page-table walks that have started before we
 	 * trapped to EL2 have completed. This also synchronises the
-	 * above disabling of BRBE, SPE and TRBE.
+	 * above disabling of BRBE and SPE.
 	 *
 	 * See DDI0487I.a D8.1.5 "Out-of-context translation regimes",
 	 * rule R_LFHQG and subsequent information statements.
-- 
2.53.0.473.g4a7958ca14-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/3] KVM: arm64: Disable SPE Profiling Buffer when running in guest context
  2026-02-27 21:21 [PATCH v2 0/3] KVM: arm64: Fix SPE and TRBE nVHE world switch Will Deacon
  2026-02-27 21:21 ` [PATCH v2 1/3] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context Will Deacon
@ 2026-02-27 21:21 ` Will Deacon
  2026-03-03  9:48   ` Suzuki K Poulose
                     ` (2 more replies)
  2026-02-27 21:21 ` [PATCH v2 3/3] KVM: arm64: Don't pass host_debug_state to BRBE world-switch routines Will Deacon
  2 siblings, 3 replies; 14+ messages in thread
From: Will Deacon @ 2026-02-27 21:21 UTC (permalink / raw)
  To: kvmarm
  Cc: mark.rutland, linux-arm-kernel, Will Deacon, Marc Zyngier,
	Oliver Upton, James Clark, Leo Yan, Suzuki K Poulose, Fuad Tabba,
	Alexandru Elisei, Yabin Cui

The nVHE world-switch code relies on zeroing PMSCR_EL1 to disable
profiling data generation in guest context when SPE is in use by the
host.

Unfortunately, this may leave PMBLIMITR_EL1.E set and consequently we
can end up running in guest/hypervisor context with the Profiling Buffer
enabled. The current "known issues" document for Rev M.a of the Arm ARM
states that this can lead to speculative, out-of-context translations:

  | 2.18 D23136:
  |
  | When the Profiling Buffer is enabled, profiling is not stopped, and
  | Discard mode is not enabled, the Statistical Profiling Unit might
  | cause speculative translations for the owning translation regime,
  | including when the owning translation regime is out-of-context.

In a similar fashion to TRBE, ensure that the Profiling Buffer is
disabled during the nVHE world switch before we start messing with the
stage-2 MMU and trap configuration.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Fixes: f85279b4bd48 ("arm64: KVM: Save/restore the host SPE state when entering/leaving a VM")
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h  |  1 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c | 33 ++++++++++++++++++++----------
 arch/arm64/kvm/hyp/nvhe/switch.c   |  2 +-
 3 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 1532ad2b2ec2..d527c77977dd 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -768,6 +768,7 @@ struct kvm_host_data {
 		struct kvm_guest_debug_arch regs;
 		/* Statistical profiling extension */
 		u64 pmscr_el1;
+		u64 pmblimitr_el1;
 		/* Self-hosted trace */
 		u64 trfcr_el1;
 		u64 trblimitr_el1;
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 3dbdee1148d3..75158a9cd06a 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -14,20 +14,20 @@
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 
-static void __debug_save_spe(u64 *pmscr_el1)
+static void __debug_save_spe(void)
 {
-	u64 reg;
+	u64 *pmscr_el1, *pmblimitr_el1;
 
-	/* Clear pmscr in case of early return */
-	*pmscr_el1 = 0;
+	pmscr_el1 = host_data_ptr(host_debug_state.pmscr_el1);
+	pmblimitr_el1 = host_data_ptr(host_debug_state.pmblimitr_el1);
 
 	/*
 	 * At this point, we know that this CPU implements
 	 * SPE and is available to the host.
 	 * Check if the host is actually using it ?
 	 */
-	reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
-	if (!(reg & BIT(PMBLIMITR_EL1_E_SHIFT)))
+	*pmblimitr_el1 = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (!(*pmblimitr_el1 & BIT(PMBLIMITR_EL1_E_SHIFT)))
 		return;
 
 	/* Yes; save the control register and disable data generation */
@@ -37,18 +37,29 @@ static void __debug_save_spe(u64 *pmscr_el1)
 
 	/* Now drain all buffered data to memory */
 	psb_csync();
+	dsb(nsh);
+
+	/* And disable the profiling buffer */
+	write_sysreg_s(0, SYS_PMBLIMITR_EL1);
+	isb();
 }
 
-static void __debug_restore_spe(u64 pmscr_el1)
+static void __debug_restore_spe(void)
 {
-	if (!pmscr_el1)
+	u64 pmblimitr_el1 = *host_data_ptr(host_debug_state.pmblimitr_el1);
+
+	if (!(pmblimitr_el1 & BIT(PMBLIMITR_EL1_E_SHIFT)))
 		return;
 
 	/* The host page table is installed, but not yet synchronised */
 	isb();
 
+	/* Re-enable the profiling buffer. */
+	write_sysreg_s(pmblimitr_el1, SYS_PMBLIMITR_EL1);
+	isb();
+
 	/* Re-enable data generation */
-	write_sysreg_el1(pmscr_el1, SYS_PMSCR);
+	write_sysreg_el1(*host_data_ptr(host_debug_state.pmscr_el1), SYS_PMSCR);
 }
 
 static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
@@ -177,7 +188,7 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
 {
 	/* Disable and flush SPE data generation */
 	if (host_data_test_flag(HAS_SPE))
-		__debug_save_spe(host_data_ptr(host_debug_state.pmscr_el1));
+		__debug_save_spe();
 
 	/* Disable BRBE branch records */
 	if (host_data_test_flag(HAS_BRBE))
@@ -195,7 +206,7 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
 {
 	if (host_data_test_flag(HAS_SPE))
-		__debug_restore_spe(*host_data_ptr(host_debug_state.pmscr_el1));
+		__debug_restore_spe();
 	if (host_data_test_flag(HAS_BRBE))
 		__debug_restore_brbe(*host_data_ptr(host_debug_state.brbcr_el1));
 	if (__trace_needs_switch())
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index f00688e69d88..9b6e87dac3b9 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -278,7 +278,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 * We're about to restore some new MMU state. Make sure
 	 * ongoing page-table walks that have started before we
 	 * trapped to EL2 have completed. This also synchronises the
-	 * above disabling of BRBE and SPE.
+	 * above disabling of BRBE.
 	 *
 	 * See DDI0487I.a D8.1.5 "Out-of-context translation regimes",
 	 * rule R_LFHQG and subsequent information statements.
-- 
2.53.0.473.g4a7958ca14-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/3] KVM: arm64: Don't pass host_debug_state to BRBE world-switch routines
  2026-02-27 21:21 [PATCH v2 0/3] KVM: arm64: Fix SPE and TRBE nVHE world switch Will Deacon
  2026-02-27 21:21 ` [PATCH v2 1/3] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context Will Deacon
  2026-02-27 21:21 ` [PATCH v2 2/3] KVM: arm64: Disable SPE Profiling Buffer " Will Deacon
@ 2026-02-27 21:21 ` Will Deacon
  2026-03-25 19:28   ` Fuad Tabba
  2 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2026-02-27 21:21 UTC (permalink / raw)
  To: kvmarm
  Cc: mark.rutland, linux-arm-kernel, Will Deacon, Marc Zyngier,
	Oliver Upton, James Clark, Leo Yan, Suzuki K Poulose, Fuad Tabba,
	Alexandru Elisei, Yabin Cui

Now that the SPE and BRBE nVHE world-switch routines operate on the
host_debug_state directly, tweak the BRBE code to do the same for
consistency.

This is purely cosmetic.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/debug-sr.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 75158a9cd06a..06ae122a7761 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -158,9 +158,9 @@ static void __trace_switch_to_host(void)
 			  *host_data_ptr(host_debug_state.trfcr_el1));
 }
 
-static void __debug_save_brbe(u64 *brbcr_el1)
+static void __debug_save_brbe(void)
 {
-	*brbcr_el1 = 0;
+	u64 *brbcr_el1 = host_data_ptr(host_debug_state.brbcr_el1);
 
 	/* Check if the BRBE is enabled */
 	if (!(read_sysreg_el1(SYS_BRBCR) & (BRBCR_ELx_E0BRE | BRBCR_ELx_ExBRE)))
@@ -175,8 +175,10 @@ static void __debug_save_brbe(u64 *brbcr_el1)
 	write_sysreg_el1(0, SYS_BRBCR);
 }
 
-static void __debug_restore_brbe(u64 brbcr_el1)
+static void __debug_restore_brbe(void)
 {
+	u64 brbcr_el1 = *host_data_ptr(host_debug_state.brbcr_el1);
+
 	if (!brbcr_el1)
 		return;
 
@@ -192,7 +194,7 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
 
 	/* Disable BRBE branch records */
 	if (host_data_test_flag(HAS_BRBE))
-		__debug_save_brbe(host_data_ptr(host_debug_state.brbcr_el1));
+		__debug_save_brbe();
 
 	if (__trace_needs_switch())
 		__trace_switch_to_guest();
@@ -208,7 +210,7 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
 	if (host_data_test_flag(HAS_SPE))
 		__debug_restore_spe();
 	if (host_data_test_flag(HAS_BRBE))
-		__debug_restore_brbe(*host_data_ptr(host_debug_state.brbcr_el1));
+		__debug_restore_brbe();
 	if (__trace_needs_switch())
 		__trace_switch_to_host();
 }
-- 
2.53.0.473.g4a7958ca14-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
  2026-02-27 21:21 ` [PATCH v2 1/3] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context Will Deacon
@ 2026-03-03  9:23   ` Suzuki K Poulose
  2026-03-03 17:39   ` Leo Yan
  2026-03-25 19:27   ` Fuad Tabba
  2 siblings, 0 replies; 14+ messages in thread
From: Suzuki K Poulose @ 2026-03-03  9:23 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: mark.rutland, linux-arm-kernel, Marc Zyngier, Oliver Upton,
	James Clark, Leo Yan, Fuad Tabba, Alexandru Elisei, Yabin Cui

On 27/02/2026 21:21, Will Deacon wrote:
> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> generation in guest context when self-hosted TRBE is in use by the host.
> 
> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> per R_YCHKJ the Trace Buffer Unit will still be enabled if
> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> Trace Buffer Unit can perform address translation for the "owning
> exception level" even when it is out of context.
> 
> Consequently, we can end up in a state where TRBE performs speculative
> page-table walks for a host VA/IPA in guest/hypervisor context depending
> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
> potential result appears to be a heady mixture of SErrors, data
> corruption and hardware lockups.
> 
> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
> draining the buffer, restoring the register on return to the host. This
> unfortunately means we need to tackle CPU errata #2064142 and #2038923
> which add additional synchronisation requirements around manipulations
> of the limit register. Hopefully this doesn't need to be fast.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Oliver Upton <oupton@kernel.org>
> Cc: James Clark <james.clark@linaro.org>
> Cc: Leo Yan <leo.yan@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Fuad Tabba <tabba@google.com>
> Cc: Alexandru Elisei <alexandru.elisei@arm.com>
> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
> Signed-off-by: Will Deacon <will@kernel.org>

Thanks for fixing this. Looks correct to me,

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>



> ---
>   arch/arm64/include/asm/kvm_host.h  |  1 +
>   arch/arm64/kvm/hyp/nvhe/debug-sr.c | 73 ++++++++++++++++++++++++++----
>   arch/arm64/kvm/hyp/nvhe/switch.c   |  2 +-
>   3 files changed, 66 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 5d5a3bbdb95e..1532ad2b2ec2 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -770,6 +770,7 @@ struct kvm_host_data {
>   		u64 pmscr_el1;
>   		/* Self-hosted trace */
>   		u64 trfcr_el1;
> +		u64 trblimitr_el1;
>   		/* Values of trap registers for the host before guest entry. */
>   		u64 mdcr_el2;
>   		u64 brbcr_el1;
> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> index 2a1c0f49792b..3dbdee1148d3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> @@ -57,12 +57,56 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
>   	write_sysreg_el1(new_trfcr, SYS_TRFCR);
>   }
>   
> -static bool __trace_needs_drain(void)
> +static void __trace_drain_and_disable(void)
>   {
> -	if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
> -		return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
> +	u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
>   
> -	return host_data_test_flag(TRBE_ENABLED);
> +	*trblimitr_el1 = 0;
> +
> +	if (is_protected_kvm_enabled()) {
> +		if (!host_data_test_flag(HAS_TRBE))
> +			return;
> +	} else {
> +		if (!host_data_test_flag(TRBE_ENABLED))
> +			return;
> +	}
> +
> +	*trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +	if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
> +		/*
> +		 * The host has enabled the Trace Buffer Unit so we have
> +		 * to beat the CPU with a stick until it stops accessing
> +		 * memory.
> +		 */
> +
> +		/* First, ensure that our prior write to TRFCR has stuck. */
> +		isb();
> +
> +		/* Now synchronise with the trace and drain the buffer. */
> +		tsb_csync();
> +		dsb(nsh);
> +
> +		/*
> +		 * With no more trace being generated, we can disable the
> +		 * Trace Buffer Unit.
> +		 */
> +		write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> +		if (cpus_have_final_cap(ARM64_WORKAROUND_2064142)) {
> +			/*
> +			 * Some CPUs are so good, we have to drain 'em
> +			 * twice.
> +			 */
> +			tsb_csync();
> +			dsb(nsh);
> +		}
> +
> +		/*
> +		 * Ensure that the Trace Buffer Unit is disabled before
> +		 * we start mucking with the stage-2 and trap
> +		 * configuration.
> +		 */
> +		isb();
> +	}
>   }
>   
>   static bool __trace_needs_switch(void)
> @@ -79,15 +123,26 @@ static void __trace_switch_to_guest(void)
>   
>   	__trace_do_switch(host_data_ptr(host_debug_state.trfcr_el1),
>   			  *host_data_ptr(trfcr_while_in_guest));
> -
> -	if (__trace_needs_drain()) {
> -		isb();
> -		tsb_csync();
> -	}
> +	__trace_drain_and_disable();
>   }
>   
>   static void __trace_switch_to_host(void)
>   {
> +	u64 trblimitr_el1 = *host_data_ptr(host_debug_state.trblimitr_el1);
> +
> +	if (trblimitr_el1 & TRBLIMITR_EL1_E) {
> +		/* Re-enable the Trace Buffer Unit for the host. */
> +		write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
> +		isb();
> +		if (cpus_have_final_cap(ARM64_WORKAROUND_2038923)) {
> +			/*
> +			 * Make sure the unit is re-enabled before we
> +			 * poke TRFCR.
> +			 */
> +			isb();
> +		}
> +	}
> +
>   	__trace_do_switch(host_data_ptr(trfcr_while_in_guest),
>   			  *host_data_ptr(host_debug_state.trfcr_el1));
>   }
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index 779089e42681..f00688e69d88 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -278,7 +278,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>   	 * We're about to restore some new MMU state. Make sure
>   	 * ongoing page-table walks that have started before we
>   	 * trapped to EL2 have completed. This also synchronises the
> -	 * above disabling of BRBE, SPE and TRBE.
> +	 * above disabling of BRBE and SPE.
>   	 *
>   	 * See DDI0487I.a D8.1.5 "Out-of-context translation regimes",
>   	 * rule R_LFHQG and subsequent information statements.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/3] KVM: arm64: Disable SPE Profiling Buffer when running in guest context
  2026-02-27 21:21 ` [PATCH v2 2/3] KVM: arm64: Disable SPE Profiling Buffer " Will Deacon
@ 2026-03-03  9:48   ` Suzuki K Poulose
  2026-03-03 14:39     ` Will Deacon
  2026-03-25 16:34   ` Alexandru Elisei
  2026-03-25 19:28   ` Fuad Tabba
  2 siblings, 1 reply; 14+ messages in thread
From: Suzuki K Poulose @ 2026-03-03  9:48 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: mark.rutland, linux-arm-kernel, Marc Zyngier, Oliver Upton,
	James Clark, Leo Yan, Fuad Tabba, Alexandru Elisei, Yabin Cui

On 27/02/2026 21:21, Will Deacon wrote:
> The nVHE world-switch code relies on zeroing PMSCR_EL1 to disable
> profiling data generation in guest context when SPE is in use by the
> host.
> 
> Unfortunately, this may leave PMBLIMITR_EL1.E set and consequently we
> can end up running in guest/hypervisor context with the Profiling Buffer
> enabled. The current "known issues" document for Rev M.a of the Arm ARM
> states that this can lead to speculative, out-of-context translations:
> 
>    | 2.18 D23136:
>    |
>    | When the Profiling Buffer is enabled, profiling is not stopped, and
>    | Discard mode is not enabled, the Statistical Profiling Unit might
>    | cause speculative translations for the owning translation regime,
>    | including when the owning translation regime is out-of-context.
> 
> In a similar fashion to TRBE, ensure that the Profiling Buffer is
> disabled during the nVHE world switch before we start messing with the
> stage-2 MMU and trap configuration.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Oliver Upton <oupton@kernel.org>
> Cc: James Clark <james.clark@linaro.org>
> Cc: Leo Yan <leo.yan@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Fuad Tabba <tabba@google.com>
> Cc: Alexandru Elisei <alexandru.elisei@arm.com>
> Fixes: f85279b4bd48 ("arm64: KVM: Save/restore the host SPE state when entering/leaving a VM")
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_host.h  |  1 +
>   arch/arm64/kvm/hyp/nvhe/debug-sr.c | 33 ++++++++++++++++++++----------
>   arch/arm64/kvm/hyp/nvhe/switch.c   |  2 +-
>   3 files changed, 24 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 1532ad2b2ec2..d527c77977dd 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -768,6 +768,7 @@ struct kvm_host_data {
>   		struct kvm_guest_debug_arch regs;
>   		/* Statistical profiling extension */
>   		u64 pmscr_el1;
> +		u64 pmblimitr_el1;
>   		/* Self-hosted trace */
>   		u64 trfcr_el1;
>   		u64 trblimitr_el1;
> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> index 3dbdee1148d3..75158a9cd06a 100644
> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> @@ -14,20 +14,20 @@
>   #include <asm/kvm_hyp.h>
>   #include <asm/kvm_mmu.h>
>   
> -static void __debug_save_spe(u64 *pmscr_el1)
> +static void __debug_save_spe(void)
>   {
> -	u64 reg;
> +	u64 *pmscr_el1, *pmblimitr_el1;
>   
> -	/* Clear pmscr in case of early return */
> -	*pmscr_el1 = 0;
> +	pmscr_el1 = host_data_ptr(host_debug_state.pmscr_el1);
> +	pmblimitr_el1 = host_data_ptr(host_debug_state.pmblimitr_el1);
>   
>   	/*
>   	 * At this point, we know that this CPU implements
>   	 * SPE and is available to the host.
>   	 * Check if the host is actually using it ?
>   	 */
> -	reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
> -	if (!(reg & BIT(PMBLIMITR_EL1_E_SHIFT)))
> +	*pmblimitr_el1 = read_sysreg_s(SYS_PMBLIMITR_EL1);
> +	if (!(*pmblimitr_el1 & BIT(PMBLIMITR_EL1_E_SHIFT)))
>   		return;
>   
>   	/* Yes; save the control register and disable data generation */
> @@ -37,18 +37,29 @@ static void __debug_save_spe(u64 *pmscr_el1)
>   
>   	/* Now drain all buffered data to memory */
>   	psb_csync();
> +	dsb(nsh);
> +
> +	/* And disable the profiling buffer */
> +	write_sysreg_s(0, SYS_PMBLIMITR_EL1);
> +	isb();
>   }
>   
> -static void __debug_restore_spe(u64 pmscr_el1)
> +static void __debug_restore_spe(void)
>   {
> -	if (!pmscr_el1)
> +	u64 pmblimitr_el1 = *host_data_ptr(host_debug_state.pmblimitr_el1);
> +
> +	if (!(pmblimitr_el1 & BIT(PMBLIMITR_EL1_E_SHIFT)))
>   		return;
>   
>   	/* The host page table is installed, but not yet synchronised */
>   	isb();
>   

minor nit: This seems buried deep down in a helper (with no context of 
what else could have happened since the host context has been restored)
and for now it looks correct,  but is prone to inadvertent changes
causing issues or making this obsolete. With the isb() following LIMITR, 
wouldn't that be  sufficient ?


Otherwise, looks good to me

Suzuki


> +	/* Re-enable the profiling buffer. */
> +	write_sysreg_s(pmblimitr_el1, SYS_PMBLIMITR_EL1);
> +	isb();
> +
>   	/* Re-enable data generation */
> -	write_sysreg_el1(pmscr_el1, SYS_PMSCR);
> +	write_sysreg_el1(*host_data_ptr(host_debug_state.pmscr_el1), SYS_PMSCR);
>   }
>   
>   static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> @@ -177,7 +188,7 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>   {
>   	/* Disable and flush SPE data generation */
>   	if (host_data_test_flag(HAS_SPE))
> -		__debug_save_spe(host_data_ptr(host_debug_state.pmscr_el1));
> +		__debug_save_spe();
>   
>   	/* Disable BRBE branch records */
>   	if (host_data_test_flag(HAS_BRBE))
> @@ -195,7 +206,7 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
>   void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>   {
>   	if (host_data_test_flag(HAS_SPE))
> -		__debug_restore_spe(*host_data_ptr(host_debug_state.pmscr_el1));
> +		__debug_restore_spe();
>   	if (host_data_test_flag(HAS_BRBE))
>   		__debug_restore_brbe(*host_data_ptr(host_debug_state.brbcr_el1));
>   	if (__trace_needs_switch())
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index f00688e69d88..9b6e87dac3b9 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -278,7 +278,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>   	 * We're about to restore some new MMU state. Make sure
>   	 * ongoing page-table walks that have started before we
>   	 * trapped to EL2 have completed. This also synchronises the
> -	 * above disabling of BRBE and SPE.
> +	 * above disabling of BRBE.
>   	 *
>   	 * See DDI0487I.a D8.1.5 "Out-of-context translation regimes",
>   	 * rule R_LFHQG and subsequent information statements.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/3] KVM: arm64: Disable SPE Profiling Buffer when running in guest context
  2026-03-03  9:48   ` Suzuki K Poulose
@ 2026-03-03 14:39     ` Will Deacon
  2026-03-03 15:01       ` Suzuki K Poulose
  0 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2026-03-03 14:39 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: kvmarm, mark.rutland, linux-arm-kernel, Marc Zyngier,
	Oliver Upton, James Clark, Leo Yan, Fuad Tabba, Alexandru Elisei,
	Yabin Cui

On Tue, Mar 03, 2026 at 09:48:06AM +0000, Suzuki K Poulose wrote:
> On 27/02/2026 21:21, Will Deacon wrote:
> > diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > index 3dbdee1148d3..75158a9cd06a 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > @@ -14,20 +14,20 @@
> >   #include <asm/kvm_hyp.h>
> >   #include <asm/kvm_mmu.h>
> > -static void __debug_save_spe(u64 *pmscr_el1)
> > +static void __debug_save_spe(void)
> >   {
> > -	u64 reg;
> > +	u64 *pmscr_el1, *pmblimitr_el1;
> > -	/* Clear pmscr in case of early return */
> > -	*pmscr_el1 = 0;
> > +	pmscr_el1 = host_data_ptr(host_debug_state.pmscr_el1);
> > +	pmblimitr_el1 = host_data_ptr(host_debug_state.pmblimitr_el1);
> >   	/*
> >   	 * At this point, we know that this CPU implements
> >   	 * SPE and is available to the host.
> >   	 * Check if the host is actually using it ?
> >   	 */
> > -	reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
> > -	if (!(reg & BIT(PMBLIMITR_EL1_E_SHIFT)))
> > +	*pmblimitr_el1 = read_sysreg_s(SYS_PMBLIMITR_EL1);
> > +	if (!(*pmblimitr_el1 & BIT(PMBLIMITR_EL1_E_SHIFT)))
> >   		return;
> >   	/* Yes; save the control register and disable data generation */
> > @@ -37,18 +37,29 @@ static void __debug_save_spe(u64 *pmscr_el1)
> >   	/* Now drain all buffered data to memory */
> >   	psb_csync();
> > +	dsb(nsh);
> > +
> > +	/* And disable the profiling buffer */
> > +	write_sysreg_s(0, SYS_PMBLIMITR_EL1);
> > +	isb();
> >   }
> > -static void __debug_restore_spe(u64 pmscr_el1)
> > +static void __debug_restore_spe(void)
> >   {
> > -	if (!pmscr_el1)
> > +	u64 pmblimitr_el1 = *host_data_ptr(host_debug_state.pmblimitr_el1);
> > +
> > +	if (!(pmblimitr_el1 & BIT(PMBLIMITR_EL1_E_SHIFT)))
> >   		return;
> >   	/* The host page table is installed, but not yet synchronised */
> >   	isb();
> 
> minor nit: This seems buried deep down in a helper (with no context of what
> else could have happened since the host context has been restored)
> and for now it looks correct,  but is prone to inadvertent changes
> causing issues or making this obsolete. With the isb() following LIMITR,
> wouldn't that be  sufficient ?

I'm just inherting this from the existing upstream code -- see the isb()
in the existing implementation of __debug_restore_spe().

The isb() is needed to ensure that SPE can't start making out-of-context
translation table walks (which can occur once PMBLIMITR_EL1.E is set)
before the stage-2 MMU is restored back to the host configuration (e.g.
by clearing HCR_EL2.VM for nVHE or by restoring VTCR and VTTBR for
pKVM). We want to predicate it on SPE being enabled, otherwise it's
unconditional overhead, so I don't think we can move it.

Will


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/3] KVM: arm64: Disable SPE Profiling Buffer when running in guest context
  2026-03-03 14:39     ` Will Deacon
@ 2026-03-03 15:01       ` Suzuki K Poulose
  0 siblings, 0 replies; 14+ messages in thread
From: Suzuki K Poulose @ 2026-03-03 15:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, mark.rutland, linux-arm-kernel, Marc Zyngier,
	Oliver Upton, James Clark, Leo Yan, Fuad Tabba, Alexandru Elisei,
	Yabin Cui

On 03/03/2026 14:39, Will Deacon wrote:
> On Tue, Mar 03, 2026 at 09:48:06AM +0000, Suzuki K Poulose wrote:
>> On 27/02/2026 21:21, Will Deacon wrote:
>>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>> index 3dbdee1148d3..75158a9cd06a 100644
>>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>> @@ -14,20 +14,20 @@
>>>    #include <asm/kvm_hyp.h>
>>>    #include <asm/kvm_mmu.h>
>>> -static void __debug_save_spe(u64 *pmscr_el1)
>>> +static void __debug_save_spe(void)
>>>    {
>>> -	u64 reg;
>>> +	u64 *pmscr_el1, *pmblimitr_el1;
>>> -	/* Clear pmscr in case of early return */
>>> -	*pmscr_el1 = 0;
>>> +	pmscr_el1 = host_data_ptr(host_debug_state.pmscr_el1);
>>> +	pmblimitr_el1 = host_data_ptr(host_debug_state.pmblimitr_el1);
>>>    	/*
>>>    	 * At this point, we know that this CPU implements
>>>    	 * SPE and is available to the host.
>>>    	 * Check if the host is actually using it ?
>>>    	 */
>>> -	reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
>>> -	if (!(reg & BIT(PMBLIMITR_EL1_E_SHIFT)))
>>> +	*pmblimitr_el1 = read_sysreg_s(SYS_PMBLIMITR_EL1);
>>> +	if (!(*pmblimitr_el1 & BIT(PMBLIMITR_EL1_E_SHIFT)))
>>>    		return;
>>>    	/* Yes; save the control register and disable data generation */
>>> @@ -37,18 +37,29 @@ static void __debug_save_spe(u64 *pmscr_el1)
>>>    	/* Now drain all buffered data to memory */
>>>    	psb_csync();
>>> +	dsb(nsh);
>>> +
>>> +	/* And disable the profiling buffer */
>>> +	write_sysreg_s(0, SYS_PMBLIMITR_EL1);
>>> +	isb();
>>>    }
>>> -static void __debug_restore_spe(u64 pmscr_el1)
>>> +static void __debug_restore_spe(void)
>>>    {
>>> -	if (!pmscr_el1)
>>> +	u64 pmblimitr_el1 = *host_data_ptr(host_debug_state.pmblimitr_el1);
>>> +
>>> +	if (!(pmblimitr_el1 & BIT(PMBLIMITR_EL1_E_SHIFT)))
>>>    		return;
>>>    	/* The host page table is installed, but not yet synchronised */
>>>    	isb();
>>
>> minor nit: This seems buried deep down in a helper (with no context of what
>> else could have happened since the host context has been restored)
>> and for now it looks correct,  but is prone to inadvertent changes
>> causing issues or making this obsolete. With the isb() following LIMITR,
>> wouldn't that be  sufficient ?
> 
> I'm just inherting this from the existing upstream code -- see the isb()
> in the existing implementation of __debug_restore_spe().

Of course, it was. I should have added "not from your change".


Now we have:

__kvm_vcpu_run () {

   ..guest_exit..

//load_host_stage2() if required -> pkvm.

__sysreg_restore_state_nvhe() -> __sysreg_restore_el1_state () #- we 
restore "host MMU context" here

  ...
__debug_restore_host_buffers_nvhe() -> __debug_restore_spe(); #issue isb 
here to sync the  host context

}

and the isb() is placed in the spe restore code. All of this is correct
and I do understand why it is there. My comment was, this is not easily
relatable and prone to errors with inadvertent changes (e.g., moving the
spe around etc). But I agree that we can avoid the unconditional
overhead over the code readability.

Cheers
Suzuki

> 
> The isb() is needed to ensure that SPE can't start making out-of-context
> translation table walks (which can occur once PMBLIMITR_EL1.E is set)
> before the stage-2 MMU is restored back to the host configuration (e.g.
> by clearing HCR_EL2.VM for nVHE or by restoring VTCR and VTTBR for
> pKVM). We want to predicate it on SPE being enabled, otherwise it's
> unconditional overhead, so I don't think we can move it.
> 
> Will



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
  2026-02-27 21:21 ` [PATCH v2 1/3] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context Will Deacon
  2026-03-03  9:23   ` Suzuki K Poulose
@ 2026-03-03 17:39   ` Leo Yan
  2026-03-25 19:27   ` Fuad Tabba
  2 siblings, 0 replies; 14+ messages in thread
From: Leo Yan @ 2026-03-03 17:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, mark.rutland, linux-arm-kernel, Marc Zyngier,
	Oliver Upton, James Clark, Suzuki K Poulose, Fuad Tabba,
	Alexandru Elisei, Yabin Cui

On Fri, Feb 27, 2026 at 09:21:33PM +0000, Will Deacon wrote:
> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> generation in guest context when self-hosted TRBE is in use by the host.
> 
> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> per R_YCHKJ the Trace Buffer Unit will still be enabled if
> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> Trace Buffer Unit can perform address translation for the "owning
> exception level" even when it is out of context.
> 
> Consequently, we can end up in a state where TRBE performs speculative
> page-table walks for a host VA/IPA in guest/hypervisor context depending
> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
> potential result appears to be a heady mixture of SErrors, data
> corruption and hardware lockups.
> 
> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
> draining the buffer, restoring the register on return to the host. This
> unfortunately means we need to tackle CPU errata #2064142 and #2038923
> which add additional synchronisation requirements around manipulations
> of the limit register. Hopefully this doesn't need to be fast.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Oliver Upton <oupton@kernel.org>
> Cc: James Clark <james.clark@linaro.org>
> Cc: Leo Yan <leo.yan@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Fuad Tabba <tabba@google.com>
> Cc: Alexandru Elisei <alexandru.elisei@arm.com>
> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
> Signed-off-by: Will Deacon <will@kernel.org>

I tested this on my Orion6 board in nVHE mode (kvm-arm.mode=nvhe).

I launched a VM with several threads running sleep 0.1 in a loop inside
the VM shell.  Then, I collected TRBE trace data on the host side:

  $ perf record -e cs_etm// -a -- sleep 100
  [ perf record: Woken up 74 times to write data ]
  Warning:
  Processed 4798137 events and lost 4 chunks!

  Check IO/CPU overload!

  Warning:
  Processed 9608 samples and lost 100.00%!

  Failed to open /proc/schedstat
  [ perf record: Captured and wrote 42401.333 MB perf.data ]

Tested-by: Leo Yan <leo.yan@arm.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/3] KVM: arm64: Disable SPE Profiling Buffer when running in guest context
  2026-02-27 21:21 ` [PATCH v2 2/3] KVM: arm64: Disable SPE Profiling Buffer " Will Deacon
  2026-03-03  9:48   ` Suzuki K Poulose
@ 2026-03-25 16:34   ` Alexandru Elisei
  2026-03-25 19:28   ` Fuad Tabba
  2 siblings, 0 replies; 14+ messages in thread
From: Alexandru Elisei @ 2026-03-25 16:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, mark.rutland, linux-arm-kernel, Marc Zyngier,
	Oliver Upton, James Clark, Leo Yan, Suzuki K Poulose, Fuad Tabba,
	Yabin Cui

Hi Will,

The patch looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Also tested the series for about 24h on an Orion O6 board with SPE enabled by
profiling two host processes spread across all CPUs with SPE, and at the same
time profiling a virtual machine with the VCPUs simiarly spread across the CPUs
with SPE:

Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

On Fri, Feb 27, 2026 at 09:21:34PM +0000, Will Deacon wrote:
> The nVHE world-switch code relies on zeroing PMSCR_EL1 to disable
> profiling data generation in guest context when SPE is in use by the
> host.
> 
> Unfortunately, this may leave PMBLIMITR_EL1.E set and consequently we
> can end up running in guest/hypervisor context with the Profiling Buffer
> enabled. The current "known issues" document for Rev M.a of the Arm ARM
> states that this can lead to speculative, out-of-context translations:
> 
>   | 2.18 D23136:
>   |
>   | When the Profiling Buffer is enabled, profiling is not stopped, and
>   | Discard mode is not enabled, the Statistical Profiling Unit might
>   | cause speculative translations for the owning translation regime,
>   | including when the owning translation regime is out-of-context.
> 
> In a similar fashion to TRBE, ensure that the Profiling Buffer is
> disabled during the nVHE world switch before we start messing with the
> stage-2 MMU and trap configuration.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Oliver Upton <oupton@kernel.org>
> Cc: James Clark <james.clark@linaro.org>
> Cc: Leo Yan <leo.yan@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Fuad Tabba <tabba@google.com>
> Cc: Alexandru Elisei <alexandru.elisei@arm.com>
> Fixes: f85279b4bd48 ("arm64: KVM: Save/restore the host SPE state when entering/leaving a VM")
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h  |  1 +
>  arch/arm64/kvm/hyp/nvhe/debug-sr.c | 33 ++++++++++++++++++++----------
>  arch/arm64/kvm/hyp/nvhe/switch.c   |  2 +-
>  3 files changed, 24 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 1532ad2b2ec2..d527c77977dd 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -768,6 +768,7 @@ struct kvm_host_data {
>  		struct kvm_guest_debug_arch regs;
>  		/* Statistical profiling extension */
>  		u64 pmscr_el1;
> +		u64 pmblimitr_el1;
>  		/* Self-hosted trace */
>  		u64 trfcr_el1;
>  		u64 trblimitr_el1;
> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> index 3dbdee1148d3..75158a9cd06a 100644
> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> @@ -14,20 +14,20 @@
>  #include <asm/kvm_hyp.h>
>  #include <asm/kvm_mmu.h>
>  
> -static void __debug_save_spe(u64 *pmscr_el1)
> +static void __debug_save_spe(void)
>  {
> -	u64 reg;
> +	u64 *pmscr_el1, *pmblimitr_el1;
>  
> -	/* Clear pmscr in case of early return */
> -	*pmscr_el1 = 0;
> +	pmscr_el1 = host_data_ptr(host_debug_state.pmscr_el1);
> +	pmblimitr_el1 = host_data_ptr(host_debug_state.pmblimitr_el1);
>  
>  	/*
>  	 * At this point, we know that this CPU implements
>  	 * SPE and is available to the host.
>  	 * Check if the host is actually using it ?
>  	 */
> -	reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
> -	if (!(reg & BIT(PMBLIMITR_EL1_E_SHIFT)))
> +	*pmblimitr_el1 = read_sysreg_s(SYS_PMBLIMITR_EL1);
> +	if (!(*pmblimitr_el1 & BIT(PMBLIMITR_EL1_E_SHIFT)))
>  		return;
>  
>  	/* Yes; save the control register and disable data generation */
> @@ -37,18 +37,29 @@ static void __debug_save_spe(u64 *pmscr_el1)
>  
>  	/* Now drain all buffered data to memory */
>  	psb_csync();
> +	dsb(nsh);
> +
> +	/* And disable the profiling buffer */
> +	write_sysreg_s(0, SYS_PMBLIMITR_EL1);
> +	isb();
>  }
>  
> -static void __debug_restore_spe(u64 pmscr_el1)
> +static void __debug_restore_spe(void)
>  {
> -	if (!pmscr_el1)
> +	u64 pmblimitr_el1 = *host_data_ptr(host_debug_state.pmblimitr_el1);
> +
> +	if (!(pmblimitr_el1 & BIT(PMBLIMITR_EL1_E_SHIFT)))
>  		return;
>  
>  	/* The host page table is installed, but not yet synchronised */
>  	isb();
>  
> +	/* Re-enable the profiling buffer. */
> +	write_sysreg_s(pmblimitr_el1, SYS_PMBLIMITR_EL1);
> +	isb();
> +
>  	/* Re-enable data generation */
> -	write_sysreg_el1(pmscr_el1, SYS_PMSCR);
> +	write_sysreg_el1(*host_data_ptr(host_debug_state.pmscr_el1), SYS_PMSCR);
>  }
>  
>  static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> @@ -177,7 +188,7 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>  {
>  	/* Disable and flush SPE data generation */
>  	if (host_data_test_flag(HAS_SPE))
> -		__debug_save_spe(host_data_ptr(host_debug_state.pmscr_el1));
> +		__debug_save_spe();
>  
>  	/* Disable BRBE branch records */
>  	if (host_data_test_flag(HAS_BRBE))
> @@ -195,7 +206,7 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
>  void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>  {
>  	if (host_data_test_flag(HAS_SPE))
> -		__debug_restore_spe(*host_data_ptr(host_debug_state.pmscr_el1));
> +		__debug_restore_spe();
>  	if (host_data_test_flag(HAS_BRBE))
>  		__debug_restore_brbe(*host_data_ptr(host_debug_state.brbcr_el1));
>  	if (__trace_needs_switch())
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index f00688e69d88..9b6e87dac3b9 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -278,7 +278,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  	 * We're about to restore some new MMU state. Make sure
>  	 * ongoing page-table walks that have started before we
>  	 * trapped to EL2 have completed. This also synchronises the
> -	 * above disabling of BRBE and SPE.
> +	 * above disabling of BRBE.
>  	 *
>  	 * See DDI0487I.a D8.1.5 "Out-of-context translation regimes",
>  	 * rule R_LFHQG and subsequent information statements.
> -- 
> 2.53.0.473.g4a7958ca14-goog
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
  2026-02-27 21:21 ` [PATCH v2 1/3] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context Will Deacon
  2026-03-03  9:23   ` Suzuki K Poulose
  2026-03-03 17:39   ` Leo Yan
@ 2026-03-25 19:27   ` Fuad Tabba
  2026-03-26 12:49     ` Will Deacon
  2 siblings, 1 reply; 14+ messages in thread
From: Fuad Tabba @ 2026-03-25 19:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, mark.rutland, linux-arm-kernel, Marc Zyngier,
	Oliver Upton, James Clark, Leo Yan, Suzuki K Poulose,
	Alexandru Elisei, Yabin Cui

Hi Will,

Now that I have backported this to various Android versions....

On Fri, 27 Feb 2026 at 21:22, Will Deacon <will@kernel.org> wrote:
>
> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> generation in guest context when self-hosted TRBE is in use by the host.
>
> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> per R_YCHKJ the Trace Buffer Unit will still be enabled if
> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> Trace Buffer Unit can perform address translation for the "owning
> exception level" even when it is out of context.
>
> Consequently, we can end up in a state where TRBE performs speculative
> page-table walks for a host VA/IPA in guest/hypervisor context depending
> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
> potential result appears to be a heady mixture of SErrors, data
> corruption and hardware lockups.
>
> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
> draining the buffer, restoring the register on return to the host. This
> unfortunately means we need to tackle CPU errata #2064142 and #2038923
> which add additional synchronisation requirements around manipulations
> of the limit register. Hopefully this doesn't need to be fast.
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Oliver Upton <oupton@kernel.org>
> Cc: James Clark <james.clark@linaro.org>
> Cc: Leo Yan <leo.yan@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Fuad Tabba <tabba@google.com>
> Cc: Alexandru Elisei <alexandru.elisei@arm.com>
> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h  |  1 +
>  arch/arm64/kvm/hyp/nvhe/debug-sr.c | 73 ++++++++++++++++++++++++++----
>  arch/arm64/kvm/hyp/nvhe/switch.c   |  2 +-
>  3 files changed, 66 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 5d5a3bbdb95e..1532ad2b2ec2 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -770,6 +770,7 @@ struct kvm_host_data {
>                 u64 pmscr_el1;
>                 /* Self-hosted trace */
>                 u64 trfcr_el1;
> +               u64 trblimitr_el1;
>                 /* Values of trap registers for the host before guest entry. */
>                 u64 mdcr_el2;
>                 u64 brbcr_el1;
> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> index 2a1c0f49792b..3dbdee1148d3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> @@ -57,12 +57,56 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
>         write_sysreg_el1(new_trfcr, SYS_TRFCR);
>  }
>
> -static bool __trace_needs_drain(void)
> +static void __trace_drain_and_disable(void)
>  {
> -       if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
> -               return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
> +       u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
>
> -       return host_data_test_flag(TRBE_ENABLED);
> +       *trblimitr_el1 = 0;
> +
> +       if (is_protected_kvm_enabled()) {
> +               if (!host_data_test_flag(HAS_TRBE))
> +                       return;
> +       } else {
> +               if (!host_data_test_flag(TRBE_ENABLED))
> +                       return;
> +       }

Can we simplify this? e.g.,

+      bool needs_drain = is_protected_kvm_enabled() ?
host_data_test_flag(HAS_TRBE) : host_data_test_flag(TRBE_ENABLED);
....

That said:

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> +
> +       *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
> +       if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
> +               /*
> +                * The host has enabled the Trace Buffer Unit so we have
> +                * to beat the CPU with a stick until it stops accessing
> +                * memory.
> +                */
> +
> +               /* First, ensure that our prior write to TRFCR has stuck. */
> +               isb();
> +
> +               /* Now synchronise with the trace and drain the buffer. */
> +               tsb_csync();
> +               dsb(nsh);
> +
> +               /*
> +                * With no more trace being generated, we can disable the
> +                * Trace Buffer Unit.
> +                */
> +               write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> +               if (cpus_have_final_cap(ARM64_WORKAROUND_2064142)) {
> +                       /*
> +                        * Some CPUs are so good, we have to drain 'em
> +                        * twice.
> +                        */
> +                       tsb_csync();
> +                       dsb(nsh);
> +               }
> +
> +               /*
> +                * Ensure that the Trace Buffer Unit is disabled before
> +                * we start mucking with the stage-2 and trap
> +                * configuration.
> +                */
> +               isb();
> +       }
>  }
>
>  static bool __trace_needs_switch(void)
> @@ -79,15 +123,26 @@ static void __trace_switch_to_guest(void)
>
>         __trace_do_switch(host_data_ptr(host_debug_state.trfcr_el1),
>                           *host_data_ptr(trfcr_while_in_guest));
> -
> -       if (__trace_needs_drain()) {
> -               isb();
> -               tsb_csync();
> -       }
> +       __trace_drain_and_disable();
>  }
>
>  static void __trace_switch_to_host(void)
>  {
> +       u64 trblimitr_el1 = *host_data_ptr(host_debug_state.trblimitr_el1);
> +
> +       if (trblimitr_el1 & TRBLIMITR_EL1_E) {
> +               /* Re-enable the Trace Buffer Unit for the host. */
> +               write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
> +               isb();
> +               if (cpus_have_final_cap(ARM64_WORKAROUND_2038923)) {
> +                       /*
> +                        * Make sure the unit is re-enabled before we
> +                        * poke TRFCR.
> +                        */
> +                       isb();
> +               }
> +       }
> +
>         __trace_do_switch(host_data_ptr(trfcr_while_in_guest),
>                           *host_data_ptr(host_debug_state.trfcr_el1));
>  }
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index 779089e42681..f00688e69d88 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -278,7 +278,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>          * We're about to restore some new MMU state. Make sure
>          * ongoing page-table walks that have started before we
>          * trapped to EL2 have completed. This also synchronises the
> -        * above disabling of BRBE, SPE and TRBE.
> +        * above disabling of BRBE and SPE.
>          *
>          * See DDI0487I.a D8.1.5 "Out-of-context translation regimes",
>          * rule R_LFHQG and subsequent information statements.
> --
> 2.53.0.473.g4a7958ca14-goog
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/3] KVM: arm64: Disable SPE Profiling Buffer when running in guest context
  2026-02-27 21:21 ` [PATCH v2 2/3] KVM: arm64: Disable SPE Profiling Buffer " Will Deacon
  2026-03-03  9:48   ` Suzuki K Poulose
  2026-03-25 16:34   ` Alexandru Elisei
@ 2026-03-25 19:28   ` Fuad Tabba
  2 siblings, 0 replies; 14+ messages in thread
From: Fuad Tabba @ 2026-03-25 19:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, mark.rutland, linux-arm-kernel, Marc Zyngier,
	Oliver Upton, James Clark, Leo Yan, Suzuki K Poulose,
	Alexandru Elisei, Yabin Cui

On Fri, 27 Feb 2026 at 21:22, Will Deacon <will@kernel.org> wrote:
>
> The nVHE world-switch code relies on zeroing PMSCR_EL1 to disable
> profiling data generation in guest context when SPE is in use by the
> host.
>
> Unfortunately, this may leave PMBLIMITR_EL1.E set and consequently we
> can end up running in guest/hypervisor context with the Profiling Buffer
> enabled. The current "known issues" document for Rev M.a of the Arm ARM
> states that this can lead to speculative, out-of-context translations:
>
>   | 2.18 D23136:
>   |
>   | When the Profiling Buffer is enabled, profiling is not stopped, and
>   | Discard mode is not enabled, the Statistical Profiling Unit might
>   | cause speculative translations for the owning translation regime,
>   | including when the owning translation regime is out-of-context.
>
> In a similar fashion to TRBE, ensure that the Profiling Buffer is
> disabled during the nVHE world switch before we start messing with the
> stage-2 MMU and trap configuration.
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Oliver Upton <oupton@kernel.org>
> Cc: James Clark <james.clark@linaro.org>
> Cc: Leo Yan <leo.yan@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Fuad Tabba <tabba@google.com>
> Cc: Alexandru Elisei <alexandru.elisei@arm.com>
> Fixes: f85279b4bd48 ("arm64: KVM: Save/restore the host SPE state when entering/leaving a VM")
> Signed-off-by: Will Deacon <will@kernel.org>

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/arm64/include/asm/kvm_host.h  |  1 +
>  arch/arm64/kvm/hyp/nvhe/debug-sr.c | 33 ++++++++++++++++++++----------
>  arch/arm64/kvm/hyp/nvhe/switch.c   |  2 +-
>  3 files changed, 24 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 1532ad2b2ec2..d527c77977dd 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -768,6 +768,7 @@ struct kvm_host_data {
>                 struct kvm_guest_debug_arch regs;
>                 /* Statistical profiling extension */
>                 u64 pmscr_el1;
> +               u64 pmblimitr_el1;
>                 /* Self-hosted trace */
>                 u64 trfcr_el1;
>                 u64 trblimitr_el1;
> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> index 3dbdee1148d3..75158a9cd06a 100644
> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> @@ -14,20 +14,20 @@
>  #include <asm/kvm_hyp.h>
>  #include <asm/kvm_mmu.h>
>
> -static void __debug_save_spe(u64 *pmscr_el1)
> +static void __debug_save_spe(void)
>  {
> -       u64 reg;
> +       u64 *pmscr_el1, *pmblimitr_el1;
>
> -       /* Clear pmscr in case of early return */
> -       *pmscr_el1 = 0;
> +       pmscr_el1 = host_data_ptr(host_debug_state.pmscr_el1);
> +       pmblimitr_el1 = host_data_ptr(host_debug_state.pmblimitr_el1);
>
>         /*
>          * At this point, we know that this CPU implements
>          * SPE and is available to the host.
>          * Check if the host is actually using it ?
>          */
> -       reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
> -       if (!(reg & BIT(PMBLIMITR_EL1_E_SHIFT)))
> +       *pmblimitr_el1 = read_sysreg_s(SYS_PMBLIMITR_EL1);
> +       if (!(*pmblimitr_el1 & BIT(PMBLIMITR_EL1_E_SHIFT)))
>                 return;
>
>         /* Yes; save the control register and disable data generation */
> @@ -37,18 +37,29 @@ static void __debug_save_spe(u64 *pmscr_el1)
>
>         /* Now drain all buffered data to memory */
>         psb_csync();
> +       dsb(nsh);
> +
> +       /* And disable the profiling buffer */
> +       write_sysreg_s(0, SYS_PMBLIMITR_EL1);
> +       isb();
>  }
>
> -static void __debug_restore_spe(u64 pmscr_el1)
> +static void __debug_restore_spe(void)
>  {
> -       if (!pmscr_el1)
> +       u64 pmblimitr_el1 = *host_data_ptr(host_debug_state.pmblimitr_el1);
> +
> +       if (!(pmblimitr_el1 & BIT(PMBLIMITR_EL1_E_SHIFT)))
>                 return;
>
>         /* The host page table is installed, but not yet synchronised */
>         isb();
>
> +       /* Re-enable the profiling buffer. */
> +       write_sysreg_s(pmblimitr_el1, SYS_PMBLIMITR_EL1);
> +       isb();
> +
>         /* Re-enable data generation */
> -       write_sysreg_el1(pmscr_el1, SYS_PMSCR);
> +       write_sysreg_el1(*host_data_ptr(host_debug_state.pmscr_el1), SYS_PMSCR);
>  }
>
>  static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> @@ -177,7 +188,7 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>  {
>         /* Disable and flush SPE data generation */
>         if (host_data_test_flag(HAS_SPE))
> -               __debug_save_spe(host_data_ptr(host_debug_state.pmscr_el1));
> +               __debug_save_spe();
>
>         /* Disable BRBE branch records */
>         if (host_data_test_flag(HAS_BRBE))
> @@ -195,7 +206,7 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
>  void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>  {
>         if (host_data_test_flag(HAS_SPE))
> -               __debug_restore_spe(*host_data_ptr(host_debug_state.pmscr_el1));
> +               __debug_restore_spe();
>         if (host_data_test_flag(HAS_BRBE))
>                 __debug_restore_brbe(*host_data_ptr(host_debug_state.brbcr_el1));
>         if (__trace_needs_switch())
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index f00688e69d88..9b6e87dac3b9 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -278,7 +278,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>          * We're about to restore some new MMU state. Make sure
>          * ongoing page-table walks that have started before we
>          * trapped to EL2 have completed. This also synchronises the
> -        * above disabling of BRBE and SPE.
> +        * above disabling of BRBE.
>          *
>          * See DDI0487I.a D8.1.5 "Out-of-context translation regimes",
>          * rule R_LFHQG and subsequent information statements.
> --
> 2.53.0.473.g4a7958ca14-goog
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/3] KVM: arm64: Don't pass host_debug_state to BRBE world-switch routines
  2026-02-27 21:21 ` [PATCH v2 3/3] KVM: arm64: Don't pass host_debug_state to BRBE world-switch routines Will Deacon
@ 2026-03-25 19:28   ` Fuad Tabba
  0 siblings, 0 replies; 14+ messages in thread
From: Fuad Tabba @ 2026-03-25 19:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, mark.rutland, linux-arm-kernel, Marc Zyngier,
	Oliver Upton, James Clark, Leo Yan, Suzuki K Poulose,
	Alexandru Elisei, Yabin Cui

On Fri, 27 Feb 2026 at 21:22, Will Deacon <will@kernel.org> wrote:
>
> Now that the SPE and BRBE nVHE world-switch routines operate on the
> host_debug_state directly, tweak the BRBE code to do the same for
> consistency.
>
> This is purely cosmetic.
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Oliver Upton <oupton@kernel.org>
> Cc: James Clark <james.clark@linaro.org>
> Cc: Leo Yan <leo.yan@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Fuad Tabba <tabba@google.com>
> Cc: Alexandru Elisei <alexandru.elisei@arm.com>
> Signed-off-by: Will Deacon <will@kernel.org>

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/arm64/kvm/hyp/nvhe/debug-sr.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> index 75158a9cd06a..06ae122a7761 100644
> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> @@ -158,9 +158,9 @@ static void __trace_switch_to_host(void)
>                           *host_data_ptr(host_debug_state.trfcr_el1));
>  }
>
> -static void __debug_save_brbe(u64 *brbcr_el1)
> +static void __debug_save_brbe(void)
>  {
> -       *brbcr_el1 = 0;
> +       u64 *brbcr_el1 = host_data_ptr(host_debug_state.brbcr_el1);
>
>         /* Check if the BRBE is enabled */
>         if (!(read_sysreg_el1(SYS_BRBCR) & (BRBCR_ELx_E0BRE | BRBCR_ELx_ExBRE)))
> @@ -175,8 +175,10 @@ static void __debug_save_brbe(u64 *brbcr_el1)
>         write_sysreg_el1(0, SYS_BRBCR);
>  }
>
> -static void __debug_restore_brbe(u64 brbcr_el1)
> +static void __debug_restore_brbe(void)
>  {
> +       u64 brbcr_el1 = *host_data_ptr(host_debug_state.brbcr_el1);
> +
>         if (!brbcr_el1)
>                 return;
>
> @@ -192,7 +194,7 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>
>         /* Disable BRBE branch records */
>         if (host_data_test_flag(HAS_BRBE))
> -               __debug_save_brbe(host_data_ptr(host_debug_state.brbcr_el1));
> +               __debug_save_brbe();
>
>         if (__trace_needs_switch())
>                 __trace_switch_to_guest();
> @@ -208,7 +210,7 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>         if (host_data_test_flag(HAS_SPE))
>                 __debug_restore_spe();
>         if (host_data_test_flag(HAS_BRBE))
> -               __debug_restore_brbe(*host_data_ptr(host_debug_state.brbcr_el1));
> +               __debug_restore_brbe();
>         if (__trace_needs_switch())
>                 __trace_switch_to_host();
>  }
> --
> 2.53.0.473.g4a7958ca14-goog
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
  2026-03-25 19:27   ` Fuad Tabba
@ 2026-03-26 12:49     ` Will Deacon
  0 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2026-03-26 12:49 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvmarm, mark.rutland, linux-arm-kernel, Marc Zyngier,
	Oliver Upton, James Clark, Leo Yan, Suzuki K Poulose,
	Alexandru Elisei, Yabin Cui

On Wed, Mar 25, 2026 at 07:27:32PM +0000, Fuad Tabba wrote:
> On Fri, 27 Feb 2026 at 21:22, Will Deacon <will@kernel.org> wrote:
> >
> > The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> > generation in guest context when self-hosted TRBE is in use by the host.
> >
> > Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> > TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> > per R_YCHKJ the Trace Buffer Unit will still be enabled if
> > TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> > Trace Buffer Unit can perform address translation for the "owning
> > exception level" even when it is out of context.
> >
> > Consequently, we can end up in a state where TRBE performs speculative
> > page-table walks for a host VA/IPA in guest/hypervisor context depending
> > on the value of MDCR_EL2.E2TB, which changes over world-switch. The
> > potential result appears to be a heady mixture of SErrors, data
> > corruption and hardware lockups.
> >
> > Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
> > draining the buffer, restoring the register on return to the host. This
> > unfortunately means we need to tackle CPU errata #2064142 and #2038923
> > which add additional synchronisation requirements around manipulations
> > of the limit register. Hopefully this doesn't need to be fast.
> >
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Oliver Upton <oupton@kernel.org>
> > Cc: James Clark <james.clark@linaro.org>
> > Cc: Leo Yan <leo.yan@arm.com>
> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Cc: Fuad Tabba <tabba@google.com>
> > Cc: Alexandru Elisei <alexandru.elisei@arm.com>
> > Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_host.h  |  1 +
> >  arch/arm64/kvm/hyp/nvhe/debug-sr.c | 73 ++++++++++++++++++++++++++----
> >  arch/arm64/kvm/hyp/nvhe/switch.c   |  2 +-
> >  3 files changed, 66 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 5d5a3bbdb95e..1532ad2b2ec2 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -770,6 +770,7 @@ struct kvm_host_data {
> >                 u64 pmscr_el1;
> >                 /* Self-hosted trace */
> >                 u64 trfcr_el1;
> > +               u64 trblimitr_el1;
> >                 /* Values of trap registers for the host before guest entry. */
> >                 u64 mdcr_el2;
> >                 u64 brbcr_el1;
> > diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > index 2a1c0f49792b..3dbdee1148d3 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > @@ -57,12 +57,56 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> >         write_sysreg_el1(new_trfcr, SYS_TRFCR);
> >  }
> >
> > -static bool __trace_needs_drain(void)
> > +static void __trace_drain_and_disable(void)
> >  {
> > -       if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
> > -               return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
> > +       u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
> >
> > -       return host_data_test_flag(TRBE_ENABLED);
> > +       *trblimitr_el1 = 0;
> > +
> > +       if (is_protected_kvm_enabled()) {
> > +               if (!host_data_test_flag(HAS_TRBE))
> > +                       return;
> > +       } else {
> > +               if (!host_data_test_flag(TRBE_ENABLED))
> > +                       return;
> > +       }
> 
> Can we simplify this? e.g.,
> 
> +      bool needs_drain = is_protected_kvm_enabled() ?
> host_data_test_flag(HAS_TRBE) : host_data_test_flag(TRBE_ENABLED);
> ....

Good idea. I tend to avoid 'bool's as they often make the code less
readable in my experience, but in this case it would be a lot better
than the nested conditionals I have. I'll spin a v3!

> That said:
> 
> Tested-by: Fuad Tabba <tabba@google.com>
> Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,

Will


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-03-26 12:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-27 21:21 [PATCH v2 0/3] KVM: arm64: Fix SPE and TRBE nVHE world switch Will Deacon
2026-02-27 21:21 ` [PATCH v2 1/3] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context Will Deacon
2026-03-03  9:23   ` Suzuki K Poulose
2026-03-03 17:39   ` Leo Yan
2026-03-25 19:27   ` Fuad Tabba
2026-03-26 12:49     ` Will Deacon
2026-02-27 21:21 ` [PATCH v2 2/3] KVM: arm64: Disable SPE Profiling Buffer " Will Deacon
2026-03-03  9:48   ` Suzuki K Poulose
2026-03-03 14:39     ` Will Deacon
2026-03-03 15:01       ` Suzuki K Poulose
2026-03-25 16:34   ` Alexandru Elisei
2026-03-25 19:28   ` Fuad Tabba
2026-02-27 21:21 ` [PATCH v2 3/3] KVM: arm64: Don't pass host_debug_state to BRBE world-switch routines Will Deacon
2026-03-25 19:28   ` Fuad Tabba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox