* [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
@ 2026-02-16 13:09 Will Deacon
2026-02-16 14:29 ` Marc Zyngier
2026-02-16 15:13 ` James Clark
0 siblings, 2 replies; 31+ messages in thread
From: Will Deacon @ 2026-02-16 13:09 UTC (permalink / raw)
To: kvmarm
Cc: mark.rutland, linux-arm-kernel, Will Deacon, Marc Zyngier,
Oliver Upton, James Clark, Leo Yan, Suzuki K Poulose, Fuad Tabba
The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
generation in guest context when self-hosted TRBE is in use by the host.
Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
per R_YCHKJ the Trace Buffer Unit will still be enabled if
TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
Trace Buffer Unit can perform address translation for the "owning
exception level" even when it is out of context.
Consequently, we can end up in a state where TRBE performs speculative
page-table walks for a host VA/IPA in guest/hypervisor context depending
on the value of MDCR_EL2.E2TB, which changes over world-switch. The
result appears to be a heady mixture of data corruption and hardware
lockups.
Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
draining the buffer, restoring the register on return to the host.
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
Signed-off-by: Will Deacon <will@kernel.org>
---
NOTE: This is *untested* as I don't have a TRBE-capable device that can
run upstream but I noticed this by inspection when triaging occasional
hardware lockups on systems using a 6.12-based kernel with TRBE running
at the same time as a vCPU is loaded. This code has changed quite a bit
over time, so stable backports are not entirely straightforward.
Hopefully James/Leo/Suzuki can help us test if folks agree with the
general approach taken here.
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
2 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ac7f970c7883..a932cf043b83 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -746,6 +746,7 @@ struct kvm_host_data {
u64 pmscr_el1;
/* Self-hosted trace */
u64 trfcr_el1;
+ u64 trblimitr_el1;
/* Values of trap registers for the host before guest entry. */
u64 mdcr_el2;
u64 brbcr_el1;
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 2a1c0f49792b..fd389a26bc59 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
write_sysreg_el1(new_trfcr, SYS_TRFCR);
}
-static bool __trace_needs_drain(void)
+static void __trace_drain_and_disable(void)
{
- if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
- return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
+ u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
- return host_data_test_flag(TRBE_ENABLED);
+ *trblimitr_el1 = 0;
+
+ if (is_protected_kvm_enabled()) {
+ if (!host_data_test_flag(HAS_TRBE))
+ return;
+ } else {
+ if (!host_data_test_flag(TRBE_ENABLED))
+ return;
+ }
+
+ *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
+ if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
+ isb();
+ tsb_csync();
+ write_sysreg_s(0, SYS_TRBLIMITR_EL1);
+ isb();
+ }
}
static bool __trace_needs_switch(void)
@@ -79,15 +94,18 @@ static void __trace_switch_to_guest(void)
__trace_do_switch(host_data_ptr(host_debug_state.trfcr_el1),
*host_data_ptr(trfcr_while_in_guest));
-
- if (__trace_needs_drain()) {
- isb();
- tsb_csync();
- }
+ __trace_drain_and_disable();
}
static void __trace_switch_to_host(void)
{
+ u64 trblimitr_el1 = *host_data_ptr(host_debug_state.trblimitr_el1);
+
+ if (trblimitr_el1 & TRBLIMITR_EL1_E) {
+ write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
+ isb();
+ }
+
__trace_do_switch(host_data_ptr(trfcr_while_in_guest),
*host_data_ptr(host_debug_state.trfcr_el1));
}
--
2.53.0.273.g2a3d683680-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 13:09 [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context Will Deacon
@ 2026-02-16 14:29 ` Marc Zyngier
2026-02-16 15:05 ` James Clark
` (2 more replies)
2026-02-16 15:13 ` James Clark
1 sibling, 3 replies; 31+ messages in thread
From: Marc Zyngier @ 2026-02-16 14:29 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, mark.rutland, linux-arm-kernel, Oliver Upton, James Clark,
Leo Yan, Suzuki K Poulose, Fuad Tabba
On Mon, 16 Feb 2026 13:09:59 +0000,
Will Deacon <will@kernel.org> wrote:
>
> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> generation in guest context when self-hosted TRBE is in use by the host.
>
> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> per R_YCHKJ the Trace Buffer Unit will still be enabled if
> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> Trace Buffer Unit can perform address translation for the "owning
> exception level" even when it is out of context.
Great. So TRBE violates all the principles that we hold true in the
architecture. Does SPE suffer from the same level of brokenness?
> Consequently, we can end up in a state where TRBE performs speculative
> page-table walks for a host VA/IPA in guest/hypervisor context depending
> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
> result appears to be a heady mixture of data corruption and hardware
> lockups.
>
> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
> draining the buffer, restoring the register on return to the host.
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Oliver Upton <oupton@kernel.org>
> Cc: James Clark <james.clark@linaro.org>
> Cc: Leo Yan <leo.yan@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Fuad Tabba <tabba@google.com>
> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>
> NOTE: This is *untested* as I don't have a TRBE-capable device that can
> run upstream but I noticed this by inspection when triaging occasional
> hardware lockups on systems using a 6.12-based kernel with TRBE running
> at the same time as a vCPU is loaded. This code has changed quite a bit
> over time, so stable backports are not entirely straightforward.
> Hopefully James/Leo/Suzuki can help us test if folks agree with the
> general approach taken here.
>
> arch/arm64/include/asm/kvm_host.h | 1 +
> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
> 2 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index ac7f970c7883..a932cf043b83 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -746,6 +746,7 @@ struct kvm_host_data {
> u64 pmscr_el1;
> /* Self-hosted trace */
> u64 trfcr_el1;
> + u64 trblimitr_el1;
> /* Values of trap registers for the host before guest entry. */
> u64 mdcr_el2;
> u64 brbcr_el1;
> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> index 2a1c0f49792b..fd389a26bc59 100644
> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> write_sysreg_el1(new_trfcr, SYS_TRFCR);
> }
>
> -static bool __trace_needs_drain(void)
> +static void __trace_drain_and_disable(void)
> {
> - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
> - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
> + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
>
> - return host_data_test_flag(TRBE_ENABLED);
> + *trblimitr_el1 = 0;
> +
> + if (is_protected_kvm_enabled()) {
> + if (!host_data_test_flag(HAS_TRBE))
> + return;
> + } else {
> + if (!host_data_test_flag(TRBE_ENABLED))
> + return;
> + }
> +
> + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
> + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
> + isb();
> + tsb_csync();
> + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> + isb();
> + }
Doesn't this mean we should be able to get rid of most of the TRFCR
messing about that litters the entry/exit code and leave that to VHE
only? And even then, I'm tempted to simply get rid of any sort of
guest-only tracing, given that TRBE is not capable of representing
exceptions that are synthesised by the host, making it the resulting
traces useless.
I'm still trying to get my hands on a TRBE-enabled system that has
some actual firmware tables (my O6 seems to have the HW, but no
description of the required coresight infra).
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 14:29 ` Marc Zyngier
@ 2026-02-16 15:05 ` James Clark
2026-02-16 15:51 ` Marc Zyngier
2026-02-16 18:14 ` Will Deacon
2026-02-16 15:53 ` Alexandru Elisei
2026-02-16 17:32 ` Will Deacon
2 siblings, 2 replies; 31+ messages in thread
From: James Clark @ 2026-02-16 15:05 UTC (permalink / raw)
To: Marc Zyngier, Will Deacon
Cc: kvmarm, mark.rutland, linux-arm-kernel, Oliver Upton, Leo Yan,
Suzuki K Poulose, Fuad Tabba
On 16/02/2026 2:29 pm, Marc Zyngier wrote:
> On Mon, 16 Feb 2026 13:09:59 +0000,
> Will Deacon <will@kernel.org> wrote:
>>
>> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
>> generation in guest context when self-hosted TRBE is in use by the host.
>>
>> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
>> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
>> per R_YCHKJ the Trace Buffer Unit will still be enabled if
>> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
>> Trace Buffer Unit can perform address translation for the "owning
>> exception level" even when it is out of context.
>
> Great. So TRBE violates all the principles that we hold true in the
> architecture. Does SPE suffer from the same level of brokenness?
>
>> Consequently, we can end up in a state where TRBE performs speculative
>> page-table walks for a host VA/IPA in guest/hypervisor context depending
>> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
>> result appears to be a heady mixture of data corruption and hardware
>> lockups.
>>
>> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
>> draining the buffer, restoring the register on return to the host.
>>
>> Cc: Marc Zyngier <maz@kernel.org>
>> Cc: Oliver Upton <oupton@kernel.org>
>> Cc: James Clark <james.clark@linaro.org>
>> Cc: Leo Yan <leo.yan@arm.com>
>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Cc: Fuad Tabba <tabba@google.com>
>> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
>> Signed-off-by: Will Deacon <will@kernel.org>
>> ---
>>
>> NOTE: This is *untested* as I don't have a TRBE-capable device that can
>> run upstream but I noticed this by inspection when triaging occasional
>> hardware lockups on systems using a 6.12-based kernel with TRBE running
>> at the same time as a vCPU is loaded. This code has changed quite a bit
>> over time, so stable backports are not entirely straightforward.
>> Hopefully James/Leo/Suzuki can help us test if folks agree with the
>> general approach taken here.
>>
>> arch/arm64/include/asm/kvm_host.h | 1 +
>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
>> 2 files changed, 28 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index ac7f970c7883..a932cf043b83 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -746,6 +746,7 @@ struct kvm_host_data {
>> u64 pmscr_el1;
>> /* Self-hosted trace */
>> u64 trfcr_el1;
>> + u64 trblimitr_el1;
>> /* Values of trap registers for the host before guest entry. */
>> u64 mdcr_el2;
>> u64 brbcr_el1;
>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> index 2a1c0f49792b..fd389a26bc59 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
>> write_sysreg_el1(new_trfcr, SYS_TRFCR);
>> }
>>
>> -static bool __trace_needs_drain(void)
>> +static void __trace_drain_and_disable(void)
>> {
>> - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
>> - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
>> + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
>>
>> - return host_data_test_flag(TRBE_ENABLED);
>> + *trblimitr_el1 = 0;
>> +
>> + if (is_protected_kvm_enabled()) {
>> + if (!host_data_test_flag(HAS_TRBE))
>> + return;
>> + } else {
>> + if (!host_data_test_flag(TRBE_ENABLED))
>> + return;
>> + }
>> +
>> + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
>> + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
>> + isb();
>> + tsb_csync();
>> + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
>> + isb();
The TRBE driver might do an extra drain here as a workaround. Hard to
tell if it's actually required in this case (seems like probably not)
but it might be worth doing it anyway to avoid hitting the issue.
Especially if we add guest support later where some of the affected
registers might start being used. See:
if (trbe_needs_drain_after_disable(cpudata))
trbe_drain_buffer();
>> + }
>
> Doesn't this mean we should be able to get rid of most of the TRFCR
> messing about that litters the entry/exit code and leave that to VHE
Technically you could have ETMs that and are connected to sinks other
than TRBE. Unless you somehow switch off those sinks you still need to
do the TRFCR switching stuff.
> only? And even then, I'm tempted to simply get rid of any sort of
> guest-only tracing, given that TRBE is not capable of representing
> exceptions that are synthesised by the host, making it the resulting
> traces useless.
I haven't heard of anyone tracing a guest from the host, but until we
add support for guests to be able to trace themselves it's the only way
of doing it, so it could be useful. Although all the messing around with
TRFCR before was from a request to disable guest trace rather than
enable it, as it was always on in nVHE.
>
> I'm still trying to get my hands on a TRBE-enabled system that has
> some actual firmware tables (my O6 seems to have the HW, but no
> description of the required coresight infra).
Leo has a guide, I haven't tried it myself yet but it should be working.
I'll send it to you.
James
>
> Thanks,
>
> M.
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 13:09 [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context Will Deacon
2026-02-16 14:29 ` Marc Zyngier
@ 2026-02-16 15:13 ` James Clark
2026-02-16 17:05 ` Will Deacon
1 sibling, 1 reply; 31+ messages in thread
From: James Clark @ 2026-02-16 15:13 UTC (permalink / raw)
To: Will Deacon, kvmarm
Cc: mark.rutland, linux-arm-kernel, Marc Zyngier, Oliver Upton,
Leo Yan, Suzuki K Poulose, Fuad Tabba
On 16/02/2026 1:09 pm, Will Deacon wrote:
> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> generation in guest context when self-hosted TRBE is in use by the host.
>
> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> per R_YCHKJ the Trace Buffer Unit will still be enabled if
> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> Trace Buffer Unit can perform address translation for the "owning
> exception level" even when it is out of context.
>
> Consequently, we can end up in a state where TRBE performs speculative
> page-table walks for a host VA/IPA in guest/hypervisor context depending
> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
> result appears to be a heady mixture of data corruption and hardware
> lockups.
>
> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
> draining the buffer, restoring the register on return to the host.
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Oliver Upton <oupton@kernel.org>
> Cc: James Clark <james.clark@linaro.org>
> Cc: Leo Yan <leo.yan@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Fuad Tabba <tabba@google.com>
> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>
> NOTE: This is *untested* as I don't have a TRBE-capable device that can
> run upstream but I noticed this by inspection when triaging occasional
> hardware lockups on systems using a 6.12-based kernel with TRBE running
> at the same time as a vCPU is loaded. This code has changed quite a bit
> over time, so stable backports are not entirely straightforward.
> Hopefully James/Leo/Suzuki can help us test if folks agree with the
> general approach taken here.
>
> arch/arm64/include/asm/kvm_host.h | 1 +
> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
> 2 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index ac7f970c7883..a932cf043b83 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -746,6 +746,7 @@ struct kvm_host_data {
> u64 pmscr_el1;
> /* Self-hosted trace */
> u64 trfcr_el1;
> + u64 trblimitr_el1;
> /* Values of trap registers for the host before guest entry. */
> u64 mdcr_el2;
> u64 brbcr_el1;
> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> index 2a1c0f49792b..fd389a26bc59 100644
> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> write_sysreg_el1(new_trfcr, SYS_TRFCR);
> }
>
> -static bool __trace_needs_drain(void)
> +static void __trace_drain_and_disable(void)
> {
> - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
> - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
> + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
>
> - return host_data_test_flag(TRBE_ENABLED);
> + *trblimitr_el1 = 0;
> +
> + if (is_protected_kvm_enabled()) {
> + if (!host_data_test_flag(HAS_TRBE))
> + return;
> + } else {
> + if (!host_data_test_flag(TRBE_ENABLED))
> + return;
> + }
> +
> + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
> + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
> + isb();
> + tsb_csync();
> + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> + isb();
> + }
> }
>
> static bool __trace_needs_switch(void)
> @@ -79,15 +94,18 @@ static void __trace_switch_to_guest(void)
>
> __trace_do_switch(host_data_ptr(host_debug_state.trfcr_el1),
> *host_data_ptr(trfcr_while_in_guest));
> -
> - if (__trace_needs_drain()) {
> - isb();
> - tsb_csync();
> - }
> + __trace_drain_and_disable();
> }
>
> static void __trace_switch_to_host(void)
> {
> + u64 trblimitr_el1 = *host_data_ptr(host_debug_state.trblimitr_el1);
> +
> + if (trblimitr_el1 & TRBLIMITR_EL1_E) {
> + write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
Will this restore a stale value if you do kvm_enable_trbe() then later
kvm_disable_trbe()? Looks like the read and save will be skipped unless
host_data_test_flag(TRBE_ENABLED) is true, so it will never save a
disabled value.
kvm_disable_trbe() might need to clear host_debug_state.trblimitr_el1.
> + isb();
> + }
> +
> __trace_do_switch(host_data_ptr(trfcr_while_in_guest),
> *host_data_ptr(host_debug_state.trfcr_el1));
> }
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 15:05 ` James Clark
@ 2026-02-16 15:51 ` Marc Zyngier
2026-02-16 16:10 ` James Clark
2026-02-16 18:14 ` Will Deacon
1 sibling, 1 reply; 31+ messages in thread
From: Marc Zyngier @ 2026-02-16 15:51 UTC (permalink / raw)
To: James Clark
Cc: Will Deacon, kvmarm, mark.rutland, linux-arm-kernel, Oliver Upton,
Leo Yan, Suzuki K Poulose, Fuad Tabba
On Mon, 16 Feb 2026 15:05:10 +0000,
James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 16/02/2026 2:29 pm, Marc Zyngier wrote:
> > On Mon, 16 Feb 2026 13:09:59 +0000,
> > Will Deacon <will@kernel.org> wrote:
> >>
> >> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> >> generation in guest context when self-hosted TRBE is in use by the host.
> >>
> >> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> >> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> >> per R_YCHKJ the Trace Buffer Unit will still be enabled if
> >> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> >> Trace Buffer Unit can perform address translation for the "owning
> >> exception level" even when it is out of context.
> >
> > Great. So TRBE violates all the principles that we hold true in the
> > architecture. Does SPE suffer from the same level of brokenness?
> >
> >> Consequently, we can end up in a state where TRBE performs speculative
> >> page-table walks for a host VA/IPA in guest/hypervisor context depending
> >> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
> >> result appears to be a heady mixture of data corruption and hardware
> >> lockups.
> >>
> >> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
> >> draining the buffer, restoring the register on return to the host.
> >>
> >> Cc: Marc Zyngier <maz@kernel.org>
> >> Cc: Oliver Upton <oupton@kernel.org>
> >> Cc: James Clark <james.clark@linaro.org>
> >> Cc: Leo Yan <leo.yan@arm.com>
> >> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> >> Cc: Fuad Tabba <tabba@google.com>
> >> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
> >> Signed-off-by: Will Deacon <will@kernel.org>
> >> ---
> >>
> >> NOTE: This is *untested* as I don't have a TRBE-capable device that can
> >> run upstream but I noticed this by inspection when triaging occasional
> >> hardware lockups on systems using a 6.12-based kernel with TRBE running
> >> at the same time as a vCPU is loaded. This code has changed quite a bit
> >> over time, so stable backports are not entirely straightforward.
> >> Hopefully James/Leo/Suzuki can help us test if folks agree with the
> >> general approach taken here.
> >>
> >> arch/arm64/include/asm/kvm_host.h | 1 +
> >> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
> >> 2 files changed, 28 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index ac7f970c7883..a932cf043b83 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -746,6 +746,7 @@ struct kvm_host_data {
> >> u64 pmscr_el1;
> >> /* Self-hosted trace */
> >> u64 trfcr_el1;
> >> + u64 trblimitr_el1;
> >> /* Values of trap registers for the host before guest entry. */
> >> u64 mdcr_el2;
> >> u64 brbcr_el1;
> >> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> >> index 2a1c0f49792b..fd389a26bc59 100644
> >> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> >> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> >> @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> >> write_sysreg_el1(new_trfcr, SYS_TRFCR);
> >> }
> >> -static bool __trace_needs_drain(void)
> >> +static void __trace_drain_and_disable(void)
> >> {
> >> - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
> >> - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
> >> + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
> >> - return host_data_test_flag(TRBE_ENABLED);
> >> + *trblimitr_el1 = 0;
> >> +
> >> + if (is_protected_kvm_enabled()) {
> >> + if (!host_data_test_flag(HAS_TRBE))
> >> + return;
> >> + } else {
> >> + if (!host_data_test_flag(TRBE_ENABLED))
> >> + return;
> >> + }
> >> +
> >> + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
> >> + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
> >> + isb();
> >> + tsb_csync();
> >> + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> >> + isb();
>
> The TRBE driver might do an extra drain here as a workaround. Hard to
> tell if it's actually required in this case (seems like probably not)
> but it might be worth doing it anyway to avoid hitting the
> issue. Especially if we add guest support later where some of the
> affected registers might start being used.
Just to set the expectations: guest TRBE support is not happening
until the architecture is fixed. It cannot reliably give a trace that
includes emulated exceptions, and until then, no TRBE for you.
> See:
>
> if (trbe_needs_drain_after_disable(cpudata))
> trbe_drain_buffer();
>
>
> >> + }
> >
> > Doesn't this mean we should be able to get rid of most of the TRFCR
> > messing about that litters the entry/exit code and leave that to VHE
>
> Technically you could have ETMs that and are connected to sinks other
> than TRBE. Unless you somehow switch off those sinks you still need to
> do the TRFCR switching stuff.
>
> > only? And even then, I'm tempted to simply get rid of any sort of
> > guest-only tracing, given that TRBE is not capable of representing
> > exceptions that are synthesised by the host, making it the resulting
> > traces useless.
>
> I haven't heard of anyone tracing a guest from the host, but until we
> add support for guests to be able to trace themselves it's the only
> way of doing it, so it could be useful.
But that's *not* working. If you trace EL1 only, even with a VHE host,
the result is not usable.
> Although all the messing
> around with TRFCR before was from a request to disable guest trace
> rather than enable it, as it was always on in nVHE.
I can't see how it could ever be "always on", unless you were happy to
randomly corrupt memory?
> > I'm still trying to get my hands on a TRBE-enabled system that has
> > some actual firmware tables (my O6 seems to have the HW, but no
> > description of the required coresight infra).
>
> Leo has a guide, I haven't tried it myself yet but it should be
> working. I'll send it to you.
Thanks. But why isn't that stuff publicly documented? Sight...
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 14:29 ` Marc Zyngier
2026-02-16 15:05 ` James Clark
@ 2026-02-16 15:53 ` Alexandru Elisei
2026-02-16 17:10 ` Will Deacon
2026-02-16 17:32 ` Will Deacon
2 siblings, 1 reply; 31+ messages in thread
From: Alexandru Elisei @ 2026-02-16 15:53 UTC (permalink / raw)
To: Marc Zyngier
Cc: Will Deacon, kvmarm, mark.rutland, linux-arm-kernel, Oliver Upton,
James Clark, Leo Yan, Suzuki K Poulose, Fuad Tabba
Hi,
On Mon, Feb 16, 2026 at 02:29:31PM +0000, Marc Zyngier wrote:
> On Mon, 16 Feb 2026 13:09:59 +0000,
> Will Deacon <will@kernel.org> wrote:
> >
> > The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> > generation in guest context when self-hosted TRBE is in use by the host.
> >
> > Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> > TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> > per R_YCHKJ the Trace Buffer Unit will still be enabled if
> > TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> > Trace Buffer Unit can perform address translation for the "owning
> > exception level" even when it is out of context.
>
> Great. So TRBE violates all the principles that we hold true in the
> architecture. Does SPE suffer from the same level of brokenness?
I think not currently - I_JZRDG from DDI0487M.a.a says that after a PSB + DSB
'no new memory accesses using the lower Exception level translation table
entries occur'.
But looks like the behaviour will be changed so that it will be similar to TRBE,
according to the Arm known issues document [1], added in D23136:
'When the Profiling Buffer is enabled, profiling is not stopped, and Discard mode
is not enabled, the Statistical Profiling Unit might cause speculative
translations for the owning translation regime, including when the owning
translation regime is out-of-context'.
[1] https://developer.arm.com/documentation/102105/latest/
Thanks,
Alex
>
> > Consequently, we can end up in a state where TRBE performs speculative
> > page-table walks for a host VA/IPA in guest/hypervisor context depending
> > on the value of MDCR_EL2.E2TB, which changes over world-switch. The
> > result appears to be a heady mixture of data corruption and hardware
> > lockups.
> >
> > Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
> > draining the buffer, restoring the register on return to the host.
> >
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Oliver Upton <oupton@kernel.org>
> > Cc: James Clark <james.clark@linaro.org>
> > Cc: Leo Yan <leo.yan@arm.com>
> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Cc: Fuad Tabba <tabba@google.com>
> > Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >
> > NOTE: This is *untested* as I don't have a TRBE-capable device that can
> > run upstream but I noticed this by inspection when triaging occasional
> > hardware lockups on systems using a 6.12-based kernel with TRBE running
> > at the same time as a vCPU is loaded. This code has changed quite a bit
> > over time, so stable backports are not entirely straightforward.
> > Hopefully James/Leo/Suzuki can help us test if folks agree with the
> > general approach taken here.
> >
> > arch/arm64/include/asm/kvm_host.h | 1 +
> > arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
> > 2 files changed, 28 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index ac7f970c7883..a932cf043b83 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -746,6 +746,7 @@ struct kvm_host_data {
> > u64 pmscr_el1;
> > /* Self-hosted trace */
> > u64 trfcr_el1;
> > + u64 trblimitr_el1;
> > /* Values of trap registers for the host before guest entry. */
> > u64 mdcr_el2;
> > u64 brbcr_el1;
> > diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > index 2a1c0f49792b..fd389a26bc59 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> > write_sysreg_el1(new_trfcr, SYS_TRFCR);
> > }
> >
> > -static bool __trace_needs_drain(void)
> > +static void __trace_drain_and_disable(void)
> > {
> > - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
> > - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
> > + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
> >
> > - return host_data_test_flag(TRBE_ENABLED);
> > + *trblimitr_el1 = 0;
> > +
> > + if (is_protected_kvm_enabled()) {
> > + if (!host_data_test_flag(HAS_TRBE))
> > + return;
> > + } else {
> > + if (!host_data_test_flag(TRBE_ENABLED))
> > + return;
> > + }
> > +
> > + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
> > + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
> > + isb();
> > + tsb_csync();
> > + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> > + isb();
> > + }
>
> Doesn't this mean we should be able to get rid of most of the TRFCR
> messing about that litters the entry/exit code and leave that to VHE
> only? And even then, I'm tempted to simply get rid of any sort of
> guest-only tracing, given that TRBE is not capable of representing
> exceptions that are synthesised by the host, making it the resulting
> traces useless.
>
> I'm still trying to get my hands on a TRBE-enabled system that has
> some actual firmware tables (my O6 seems to have the HW, but no
> description of the required coresight infra).
>
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 15:51 ` Marc Zyngier
@ 2026-02-16 16:10 ` James Clark
2026-02-16 16:49 ` Marc Zyngier
0 siblings, 1 reply; 31+ messages in thread
From: James Clark @ 2026-02-16 16:10 UTC (permalink / raw)
To: Marc Zyngier
Cc: Will Deacon, kvmarm, mark.rutland, linux-arm-kernel, Oliver Upton,
Leo Yan, Suzuki K Poulose, Fuad Tabba
On 16/02/2026 3:51 pm, Marc Zyngier wrote:
> On Mon, 16 Feb 2026 15:05:10 +0000,
> James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 16/02/2026 2:29 pm, Marc Zyngier wrote:
>>> On Mon, 16 Feb 2026 13:09:59 +0000,
>>> Will Deacon <will@kernel.org> wrote:
>>>>
>>>> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
>>>> generation in guest context when self-hosted TRBE is in use by the host.
>>>>
>>>> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
>>>> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
>>>> per R_YCHKJ the Trace Buffer Unit will still be enabled if
>>>> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
>>>> Trace Buffer Unit can perform address translation for the "owning
>>>> exception level" even when it is out of context.
>>>
>>> Great. So TRBE violates all the principles that we hold true in the
>>> architecture. Does SPE suffer from the same level of brokenness?
>>>
>>>> Consequently, we can end up in a state where TRBE performs speculative
>>>> page-table walks for a host VA/IPA in guest/hypervisor context depending
>>>> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
>>>> result appears to be a heady mixture of data corruption and hardware
>>>> lockups.
>>>>
>>>> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
>>>> draining the buffer, restoring the register on return to the host.
>>>>
>>>> Cc: Marc Zyngier <maz@kernel.org>
>>>> Cc: Oliver Upton <oupton@kernel.org>
>>>> Cc: James Clark <james.clark@linaro.org>
>>>> Cc: Leo Yan <leo.yan@arm.com>
>>>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>> Cc: Fuad Tabba <tabba@google.com>
>>>> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
>>>> Signed-off-by: Will Deacon <will@kernel.org>
>>>> ---
>>>>
>>>> NOTE: This is *untested* as I don't have a TRBE-capable device that can
>>>> run upstream but I noticed this by inspection when triaging occasional
>>>> hardware lockups on systems using a 6.12-based kernel with TRBE running
>>>> at the same time as a vCPU is loaded. This code has changed quite a bit
>>>> over time, so stable backports are not entirely straightforward.
>>>> Hopefully James/Leo/Suzuki can help us test if folks agree with the
>>>> general approach taken here.
>>>>
>>>> arch/arm64/include/asm/kvm_host.h | 1 +
>>>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
>>>> 2 files changed, 28 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index ac7f970c7883..a932cf043b83 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -746,6 +746,7 @@ struct kvm_host_data {
>>>> u64 pmscr_el1;
>>>> /* Self-hosted trace */
>>>> u64 trfcr_el1;
>>>> + u64 trblimitr_el1;
>>>> /* Values of trap registers for the host before guest entry. */
>>>> u64 mdcr_el2;
>>>> u64 brbcr_el1;
>>>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>>> index 2a1c0f49792b..fd389a26bc59 100644
>>>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>>> @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
>>>> write_sysreg_el1(new_trfcr, SYS_TRFCR);
>>>> }
>>>> -static bool __trace_needs_drain(void)
>>>> +static void __trace_drain_and_disable(void)
>>>> {
>>>> - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
>>>> - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
>>>> + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
>>>> - return host_data_test_flag(TRBE_ENABLED);
>>>> + *trblimitr_el1 = 0;
>>>> +
>>>> + if (is_protected_kvm_enabled()) {
>>>> + if (!host_data_test_flag(HAS_TRBE))
>>>> + return;
>>>> + } else {
>>>> + if (!host_data_test_flag(TRBE_ENABLED))
>>>> + return;
>>>> + }
>>>> +
>>>> + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
>>>> + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
>>>> + isb();
>>>> + tsb_csync();
>>>> + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
>>>> + isb();
>>
>> The TRBE driver might do an extra drain here as a workaround. Hard to
>> tell if it's actually required in this case (seems like probably not)
>> but it might be worth doing it anyway to avoid hitting the
>> issue. Especially if we add guest support later where some of the
>> affected registers might start being used.
>
> Just to set the expectations: guest TRBE support is not happening
> until the architecture is fixed. It cannot reliably give a trace that
> includes emulated exceptions, and until then, no TRBE for you.
>
>> See:
>>
>> if (trbe_needs_drain_after_disable(cpudata))
>> trbe_drain_buffer();
>>
>>
>>>> + }
>>>
>>> Doesn't this mean we should be able to get rid of most of the TRFCR
>>> messing about that litters the entry/exit code and leave that to VHE
>>
>> Technically you could have ETMs that and are connected to sinks other
>> than TRBE. Unless you somehow switch off those sinks you still need to
>> do the TRFCR switching stuff.
>>
>>> only? And even then, I'm tempted to simply get rid of any sort of
>>> guest-only tracing, given that TRBE is not capable of representing
>>> exceptions that are synthesised by the host, making it the resulting
>>> traces useless.
>>
>> I haven't heard of anyone tracing a guest from the host, but until we
>> add support for guests to be able to trace themselves it's the only
>> way of doing it, so it could be useful.
>
> But that's *not* working. If you trace EL1 only, even with a VHE host,
> the result is not usable.
>
Do you mean not working because of the missing exceptions? I did a bit
of testing before and the trace did seem somewhat usable to me. It had
EL1 and EL0 atoms in there. All you need is the mmap records from the
guest which you can get by running Perf in the guest and it's possible
to decode it. Maybe it's not complete but I don't think all use cases
require complete trace. AutoFDO for example just needs lots of small
snippets of execution history.
>> Although all the messing
>> around with TRFCR before was from a request to disable guest trace
>> rather than enable it, as it was always on in nVHE.
>
> I can't see how it could ever be "always on", unless you were happy to
> randomly corrupt memory?
>
It was, and it was specifically the reason to add the TRFCR switching
stuff. With a non-TRBE sink in nVHE mode.
VHE was always off and nVHE was always on. So we fixed both at the same
time and made the include/exclude settings of the event respected
wherever possible.
>>> I'm still trying to get my hands on a TRBE-enabled system that has
>>> some actual firmware tables (my O6 seems to have the HW, but no
>>> description of the required coresight infra).
>>
>> Leo has a guide, I haven't tried it myself yet but it should be
>> working. I'll send it to you.
>
> Thanks. But why isn't that stuff publicly documented? Sight...
>
> M.
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 16:10 ` James Clark
@ 2026-02-16 16:49 ` Marc Zyngier
2026-02-20 11:42 ` James Clark
2026-02-20 15:48 ` Leo Yan
0 siblings, 2 replies; 31+ messages in thread
From: Marc Zyngier @ 2026-02-16 16:49 UTC (permalink / raw)
To: James Clark
Cc: Will Deacon, kvmarm, mark.rutland, linux-arm-kernel, Oliver Upton,
Leo Yan, Suzuki K Poulose, Fuad Tabba
On Mon, 16 Feb 2026 16:10:14 +0000,
James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 16/02/2026 3:51 pm, Marc Zyngier wrote:
> > On Mon, 16 Feb 2026 15:05:10 +0000,
> > James Clark <james.clark@linaro.org> wrote:
> >>
> >>
> >>
> >> On 16/02/2026 2:29 pm, Marc Zyngier wrote:
> >>> On Mon, 16 Feb 2026 13:09:59 +0000,
> >>> Will Deacon <will@kernel.org> wrote:
> >>>>
> >>>> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> >>>> generation in guest context when self-hosted TRBE is in use by the host.
> >>>>
> >>>> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> >>>> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> >>>> per R_YCHKJ the Trace Buffer Unit will still be enabled if
> >>>> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> >>>> Trace Buffer Unit can perform address translation for the "owning
> >>>> exception level" even when it is out of context.
> >>>
> >>> Great. So TRBE violates all the principles that we hold true in the
> >>> architecture. Does SPE suffer from the same level of brokenness?
> >>>
> >>>> Consequently, we can end up in a state where TRBE performs speculative
> >>>> page-table walks for a host VA/IPA in guest/hypervisor context depending
> >>>> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
> >>>> result appears to be a heady mixture of data corruption and hardware
> >>>> lockups.
> >>>>
> >>>> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
> >>>> draining the buffer, restoring the register on return to the host.
> >>>>
> >>>> Cc: Marc Zyngier <maz@kernel.org>
> >>>> Cc: Oliver Upton <oupton@kernel.org>
> >>>> Cc: James Clark <james.clark@linaro.org>
> >>>> Cc: Leo Yan <leo.yan@arm.com>
> >>>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>>> Cc: Fuad Tabba <tabba@google.com>
> >>>> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
> >>>> Signed-off-by: Will Deacon <will@kernel.org>
> >>>> ---
> >>>>
> >>>> NOTE: This is *untested* as I don't have a TRBE-capable device that can
> >>>> run upstream but I noticed this by inspection when triaging occasional
> >>>> hardware lockups on systems using a 6.12-based kernel with TRBE running
> >>>> at the same time as a vCPU is loaded. This code has changed quite a bit
> >>>> over time, so stable backports are not entirely straightforward.
> >>>> Hopefully James/Leo/Suzuki can help us test if folks agree with the
> >>>> general approach taken here.
> >>>>
> >>>> arch/arm64/include/asm/kvm_host.h | 1 +
> >>>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
> >>>> 2 files changed, 28 insertions(+), 9 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >>>> index ac7f970c7883..a932cf043b83 100644
> >>>> --- a/arch/arm64/include/asm/kvm_host.h
> >>>> +++ b/arch/arm64/include/asm/kvm_host.h
> >>>> @@ -746,6 +746,7 @@ struct kvm_host_data {
> >>>> u64 pmscr_el1;
> >>>> /* Self-hosted trace */
> >>>> u64 trfcr_el1;
> >>>> + u64 trblimitr_el1;
> >>>> /* Values of trap registers for the host before guest entry. */
> >>>> u64 mdcr_el2;
> >>>> u64 brbcr_el1;
> >>>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> >>>> index 2a1c0f49792b..fd389a26bc59 100644
> >>>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> >>>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> >>>> @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> >>>> write_sysreg_el1(new_trfcr, SYS_TRFCR);
> >>>> }
> >>>> -static bool __trace_needs_drain(void)
> >>>> +static void __trace_drain_and_disable(void)
> >>>> {
> >>>> - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
> >>>> - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
> >>>> + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
> >>>> - return host_data_test_flag(TRBE_ENABLED);
> >>>> + *trblimitr_el1 = 0;
> >>>> +
> >>>> + if (is_protected_kvm_enabled()) {
> >>>> + if (!host_data_test_flag(HAS_TRBE))
> >>>> + return;
> >>>> + } else {
> >>>> + if (!host_data_test_flag(TRBE_ENABLED))
> >>>> + return;
> >>>> + }
> >>>> +
> >>>> + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
> >>>> + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
> >>>> + isb();
> >>>> + tsb_csync();
> >>>> + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> >>>> + isb();
> >>
> >> The TRBE driver might do an extra drain here as a workaround. Hard to
> >> tell if it's actually required in this case (seems like probably not)
> >> but it might be worth doing it anyway to avoid hitting the
> >> issue. Especially if we add guest support later where some of the
> >> affected registers might start being used.
> >
> > Just to set the expectations: guest TRBE support is not happening
> > until the architecture is fixed. It cannot reliably give a trace that
> > includes emulated exceptions, and until then, no TRBE for you.
> >
> >> See:
> >>
> >> if (trbe_needs_drain_after_disable(cpudata))
> >> trbe_drain_buffer();
> >>
> >>
> >>>> + }
> >>>
> >>> Doesn't this mean we should be able to get rid of most of the TRFCR
> >>> messing about that litters the entry/exit code and leave that to VHE
> >>
> >> Technically you could have ETMs that and are connected to sinks other
> >> than TRBE. Unless you somehow switch off those sinks you still need to
> >> do the TRFCR switching stuff.
> >>
> >>> only? And even then, I'm tempted to simply get rid of any sort of
> >>> guest-only tracing, given that TRBE is not capable of representing
> >>> exceptions that are synthesised by the host, making it the resulting
> >>> traces useless.
> >>
> >> I haven't heard of anyone tracing a guest from the host, but until we
> >> add support for guests to be able to trace themselves it's the only
> >> way of doing it, so it could be useful.
> >
> > But that's *not* working. If you trace EL1 only, even with a VHE host,
> > the result is not usable.
> >
>
> Do you mean not working because of the missing exceptions? I did a bit
> of testing before and the trace did seem somewhat usable to me. It had
> EL1 and EL0 atoms in there.
Sure. Now try to look at what that means for NV, where all the
EL1->EL2 exceptions are emulated, where all the EL2->EL1 exception
returns are emulated.
What does it give you? A bag of nonsense.
Same thing for EL2->EL0, by the way, so you can't even correctly
profile an EL0 program that performs a syscall, or that gets
interrupted. And while without NV, these exceptions are rare, having a
trace that is unreliable has the potential of being worse than no
trace at all.
Until the architecture grows a way for KVM to inject the missing
information into the trace, TRBE support for guest will stay out.
> All you need is the mmap records from the
> guest which you can get by running Perf in the guest and it's possible
> to decode it. Maybe it's not complete but I don't think all use cases
> require complete trace. AutoFDO for example just needs lots of small
> snippets of execution history.
I don't think it is OK to feed an FDO with traces that are known to be
incomplete. Maybe that goes under the radar today, but my crystal ball
is telling me things could be very different in the future, and I'm
not going to take any bet.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 15:13 ` James Clark
@ 2026-02-16 17:05 ` Will Deacon
2026-02-17 9:18 ` James Clark
0 siblings, 1 reply; 31+ messages in thread
From: Will Deacon @ 2026-02-16 17:05 UTC (permalink / raw)
To: James Clark
Cc: kvmarm, mark.rutland, linux-arm-kernel, Marc Zyngier,
Oliver Upton, Leo Yan, Suzuki K Poulose, Fuad Tabba
On Mon, Feb 16, 2026 at 03:13:54PM +0000, James Clark wrote:
>
>
> On 16/02/2026 1:09 pm, Will Deacon wrote:
> > The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> > generation in guest context when self-hosted TRBE is in use by the host.
> >
> > Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> > TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> > per R_YCHKJ the Trace Buffer Unit will still be enabled if
> > TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> > Trace Buffer Unit can perform address translation for the "owning
> > exception level" even when it is out of context.
> >
> > Consequently, we can end up in a state where TRBE performs speculative
> > page-table walks for a host VA/IPA in guest/hypervisor context depending
> > on the value of MDCR_EL2.E2TB, which changes over world-switch. The
> > result appears to be a heady mixture of data corruption and hardware
> > lockups.
> >
> > Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
> > draining the buffer, restoring the register on return to the host.
> >
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Oliver Upton <oupton@kernel.org>
> > Cc: James Clark <james.clark@linaro.org>
> > Cc: Leo Yan <leo.yan@arm.com>
> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Cc: Fuad Tabba <tabba@google.com>
> > Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >
> > NOTE: This is *untested* as I don't have a TRBE-capable device that can
> > run upstream but I noticed this by inspection when triaging occasional
> > hardware lockups on systems using a 6.12-based kernel with TRBE running
> > at the same time as a vCPU is loaded. This code has changed quite a bit
> > over time, so stable backports are not entirely straightforward.
> > Hopefully James/Leo/Suzuki can help us test if folks agree with the
> > general approach taken here.
> >
> > arch/arm64/include/asm/kvm_host.h | 1 +
> > arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
> > 2 files changed, 28 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index ac7f970c7883..a932cf043b83 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -746,6 +746,7 @@ struct kvm_host_data {
> > u64 pmscr_el1;
> > /* Self-hosted trace */
> > u64 trfcr_el1;
> > + u64 trblimitr_el1;
> > /* Values of trap registers for the host before guest entry. */
> > u64 mdcr_el2;
> > u64 brbcr_el1;
> > diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > index 2a1c0f49792b..fd389a26bc59 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> > write_sysreg_el1(new_trfcr, SYS_TRFCR);
> > }
> > -static bool __trace_needs_drain(void)
> > +static void __trace_drain_and_disable(void)
> > {
> > - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
> > - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
> > + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
> > - return host_data_test_flag(TRBE_ENABLED);
> > + *trblimitr_el1 = 0;
> > +
> > + if (is_protected_kvm_enabled()) {
> > + if (!host_data_test_flag(HAS_TRBE))
> > + return;
> > + } else {
> > + if (!host_data_test_flag(TRBE_ENABLED))
> > + return;
> > + }
> > +
> > + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
> > + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
> > + isb();
> > + tsb_csync();
> > + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> > + isb();
> > + }
> > }
> > static bool __trace_needs_switch(void)
> > @@ -79,15 +94,18 @@ static void __trace_switch_to_guest(void)
> > __trace_do_switch(host_data_ptr(host_debug_state.trfcr_el1),
> > *host_data_ptr(trfcr_while_in_guest));
> > -
> > - if (__trace_needs_drain()) {
> > - isb();
> > - tsb_csync();
> > - }
> > + __trace_drain_and_disable();
> > }
> > static void __trace_switch_to_host(void)
> > {
> > + u64 trblimitr_el1 = *host_data_ptr(host_debug_state.trblimitr_el1);
> > +
> > + if (trblimitr_el1 & TRBLIMITR_EL1_E) {
> > + write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
>
> Will this restore a stale value if you do kvm_enable_trbe() then later
> kvm_disable_trbe()? Looks like the read and save will be skipped unless
> host_data_test_flag(TRBE_ENABLED) is true, so it will never save a disabled
> value.
__trace_drain_and_disable() sets the saved limit to 0 if TRBE_ENABLE is
not set, so this shouldn't do anything in that case. Or did I
misunderstand your scenario?
> kvm_disable_trbe() might need to clear host_debug_state.trblimitr_el1.
pKVM can't rely on that thing being called, so the context switch still
needs to be self-contained there.
Will
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 15:53 ` Alexandru Elisei
@ 2026-02-16 17:10 ` Will Deacon
2026-02-17 12:13 ` Will Deacon
0 siblings, 1 reply; 31+ messages in thread
From: Will Deacon @ 2026-02-16 17:10 UTC (permalink / raw)
To: Alexandru Elisei
Cc: Marc Zyngier, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, James Clark, Leo Yan, Suzuki K Poulose, Fuad Tabba
On Mon, Feb 16, 2026 at 03:53:54PM +0000, Alexandru Elisei wrote:
> Hi,
>
> On Mon, Feb 16, 2026 at 02:29:31PM +0000, Marc Zyngier wrote:
> > On Mon, 16 Feb 2026 13:09:59 +0000,
> > Will Deacon <will@kernel.org> wrote:
> > >
> > > The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> > > generation in guest context when self-hosted TRBE is in use by the host.
> > >
> > > Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> > > TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> > > per R_YCHKJ the Trace Buffer Unit will still be enabled if
> > > TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> > > Trace Buffer Unit can perform address translation for the "owning
> > > exception level" even when it is out of context.
> >
> > Great. So TRBE violates all the principles that we hold true in the
> > architecture. Does SPE suffer from the same level of brokenness?
>
> I think not currently - I_JZRDG from DDI0487M.a.a says that after a PSB + DSB
> 'no new memory accesses using the lower Exception level translation table
> entries occur'.
>
> But looks like the behaviour will be changed so that it will be similar to TRBE,
> according to the Arm known issues document [1], added in D23136:
>
> 'When the Profiling Buffer is enabled, profiling is not stopped, and Discard mode
> is not enabled, the Statistical Profiling Unit might cause speculative
> translations for the owning translation regime, including when the owning
> translation regime is out-of-context'.
I think SPE is ok, as __debug_save_spe() clears PMSCR_EL1 and (unlike
TRBE) PMSCR_EL1.ExSPE _are_ factored into whether or not profiling is
"enabled".
So there's a funny asymmetry between SPE and TRBE, which I assume is due
to the coresight much associated with the latter.
Will
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 14:29 ` Marc Zyngier
2026-02-16 15:05 ` James Clark
2026-02-16 15:53 ` Alexandru Elisei
@ 2026-02-16 17:32 ` Will Deacon
2026-02-17 12:20 ` James Clark
2 siblings, 1 reply; 31+ messages in thread
From: Will Deacon @ 2026-02-16 17:32 UTC (permalink / raw)
To: Marc Zyngier
Cc: kvmarm, mark.rutland, linux-arm-kernel, Oliver Upton, James Clark,
Leo Yan, Suzuki K Poulose, Fuad Tabba
On Mon, Feb 16, 2026 at 02:29:31PM +0000, Marc Zyngier wrote:
> On Mon, 16 Feb 2026 13:09:59 +0000,
> Will Deacon <will@kernel.org> wrote:
> >
> > The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> > generation in guest context when self-hosted TRBE is in use by the host.
> >
> > Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> > TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> > per R_YCHKJ the Trace Buffer Unit will still be enabled if
> > TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> > Trace Buffer Unit can perform address translation for the "owning
> > exception level" even when it is out of context.
>
> Great. So TRBE violates all the principles that we hold true in the
> architecture. Does SPE suffer from the same level of brokenness?
>
> > Consequently, we can end up in a state where TRBE performs speculative
> > page-table walks for a host VA/IPA in guest/hypervisor context depending
> > on the value of MDCR_EL2.E2TB, which changes over world-switch. The
> > result appears to be a heady mixture of data corruption and hardware
> > lockups.
> >
> > Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
> > draining the buffer, restoring the register on return to the host.
> >
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Oliver Upton <oupton@kernel.org>
> > Cc: James Clark <james.clark@linaro.org>
> > Cc: Leo Yan <leo.yan@arm.com>
> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Cc: Fuad Tabba <tabba@google.com>
> > Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >
> > NOTE: This is *untested* as I don't have a TRBE-capable device that can
> > run upstream but I noticed this by inspection when triaging occasional
> > hardware lockups on systems using a 6.12-based kernel with TRBE running
> > at the same time as a vCPU is loaded. This code has changed quite a bit
> > over time, so stable backports are not entirely straightforward.
> > Hopefully James/Leo/Suzuki can help us test if folks agree with the
> > general approach taken here.
> >
> > arch/arm64/include/asm/kvm_host.h | 1 +
> > arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
> > 2 files changed, 28 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index ac7f970c7883..a932cf043b83 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -746,6 +746,7 @@ struct kvm_host_data {
> > u64 pmscr_el1;
> > /* Self-hosted trace */
> > u64 trfcr_el1;
> > + u64 trblimitr_el1;
> > /* Values of trap registers for the host before guest entry. */
> > u64 mdcr_el2;
> > u64 brbcr_el1;
> > diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > index 2a1c0f49792b..fd389a26bc59 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> > write_sysreg_el1(new_trfcr, SYS_TRFCR);
> > }
> >
> > -static bool __trace_needs_drain(void)
> > +static void __trace_drain_and_disable(void)
> > {
> > - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
> > - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
> > + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
> >
> > - return host_data_test_flag(TRBE_ENABLED);
> > + *trblimitr_el1 = 0;
> > +
> > + if (is_protected_kvm_enabled()) {
> > + if (!host_data_test_flag(HAS_TRBE))
> > + return;
> > + } else {
> > + if (!host_data_test_flag(TRBE_ENABLED))
> > + return;
> > + }
> > +
> > + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
> > + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
> > + isb();
> > + tsb_csync();
> > + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> > + isb();
> > + }
>
> Doesn't this mean we should be able to get rid of most of the TRFCR
> messing about that litters the entry/exit code and leave that to VHE
> only?
I'm not sure we can get rid of an awful lot: if the host is using TRBE,
then we still need to stop trace generation, drain the buffer and
disable the buffer. Or are you thinking of some other TRFCR accesses?
Looking at the TRBE driver, I _think_ the idea is that the trace
hardware can generate trace to ETM/Coresight instead of memory in some
cases and so you can enable it at boot time or via sysfs and then
profile the whole machine, presumably using an expensive external box +
cable or via some other coresight "sink" component. But I'm really
guessing based on the driver; James and Leo will know for sure.
I've tried (and failed) to reconcile the above with what is written in
the Arm ARM regarding self-hosted trace with TRBE.
> And even then, I'm tempted to simply get rid of any sort of
> guest-only tracing, given that TRBE is not capable of representing
> exceptions that are synthesised by the host, making it the resulting
> traces useless.
I think that effectively means reverting the series merged from here:
https://lore.kernel.org/all/20250106142446.628923-1-james.clark@linaro.org/
but then we still need to clear TRBLIMITR_EL1.E.
Will
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 15:05 ` James Clark
2026-02-16 15:51 ` Marc Zyngier
@ 2026-02-16 18:14 ` Will Deacon
2026-02-17 14:19 ` Leo Yan
1 sibling, 1 reply; 31+ messages in thread
From: Will Deacon @ 2026-02-16 18:14 UTC (permalink / raw)
To: James Clark
Cc: Marc Zyngier, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, Leo Yan, Suzuki K Poulose, Fuad Tabba
On Mon, Feb 16, 2026 at 03:05:10PM +0000, James Clark wrote:
> On 16/02/2026 2:29 pm, Marc Zyngier wrote:
> > On Mon, 16 Feb 2026 13:09:59 +0000,
> > Will Deacon <will@kernel.org> wrote:
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > > index 2a1c0f49792b..fd389a26bc59 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> > > @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> > > write_sysreg_el1(new_trfcr, SYS_TRFCR);
> > > }
> > > -static bool __trace_needs_drain(void)
> > > +static void __trace_drain_and_disable(void)
> > > {
> > > - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
> > > - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
> > > + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
> > > - return host_data_test_flag(TRBE_ENABLED);
> > > + *trblimitr_el1 = 0;
> > > +
> > > + if (is_protected_kvm_enabled()) {
> > > + if (!host_data_test_flag(HAS_TRBE))
> > > + return;
> > > + } else {
> > > + if (!host_data_test_flag(TRBE_ENABLED))
> > > + return;
> > > + }
> > > +
> > > + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
> > > + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
> > > + isb();
> > > + tsb_csync();
> > > + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> > > + isb();
>
> The TRBE driver might do an extra drain here as a workaround. Hard to tell
> if it's actually required in this case (seems like probably not) but it
> might be worth doing it anyway to avoid hitting the issue. Especially if we
> add guest support later where some of the affected registers might start
> being used. See:
>
> if (trbe_needs_drain_after_disable(cpudata))
> trbe_drain_buffer();
Oh great, this thing sucks even more than I realised!
But thanks for pointing that out... this is presumably erratum #2064142,
but we probably need to look at #2038923 as well :/
I can't find any public documentation for the problems, but based on the
kconfig text then I think we care about #2064142 so that the TRBE
register writes when restoring the host context are effective and we
care about #2038923 to avoid corrupting trace when re-enabling for the
host.
It also looks like we can't rely on the dsb(nsh) in the vcpu_run()
path if that needs to be before the write to TRBLIMITR_EL1.
In which case, the host->guest something hideous like:
isb();
tsb_csync(); // Executes twice if ARM64_WORKAROUND_TSB_FLUSH_FAILURE!
dsb(nsh); // I missed this in my patch
write_sysreg_s(0, SYS_TRBLIMITR_EL1);
if (2064142) {
tsb_csync();
dsb(nsh);
}
isb();
and then the guest->host part is:
write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
isb();
if (2038923)
isb();
Does that look right to you?
Will
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 17:05 ` Will Deacon
@ 2026-02-17 9:18 ` James Clark
0 siblings, 0 replies; 31+ messages in thread
From: James Clark @ 2026-02-17 9:18 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, mark.rutland, linux-arm-kernel, Marc Zyngier,
Oliver Upton, Leo Yan, Suzuki K Poulose, Fuad Tabba
On 16/02/2026 5:05 pm, Will Deacon wrote:
> On Mon, Feb 16, 2026 at 03:13:54PM +0000, James Clark wrote:
>>
>>
>> On 16/02/2026 1:09 pm, Will Deacon wrote:
>>> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
>>> generation in guest context when self-hosted TRBE is in use by the host.
>>>
>>> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
>>> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
>>> per R_YCHKJ the Trace Buffer Unit will still be enabled if
>>> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
>>> Trace Buffer Unit can perform address translation for the "owning
>>> exception level" even when it is out of context.
>>>
>>> Consequently, we can end up in a state where TRBE performs speculative
>>> page-table walks for a host VA/IPA in guest/hypervisor context depending
>>> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
>>> result appears to be a heady mixture of data corruption and hardware
>>> lockups.
>>>
>>> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
>>> draining the buffer, restoring the register on return to the host.
>>>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Cc: Oliver Upton <oupton@kernel.org>
>>> Cc: James Clark <james.clark@linaro.org>
>>> Cc: Leo Yan <leo.yan@arm.com>
>>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> Cc: Fuad Tabba <tabba@google.com>
>>> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
>>> Signed-off-by: Will Deacon <will@kernel.org>
>>> ---
>>>
>>> NOTE: This is *untested* as I don't have a TRBE-capable device that can
>>> run upstream but I noticed this by inspection when triaging occasional
>>> hardware lockups on systems using a 6.12-based kernel with TRBE running
>>> at the same time as a vCPU is loaded. This code has changed quite a bit
>>> over time, so stable backports are not entirely straightforward.
>>> Hopefully James/Leo/Suzuki can help us test if folks agree with the
>>> general approach taken here.
>>>
>>> arch/arm64/include/asm/kvm_host.h | 1 +
>>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
>>> 2 files changed, 28 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index ac7f970c7883..a932cf043b83 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -746,6 +746,7 @@ struct kvm_host_data {
>>> u64 pmscr_el1;
>>> /* Self-hosted trace */
>>> u64 trfcr_el1;
>>> + u64 trblimitr_el1;
>>> /* Values of trap registers for the host before guest entry. */
>>> u64 mdcr_el2;
>>> u64 brbcr_el1;
>>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>> index 2a1c0f49792b..fd389a26bc59 100644
>>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>> @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
>>> write_sysreg_el1(new_trfcr, SYS_TRFCR);
>>> }
>>> -static bool __trace_needs_drain(void)
>>> +static void __trace_drain_and_disable(void)
>>> {
>>> - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
>>> - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
>>> + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
>>> - return host_data_test_flag(TRBE_ENABLED);
>>> + *trblimitr_el1 = 0;
>>> +
>>> + if (is_protected_kvm_enabled()) {
>>> + if (!host_data_test_flag(HAS_TRBE))
>>> + return;
>>> + } else {
>>> + if (!host_data_test_flag(TRBE_ENABLED))
>>> + return;
>>> + }
>>> +
>>> + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
>>> + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
>>> + isb();
>>> + tsb_csync();
>>> + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
>>> + isb();
>>> + }
>>> }
>>> static bool __trace_needs_switch(void)
>>> @@ -79,15 +94,18 @@ static void __trace_switch_to_guest(void)
>>> __trace_do_switch(host_data_ptr(host_debug_state.trfcr_el1),
>>> *host_data_ptr(trfcr_while_in_guest));
>>> -
>>> - if (__trace_needs_drain()) {
>>> - isb();
>>> - tsb_csync();
>>> - }
>>> + __trace_drain_and_disable();
>>> }
>>> static void __trace_switch_to_host(void)
>>> {
>>> + u64 trblimitr_el1 = *host_data_ptr(host_debug_state.trblimitr_el1);
>>> +
>>> + if (trblimitr_el1 & TRBLIMITR_EL1_E) {
>>> + write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
>>
>> Will this restore a stale value if you do kvm_enable_trbe() then later
>> kvm_disable_trbe()? Looks like the read and save will be skipped unless
>> host_data_test_flag(TRBE_ENABLED) is true, so it will never save a disabled
>> value.
>
> __trace_drain_and_disable() sets the saved limit to 0 if TRBE_ENABLE is
> not set, so this shouldn't do anything in that case. Or did I
> misunderstand your scenario?
>
No you're right. I saw the early return for !TRBE_ENABLED and thought
things only happened after that. But the zeroing is before, so it's ok.
>> kvm_disable_trbe() might need to clear host_debug_state.trblimitr_el1.
>
> pKVM can't rely on that thing being called, so the context switch still
> needs to be self-contained there.
>
> Will
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 17:10 ` Will Deacon
@ 2026-02-17 12:13 ` Will Deacon
0 siblings, 0 replies; 31+ messages in thread
From: Will Deacon @ 2026-02-17 12:13 UTC (permalink / raw)
To: Alexandru Elisei
Cc: Marc Zyngier, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, James Clark, Leo Yan, Suzuki K Poulose, Fuad Tabba
On Mon, Feb 16, 2026 at 05:10:17PM +0000, Will Deacon wrote:
> On Mon, Feb 16, 2026 at 03:53:54PM +0000, Alexandru Elisei wrote:
> > Hi,
> >
> > On Mon, Feb 16, 2026 at 02:29:31PM +0000, Marc Zyngier wrote:
> > > On Mon, 16 Feb 2026 13:09:59 +0000,
> > > Will Deacon <will@kernel.org> wrote:
> > > >
> > > > The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> > > > generation in guest context when self-hosted TRBE is in use by the host.
> > > >
> > > > Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> > > > TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> > > > per R_YCHKJ the Trace Buffer Unit will still be enabled if
> > > > TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> > > > Trace Buffer Unit can perform address translation for the "owning
> > > > exception level" even when it is out of context.
> > >
> > > Great. So TRBE violates all the principles that we hold true in the
> > > architecture. Does SPE suffer from the same level of brokenness?
> >
> > I think not currently - I_JZRDG from DDI0487M.a.a says that after a PSB + DSB
> > 'no new memory accesses using the lower Exception level translation table
> > entries occur'.
> >
> > But looks like the behaviour will be changed so that it will be similar to TRBE,
> > according to the Arm known issues document [1], added in D23136:
> >
> > 'When the Profiling Buffer is enabled, profiling is not stopped, and Discard mode
> > is not enabled, the Statistical Profiling Unit might cause speculative
> > translations for the owning translation regime, including when the owning
> > translation regime is out-of-context'.
>
> I think SPE is ok, as __debug_save_spe() clears PMSCR_EL1 and (unlike
> TRBE) PMSCR_EL1.ExSPE _are_ factored into whether or not profiling is
> "enabled".
>
> So there's a funny asymmetry between SPE and TRBE, which I assume is due
> to the coresight much associated with the latter.
Urgh...
Alex and I spoke a bit about this on irc and we think SPE might suffer
from the same problem after all. It just uses different terminology, so
it's not as obvious to shake out from the TRBE example.
With TRBE the problem is that we have:
* TRFCR_EL1 controls whether or not trace generation is "prohibited"
* TRBLIMITR controls whether or not the trace buffer unit is "enabled"
With SPE we have:
* PMSCR_EL1 controls whether or not *profiling* is "enabled"
* PMBLIMITR controls whether or not the *profiling buffer* is "enabled"
Unlike TRBE, the SPE spec doesn't talk concretely about address
translation but I think it's safest to assume that the profiling buffer
being enabled means that it can translate out of context, similarly to
TRBE.
So we'll need to zap PMBLIMITR as well. I'll cook something for v2 but,
as before, I'm going to struggle to test this (I think SPE got whacked
at EL3 for a bunch of errata on the devices I have).
Will
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 17:32 ` Will Deacon
@ 2026-02-17 12:20 ` James Clark
2026-02-17 12:26 ` Will Deacon
0 siblings, 1 reply; 31+ messages in thread
From: James Clark @ 2026-02-17 12:20 UTC (permalink / raw)
To: Will Deacon, Marc Zyngier
Cc: kvmarm, mark.rutland, linux-arm-kernel, Oliver Upton, Leo Yan,
Suzuki K Poulose, Fuad Tabba
On 16/02/2026 5:32 pm, Will Deacon wrote:
> On Mon, Feb 16, 2026 at 02:29:31PM +0000, Marc Zyngier wrote:
>> On Mon, 16 Feb 2026 13:09:59 +0000,
>> Will Deacon <will@kernel.org> wrote:
>>>
>>> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
>>> generation in guest context when self-hosted TRBE is in use by the host.
>>>
>>> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
>>> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
>>> per R_YCHKJ the Trace Buffer Unit will still be enabled if
>>> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
>>> Trace Buffer Unit can perform address translation for the "owning
>>> exception level" even when it is out of context.
>>
>> Great. So TRBE violates all the principles that we hold true in the
>> architecture. Does SPE suffer from the same level of brokenness?
>>
>>> Consequently, we can end up in a state where TRBE performs speculative
>>> page-table walks for a host VA/IPA in guest/hypervisor context depending
>>> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
>>> result appears to be a heady mixture of data corruption and hardware
>>> lockups.
>>>
>>> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
>>> draining the buffer, restoring the register on return to the host.
>>>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Cc: Oliver Upton <oupton@kernel.org>
>>> Cc: James Clark <james.clark@linaro.org>
>>> Cc: Leo Yan <leo.yan@arm.com>
>>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> Cc: Fuad Tabba <tabba@google.com>
>>> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
>>> Signed-off-by: Will Deacon <will@kernel.org>
>>> ---
>>>
>>> NOTE: This is *untested* as I don't have a TRBE-capable device that can
>>> run upstream but I noticed this by inspection when triaging occasional
>>> hardware lockups on systems using a 6.12-based kernel with TRBE running
>>> at the same time as a vCPU is loaded. This code has changed quite a bit
>>> over time, so stable backports are not entirely straightforward.
>>> Hopefully James/Leo/Suzuki can help us test if folks agree with the
>>> general approach taken here.
>>>
>>> arch/arm64/include/asm/kvm_host.h | 1 +
>>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
>>> 2 files changed, 28 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index ac7f970c7883..a932cf043b83 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -746,6 +746,7 @@ struct kvm_host_data {
>>> u64 pmscr_el1;
>>> /* Self-hosted trace */
>>> u64 trfcr_el1;
>>> + u64 trblimitr_el1;
>>> /* Values of trap registers for the host before guest entry. */
>>> u64 mdcr_el2;
>>> u64 brbcr_el1;
>>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>> index 2a1c0f49792b..fd389a26bc59 100644
>>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>> @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
>>> write_sysreg_el1(new_trfcr, SYS_TRFCR);
>>> }
>>>
>>> -static bool __trace_needs_drain(void)
>>> +static void __trace_drain_and_disable(void)
>>> {
>>> - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
>>> - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
>>> + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
>>>
>>> - return host_data_test_flag(TRBE_ENABLED);
>>> + *trblimitr_el1 = 0;
>>> +
>>> + if (is_protected_kvm_enabled()) {
>>> + if (!host_data_test_flag(HAS_TRBE))
>>> + return;
>>> + } else {
>>> + if (!host_data_test_flag(TRBE_ENABLED))
>>> + return;
>>> + }
>>> +
>>> + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
>>> + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
>>> + isb();
>>> + tsb_csync();
>>> + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
>>> + isb();
>>> + }
>>
>> Doesn't this mean we should be able to get rid of most of the TRFCR
>> messing about that litters the entry/exit code and leave that to VHE
>> only?
>
> I'm not sure we can get rid of an awful lot: if the host is using TRBE,
> then we still need to stop trace generation, drain the buffer and
> disable the buffer. Or are you thinking of some other TRFCR accesses?
>
> Looking at the TRBE driver, I _think_ the idea is that the trace
> hardware can generate trace to ETM/Coresight instead of memory in some
> cases and so you can enable it at boot time or via sysfs and then
> profile the whole machine, presumably using an expensive external box +
> cable or via some other coresight "sink" component. But I'm really
> guessing based on the driver; James and Leo will know for sure.
Exactly, there are other sink types. For example ETF has SRAM or ETR
uses physical addresses.
>
> I've tried (and failed) to reconcile the above with what is written in
> the Arm ARM regarding self-hosted trace with TRBE.
>
>> And even then, I'm tempted to simply get rid of any sort of
>> guest-only tracing, given that TRBE is not capable of representing
>> exceptions that are synthesised by the host, making it the resulting
>> traces useless.
>
> I think that effectively means reverting the series merged from here:
>
> https://lore.kernel.org/all/20250106142446.628923-1-james.clark@linaro.org/
>
> but then we still need to clear TRBLIMITR_EL1.E.
>
> Will
Removing that series would actually have the effect of turning guest
trace on in nVHE for non-TRBE sinks. The reason for implementing the
filtering was to turn guest trace off because a user didn't want to see it.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-17 12:20 ` James Clark
@ 2026-02-17 12:26 ` Will Deacon
2026-02-17 13:58 ` James Clark
0 siblings, 1 reply; 31+ messages in thread
From: Will Deacon @ 2026-02-17 12:26 UTC (permalink / raw)
To: James Clark
Cc: Marc Zyngier, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, Leo Yan, Suzuki K Poulose, Fuad Tabba
On Tue, Feb 17, 2026 at 12:20:14PM +0000, James Clark wrote:
> On 16/02/2026 5:32 pm, Will Deacon wrote:
> > On Mon, Feb 16, 2026 at 02:29:31PM +0000, Marc Zyngier wrote:
> > > And even then, I'm tempted to simply get rid of any sort of
> > > guest-only tracing, given that TRBE is not capable of representing
> > > exceptions that are synthesised by the host, making it the resulting
> > > traces useless.
> >
> > I think that effectively means reverting the series merged from here:
> >
> > https://lore.kernel.org/all/20250106142446.628923-1-james.clark@linaro.org/
> >
> > but then we still need to clear TRBLIMITR_EL1.E.
> >
>
> Removing that series would actually have the effect of turning guest trace
> on in nVHE for non-TRBE sinks. The reason for implementing the filtering was
> to turn guest trace off because a user didn't want to see it.
What I meant was, revert that series and then also ensure that both TRFCR
and TRBLIMITR are always zero while running in the guest. Is that not
sufficient?
Will
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-17 12:26 ` Will Deacon
@ 2026-02-17 13:58 ` James Clark
0 siblings, 0 replies; 31+ messages in thread
From: James Clark @ 2026-02-17 13:58 UTC (permalink / raw)
To: Will Deacon
Cc: Marc Zyngier, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, Leo Yan, Suzuki K Poulose, Fuad Tabba
On 17/02/2026 12:26 pm, Will Deacon wrote:
> On Tue, Feb 17, 2026 at 12:20:14PM +0000, James Clark wrote:
>> On 16/02/2026 5:32 pm, Will Deacon wrote:
>>> On Mon, Feb 16, 2026 at 02:29:31PM +0000, Marc Zyngier wrote:
>>>> And even then, I'm tempted to simply get rid of any sort of
>>>> guest-only tracing, given that TRBE is not capable of representing
>>>> exceptions that are synthesised by the host, making it the resulting
>>>> traces useless.
>>>
>>> I think that effectively means reverting the series merged from here:
>>>
>>> https://lore.kernel.org/all/20250106142446.628923-1-james.clark@linaro.org/
>>>
>>> but then we still need to clear TRBLIMITR_EL1.E.
>>>
>>
>> Removing that series would actually have the effect of turning guest trace
>> on in nVHE for non-TRBE sinks. The reason for implementing the filtering was
>> to turn guest trace off because a user didn't want to see it.
>
> What I meant was, revert that series and then also ensure that both TRFCR
> and TRBLIMITR are always zero while running in the guest. Is that not
> sufficient?
>
> Will
Yes that would work. Although if someone is currently tracing guests
from the host then obviously that will break that. I'm still not
convinced that tracing guests from the host is broken enough to be
completely useless. But there is a chance nobody is doing it, or waiting
for them to scream might help in understanding their use case.
It doesn't seem like the combination of the TRFCR filtering stuff and
zeroing TRBLIMITR is too messy either though. I suppose you could also
skip the TRFCR switch if you've disabled TRBE if you wanted to save a
register write, but that makes it slightly more complicated.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 18:14 ` Will Deacon
@ 2026-02-17 14:19 ` Leo Yan
2026-02-17 14:52 ` Will Deacon
0 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2026-02-17 14:19 UTC (permalink / raw)
To: Will Deacon
Cc: James Clark, Marc Zyngier, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, Suzuki K Poulose, Fuad Tabba
On Mon, Feb 16, 2026 at 06:14:11PM +0000, Will Deacon wrote:
[...]
> > The TRBE driver might do an extra drain here as a workaround. Hard to tell
> > if it's actually required in this case (seems like probably not) but it
> > might be worth doing it anyway to avoid hitting the issue. Especially if we
> > add guest support later where some of the affected registers might start
> > being used. See:
> >
> > if (trbe_needs_drain_after_disable(cpudata))
> > trbe_drain_buffer();
>
> Oh great, this thing sucks even more than I realised!
>
> But thanks for pointing that out... this is presumably erratum #2064142,
> but we probably need to look at #2038923 as well :/
>
> I can't find any public documentation for the problems, but based on the
> kconfig text then I think we care about #2064142 so that the TRBE
> register writes when restoring the host context are effective and we
> care about #2038923 to avoid corrupting trace when re-enabling for the
> host.
Seems to me, this is correct.
> It also looks like we can't rely on the dsb(nsh) in the vcpu_run()
> path if that needs to be before the write to TRBLIMITR_EL1.
>
> In which case, the host->guest something hideous like:
>
> isb();
> tsb_csync(); // Executes twice if ARM64_WORKAROUND_TSB_FLUSH_FAILURE!
> dsb(nsh); // I missed this in my patch
> write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> if (2064142) {
> tsb_csync();
> dsb(nsh);
> }
> isb();
As I_QXJZX suggests, the section K10.5.10 "Context switching" gives
the flow. I'd suggest the VM context switch is also aligned to the
description in S_VKHHY.
When switching from host to guest, we need to clear TRCPRGCTLR.EN to
zero. As the doc states "ETE trace compression logic is stateful,
and disabling the ETE resets this compression state".
> and then the guest->host part is:
>
> write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
> isb();
> if (2038923)
> isb();
>
> Does that look right to you?
S_PKLXF gives the flow for switching in.
Thanks,
Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-17 14:19 ` Leo Yan
@ 2026-02-17 14:52 ` Will Deacon
2026-02-17 19:01 ` Leo Yan
0 siblings, 1 reply; 31+ messages in thread
From: Will Deacon @ 2026-02-17 14:52 UTC (permalink / raw)
To: Leo Yan
Cc: James Clark, Marc Zyngier, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, Suzuki K Poulose, Fuad Tabba
On Tue, Feb 17, 2026 at 02:19:17PM +0000, Leo Yan wrote:
> On Mon, Feb 16, 2026 at 06:14:11PM +0000, Will Deacon wrote:
>
> [...]
>
> > > The TRBE driver might do an extra drain here as a workaround. Hard to tell
> > > if it's actually required in this case (seems like probably not) but it
> > > might be worth doing it anyway to avoid hitting the issue. Especially if we
> > > add guest support later where some of the affected registers might start
> > > being used. See:
> > >
> > > if (trbe_needs_drain_after_disable(cpudata))
> > > trbe_drain_buffer();
> >
> > Oh great, this thing sucks even more than I realised!
> >
> > But thanks for pointing that out... this is presumably erratum #2064142,
> > but we probably need to look at #2038923 as well :/
> >
> > I can't find any public documentation for the problems, but based on the
> > kconfig text then I think we care about #2064142 so that the TRBE
> > register writes when restoring the host context are effective and we
> > care about #2038923 to avoid corrupting trace when re-enabling for the
> > host.
>
> Seems to me, this is correct.
>
> > It also looks like we can't rely on the dsb(nsh) in the vcpu_run()
> > path if that needs to be before the write to TRBLIMITR_EL1.
> >
> > In which case, the host->guest something hideous like:
> >
> > isb();
> > tsb_csync(); // Executes twice if ARM64_WORKAROUND_TSB_FLUSH_FAILURE!
> > dsb(nsh); // I missed this in my patch
> > write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> > if (2064142) {
> > tsb_csync();
> > dsb(nsh);
> > }
> > isb();
>
> As I_QXJZX suggests, the section K10.5.10 "Context switching" gives
> the flow. I'd suggest the VM context switch is also aligned to the
> description in S_VKHHY.
I honestly have a hard time believing the sequence in S_VKHHY as the DSB
seems to be in the wrong place which means the TSB CSYNC can float. It
also isn't aligned with what the EL1 driver does...
> When switching from host to guest, we need to clear TRCPRGCTLR.EN to
> zero. As the doc states "ETE trace compression logic is stateful,
> and disabling the ETE resets this compression state".
>
> > and then the guest->host part is:
> >
> > write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
> > isb();
> > if (2038923)
> > isb();
> >
> > Does that look right to you?
>
> S_PKLXF gives the flow for switching in.
Well, modulo errata, sure. I don't have access to the errata document so
I was more interested in whether I got that right...
Will
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-17 14:52 ` Will Deacon
@ 2026-02-17 19:01 ` Leo Yan
2026-02-19 13:54 ` Will Deacon
0 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2026-02-17 19:01 UTC (permalink / raw)
To: Will Deacon
Cc: James Clark, Marc Zyngier, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, Suzuki K Poulose, Fuad Tabba
On Tue, Feb 17, 2026 at 02:52:32PM +0000, Will Deacon wrote:
[...]
> > > It also looks like we can't rely on the dsb(nsh) in the vcpu_run()
> > > path if that needs to be before the write to TRBLIMITR_EL1.
> > >
> > > In which case, the host->guest something hideous like:
> > >
> > > isb();
> > > tsb_csync(); // Executes twice if ARM64_WORKAROUND_TSB_FLUSH_FAILURE!
> > > dsb(nsh); // I missed this in my patch
> > > write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> > > if (2064142) {
> > > tsb_csync();
> > > dsb(nsh);
> > > }
> > > isb();
> >
> > As I_QXJZX suggests, the section K10.5.10 "Context switching" gives
> > the flow. I'd suggest the VM context switch is also aligned to the
> > description in S_VKHHY.
>
> I honestly have a hard time believing the sequence in S_VKHHY as the DSB
> seems to be in the wrong place which means the TSB CSYNC can float. It
> also isn't aligned with what the EL1 driver does...
Sorry for confusion. I am checking internally for the flow suggested
in S_VKHHY.
> > When switching from host to guest, we need to clear TRCPRGCTLR.EN to
> > zero. As the doc states "ETE trace compression logic is stateful,
> > and disabling the ETE resets this compression state".
> >
> > > and then the guest->host part is:
> > >
> > > write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
> > > isb();
> > > if (2038923)
> > > isb();
> > >
> > > Does that look right to you?
> >
> > S_PKLXF gives the flow for switching in.
>
> Well, modulo errata, sure. I don't have access to the errata document so
> I was more interested in whether I got that right...
Please see the doc:
https://developer.arm.com/documentation/SDEN-1873351/1900/?lang=en
Thanks,
Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-17 19:01 ` Leo Yan
@ 2026-02-19 13:54 ` Will Deacon
2026-02-19 18:58 ` Leo Yan
0 siblings, 1 reply; 31+ messages in thread
From: Will Deacon @ 2026-02-19 13:54 UTC (permalink / raw)
To: Leo Yan
Cc: James Clark, Marc Zyngier, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, Suzuki K Poulose, Fuad Tabba
On Tue, Feb 17, 2026 at 07:01:21PM +0000, Leo Yan wrote:
> On Tue, Feb 17, 2026 at 02:52:32PM +0000, Will Deacon wrote:
>
> [...]
>
> > > > It also looks like we can't rely on the dsb(nsh) in the vcpu_run()
> > > > path if that needs to be before the write to TRBLIMITR_EL1.
> > > >
> > > > In which case, the host->guest something hideous like:
> > > >
> > > > isb();
> > > > tsb_csync(); // Executes twice if ARM64_WORKAROUND_TSB_FLUSH_FAILURE!
> > > > dsb(nsh); // I missed this in my patch
> > > > write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> > > > if (2064142) {
> > > > tsb_csync();
> > > > dsb(nsh);
> > > > }
> > > > isb();
> > >
> > > As I_QXJZX suggests, the section K10.5.10 "Context switching" gives
> > > the flow. I'd suggest the VM context switch is also aligned to the
> > > description in S_VKHHY.
> >
> > I honestly have a hard time believing the sequence in S_VKHHY as the DSB
> > seems to be in the wrong place which means the TSB CSYNC can float. It
> > also isn't aligned with what the EL1 driver does...
>
> Sorry for confusion. I am checking internally for the flow suggested
> in S_VKHHY.
>
> > > When switching from host to guest, we need to clear TRCPRGCTLR.EN to
> > > zero. As the doc states "ETE trace compression logic is stateful,
> > > and disabling the ETE resets this compression state".
> > >
> > > > and then the guest->host part is:
> > > >
> > > > write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
> > > > isb();
> > > > if (2038923)
> > > > isb();
> > > >
> > > > Does that look right to you?
> > >
> > > S_PKLXF gives the flow for switching in.
> >
> > Well, modulo errata, sure. I don't have access to the errata document so
> > I was more interested in whether I got that right...
>
> Please see the doc:
> https://developer.arm.com/documentation/SDEN-1873351/1900/?lang=en
Aha, thank you, Leo!
I swear you used to be able to google the erratum number and get the doc,
but that doesn't seem to be the case any more. In fact, if you type the
erratum number into the search box on developer.arm.com it doesn't even
work, so cheers for pointing me to the right place.
Will
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-19 13:54 ` Will Deacon
@ 2026-02-19 18:58 ` Leo Yan
2026-02-19 19:06 ` Leo Yan
2026-02-25 12:09 ` Leo Yan
0 siblings, 2 replies; 31+ messages in thread
From: Leo Yan @ 2026-02-19 18:58 UTC (permalink / raw)
To: Will Deacon
Cc: James Clark, Marc Zyngier, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, Suzuki K Poulose, Fuad Tabba
On Thu, Feb 19, 2026 at 01:54:19PM +0000, Will Deacon wrote:
[...]
> > > > > In which case, the host->guest something hideous like:
> > > > >
> > > > > isb();
> > > > > tsb_csync(); // Executes twice if ARM64_WORKAROUND_TSB_FLUSH_FAILURE!
> > > > > dsb(nsh); // I missed this in my patch
> > > > > write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> > > > > if (2064142) {
> > > > > tsb_csync();
> > > > > dsb(nsh);
> > > > > }
> > > > > isb();
> > > >
> > > > As I_QXJZX suggests, the section K10.5.10 "Context switching" gives
> > > > the flow. I'd suggest the VM context switch is also aligned to the
> > > > description in S_VKHHY.
> > >
> > > I honestly have a hard time believing the sequence in S_VKHHY as the DSB
> > > seems to be in the wrong place which means the TSB CSYNC can float. It
> > > also isn't aligned with what the EL1 driver does...
> >
> > Sorry for confusion. I am checking internally for the flow suggested
> > in S_VKHHY.
After internal review, we conclude that S_VKHHY is valid. The intent of
S_VKHHY is to use the minimal number of barriers during a context switch.
Given the mentioned "DSB seems to be in the wrong place which means the
TSB CSYNC can float", this would be fine due to the two considerations:
1) As described in B2.6.8:
| The following situations are synchronized using a TSB operation:
|
| * A direct write B to a System register is ordered after an indirect
| read or indirect write of the same register by atrace operation of
| a traced instruction A, if all of the following are true:
| - A is executed in program order before a Context synchronization event C.
| - C appears in program order before a TSB operation T.
| - B is executed in program order after T.
If trace operations indirectly read or write system registers, the TSB
ensures that these indirect accesses are complete before any direct
writes to the same register are performed in program order after the TSB.
So in S_VKHHY, we don't expect clearing TRCPRGCTLR.EN (step 4) and
clearing TRBLIMITR_EL1.E (step 5) to take effect prior to the TSB.
2) A DSB executed after the TSB ensures the data writes are complete,
then it is safe to read trace data from memory. However, in the context
switch case, we don't need to read trace data, so a DSB for "publishing"
data is not required.
Based on these conclusions, let me summarize the flow:
// Prohibit trace
TRFCR_EL1 = 0;
// No new program-flow trace
isb();
// Trace operation and trace unit are flushed
tsb_csync(); // Executes twice if ARM64_WORKAROUND_TSB_FLUSH_FAILURE!
// Disable trace unit
TRCPRGCTLR.EN = 0b0
// Disable trace buffer unit
TRBLIMITR_EL1.E = 0b0
if (2064142) {
tsb_csync();
dsb(nsh);
}
// Ensure trace disable takes effect and indirect writes are visible;
// Ensure 2064142 is done before affected sysreg write.
isb();
> > > > When switching from host to guest, we need to clear TRCPRGCTLR.EN to
> > > > zero. As the doc states "ETE trace compression logic is stateful,
> > > > and disabling the ETE resets this compression state".
> > > >
> > > > > and then the guest->host part is:
> > > > >
> > > > > write_sysreg_s(trblimitr_el1, SYS_TRBLIMITR_EL1);
> > > > > isb();
> > > > > if (2038923)
> > > > > isb();
> > > > >
> > > > > Does that look right to you?
> > > >
> > > > S_PKLXF gives the flow for switching in.
> > >
> > > Well, modulo errata, sure. I don't have access to the errata document so
> > > I was more interested in whether I got that right...
> >
> > Please see the doc:
> > https://developer.arm.com/documentation/SDEN-1873351/1900/?lang=en
>
> Aha, thank you, Leo!
You are welcome!
> I swear you used to be able to google the erratum number and get the doc,
> but that doesn't seem to be the case any more. In fact, if you type the
> erratum number into the search box on developer.arm.com it doesn't even
> work, so cheers for pointing me to the right place.
I also cannot search the doc via google, seems this is a known issue for
developer.arm.com when I checked this internally. Thanks a lot for
reporting it.
The flow for guest->host might be easy one, anyway, I try to summary for
review:
Restore TRFCR_EL1;
// Ensure restored sysreg is visible
isb();
// Enable trace buffer
TRBLIMITR_EL1.E = 0b1
// Enable trace unit
TRCPRGCTLR.EN = 0b1
if (2038923)
isb();
// eret works as an extra context synchronization
Thanks,
Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-19 18:58 ` Leo Yan
@ 2026-02-19 19:06 ` Leo Yan
2026-02-25 12:09 ` Leo Yan
1 sibling, 0 replies; 31+ messages in thread
From: Leo Yan @ 2026-02-19 19:06 UTC (permalink / raw)
To: Will Deacon
Cc: James Clark, Marc Zyngier, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, Suzuki K Poulose, Fuad Tabba
On Thu, Feb 19, 2026 at 06:58:03PM +0000, Leo Yan wrote:
[...]
> The flow for guest->host might be easy one, anyway, I try to summary for
> review:
>
> Restore TRFCR_EL1;
>
> // Ensure restored sysreg is visible
> isb();
>
> // Enable trace buffer
> TRBLIMITR_EL1.E = 0b1
>
> // Enable trace unit
> TRCPRGCTLR.EN = 0b1
>
> if (2038923)
> isb();
To strictly follow the SDEN, this ISB should be moved to immediately
after enabling the TRBLIMITR_EL1.E bit and before enabling the
TRCPRGCTLR.EN bit.
Sorry for spamming.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 16:49 ` Marc Zyngier
@ 2026-02-20 11:42 ` James Clark
2026-02-24 11:19 ` Marc Zyngier
2026-02-20 15:48 ` Leo Yan
1 sibling, 1 reply; 31+ messages in thread
From: James Clark @ 2026-02-20 11:42 UTC (permalink / raw)
To: Marc Zyngier
Cc: Will Deacon, kvmarm, mark.rutland, linux-arm-kernel, Oliver Upton,
Leo Yan, Suzuki K Poulose, Fuad Tabba
On 16/02/2026 4:49 pm, Marc Zyngier wrote:
> On Mon, 16 Feb 2026 16:10:14 +0000,
> James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 16/02/2026 3:51 pm, Marc Zyngier wrote:
>>> On Mon, 16 Feb 2026 15:05:10 +0000,
>>> James Clark <james.clark@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 16/02/2026 2:29 pm, Marc Zyngier wrote:
>>>>> On Mon, 16 Feb 2026 13:09:59 +0000,
>>>>> Will Deacon <will@kernel.org> wrote:
>>>>>>
>>>>>> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
>>>>>> generation in guest context when self-hosted TRBE is in use by the host.
>>>>>>
>>>>>> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
>>>>>> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
>>>>>> per R_YCHKJ the Trace Buffer Unit will still be enabled if
>>>>>> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
>>>>>> Trace Buffer Unit can perform address translation for the "owning
>>>>>> exception level" even when it is out of context.
>>>>>
>>>>> Great. So TRBE violates all the principles that we hold true in the
>>>>> architecture. Does SPE suffer from the same level of brokenness?
>>>>>
>>>>>> Consequently, we can end up in a state where TRBE performs speculative
>>>>>> page-table walks for a host VA/IPA in guest/hypervisor context depending
>>>>>> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
>>>>>> result appears to be a heady mixture of data corruption and hardware
>>>>>> lockups.
>>>>>>
>>>>>> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
>>>>>> draining the buffer, restoring the register on return to the host.
>>>>>>
>>>>>> Cc: Marc Zyngier <maz@kernel.org>
>>>>>> Cc: Oliver Upton <oupton@kernel.org>
>>>>>> Cc: James Clark <james.clark@linaro.org>
>>>>>> Cc: Leo Yan <leo.yan@arm.com>
>>>>>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>>>> Cc: Fuad Tabba <tabba@google.com>
>>>>>> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
>>>>>> Signed-off-by: Will Deacon <will@kernel.org>
>>>>>> ---
>>>>>>
>>>>>> NOTE: This is *untested* as I don't have a TRBE-capable device that can
>>>>>> run upstream but I noticed this by inspection when triaging occasional
>>>>>> hardware lockups on systems using a 6.12-based kernel with TRBE running
>>>>>> at the same time as a vCPU is loaded. This code has changed quite a bit
>>>>>> over time, so stable backports are not entirely straightforward.
>>>>>> Hopefully James/Leo/Suzuki can help us test if folks agree with the
>>>>>> general approach taken here.
>>>>>>
>>>>>> arch/arm64/include/asm/kvm_host.h | 1 +
>>>>>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
>>>>>> 2 files changed, 28 insertions(+), 9 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>>>> index ac7f970c7883..a932cf043b83 100644
>>>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>>>> @@ -746,6 +746,7 @@ struct kvm_host_data {
>>>>>> u64 pmscr_el1;
>>>>>> /* Self-hosted trace */
>>>>>> u64 trfcr_el1;
>>>>>> + u64 trblimitr_el1;
>>>>>> /* Values of trap registers for the host before guest entry. */
>>>>>> u64 mdcr_el2;
>>>>>> u64 brbcr_el1;
>>>>>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>>>>> index 2a1c0f49792b..fd389a26bc59 100644
>>>>>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>>>>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>>>>> @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
>>>>>> write_sysreg_el1(new_trfcr, SYS_TRFCR);
>>>>>> }
>>>>>> -static bool __trace_needs_drain(void)
>>>>>> +static void __trace_drain_and_disable(void)
>>>>>> {
>>>>>> - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
>>>>>> - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
>>>>>> + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
>>>>>> - return host_data_test_flag(TRBE_ENABLED);
>>>>>> + *trblimitr_el1 = 0;
>>>>>> +
>>>>>> + if (is_protected_kvm_enabled()) {
>>>>>> + if (!host_data_test_flag(HAS_TRBE))
>>>>>> + return;
>>>>>> + } else {
>>>>>> + if (!host_data_test_flag(TRBE_ENABLED))
>>>>>> + return;
>>>>>> + }
>>>>>> +
>>>>>> + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
>>>>>> + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
>>>>>> + isb();
>>>>>> + tsb_csync();
>>>>>> + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
>>>>>> + isb();
>>>>
>>>> The TRBE driver might do an extra drain here as a workaround. Hard to
>>>> tell if it's actually required in this case (seems like probably not)
>>>> but it might be worth doing it anyway to avoid hitting the
>>>> issue. Especially if we add guest support later where some of the
>>>> affected registers might start being used.
>>>
>>> Just to set the expectations: guest TRBE support is not happening
>>> until the architecture is fixed. It cannot reliably give a trace that
>>> includes emulated exceptions, and until then, no TRBE for you.
>>>
>>>> See:
>>>>
>>>> if (trbe_needs_drain_after_disable(cpudata))
>>>> trbe_drain_buffer();
>>>>
>>>>
>>>>>> + }
>>>>>
>>>>> Doesn't this mean we should be able to get rid of most of the TRFCR
>>>>> messing about that litters the entry/exit code and leave that to VHE
>>>>
>>>> Technically you could have ETMs that and are connected to sinks other
>>>> than TRBE. Unless you somehow switch off those sinks you still need to
>>>> do the TRFCR switching stuff.
>>>>
>>>>> only? And even then, I'm tempted to simply get rid of any sort of
>>>>> guest-only tracing, given that TRBE is not capable of representing
>>>>> exceptions that are synthesised by the host, making it the resulting
>>>>> traces useless.
>>>>
>>>> I haven't heard of anyone tracing a guest from the host, but until we
>>>> add support for guests to be able to trace themselves it's the only
>>>> way of doing it, so it could be useful.
>>>
>>> But that's *not* working. If you trace EL1 only, even with a VHE host,
>>> the result is not usable.
>>>
>>
>> Do you mean not working because of the missing exceptions? I did a bit
>> of testing before and the trace did seem somewhat usable to me. It had
>> EL1 and EL0 atoms in there.
>
> Sure. Now try to look at what that means for NV, where all the
> EL1->EL2 exceptions are emulated, where all the EL2->EL1 exception
> returns are emulated.
>
> What does it give you? A bag of nonsense.
>
> Same thing for EL2->EL0, by the way, so you can't even correctly
> profile an EL0 program that performs a syscall, or that gets
> interrupted. And while without NV, these exceptions are rare, having a
> trace that is unreliable has the potential of being worse than no
> trace at all.
If there are issues with NV perhaps we can skip it for the initial trace
virtualisation implementation? I'm not familiar with it but isn't NV
still an experimental feature anyway? I can't imagine actual users who
want to do tracing in guests would accept that they can't do tracing on
a non-NV guest because there is something that doesn't work in NV.
Also do you have an example of these exceptions that you mean without NV
so I can have a look? I have a hack that allows basic use of ETE/TRBE in
VHE mode and did some recordings of syscalls and they end up looking ok
in the decoded trace:
$ perf record -e cs_etm/timestamp=0/u -C 0 perf bench syscall basic
$ perf script
Results in:
bench_syscall_common+0xb4 => aaaaaffcafe0 getppid@plt+0x0
getppid@plt+0xc => ffffa78980c0 getppid+0x0 (libc.so.6)
getppid+0x8 (libc.so.6) => 0 [unknown] ([unknown])
[unknown] ([unknown]) => ffffa78980cc getppid+0xc (libc.so.6)
getppid+0xc (libc.so.6) => aaaab0076564 bench_syscall_common+0xb8
bench_syscall_common+0xb8 => aaaab00765d8 bench_syscall_common+0x12c
Which shows jumping from the bench function to getppid(), then doing the
syscall into the kernel which is "0 [unknown]" because I recorded with
/u. Then back to the bench loop again.
>
> Until the architecture grows a way for KVM to inject the missing
> information into the trace, TRBE support for guest will stay out.
>
>> All you need is the mmap records from the
>> guest which you can get by running Perf in the guest and it's possible
>> to decode it. Maybe it's not complete but I don't think all use cases
>> require complete trace. AutoFDO for example just needs lots of small
>> snippets of execution history.
>
> I don't think it is OK to feed an FDO with traces that are known to be
> incomplete. Maybe that goes under the radar today, but my crystal ball
> is telling me things could be very different in the future, and I'm
> not going to take any bet.
The preset we added for AutoFDO
(drivers/hwtracing/coresight/coresight-cfg-afdo.c) specifically turns
tracing on and off to give small incomplete snippets distributed across
the whole process but while reducing the total amount of trace. I think
that is one way to do AutoFDO and the compiler can handle it. Anyway,
AutoFDO is just one use case for trace, and an example that incomplete
is better than nothing.
In addition to that, the way the ETR and TRBE buffers are currently used
they're pretty bad at actually recording everything without gaps.
Although in theory with TRBE it's possible record everything without
dropping anything, it's still something Leo is working on.
>
> Thanks,
>
> M.
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-16 16:49 ` Marc Zyngier
2026-02-20 11:42 ` James Clark
@ 2026-02-20 15:48 ` Leo Yan
2026-02-24 11:22 ` Marc Zyngier
1 sibling, 1 reply; 31+ messages in thread
From: Leo Yan @ 2026-02-20 15:48 UTC (permalink / raw)
To: Marc Zyngier
Cc: James Clark, Will Deacon, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, Suzuki K Poulose, Fuad Tabba
On Mon, Feb 16, 2026 at 04:49:02PM +0000, Marc Zyngier wrote:
[...]
> > > But that's *not* working. If you trace EL1 only, even with a VHE host,
> > > the result is not usable.
> > >
> >
> > Do you mean not working because of the missing exceptions? I did a bit
> > of testing before and the trace did seem somewhat usable to me. It had
> > EL1 and EL0 atoms in there.
>
> Sure. Now try to look at what that means for NV, where all the
> EL1->EL2 exceptions are emulated, where all the EL2->EL1 exception
> returns are emulated.
>
> What does it give you? A bag of nonsense.
Sorry for jumping in.
If we enable TRBE in a VM, whether nested or not, why is it necessary to
capture trace data for the exception transition between the VM and its
higher level host(s)?
Seems to me, regardless of what happens during exception emulation, once
the VM is switched out, tracing will be stopped, and then re-enabled
when the VM is switched back. In that case, we should be able to record
the complete trace data for whatever occurs while the VM is running.
On the other hand, when launch a trace within a VM, I think we should
not trace higher level's hypervisor or hosts, this is concerned for
security leakage.
> Same thing for EL2->EL0, by the way, so you can't even correctly
> profile an EL0 program that performs a syscall, or that gets
> interrupted. And while without NV, these exceptions are rare, having a
> trace that is unreliable has the potential of being worse than no
> trace at all.
>
> Until the architecture grows a way for KVM to inject the missing
> information into the trace, TRBE support for guest will stay out.
I agree we need to understand what is actually blocking issues for TRBE
virtualization.
Essentially, I'd confirm the methodology for trace virtualization. I
assume it allows a higher privilege OS to trace a lower privilege OS,
but not the other way around.
Thanks,
Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-20 11:42 ` James Clark
@ 2026-02-24 11:19 ` Marc Zyngier
0 siblings, 0 replies; 31+ messages in thread
From: Marc Zyngier @ 2026-02-24 11:19 UTC (permalink / raw)
To: James Clark
Cc: Will Deacon, kvmarm, mark.rutland, linux-arm-kernel, Oliver Upton,
Leo Yan, Suzuki K Poulose, Fuad Tabba
On Fri, 20 Feb 2026 11:42:11 +0000,
James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 16/02/2026 4:49 pm, Marc Zyngier wrote:
> > On Mon, 16 Feb 2026 16:10:14 +0000,
> > James Clark <james.clark@linaro.org> wrote:
> >>
> >>
> >>
> >> On 16/02/2026 3:51 pm, Marc Zyngier wrote:
> >>> On Mon, 16 Feb 2026 15:05:10 +0000,
> >>> James Clark <james.clark@linaro.org> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 16/02/2026 2:29 pm, Marc Zyngier wrote:
> >>>>> On Mon, 16 Feb 2026 13:09:59 +0000,
> >>>>> Will Deacon <will@kernel.org> wrote:
> >>>>>>
> >>>>>> The nVHE world-switch code relies on zeroing TRFCR_EL1 to disable trace
> >>>>>> generation in guest context when self-hosted TRBE is in use by the host.
> >>>>>>
> >>>>>> Per D3.2.1 ("Controls to prohibit trace at Exception levels"), clearing
> >>>>>> TRFCR_EL1 means that trace generation is prohibited at EL1 and EL0 but
> >>>>>> per R_YCHKJ the Trace Buffer Unit will still be enabled if
> >>>>>> TRBLIMITR_EL1.E is set. R_SJFRQ goes on to state that, when enabled, the
> >>>>>> Trace Buffer Unit can perform address translation for the "owning
> >>>>>> exception level" even when it is out of context.
> >>>>>
> >>>>> Great. So TRBE violates all the principles that we hold true in the
> >>>>> architecture. Does SPE suffer from the same level of brokenness?
> >>>>>
> >>>>>> Consequently, we can end up in a state where TRBE performs speculative
> >>>>>> page-table walks for a host VA/IPA in guest/hypervisor context depending
> >>>>>> on the value of MDCR_EL2.E2TB, which changes over world-switch. The
> >>>>>> result appears to be a heady mixture of data corruption and hardware
> >>>>>> lockups.
> >>>>>>
> >>>>>> Extend the TRBE world-switch code to clear TRBLIMITR_EL1.E after
> >>>>>> draining the buffer, restoring the register on return to the host.
> >>>>>>
> >>>>>> Cc: Marc Zyngier <maz@kernel.org>
> >>>>>> Cc: Oliver Upton <oupton@kernel.org>
> >>>>>> Cc: James Clark <james.clark@linaro.org>
> >>>>>> Cc: Leo Yan <leo.yan@arm.com>
> >>>>>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>>>>> Cc: Fuad Tabba <tabba@google.com>
> >>>>>> Fixes: a1319260bf62 ("arm64: KVM: Enable access to TRBE support for host")
> >>>>>> Signed-off-by: Will Deacon <will@kernel.org>
> >>>>>> ---
> >>>>>>
> >>>>>> NOTE: This is *untested* as I don't have a TRBE-capable device that can
> >>>>>> run upstream but I noticed this by inspection when triaging occasional
> >>>>>> hardware lockups on systems using a 6.12-based kernel with TRBE running
> >>>>>> at the same time as a vCPU is loaded. This code has changed quite a bit
> >>>>>> over time, so stable backports are not entirely straightforward.
> >>>>>> Hopefully James/Leo/Suzuki can help us test if folks agree with the
> >>>>>> general approach taken here.
> >>>>>>
> >>>>>> arch/arm64/include/asm/kvm_host.h | 1 +
> >>>>>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 36 ++++++++++++++++++++++--------
> >>>>>> 2 files changed, 28 insertions(+), 9 deletions(-)
> >>>>>>
> >>>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >>>>>> index ac7f970c7883..a932cf043b83 100644
> >>>>>> --- a/arch/arm64/include/asm/kvm_host.h
> >>>>>> +++ b/arch/arm64/include/asm/kvm_host.h
> >>>>>> @@ -746,6 +746,7 @@ struct kvm_host_data {
> >>>>>> u64 pmscr_el1;
> >>>>>> /* Self-hosted trace */
> >>>>>> u64 trfcr_el1;
> >>>>>> + u64 trblimitr_el1;
> >>>>>> /* Values of trap registers for the host before guest entry. */
> >>>>>> u64 mdcr_el2;
> >>>>>> u64 brbcr_el1;
> >>>>>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> >>>>>> index 2a1c0f49792b..fd389a26bc59 100644
> >>>>>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> >>>>>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
> >>>>>> @@ -57,12 +57,27 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
> >>>>>> write_sysreg_el1(new_trfcr, SYS_TRFCR);
> >>>>>> }
> >>>>>> -static bool __trace_needs_drain(void)
> >>>>>> +static void __trace_drain_and_disable(void)
> >>>>>> {
> >>>>>> - if (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRBE))
> >>>>>> - return read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E;
> >>>>>> + u64 *trblimitr_el1 = host_data_ptr(host_debug_state.trblimitr_el1);
> >>>>>> - return host_data_test_flag(TRBE_ENABLED);
> >>>>>> + *trblimitr_el1 = 0;
> >>>>>> +
> >>>>>> + if (is_protected_kvm_enabled()) {
> >>>>>> + if (!host_data_test_flag(HAS_TRBE))
> >>>>>> + return;
> >>>>>> + } else {
> >>>>>> + if (!host_data_test_flag(TRBE_ENABLED))
> >>>>>> + return;
> >>>>>> + }
> >>>>>> +
> >>>>>> + *trblimitr_el1 = read_sysreg_s(SYS_TRBLIMITR_EL1);
> >>>>>> + if (*trblimitr_el1 & TRBLIMITR_EL1_E) {
> >>>>>> + isb();
> >>>>>> + tsb_csync();
> >>>>>> + write_sysreg_s(0, SYS_TRBLIMITR_EL1);
> >>>>>> + isb();
> >>>>
> >>>> The TRBE driver might do an extra drain here as a workaround. Hard to
> >>>> tell if it's actually required in this case (seems like probably not)
> >>>> but it might be worth doing it anyway to avoid hitting the
> >>>> issue. Especially if we add guest support later where some of the
> >>>> affected registers might start being used.
> >>>
> >>> Just to set the expectations: guest TRBE support is not happening
> >>> until the architecture is fixed. It cannot reliably give a trace that
> >>> includes emulated exceptions, and until then, no TRBE for you.
> >>>
> >>>> See:
> >>>>
> >>>> if (trbe_needs_drain_after_disable(cpudata))
> >>>> trbe_drain_buffer();
> >>>>
> >>>>
> >>>>>> + }
> >>>>>
> >>>>> Doesn't this mean we should be able to get rid of most of the TRFCR
> >>>>> messing about that litters the entry/exit code and leave that to VHE
> >>>>
> >>>> Technically you could have ETMs that and are connected to sinks other
> >>>> than TRBE. Unless you somehow switch off those sinks you still need to
> >>>> do the TRFCR switching stuff.
> >>>>
> >>>>> only? And even then, I'm tempted to simply get rid of any sort of
> >>>>> guest-only tracing, given that TRBE is not capable of representing
> >>>>> exceptions that are synthesised by the host, making it the resulting
> >>>>> traces useless.
> >>>>
> >>>> I haven't heard of anyone tracing a guest from the host, but until we
> >>>> add support for guests to be able to trace themselves it's the only
> >>>> way of doing it, so it could be useful.
> >>>
> >>> But that's *not* working. If you trace EL1 only, even with a VHE host,
> >>> the result is not usable.
> >>>
> >>
> >> Do you mean not working because of the missing exceptions? I did a bit
> >> of testing before and the trace did seem somewhat usable to me. It had
> >> EL1 and EL0 atoms in there.
> >
> > Sure. Now try to look at what that means for NV, where all the
> > EL1->EL2 exceptions are emulated, where all the EL2->EL1 exception
> > returns are emulated.
> >
> > What does it give you? A bag of nonsense.
> >
> > Same thing for EL2->EL0, by the way, so you can't even correctly
> > profile an EL0 program that performs a syscall, or that gets
> > interrupted. And while without NV, these exceptions are rare, having a
> > trace that is unreliable has the potential of being worse than no
> > trace at all.
>
> If there are issues with NV perhaps we can skip it for the initial
> trace virtualisation implementation?
No. This is broken for *any* hypervisor-generated exception.
> I'm not familiar with it but isn't NV still an experimental feature
> anyway?
Let me give you a clue: if I have to choose between TRBE and NV, it's
not TRBE I'm going to pick.
> I can't imagine actual users who want to do tracing in guests would
> accept that they can't do tracing on a non-NV guest because there is
> something that doesn't work in NV.
But that's the thing: they are not getting a trace. They are getting
nonsense.
> Also do you have an example of these exceptions that you mean without
> NV so I can have a look?
Anything that ends up in arch/arm64/kvm/hyp/exception.c, where the
exception is emulated by changing PC.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-20 15:48 ` Leo Yan
@ 2026-02-24 11:22 ` Marc Zyngier
0 siblings, 0 replies; 31+ messages in thread
From: Marc Zyngier @ 2026-02-24 11:22 UTC (permalink / raw)
To: Leo Yan
Cc: James Clark, Will Deacon, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, Suzuki K Poulose, Fuad Tabba
On Fri, 20 Feb 2026 15:48:20 +0000,
Leo Yan <leo.yan@arm.com> wrote:
>
> On Mon, Feb 16, 2026 at 04:49:02PM +0000, Marc Zyngier wrote:
>
> [...]
>
> > > > But that's *not* working. If you trace EL1 only, even with a VHE host,
> > > > the result is not usable.
> > > >
> > >
> > > Do you mean not working because of the missing exceptions? I did a bit
> > > of testing before and the trace did seem somewhat usable to me. It had
> > > EL1 and EL0 atoms in there.
> >
> > Sure. Now try to look at what that means for NV, where all the
> > EL1->EL2 exceptions are emulated, where all the EL2->EL1 exception
> > returns are emulated.
> >
> > What does it give you? A bag of nonsense.
>
> Sorry for jumping in.
>
> If we enable TRBE in a VM, whether nested or not, why is it necessary to
> capture trace data for the exception transition between the VM and its
> higher level host(s)?
Because that's what the architecture guarantees. If you can't honour
what the architecture guarantees, then you don't have an
implementation.
> Seems to me, regardless of what happens during exception emulation, once
> the VM is switched out, tracing will be stopped, and then re-enabled
> when the VM is switched back. In that case, we should be able to record
> the complete trace data for whatever occurs while the VM is running.
And that's breaking the architecture when the exception return is
emulated.
Really, I'm getting tired of having to argue this.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-19 18:58 ` Leo Yan
2026-02-19 19:06 ` Leo Yan
@ 2026-02-25 12:09 ` Leo Yan
2026-02-27 18:07 ` Will Deacon
1 sibling, 1 reply; 31+ messages in thread
From: Leo Yan @ 2026-02-25 12:09 UTC (permalink / raw)
To: Will Deacon, yabinc
Cc: James Clark, Marc Zyngier, kvmarm, mark.rutland, linux-arm-kernel,
Oliver Upton, Suzuki K Poulose, Fuad Tabba
Hi Will,
[ + Yabin ]
Thanks for Suzuki's reminding, I should mention that Yabin reported
another lockup issue caused by missing CPU PM support in TRBE driver.
We have a patch series to fix the issue:
https://lore.kernel.org/linux-arm-kernel/20251119-arm_coresight_path_power_management_improvement-v5-16-f615a301ad0b@arm.com/
Besides your fix the translation regime issue, I'd also suggest applying
the CoreSight PM patch series to fix lockup caused by CPU idle.
I have a supplement for the context switch, please see the comment below.
On Thu, Feb 19, 2026 at 06:58:03PM +0000, Leo Yan wrote:
[...]
> Based on these conclusions, let me summarize the flow:
>
> // Prohibit trace
> TRFCR_EL1 = 0;
>
> // No new program-flow trace
> isb();
>
> // Trace operation and trace unit are flushed
> tsb_csync(); // Executes twice if ARM64_WORKAROUND_TSB_FLUSH_FAILURE!
>
> // Disable trace unit
> TRCPRGCTLR.EN = 0b0
We conclude that no need to disable and re-enable the trace unit
(TRCPRGCTLR.EN) during a KVM context switch.
Here are the details:
I initially proposed controlling the TRCPRGCTLR.EN bit during switch.
This would allow the trace unit to generate ASYNC packets, which I
assumed would be convenient for decoding, since the decoder can
recognize ASYNC packets rather than decoder is to be reset if any
discontinuity occurs.
After review, during a VM context switch, the trace unit can guarantee
a single continuous stream when switching back to the host. There is no
discontinuity in trace stream. Therefore, we don't need to touch
TRCPRGCTLR.EN bit to generate ASYNC packets.
I hope this is reasonable to you.
Thanks,
Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-25 12:09 ` Leo Yan
@ 2026-02-27 18:07 ` Will Deacon
2026-03-03 10:36 ` Leo Yan
0 siblings, 1 reply; 31+ messages in thread
From: Will Deacon @ 2026-02-27 18:07 UTC (permalink / raw)
To: Leo Yan
Cc: yabinc, James Clark, Marc Zyngier, kvmarm, mark.rutland,
linux-arm-kernel, Oliver Upton, Suzuki K Poulose, Fuad Tabba
On Wed, Feb 25, 2026 at 12:09:56PM +0000, Leo Yan wrote:
> [ + Yabin ]
>
> Thanks for Suzuki's reminding, I should mention that Yabin reported
> another lockup issue caused by missing CPU PM support in TRBE driver.
>
> We have a patch series to fix the issue:
> https://lore.kernel.org/linux-arm-kernel/20251119-arm_coresight_path_power_management_improvement-v5-16-f615a301ad0b@arm.com/
Two nits on that series:
1. It seems a bit weird to me for the ETE driver to manage TRFCR but for
the TRBE driver to manage the other registers
2. Are you sure you don't need to save/restore the TRBE state when
LIMITR.E is clear? Maybe the driver is fine with that, but I'm worried
that we could suspend in a half-programmed state and lose some of that
configuration.
> Besides your fix the translation regime issue, I'd also suggest applying
> the CoreSight PM patch series to fix lockup caused by CPU idle.
Yes, we definitely need something like that in the android kernel trees.
I've previously bodged a hack into the ETE PM notifiers, but if you have
backports of your series to 6.12, 6.6 and 6.1 then we should merge them
into Android. As it stands, I don't have a TRBE-capable device running
mainline.
Will
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-02-27 18:07 ` Will Deacon
@ 2026-03-03 10:36 ` Leo Yan
2026-03-03 10:47 ` Suzuki K Poulose
0 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2026-03-03 10:36 UTC (permalink / raw)
To: Will Deacon
Cc: yabinc, James Clark, Marc Zyngier, kvmarm, mark.rutland,
linux-arm-kernel, Oliver Upton, Suzuki K Poulose, Fuad Tabba
On Fri, Feb 27, 2026 at 06:07:44PM +0000, Will Deacon wrote:
> On Wed, Feb 25, 2026 at 12:09:56PM +0000, Leo Yan wrote:
> > [ + Yabin ]
> >
> > Thanks for Suzuki's reminding, I should mention that Yabin reported
> > another lockup issue caused by missing CPU PM support in TRBE driver.
> >
> > We have a patch series to fix the issue:
> > https://lore.kernel.org/linux-arm-kernel/20251119-arm_coresight_path_power_management_improvement-v5-16-f615a301ad0b@arm.com/
>
> Two nits on that series:
>
> 1. It seems a bit weird to me for the ETE driver to manage TRFCR but for
> the TRBE driver to manage the other registers
TRFCR_ELx is introduced by FEAT_TRF, which is a separate feature from
TRBE, and it can be used for other sinks (like ETR). I think this is
the main reason that it is implemented in ETE driver rather than TRBE
driver.
> 2. Are you sure you don't need to save/restore the TRBE state when
> LIMITR.E is clear? Maybe the driver is fine with that, but I'm worried
> that we could suspend in a half-programmed state and lose some of that
> configuration.
If the TRBLIMITR_EL1.E bit is cleared during the CPU is idle, then the
next time the TRBE trace buffer is re-enabled, trbe_enable_hw() must be
called to reconfigure the TRBE registers (including TRBSR_EL1).
One concern is that after a CPU power cycle, some fields in the TRBE
registers may be in the following state:
"On a cold reset, this field resets to an architecturally UNKNOWN value."
I will change to always save/restore TRBE state. Thanks for
suggestions.
> > Besides your fix the translation regime issue, I'd also suggest applying
> > the CoreSight PM patch series to fix lockup caused by CPU idle.
>
> Yes, we definitely need something like that in the android kernel trees.
> I've previously bodged a hack into the ETE PM notifiers, but if you have
> backports of your series to 6.12, 6.6 and 6.1 then we should merge them
> into Android. As it stands, I don't have a TRBE-capable device running
> mainline.
Let us first merge the series on the master :)
After that, we can consider backporting (I assume Yabin already has a
plan for this). I'd be happy to help with backporting to v6.12.
However, I cannot commit to backporting to v6.6 or v6.1 at this stage,
as many dependencies are likely to be involved.
Thanks,
Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context
2026-03-03 10:36 ` Leo Yan
@ 2026-03-03 10:47 ` Suzuki K Poulose
0 siblings, 0 replies; 31+ messages in thread
From: Suzuki K Poulose @ 2026-03-03 10:47 UTC (permalink / raw)
To: Leo Yan, Will Deacon
Cc: yabinc, James Clark, Marc Zyngier, kvmarm, mark.rutland,
linux-arm-kernel, Oliver Upton, Fuad Tabba
On 03/03/2026 10:36, Leo Yan wrote:
> On Fri, Feb 27, 2026 at 06:07:44PM +0000, Will Deacon wrote:
>> On Wed, Feb 25, 2026 at 12:09:56PM +0000, Leo Yan wrote:
>>> [ + Yabin ]
>>>
>>> Thanks for Suzuki's reminding, I should mention that Yabin reported
>>> another lockup issue caused by missing CPU PM support in TRBE driver.
>>>
>>> We have a patch series to fix the issue:
>>> https://lore.kernel.org/linux-arm-kernel/20251119-arm_coresight_path_power_management_improvement-v5-16-f615a301ad0b@arm.com/
>>
>> Two nits on that series:
>>
>> 1. It seems a bit weird to me for the ETE driver to manage TRFCR but for
>> the TRBE driver to manage the other registers
>
> TRFCR_ELx is introduced by FEAT_TRF, which is a separate feature from
> TRBE, and it can be used for other sinks (like ETR). I think this is
> the main reason that it is implemented in ETE driver rather than TRBE
> driver.
Thats correct. TRFCR is more tied to the ETE/ETM (e.g., filter traces
for various ELs depending on the event configuration and also for
changing the ETM/ETE states). That said, we use that to prohibit
trace while we do maintenance on the TRBE.
>
>> 2. Are you sure you don't need to save/restore the TRBE state when
>> LIMITR.E is clear? Maybe the driver is fine with that, but I'm worried
>> that we could suspend in a half-programmed state and lose some of that
>> configuration.
>
> If the TRBLIMITR_EL1.E bit is cleared during the CPU is idle, then the
> next time the TRBE trace buffer is re-enabled, trbe_enable_hw() must be
> called to reconfigure the TRBE registers (including TRBSR_EL1).
We don't leave the registers in a half baked state. We always program
all the TRBE registers when we enable them. But, that said, we do have
a case now with the fix for Disabling the TRBE (TRBLIMITR.E == 0) for
nVHE, while the rest of the TRBE registers are retained. The chances
of us going in to Idle, without restoring the TRBLIMITR to the host
value doesn't exist. But we could save/restore the registers to be
safe.
Suzuki
>
> One concern is that after a CPU power cycle, some fields in the TRBE
> registers may be in the following state:
>
> "On a cold reset, this field resets to an architecturally UNKNOWN value."
>
> I will change to always save/restore TRBE state. Thanks for
> suggestions.
>
>>> Besides your fix the translation regime issue, I'd also suggest applying
>>> the CoreSight PM patch series to fix lockup caused by CPU idle.
>>
>> Yes, we definitely need something like that in the android kernel trees.
>> I've previously bodged a hack into the ETE PM notifiers, but if you have
>> backports of your series to 6.12, 6.6 and 6.1 then we should merge them
>> into Android. As it stands, I don't have a TRBE-capable device running
>> mainline.
>
> Let us first merge the series on the master :)
>
> After that, we can consider backporting (I assume Yabin already has a
> plan for this). I'd be happy to help with backporting to v6.12.
> However, I cannot commit to backporting to v6.6 or v6.1 at this stage,
> as many dependencies are likely to be involved.
>
> Thanks,
> Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2026-03-03 10:49 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-16 13:09 [PATCH] KVM: arm64: Disable TRBE Trace Buffer Unit when running in guest context Will Deacon
2026-02-16 14:29 ` Marc Zyngier
2026-02-16 15:05 ` James Clark
2026-02-16 15:51 ` Marc Zyngier
2026-02-16 16:10 ` James Clark
2026-02-16 16:49 ` Marc Zyngier
2026-02-20 11:42 ` James Clark
2026-02-24 11:19 ` Marc Zyngier
2026-02-20 15:48 ` Leo Yan
2026-02-24 11:22 ` Marc Zyngier
2026-02-16 18:14 ` Will Deacon
2026-02-17 14:19 ` Leo Yan
2026-02-17 14:52 ` Will Deacon
2026-02-17 19:01 ` Leo Yan
2026-02-19 13:54 ` Will Deacon
2026-02-19 18:58 ` Leo Yan
2026-02-19 19:06 ` Leo Yan
2026-02-25 12:09 ` Leo Yan
2026-02-27 18:07 ` Will Deacon
2026-03-03 10:36 ` Leo Yan
2026-03-03 10:47 ` Suzuki K Poulose
2026-02-16 15:53 ` Alexandru Elisei
2026-02-16 17:10 ` Will Deacon
2026-02-17 12:13 ` Will Deacon
2026-02-16 17:32 ` Will Deacon
2026-02-17 12:20 ` James Clark
2026-02-17 12:26 ` Will Deacon
2026-02-17 13:58 ` James Clark
2026-02-16 15:13 ` James Clark
2026-02-16 17:05 ` Will Deacon
2026-02-17 9:18 ` James Clark
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox