Re: Broken udelay() on KVM host with a vcpu loaded

public inbox for kvmarm@lists.cs.columbia.edu
 help / color / mirror / Atom feed

From: Marc Zyngier <maz@kernel.org>
To: Quentin Perret <qperret@google.com>
Cc: kvmarm@lists.linux.dev, oupton@kernel.org, joey.gouly@arm.com,
	suzuki.poulose@arm.com, yuzenghui@huawei.com,
	catalin.marinas@arm.com, will@kernel.org
Subject: Re: Broken udelay() on KVM host with a vcpu loaded
Date: Tue, 10 Feb 2026 12:52:21 +0000	[thread overview]
Message-ID: <86qzqsbw1m.wl-maz@kernel.org> (raw)
In-Reply-To: <ktosachvft2cgqd5qkukn275ugmhy6xrhxur4zqpdxlfr3qh5h@o3zrfnsq63od>

On Tue, 10 Feb 2026 12:27:48 +0000,
Quentin Perret <qperret@google.com> wrote:
> 
> Hi all,
> 
> I have just received a report from a partner of udelay misbehaving when
> running on the host whilst a vCPU is loaded. This hardware has FEAT_WFxT
> and uses the matching implementation of udelay. Interestingly, WFIT
> triggers using CNTVCT_EL0 unconditionally, but with KVM the host/guest
> switch for that happens from the preempt notifiers/vcpu_put which aren't
> invoked when e.g. handling an IRQ. Interestingly, udelay reads the arch
> timer to set the waiting time for WFIT using an absolute value, and that
> gets compared to CNTVCT_EL0 which in the aforementioned
> IRQ-with-vCPU-loaded case uses the _guest's_ CNTVCT_EL0.

Well, the underlying issue is that get_cycle(), as used by  __delay(),
is *either* using CNTVCT_EL0 (when booted at EL1) or CNTPCT_EL0 (when
booted at EL2).

> 
> I can think of two approaches to address the problem:
>   1. have KVM context switch cntvoff proactively prior to re-enabling
>   preemption when handling a guest exit;
>   2. modify the WFIT-based udelay implementation to read from CNTVCT_EL0
>   instead of the arch_timer to be a bit more self-consitent;
> 
> Other ideas welcome!

(1) is a real nightmare, and would force a complete redesign of the
life cycle of guest timers (switching from load/put to enter/exit for
the context switch, but only on !VHE). I'd rather avoid that, as this
is a pretty large performance penalty.

(2) is much more palatable, and easily hacked, see below. Can you
please five it a go?

Thanks,

	M.

From b1b45d591aed3e5276ff857dbc6cfa3bce181766 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz@kernel.org>
Date: Tue, 10 Feb 2026 12:43:07 +0000
Subject: [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay()

Quentin reports an interesting problem with the use of WFxT in __delay()
when a vcpu is loaded and that KVM is *not* in VHE mode.

In this case, CNTVOFF_EL2 is set to a non-zero value to reflect the
state of the guest virtual counter. At the same time, __delay() is
using get_cycles() to read the counter value, which is indirected to
reading CNTPCT_EL0.

The core of the issue is that WFxT is using the *virtual* counter,
while the kernel is using the physical counter, and that the offset
introduces a really bad discrepancy between the two.

Fix this by forcing the use of CNTVCT_EL0, making __delay() consistent
irrespective of the value of CNTVOFF_EL2.

Reported-by: Quentin Perret <qperret@google.com>
Fixes: 7d26b0516a0df ("arm64: Use WFxT for __delay() when possible")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/ktosachvft2cgqd5qkukn275ugmhy6xrhxur4zqpdxlfr3qh5h@o3zrfnsq63od
Cc: stable@vger.kernel.org
---
 arch/arm64/lib/delay.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c
index cb2062e7e2340..26a39bb301ef6 100644
--- a/arch/arm64/lib/delay.c
+++ b/arch/arm64/lib/delay.c
@@ -23,9 +23,16 @@ static inline unsigned long xloops_to_cycles(unsigned long xloops)
 	return (xloops * loops_per_jiffy * HZ) >> 32;
 }
 
+/*
+ * Force the use of CNTVCT_EL0 in order to have the same base as
+ * WFxT. This avoids some annoying issues when CNTVOFF_EL2 is not
+ * reset 0 on a KVM host until we do a vcpu_put() on the vcpu.
+ */
+#define __delay_cycles()	__arch_counter_get_cntvct_stable()
+
 void __delay(unsigned long cycles)
 {
-	cycles_t start = get_cycles();
+	cycles_t start = __delay_cycles();
 
 	if (alternative_has_cap_unlikely(ARM64_HAS_WFXT)) {
 		u64 end = start + cycles;
@@ -35,17 +42,17 @@ void __delay(unsigned long cycles)
 		 * early, use a WFET loop to complete the delay.
 		 */
 		wfit(end);
-		while ((get_cycles() - start) < cycles)
+		while ((__delay_cycles() - start) < cycles)
 			wfet(end);
 	} else 	if (arch_timer_evtstrm_available()) {
 		const cycles_t timer_evt_period =
 			USECS_TO_CYCLES(ARCH_TIMER_EVT_STREAM_PERIOD_US);
 
-		while ((get_cycles() - start + timer_evt_period) < cycles)
+		while ((__delay_cycles() - start + timer_evt_period) < cycles)
 			wfe();
 	}
 
-	while ((get_cycles() - start) < cycles)
+	while ((__delay_cycles() - start) < cycles)
 		cpu_relax();
 }
 EXPORT_SYMBOL(__delay);
-- 
2.47.3


-- 
Without deviation from the norm, progress is not possible.

next prev parent reply	other threads:[~2026-02-10 12:52 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 12:27 Broken udelay() on KVM host with a vcpu loaded Quentin Perret
2026-02-10 12:52 ` Marc Zyngier [this message]
2026-02-10 15:34   ` Will Deacon
2026-02-10 15:58     ` Quentin Perret
2026-02-10 19:54       ` Marc Zyngier
2026-02-13 11:50         ` Will Deacon
2026-02-13 13:52           ` Marc Zyngier
2026-02-13 14:05         ` Quentin Perret
2026-02-10 19:46     ` Marc Zyngier

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:cb2062e7e234 dfblob:26a39bb301ef )
 OR (
bs:"arm64: Force the use of CNTVCT_EL0 in __delay()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86qzqsbw1m.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=oupton@kernel.org \
    --cc=qperret@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox