All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Horgan <ben.horgan@arm.com>
To: Marc Zyngier <maz@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	Joey Gouly <joey.gouly@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Oliver Upton <oupton@kernel.org>,
	Zenghui Yu <yuzenghui@huawei.com>, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Hyesoo Yu <hyesoo.yu@samsung.com>,
	Quentin Perret <qperret@google.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay()
Date: Mon, 23 Feb 2026 11:16:32 +0000	[thread overview]
Message-ID: <aZw3EGs4rbQvbAzV@e134344.arm.com> (raw)
In-Reply-To: <20260213141619.1791283-1-maz@kernel.org>

Hi Marc,

On Fri, Feb 13, 2026 at 02:16:19PM +0000, Marc Zyngier wrote:
> Quentin forwards a report from Hyesoo Yu, describing an interesting
> problem with the use of WFxT in __delay() when a vcpu is loaded and
> that KVM is *not* in VHE mode (either nVHE or hVHE).
> 
> In this case, CNTVOFF_EL2 is set to a non-zero value to reflect the
> state of the guest virtual counter. At the same time, __delay() is
> using get_cycles() to read the counter value, which is indirected to
> reading CNTPCT_EL0.
> 
> The core of the issue is that WFxT is using the *virtual* counter,
> while the kernel is using the physical counter, and that the offset
> introduces a really bad discrepancy between the two.
> 
> Fix this by forcing the use of CNTVCT_EL0, making __delay() consistent
> irrespective of the value of CNTVOFF_EL2.
> 
> Reported-by: Hyesoo Yu <hyesoo.yu@samsung.com>
> Reported-by: Quentin Perret <qperret@google.com>
> Reviewed-by: Quentin Perret <qperret@google.com>
> Fixes: 7d26b0516a0df ("arm64: Use WFxT for __delay() when possible")
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Link: https://lore.kernel.org/r/ktosachvft2cgqd5qkukn275ugmhy6xrhxur4zqpdxlfr3qh5h@o3zrfnsq63od
> Cc: stable@vger.kernel.org
> ---
>  arch/arm64/lib/delay.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c
> index cb2062e7e2340..d02341303899e 100644
> --- a/arch/arm64/lib/delay.c
> +++ b/arch/arm64/lib/delay.c
> @@ -23,9 +23,20 @@ static inline unsigned long xloops_to_cycles(unsigned long xloops)
>  	return (xloops * loops_per_jiffy * HZ) >> 32;
>  }
>  
> +/*
> + * Force the use of CNTVCT_EL0 in order to have the same base as WFxT.
> + * This avoids some annoying issues when CNTVOFF_EL2 is not reset 0 on a
> + * KVM host running at EL1 until we do a vcpu_put() on the vcpu. When
> + * running at EL2, the effective offset is always 0.
> + *
> + * Note that userspace cannot change the offset behind our back either,
> + * as the vcpu mutex is held as long as KVM_RUN is in progress.
> + */
> +#define __delay_cycles()	__arch_counter_get_cntvct_stable()

I'm seeing this CONFIG_DEBUG_PREEMPT warning, see below, when running 7.0-rc1 on
FVP Base RevC. I haven't tried bisecting but it looks to be introduced by this
change.

The calls are:

__this_cpu_read()
erratum_handler()
arch_timer_reg_read_stable()
__arch_counter_get_cntvct_stable()
__delay()

This silences the warning:

diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index f5794d50f51d..f07e4efa0d2b 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -24,14 +24,14 @@
 #define has_erratum_handler(h)                                         \
        ({                                                              \
                const struct arch_timer_erratum_workaround *__wa;       \
-               __wa = __this_cpu_read(timer_unstable_counter_workaround); \
+               __wa = raw_cpu_read(timer_unstable_counter_workaround); \
                (__wa && __wa->h);                                      \
        })
 
 #define erratum_handler(h)                                             \
        ({                                                              \
                const struct arch_timer_erratum_workaround *__wa;       \
-               __wa = __this_cpu_read(timer_unstable_counter_workaround); \
+               __wa = raw_cpu_read(timer_unstable_counter_workaround); \
                (__wa && __wa->h) ? ({ isb(); __wa->h;}) : arch_timer_##h; \
        })


[    0.116471] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc1 #787 PREEMPT
[    0.116475] Hardware name: FVP Base RevC (DT)
[    0.116476] Call trace:
[    0.116477]  show_stack+0x18/0x24 (C)
[    0.116481]  dump_stack_lvl+0x74/0x8c
[    0.116486]  dump_stack+0x18/0x24
[    0.116490]  check_preemption_disabled+0xd8/0xf8
[    0.116496]  __this_cpu_preempt_check+0x1c/0x28
[    0.116501]  __delay+0x114/0x1d4
[    0.116507]  __const_udelay+0x2c/0x38
[    0.116513]  its_wait_for_range_completion+0x54/0xe4
[    0.116518]  its_send_single_command+0xd0/0x150
[    0.116525]  its_create_device+0x260/0x3a4
[    0.116528]  its_msi_prepare+0x100/0x168
[    0.116532]  its_pci_msi_prepare+0x120/0x168
[    0.116538]  msi_create_device_irq_domain+0x224/0x290
[    0.116542]  pci_setup_msix_device_domain+0x90/0xcc
[    0.116548]  __pci_enable_msix_range+0x1b8/0x620
[    0.116554]  pci_alloc_irq_vectors_affinity+0xc4/0x144
[    0.116559]  pci_alloc_irq_vectors+0x14/0x20
[    0.116564]  ahci_init_one+0x568/0xea8
[    0.116568]  local_pci_probe+0x40/0xa8
[    0.116574]  pci_device_probe+0xd4/0x208
[    0.116580]  really_probe+0xbc/0x29c
[    0.116584]  __driver_probe_device+0x78/0x12c
[    0.116588]  driver_probe_device+0x3c/0x15c
[    0.116592]  __driver_attach+0xc8/0x1d4
[    0.116597]  bus_for_each_dev+0x7c/0xe0
[    0.116602]  driver_attach+0x24/0x30
[    0.116606]  bus_add_driver+0xe4/0x208
[    0.116610]  driver_register+0x5c/0x124
[    0.116614]  __pci_register_driver+0x4c/0x58
[    0.116619]  ahci_pci_driver_init+0x24/0x30
[    0.116624]  do_one_initcall+0x58/0x3e8
[    0.116629]  kernel_init_freeable+0x1e4/0x530
[    0.116634]  kernel_init+0x24/0x1e4
[    0.116640]  ret_from_fork+0x10/0x20



> +
>  void __delay(unsigned long cycles)
>  {
> -	cycles_t start = get_cycles();
> +	cycles_t start = __delay_cycles();
>  
>  	if (alternative_has_cap_unlikely(ARM64_HAS_WFXT)) {
>  		u64 end = start + cycles;
> @@ -35,17 +46,17 @@ void __delay(unsigned long cycles)
>  		 * early, use a WFET loop to complete the delay.
>  		 */
>  		wfit(end);
> -		while ((get_cycles() - start) < cycles)
> +		while ((__delay_cycles() - start) < cycles)
>  			wfet(end);
>  	} else 	if (arch_timer_evtstrm_available()) {
>  		const cycles_t timer_evt_period =
>  			USECS_TO_CYCLES(ARCH_TIMER_EVT_STREAM_PERIOD_US);
>  
> -		while ((get_cycles() - start + timer_evt_period) < cycles)
> +		while ((__delay_cycles() - start + timer_evt_period) < cycles)
>  			wfe();
>  	}
>  
> -	while ((get_cycles() - start) < cycles)
> +	while ((__delay_cycles() - start) < cycles)
>  		cpu_relax();
>  }
>  EXPORT_SYMBOL(__delay);
> -- 
> 2.47.3
> 
> 

Thanks,

Ben

  parent reply	other threads:[~2026-02-23 11:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-13 14:16 [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay() Marc Zyngier
2026-02-19 13:27 ` Will Deacon
2026-02-23 11:16 ` Ben Horgan [this message]
2026-02-23 14:31   ` Marc Zyngier
2026-02-23 15:14     ` Ben Horgan
2026-02-25 22:36     ` Will Deacon
2026-02-26  8:16       ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZw3EGs4rbQvbAzV@e134344.arm.com \
    --to=ben.horgan@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=hyesoo.yu@samsung.com \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=oupton@kernel.org \
    --cc=qperret@google.com \
    --cc=stable@vger.kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.