* [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay()
@ 2026-02-13 14:16 Marc Zyngier
2026-02-19 13:27 ` Will Deacon
2026-02-23 11:16 ` Ben Horgan
0 siblings, 2 replies; 7+ messages in thread
From: Marc Zyngier @ 2026-02-13 14:16 UTC (permalink / raw)
To: linux-arm-kernel, kvmarm
Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Will Deacon, Catalin Marinas, Hyesoo Yu, Quentin Perret, stable
Quentin forwards a report from Hyesoo Yu, describing an interesting
problem with the use of WFxT in __delay() when a vcpu is loaded and
that KVM is *not* in VHE mode (either nVHE or hVHE).
In this case, CNTVOFF_EL2 is set to a non-zero value to reflect the
state of the guest virtual counter. At the same time, __delay() is
using get_cycles() to read the counter value, which is indirected to
reading CNTPCT_EL0.
The core of the issue is that WFxT is using the *virtual* counter,
while the kernel is using the physical counter, and that the offset
introduces a really bad discrepancy between the two.
Fix this by forcing the use of CNTVCT_EL0, making __delay() consistent
irrespective of the value of CNTVOFF_EL2.
Reported-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Reported-by: Quentin Perret <qperret@google.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Fixes: 7d26b0516a0df ("arm64: Use WFxT for __delay() when possible")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/ktosachvft2cgqd5qkukn275ugmhy6xrhxur4zqpdxlfr3qh5h@o3zrfnsq63od
Cc: stable@vger.kernel.org
---
arch/arm64/lib/delay.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c
index cb2062e7e2340..d02341303899e 100644
--- a/arch/arm64/lib/delay.c
+++ b/arch/arm64/lib/delay.c
@@ -23,9 +23,20 @@ static inline unsigned long xloops_to_cycles(unsigned long xloops)
return (xloops * loops_per_jiffy * HZ) >> 32;
}
+/*
+ * Force the use of CNTVCT_EL0 in order to have the same base as WFxT.
+ * This avoids some annoying issues when CNTVOFF_EL2 is not reset 0 on a
+ * KVM host running at EL1 until we do a vcpu_put() on the vcpu. When
+ * running at EL2, the effective offset is always 0.
+ *
+ * Note that userspace cannot change the offset behind our back either,
+ * as the vcpu mutex is held as long as KVM_RUN is in progress.
+ */
+#define __delay_cycles() __arch_counter_get_cntvct_stable()
+
void __delay(unsigned long cycles)
{
- cycles_t start = get_cycles();
+ cycles_t start = __delay_cycles();
if (alternative_has_cap_unlikely(ARM64_HAS_WFXT)) {
u64 end = start + cycles;
@@ -35,17 +46,17 @@ void __delay(unsigned long cycles)
* early, use a WFET loop to complete the delay.
*/
wfit(end);
- while ((get_cycles() - start) < cycles)
+ while ((__delay_cycles() - start) < cycles)
wfet(end);
} else if (arch_timer_evtstrm_available()) {
const cycles_t timer_evt_period =
USECS_TO_CYCLES(ARCH_TIMER_EVT_STREAM_PERIOD_US);
- while ((get_cycles() - start + timer_evt_period) < cycles)
+ while ((__delay_cycles() - start + timer_evt_period) < cycles)
wfe();
}
- while ((get_cycles() - start) < cycles)
+ while ((__delay_cycles() - start) < cycles)
cpu_relax();
}
EXPORT_SYMBOL(__delay);
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay() 2026-02-13 14:16 [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay() Marc Zyngier @ 2026-02-19 13:27 ` Will Deacon 2026-02-23 11:16 ` Ben Horgan 1 sibling, 0 replies; 7+ messages in thread From: Will Deacon @ 2026-02-19 13:27 UTC (permalink / raw) To: linux-arm-kernel, kvmarm, Marc Zyngier Cc: catalin.marinas, kernel-team, Will Deacon, Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Hyesoo Yu, Quentin Perret, stable On Fri, 13 Feb 2026 14:16:19 +0000, Marc Zyngier wrote: > Quentin forwards a report from Hyesoo Yu, describing an interesting > problem with the use of WFxT in __delay() when a vcpu is loaded and > that KVM is *not* in VHE mode (either nVHE or hVHE). > > In this case, CNTVOFF_EL2 is set to a non-zero value to reflect the > state of the guest virtual counter. At the same time, __delay() is > using get_cycles() to read the counter value, which is indirected to > reading CNTPCT_EL0. > > [...] Applied to arm64 (for-next/core), thanks! [1/1] arm64: Force the use of CNTVCT_EL0 in __delay() https://git.kernel.org/arm64/c/29cc0f3aa7c6 Cheers, -- Will https://fixes.arm64.dev https://next.arm64.dev https://will.arm64.dev ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay() 2026-02-13 14:16 [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay() Marc Zyngier 2026-02-19 13:27 ` Will Deacon @ 2026-02-23 11:16 ` Ben Horgan 2026-02-23 14:31 ` Marc Zyngier 1 sibling, 1 reply; 7+ messages in thread From: Ben Horgan @ 2026-02-23 11:16 UTC (permalink / raw) To: Marc Zyngier Cc: linux-arm-kernel, kvmarm, Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Will Deacon, Catalin Marinas, Hyesoo Yu, Quentin Perret, stable Hi Marc, On Fri, Feb 13, 2026 at 02:16:19PM +0000, Marc Zyngier wrote: > Quentin forwards a report from Hyesoo Yu, describing an interesting > problem with the use of WFxT in __delay() when a vcpu is loaded and > that KVM is *not* in VHE mode (either nVHE or hVHE). > > In this case, CNTVOFF_EL2 is set to a non-zero value to reflect the > state of the guest virtual counter. At the same time, __delay() is > using get_cycles() to read the counter value, which is indirected to > reading CNTPCT_EL0. > > The core of the issue is that WFxT is using the *virtual* counter, > while the kernel is using the physical counter, and that the offset > introduces a really bad discrepancy between the two. > > Fix this by forcing the use of CNTVCT_EL0, making __delay() consistent > irrespective of the value of CNTVOFF_EL2. > > Reported-by: Hyesoo Yu <hyesoo.yu@samsung.com> > Reported-by: Quentin Perret <qperret@google.com> > Reviewed-by: Quentin Perret <qperret@google.com> > Fixes: 7d26b0516a0df ("arm64: Use WFxT for __delay() when possible") > Signed-off-by: Marc Zyngier <maz@kernel.org> > Link: https://lore.kernel.org/r/ktosachvft2cgqd5qkukn275ugmhy6xrhxur4zqpdxlfr3qh5h@o3zrfnsq63od > Cc: stable@vger.kernel.org > --- > arch/arm64/lib/delay.c | 19 +++++++++++++++---- > 1 file changed, 15 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c > index cb2062e7e2340..d02341303899e 100644 > --- a/arch/arm64/lib/delay.c > +++ b/arch/arm64/lib/delay.c > @@ -23,9 +23,20 @@ static inline unsigned long xloops_to_cycles(unsigned long xloops) > return (xloops * loops_per_jiffy * HZ) >> 32; > } > > +/* > + * Force the use of CNTVCT_EL0 in order to have the same base as WFxT. > + * This avoids some annoying issues when CNTVOFF_EL2 is not reset 0 on a > + * KVM host running at EL1 until we do a vcpu_put() on the vcpu. When > + * running at EL2, the effective offset is always 0. > + * > + * Note that userspace cannot change the offset behind our back either, > + * as the vcpu mutex is held as long as KVM_RUN is in progress. > + */ > +#define __delay_cycles() __arch_counter_get_cntvct_stable() I'm seeing this CONFIG_DEBUG_PREEMPT warning, see below, when running 7.0-rc1 on FVP Base RevC. I haven't tried bisecting but it looks to be introduced by this change. The calls are: __this_cpu_read() erratum_handler() arch_timer_reg_read_stable() __arch_counter_get_cntvct_stable() __delay() This silences the warning: diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h index f5794d50f51d..f07e4efa0d2b 100644 --- a/arch/arm64/include/asm/arch_timer.h +++ b/arch/arm64/include/asm/arch_timer.h @@ -24,14 +24,14 @@ #define has_erratum_handler(h) \ ({ \ const struct arch_timer_erratum_workaround *__wa; \ - __wa = __this_cpu_read(timer_unstable_counter_workaround); \ + __wa = raw_cpu_read(timer_unstable_counter_workaround); \ (__wa && __wa->h); \ }) #define erratum_handler(h) \ ({ \ const struct arch_timer_erratum_workaround *__wa; \ - __wa = __this_cpu_read(timer_unstable_counter_workaround); \ + __wa = raw_cpu_read(timer_unstable_counter_workaround); \ (__wa && __wa->h) ? ({ isb(); __wa->h;}) : arch_timer_##h; \ }) [ 0.116471] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc1 #787 PREEMPT [ 0.116475] Hardware name: FVP Base RevC (DT) [ 0.116476] Call trace: [ 0.116477] show_stack+0x18/0x24 (C) [ 0.116481] dump_stack_lvl+0x74/0x8c [ 0.116486] dump_stack+0x18/0x24 [ 0.116490] check_preemption_disabled+0xd8/0xf8 [ 0.116496] __this_cpu_preempt_check+0x1c/0x28 [ 0.116501] __delay+0x114/0x1d4 [ 0.116507] __const_udelay+0x2c/0x38 [ 0.116513] its_wait_for_range_completion+0x54/0xe4 [ 0.116518] its_send_single_command+0xd0/0x150 [ 0.116525] its_create_device+0x260/0x3a4 [ 0.116528] its_msi_prepare+0x100/0x168 [ 0.116532] its_pci_msi_prepare+0x120/0x168 [ 0.116538] msi_create_device_irq_domain+0x224/0x290 [ 0.116542] pci_setup_msix_device_domain+0x90/0xcc [ 0.116548] __pci_enable_msix_range+0x1b8/0x620 [ 0.116554] pci_alloc_irq_vectors_affinity+0xc4/0x144 [ 0.116559] pci_alloc_irq_vectors+0x14/0x20 [ 0.116564] ahci_init_one+0x568/0xea8 [ 0.116568] local_pci_probe+0x40/0xa8 [ 0.116574] pci_device_probe+0xd4/0x208 [ 0.116580] really_probe+0xbc/0x29c [ 0.116584] __driver_probe_device+0x78/0x12c [ 0.116588] driver_probe_device+0x3c/0x15c [ 0.116592] __driver_attach+0xc8/0x1d4 [ 0.116597] bus_for_each_dev+0x7c/0xe0 [ 0.116602] driver_attach+0x24/0x30 [ 0.116606] bus_add_driver+0xe4/0x208 [ 0.116610] driver_register+0x5c/0x124 [ 0.116614] __pci_register_driver+0x4c/0x58 [ 0.116619] ahci_pci_driver_init+0x24/0x30 [ 0.116624] do_one_initcall+0x58/0x3e8 [ 0.116629] kernel_init_freeable+0x1e4/0x530 [ 0.116634] kernel_init+0x24/0x1e4 [ 0.116640] ret_from_fork+0x10/0x20 > + > void __delay(unsigned long cycles) > { > - cycles_t start = get_cycles(); > + cycles_t start = __delay_cycles(); > > if (alternative_has_cap_unlikely(ARM64_HAS_WFXT)) { > u64 end = start + cycles; > @@ -35,17 +46,17 @@ void __delay(unsigned long cycles) > * early, use a WFET loop to complete the delay. > */ > wfit(end); > - while ((get_cycles() - start) < cycles) > + while ((__delay_cycles() - start) < cycles) > wfet(end); > } else if (arch_timer_evtstrm_available()) { > const cycles_t timer_evt_period = > USECS_TO_CYCLES(ARCH_TIMER_EVT_STREAM_PERIOD_US); > > - while ((get_cycles() - start + timer_evt_period) < cycles) > + while ((__delay_cycles() - start + timer_evt_period) < cycles) > wfe(); > } > > - while ((get_cycles() - start) < cycles) > + while ((__delay_cycles() - start) < cycles) > cpu_relax(); > } > EXPORT_SYMBOL(__delay); > -- > 2.47.3 > > Thanks, Ben ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay() 2026-02-23 11:16 ` Ben Horgan @ 2026-02-23 14:31 ` Marc Zyngier 2026-02-23 15:14 ` Ben Horgan 2026-02-25 22:36 ` Will Deacon 0 siblings, 2 replies; 7+ messages in thread From: Marc Zyngier @ 2026-02-23 14:31 UTC (permalink / raw) To: Ben Horgan Cc: linux-arm-kernel, kvmarm, Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Will Deacon, Catalin Marinas, Hyesoo Yu, Quentin Perret, stable Hi Ben, On Mon, 23 Feb 2026 11:16:32 +0000, Ben Horgan <ben.horgan@arm.com> wrote: > > Hi Marc, > > On Fri, Feb 13, 2026 at 02:16:19PM +0000, Marc Zyngier wrote: > > Quentin forwards a report from Hyesoo Yu, describing an interesting > > problem with the use of WFxT in __delay() when a vcpu is loaded and > > that KVM is *not* in VHE mode (either nVHE or hVHE). > > > > In this case, CNTVOFF_EL2 is set to a non-zero value to reflect the > > state of the guest virtual counter. At the same time, __delay() is > > using get_cycles() to read the counter value, which is indirected to > > reading CNTPCT_EL0. > > > > The core of the issue is that WFxT is using the *virtual* counter, > > while the kernel is using the physical counter, and that the offset > > introduces a really bad discrepancy between the two. > > > > Fix this by forcing the use of CNTVCT_EL0, making __delay() consistent > > irrespective of the value of CNTVOFF_EL2. > > > > Reported-by: Hyesoo Yu <hyesoo.yu@samsung.com> > > Reported-by: Quentin Perret <qperret@google.com> > > Reviewed-by: Quentin Perret <qperret@google.com> > > Fixes: 7d26b0516a0df ("arm64: Use WFxT for __delay() when possible") > > Signed-off-by: Marc Zyngier <maz@kernel.org> > > Link: https://lore.kernel.org/r/ktosachvft2cgqd5qkukn275ugmhy6xrhxur4zqpdxlfr3qh5h@o3zrfnsq63od > > Cc: stable@vger.kernel.org > > --- > > arch/arm64/lib/delay.c | 19 +++++++++++++++---- > > 1 file changed, 15 insertions(+), 4 deletions(-) > > > > diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c > > index cb2062e7e2340..d02341303899e 100644 > > --- a/arch/arm64/lib/delay.c > > +++ b/arch/arm64/lib/delay.c > > @@ -23,9 +23,20 @@ static inline unsigned long xloops_to_cycles(unsigned long xloops) > > return (xloops * loops_per_jiffy * HZ) >> 32; > > } > > > > +/* > > + * Force the use of CNTVCT_EL0 in order to have the same base as WFxT. > > + * This avoids some annoying issues when CNTVOFF_EL2 is not reset 0 on a > > + * KVM host running at EL1 until we do a vcpu_put() on the vcpu. When > > + * running at EL2, the effective offset is always 0. > > + * > > + * Note that userspace cannot change the offset behind our back either, > > + * as the vcpu mutex is held as long as KVM_RUN is in progress. > > + */ > > +#define __delay_cycles() __arch_counter_get_cntvct_stable() > > I'm seeing this CONFIG_DEBUG_PREEMPT warning, see below, when running 7.0-rc1 on > FVP Base RevC. I haven't tried bisecting but it looks to be introduced by this > change. > > The calls are: > > __this_cpu_read() > erratum_handler() > arch_timer_reg_read_stable() > __arch_counter_get_cntvct_stable() > __delay() > > This silences the warning: > > diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h > index f5794d50f51d..f07e4efa0d2b 100644 > --- a/arch/arm64/include/asm/arch_timer.h > +++ b/arch/arm64/include/asm/arch_timer.h > @@ -24,14 +24,14 @@ > #define has_erratum_handler(h) \ > ({ \ > const struct arch_timer_erratum_workaround *__wa; \ > - __wa = __this_cpu_read(timer_unstable_counter_workaround); \ > + __wa = raw_cpu_read(timer_unstable_counter_workaround); \ > (__wa && __wa->h); \ > }) > > #define erratum_handler(h) \ > ({ \ > const struct arch_timer_erratum_workaround *__wa; \ > - __wa = __this_cpu_read(timer_unstable_counter_workaround); \ > + __wa = raw_cpu_read(timer_unstable_counter_workaround); \ > (__wa && __wa->h) ? ({ isb(); __wa->h;}) : arch_timer_##h; \ > }) It does indeed silence it, but that's IMO the wrong thing to do since you can end-up calling a workaround helper on the wrong CPU if preempted. If you look at how things were done before this patch, we had: get_cycles() -> arch_timer_read_counter() -> arch_counter_get_cntvct_stable() Crucially, arch_counter_get_cntvct_stable() does disable preemption, and we should preserve it. Something like this: diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c index d02341303899e..25fb593f95b0c 100644 --- a/arch/arm64/lib/delay.c +++ b/arch/arm64/lib/delay.c @@ -32,7 +32,16 @@ static inline unsigned long xloops_to_cycles(unsigned long xloops) * Note that userspace cannot change the offset behind our back either, * as the vcpu mutex is held as long as KVM_RUN is in progress. */ -#define __delay_cycles() __arch_counter_get_cntvct_stable() +static cycles_t __delay_cycles(void) +{ + cycles_t val; + + preempt_disable(); + val = __arch_counter_get_cntvct_stable(); + preenpt_enable(); + + return val; +} void __delay(unsigned long cycles) { The question is whether there is a material benefit in replicating the arch_timer_read_counter() indirection for the virtual counter in order to not pay the price of preempt_disable() when we're on a non-broken system (hopefully the vast majority of implementations). Thanks, M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay() 2026-02-23 14:31 ` Marc Zyngier @ 2026-02-23 15:14 ` Ben Horgan 2026-02-25 22:36 ` Will Deacon 1 sibling, 0 replies; 7+ messages in thread From: Ben Horgan @ 2026-02-23 15:14 UTC (permalink / raw) To: Marc Zyngier Cc: linux-arm-kernel, kvmarm, Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Will Deacon, Catalin Marinas, Hyesoo Yu, Quentin Perret, stable Hi Marc, On 2/23/26 14:31, Marc Zyngier wrote: > Hi Ben, > > On Mon, 23 Feb 2026 11:16:32 +0000, > Ben Horgan <ben.horgan@arm.com> wrote: >> >> Hi Marc, >> >> On Fri, Feb 13, 2026 at 02:16:19PM +0000, Marc Zyngier wrote: >>> Quentin forwards a report from Hyesoo Yu, describing an interesting >>> problem with the use of WFxT in __delay() when a vcpu is loaded and >>> that KVM is *not* in VHE mode (either nVHE or hVHE). >>> >>> In this case, CNTVOFF_EL2 is set to a non-zero value to reflect the >>> state of the guest virtual counter. At the same time, __delay() is >>> using get_cycles() to read the counter value, which is indirected to >>> reading CNTPCT_EL0. >>> >>> The core of the issue is that WFxT is using the *virtual* counter, >>> while the kernel is using the physical counter, and that the offset >>> introduces a really bad discrepancy between the two. >>> >>> Fix this by forcing the use of CNTVCT_EL0, making __delay() consistent >>> irrespective of the value of CNTVOFF_EL2. >>> >>> Reported-by: Hyesoo Yu <hyesoo.yu@samsung.com> >>> Reported-by: Quentin Perret <qperret@google.com> >>> Reviewed-by: Quentin Perret <qperret@google.com> >>> Fixes: 7d26b0516a0df ("arm64: Use WFxT for __delay() when possible") >>> Signed-off-by: Marc Zyngier <maz@kernel.org> >>> Link: https://lore.kernel.org/r/ktosachvft2cgqd5qkukn275ugmhy6xrhxur4zqpdxlfr3qh5h@o3zrfnsq63od >>> Cc: stable@vger.kernel.org >>> --- >>> arch/arm64/lib/delay.c | 19 +++++++++++++++---- >>> 1 file changed, 15 insertions(+), 4 deletions(-) >>> >>> diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c >>> index cb2062e7e2340..d02341303899e 100644 >>> --- a/arch/arm64/lib/delay.c >>> +++ b/arch/arm64/lib/delay.c >>> @@ -23,9 +23,20 @@ static inline unsigned long xloops_to_cycles(unsigned long xloops) >>> return (xloops * loops_per_jiffy * HZ) >> 32; >>> } >>> >>> +/* >>> + * Force the use of CNTVCT_EL0 in order to have the same base as WFxT. >>> + * This avoids some annoying issues when CNTVOFF_EL2 is not reset 0 on a >>> + * KVM host running at EL1 until we do a vcpu_put() on the vcpu. When >>> + * running at EL2, the effective offset is always 0. >>> + * >>> + * Note that userspace cannot change the offset behind our back either, >>> + * as the vcpu mutex is held as long as KVM_RUN is in progress. >>> + */ >>> +#define __delay_cycles() __arch_counter_get_cntvct_stable() >> >> I'm seeing this CONFIG_DEBUG_PREEMPT warning, see below, when running 7.0-rc1 on >> FVP Base RevC. I haven't tried bisecting but it looks to be introduced by this >> change. >> >> The calls are: >> >> __this_cpu_read() >> erratum_handler() >> arch_timer_reg_read_stable() >> __arch_counter_get_cntvct_stable() >> __delay() >> >> This silences the warning: >> >> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h >> index f5794d50f51d..f07e4efa0d2b 100644 >> --- a/arch/arm64/include/asm/arch_timer.h >> +++ b/arch/arm64/include/asm/arch_timer.h >> @@ -24,14 +24,14 @@ >> #define has_erratum_handler(h) \ >> ({ \ >> const struct arch_timer_erratum_workaround *__wa; \ >> - __wa = __this_cpu_read(timer_unstable_counter_workaround); \ >> + __wa = raw_cpu_read(timer_unstable_counter_workaround); \ >> (__wa && __wa->h); \ >> }) >> >> #define erratum_handler(h) \ >> ({ \ >> const struct arch_timer_erratum_workaround *__wa; \ >> - __wa = __this_cpu_read(timer_unstable_counter_workaround); \ >> + __wa = raw_cpu_read(timer_unstable_counter_workaround); \ >> (__wa && __wa->h) ? ({ isb(); __wa->h;}) : arch_timer_##h; \ >> }) > > It does indeed silence it, but that's IMO the wrong thing to do since > you can end-up calling a workaround helper on the wrong CPU if Agreed. This just hides the problem. > preempted. If you look at how things were done before this patch, we > had: > > get_cycles() -> arch_timer_read_counter() -> arch_counter_get_cntvct_stable() > > Crucially, arch_counter_get_cntvct_stable() does disable preemption, > and we should preserve it. Something like this: > > diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c > index d02341303899e..25fb593f95b0c 100644 > --- a/arch/arm64/lib/delay.c > +++ b/arch/arm64/lib/delay.c > @@ -32,7 +32,16 @@ static inline unsigned long xloops_to_cycles(unsigned long xloops) > * Note that userspace cannot change the offset behind our back either, > * as the vcpu mutex is held as long as KVM_RUN is in progress. > */ > -#define __delay_cycles() __arch_counter_get_cntvct_stable() > +static cycles_t __delay_cycles(void) > +{ > + cycles_t val; > + > + preempt_disable(); > + val = __arch_counter_get_cntvct_stable(); > + preenpt_enable(); > + > + return val; > +} Modulo the typo (preenpt) this looks to be correct and I see no warnings. > > void __delay(unsigned long cycles) > { > > The question is whether there is a material benefit in replicating the > arch_timer_read_counter() indirection for the virtual counter in order > to not pay the price of preempt_disable() when we're on a non-broken > system (hopefully the vast majority of implementations). I'm unsure of the tradeoffs here. > > Thanks, > > M. > Thanks, Ben ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay() 2026-02-23 14:31 ` Marc Zyngier 2026-02-23 15:14 ` Ben Horgan @ 2026-02-25 22:36 ` Will Deacon 2026-02-26 8:16 ` Marc Zyngier 1 sibling, 1 reply; 7+ messages in thread From: Will Deacon @ 2026-02-25 22:36 UTC (permalink / raw) To: Marc Zyngier Cc: Ben Horgan, linux-arm-kernel, kvmarm, Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Catalin Marinas, Hyesoo Yu, Quentin Perret, stable On Mon, Feb 23, 2026 at 02:31:44PM +0000, Marc Zyngier wrote: > Crucially, arch_counter_get_cntvct_stable() does disable preemption, > and we should preserve it. Something like this: > > diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c > index d02341303899e..25fb593f95b0c 100644 > --- a/arch/arm64/lib/delay.c > +++ b/arch/arm64/lib/delay.c > @@ -32,7 +32,16 @@ static inline unsigned long xloops_to_cycles(unsigned long xloops) > * Note that userspace cannot change the offset behind our back either, > * as the vcpu mutex is held as long as KVM_RUN is in progress. > */ > -#define __delay_cycles() __arch_counter_get_cntvct_stable() > +static cycles_t __delay_cycles(void) > +{ > + cycles_t val; > + > + preempt_disable(); > + val = __arch_counter_get_cntvct_stable(); > + preenpt_enable(); > + > + return val; > +} (nit: arch_counter_get_cntvct_stable() uses the _notrace() variants of the preempt disable/enable helpers) > void __delay(unsigned long cycles) > { > > The question is whether there is a material benefit in replicating the > arch_timer_read_counter() indirection for the virtual counter in order > to not pay the price of preempt_disable() when we're on a non-broken > system (hopefully the vast majority of implementations). That sounds nice, especially as we can assume (for now) that CPUs implementing WFIT don't need the cntvct workarounds. However, I can't really figure out how to implement it after reminding myself of all the fun we had trying to use a static key for these workarounds in the past. If a CPU being onlined has a timer erratum, we wouldn't be able to migrate any tasks in the middle of a preempt-enabled delay loop onto it. :/ Will ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay() 2026-02-25 22:36 ` Will Deacon @ 2026-02-26 8:16 ` Marc Zyngier 0 siblings, 0 replies; 7+ messages in thread From: Marc Zyngier @ 2026-02-26 8:16 UTC (permalink / raw) To: Will Deacon Cc: Ben Horgan, linux-arm-kernel, kvmarm, Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Catalin Marinas, Hyesoo Yu, Quentin Perret, stable On Wed, 25 Feb 2026 22:36:07 +0000, Will Deacon <will@kernel.org> wrote: > > On Mon, Feb 23, 2026 at 02:31:44PM +0000, Marc Zyngier wrote: > > Crucially, arch_counter_get_cntvct_stable() does disable preemption, > > and we should preserve it. Something like this: > > > > diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c > > index d02341303899e..25fb593f95b0c 100644 > > --- a/arch/arm64/lib/delay.c > > +++ b/arch/arm64/lib/delay.c > > @@ -32,7 +32,16 @@ static inline unsigned long xloops_to_cycles(unsigned long xloops) > > * Note that userspace cannot change the offset behind our back either, > > * as the vcpu mutex is held as long as KVM_RUN is in progress. > > */ > > -#define __delay_cycles() __arch_counter_get_cntvct_stable() > > +static cycles_t __delay_cycles(void) > > +{ > > + cycles_t val; > > + > > + preempt_disable(); > > + val = __arch_counter_get_cntvct_stable(); > > + preenpt_enable(); > > + > > + return val; > > +} > > (nit: arch_counter_get_cntvct_stable() uses the _notrace() variants of > the preempt disable/enable helpers) That's because arch_counter_get_cntvct_stable() itself is notrace. I'm not sure we need this function to be notrace as well, but I'll change that and we can revisit it. > > > void __delay(unsigned long cycles) > > { > > > > The question is whether there is a material benefit in replicating the > > arch_timer_read_counter() indirection for the virtual counter in order > > to not pay the price of preempt_disable() when we're on a non-broken > > system (hopefully the vast majority of implementations). > > That sounds nice, especially as we can assume (for now) that CPUs > implementing WFIT don't need the cntvct workarounds. However, I can't > really figure out how to implement it after reminding myself of all the > fun we had trying to use a static key for these workarounds in the past. > > If a CPU being onlined has a timer erratum, we wouldn't be able to > migrate any tasks in the middle of a preempt-enabled delay loop onto > it. :/ Why would it be any different than, say, sched_clock(), which ultimately uses arch_timer_read_counter()? We already rely on this indirection to do the right thing everywhere, and I don't recall we have any issue with this. Anyway, I'll shortly post what I have and we can discuss whether this is the correct approach. Thanks, M. -- Jazz isn't dead. It just smells funny. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-02-26 8:17 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-13 14:16 [PATCH] arm64: Force the use of CNTVCT_EL0 in __delay() Marc Zyngier 2026-02-19 13:27 ` Will Deacon 2026-02-23 11:16 ` Ben Horgan 2026-02-23 14:31 ` Marc Zyngier 2026-02-23 15:14 ` Ben Horgan 2026-02-25 22:36 ` Will Deacon 2026-02-26 8:16 ` Marc Zyngier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox