* [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown @ 2023-02-07 1:14 lirongqing 2023-02-08 1:04 ` Michael Kelley (LINUX) 2024-08-01 14:21 ` Thomas Gleixner 0 siblings, 2 replies; 33+ messages in thread From: lirongqing @ 2023-02-07 1:14 UTC (permalink / raw) To: seanjc, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel From: Li RongQing <lirongqing@baidu.com> Zeroing the counter register in pit_shutdown() isn't actually supposed to stop it from counting, will causes the PIT to start running again, From the spec: The largest possible initial count is 0; this is equivalent to 216 for binary counting and 104 for BCD counting. The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 the Counter "wraps around" to the highest count, either FFFF hex for binary count- ing or 9999 for BCD counting, and continues counting. Mode 0 is typically used for event counting. After the Control Word is written, OUT is initially low, and will remain low until the Counter reaches zero. OUT then goes high and remains high until a new count or a new Mode 0 Control Word is written into the Counter. Hyper-V and KVM follow the spec, the issue that 35b69a42 "(clockevents/drivers/ i8253: Add support for PIT shutdown quirk") fixed is in i8253 drivers, not Hyper-v, so delete the zero timer counter register in shutdown, and delete PIT shutdown quirk for Hyper-v Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> --- arch/x86/kernel/cpu/mshyperv.c | 11 ----------- drivers/clocksource/i8253.c | 12 ------------ include/linux/i8253.h | 1 - 3 files changed, 24 deletions(-) diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 46668e2..f788889 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -16,7 +16,6 @@ #include <linux/interrupt.h> #include <linux/irq.h> #include <linux/kexec.h> -#include <linux/i8253.h> #include <linux/random.h> #include <linux/swiotlb.h> #include <asm/processor.h> @@ -399,16 +398,6 @@ static void __init ms_hyperv_init_platform(void) if (efi_enabled(EFI_BOOT)) x86_platform.get_nmi_reason = hv_get_nmi_reason; - /* - * Hyper-V VMs have a PIT emulation quirk such that zeroing the - * counter register during PIT shutdown restarts the PIT. So it - * continues to interrupt @18.2 HZ. Setting i8253_clear_counter - * to false tells pit_shutdown() not to zero the counter so that - * the PIT really is shutdown. Generation 2 VMs don't have a PIT, - * and setting this value has no effect. - */ - i8253_clear_counter_on_shutdown = false; - #if IS_ENABLED(CONFIG_HYPERV) /* * Setup the hook to get control post apic initialization. diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c index d4350bb..169474d 100644 --- a/drivers/clocksource/i8253.c +++ b/drivers/clocksource/i8253.c @@ -20,13 +20,6 @@ DEFINE_RAW_SPINLOCK(i8253_lock); EXPORT_SYMBOL(i8253_lock); -/* - * Handle PIT quirk in pit_shutdown() where zeroing the counter register - * restarts the PIT, negating the shutdown. On platforms with the quirk, - * platform specific code can set this to false. - */ -bool i8253_clear_counter_on_shutdown __ro_after_init = true; - #ifdef CONFIG_CLKSRC_I8253 /* * Since the PIT overflows every tick, its not very useful @@ -117,11 +110,6 @@ static int pit_shutdown(struct clock_event_device *evt) outb_p(0x30, PIT_MODE); - if (i8253_clear_counter_on_shutdown) { - outb_p(0, PIT_CH0); - outb_p(0, PIT_CH0); - } - raw_spin_unlock(&i8253_lock); return 0; } diff --git a/include/linux/i8253.h b/include/linux/i8253.h index 8336b2f..e6bb36a 100644 --- a/include/linux/i8253.h +++ b/include/linux/i8253.h @@ -21,7 +21,6 @@ #define PIT_LATCH ((PIT_TICK_RATE + HZ/2) / HZ) extern raw_spinlock_t i8253_lock; -extern bool i8253_clear_counter_on_shutdown; extern struct clock_event_device i8253_clockevent; extern void clockevent_i8253_init(bool oneshot); -- 2.9.4 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* RE: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2023-02-07 1:14 [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown lirongqing @ 2023-02-08 1:04 ` Michael Kelley (LINUX) 2023-02-08 15:04 ` Sean Christopherson 2024-08-01 9:00 ` David Woodhouse 2024-08-01 14:21 ` Thomas Gleixner 1 sibling, 2 replies; 33+ messages in thread From: Michael Kelley (LINUX) @ 2023-02-08 1:04 UTC (permalink / raw) To: lirongqing@baidu.com, seanjc@google.com, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org From: lirongqing@baidu.com <lirongqing@baidu.com> Sent: Monday, February 6, 2023 5:15 PM > > Zeroing the counter register in pit_shutdown() isn't actually supposed to > stop it from counting, will causes the PIT to start running again, > From the spec: > > The largest possible initial count is 0; this is equivalent to 216 for > binary counting and 104 for BCD counting. > > The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 the > Counter "wraps around" to the highest count, either FFFF hex for binary > count- ing or 9999 for BCD counting, and continues counting. > > Mode 0 is typically used for event counting. After the Control Word is > written, OUT is initially low, and will remain low until the Counter > reaches zero. OUT then goes high and remains high until a new count or a > new Mode 0 Control Word is written into the Counter. > > Hyper-V and KVM follow the spec, the issue that 35b69a42 "(clockevents/drivers/ > i8253: Add support for PIT shutdown quirk") fixed is in i8253 drivers, not Hyper-v, > so delete the zero timer counter register in shutdown, and delete PIT shutdown > quirk for Hyper-v From the standpoint of Hyper-V, I'm good with this change. But there's a risk that old hardware might not be compliant with the spec, and needs the zero'ing for some reason. The experts in the x86 space will be in the best position to assess the risk. At the time, the quirk approach was taken so the change applied only to Hyper-V, and any such risk was avoided. For Hyper-V, Reviewed-by: Michael Kelley <mikelley@microsoft.com> > > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Li RongQing <lirongqing@baidu.com> > --- > arch/x86/kernel/cpu/mshyperv.c | 11 ----------- > drivers/clocksource/i8253.c | 12 ------------ > include/linux/i8253.h | 1 - > 3 files changed, 24 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c > index 46668e2..f788889 100644 > --- a/arch/x86/kernel/cpu/mshyperv.c > +++ b/arch/x86/kernel/cpu/mshyperv.c > @@ -16,7 +16,6 @@ > #include <linux/interrupt.h> > #include <linux/irq.h> > #include <linux/kexec.h> > -#include <linux/i8253.h> > #include <linux/random.h> > #include <linux/swiotlb.h> > #include <asm/processor.h> > @@ -399,16 +398,6 @@ static void __init ms_hyperv_init_platform(void) > if (efi_enabled(EFI_BOOT)) > x86_platform.get_nmi_reason = hv_get_nmi_reason; > > - /* > - * Hyper-V VMs have a PIT emulation quirk such that zeroing the > - * counter register during PIT shutdown restarts the PIT. So it > - * continues to interrupt @18.2 HZ. Setting i8253_clear_counter > - * to false tells pit_shutdown() not to zero the counter so that > - * the PIT really is shutdown. Generation 2 VMs don't have a PIT, > - * and setting this value has no effect. > - */ > - i8253_clear_counter_on_shutdown = false; > - > #if IS_ENABLED(CONFIG_HYPERV) > /* > * Setup the hook to get control post apic initialization. > diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c > index d4350bb..169474d 100644 > --- a/drivers/clocksource/i8253.c > +++ b/drivers/clocksource/i8253.c > @@ -20,13 +20,6 @@ > DEFINE_RAW_SPINLOCK(i8253_lock); > EXPORT_SYMBOL(i8253_lock); > > -/* > - * Handle PIT quirk in pit_shutdown() where zeroing the counter register > - * restarts the PIT, negating the shutdown. On platforms with the quirk, > - * platform specific code can set this to false. > - */ > -bool i8253_clear_counter_on_shutdown __ro_after_init = true; > - > #ifdef CONFIG_CLKSRC_I8253 > /* > * Since the PIT overflows every tick, its not very useful > @@ -117,11 +110,6 @@ static int pit_shutdown(struct clock_event_device *evt) > > outb_p(0x30, PIT_MODE); > > - if (i8253_clear_counter_on_shutdown) { > - outb_p(0, PIT_CH0); > - outb_p(0, PIT_CH0); > - } > - > raw_spin_unlock(&i8253_lock); > return 0; > } > diff --git a/include/linux/i8253.h b/include/linux/i8253.h > index 8336b2f..e6bb36a 100644 > --- a/include/linux/i8253.h > +++ b/include/linux/i8253.h > @@ -21,7 +21,6 @@ > #define PIT_LATCH ((PIT_TICK_RATE + HZ/2) / HZ) > > extern raw_spinlock_t i8253_lock; > -extern bool i8253_clear_counter_on_shutdown; > extern struct clock_event_device i8253_clockevent; > extern void clockevent_i8253_init(bool oneshot); > > -- > 2.9.4 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2023-02-08 1:04 ` Michael Kelley (LINUX) @ 2023-02-08 15:04 ` Sean Christopherson 2023-02-24 9:45 ` Li,Rongqing [not found] ` <3b8496c071214bda9e5ecfa048f18ab9@baidu.com> 2024-08-01 9:00 ` David Woodhouse 1 sibling, 2 replies; 33+ messages in thread From: Sean Christopherson @ 2023-02-08 15:04 UTC (permalink / raw) To: Michael Kelley (LINUX) Cc: lirongqing@baidu.com, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org On Wed, Feb 08, 2023, Michael Kelley (LINUX) wrote: > From: lirongqing@baidu.com <lirongqing@baidu.com> Sent: Monday, February 6, 2023 5:15 PM > > > > Zeroing the counter register in pit_shutdown() isn't actually supposed to > > stop it from counting, will causes the PIT to start running again, > > From the spec: > > > > The largest possible initial count is 0; this is equivalent to 216 for > > binary counting and 104 for BCD counting. > > > > The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 the > > Counter "wraps around" to the highest count, either FFFF hex for binary > > count- ing or 9999 for BCD counting, and continues counting. > > > > Mode 0 is typically used for event counting. After the Control Word is > > written, OUT is initially low, and will remain low until the Counter > > reaches zero. OUT then goes high and remains high until a new count or a > > new Mode 0 Control Word is written into the Counter. > > > > Hyper-V and KVM follow the spec, the issue that 35b69a42 "(clockevents/drivers/ > > i8253: Add support for PIT shutdown quirk") fixed is in i8253 drivers, not Hyper-v, > > so delete the zero timer counter register in shutdown, and delete PIT shutdown > > quirk for Hyper-v > > From the standpoint of Hyper-V, I'm good with this change. But there's a > risk that old hardware might not be compliant with the spec, and needs the > zero'ing for some reason. The experts in the x86 space will be in the best > position to assess the risk. Yep, my feeling exactly. My input is purely from reading those crusty old specs. ^ permalink raw reply [flat|nested] 33+ messages in thread
* RE: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2023-02-08 15:04 ` Sean Christopherson @ 2023-02-24 9:45 ` Li,Rongqing [not found] ` <3b8496c071214bda9e5ecfa048f18ab9@baidu.com> 1 sibling, 0 replies; 33+ messages in thread From: Li,Rongqing @ 2023-02-24 9:45 UTC (permalink / raw) To: Sean Christopherson, Michael Kelley (LINUX), tglx@linutronix.de Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org > > > Hyper-V and KVM follow the spec, the issue that 35b69a42 > > > "(clockevents/drivers/ > > > i8253: Add support for PIT shutdown quirk") fixed is in i8253 > > > drivers, not Hyper-v, so delete the zero timer counter register in > > > shutdown, and delete PIT shutdown quirk for Hyper-v > > > > From the standpoint of Hyper-V, I'm good with this change. But > > there's a risk that old hardware might not be compliant with the spec, > > and needs the zero'ing for some reason. The experts in the x86 space > > will be in the best position to assess the risk. > > Yep, my feeling exactly. My input is purely from reading those crusty old specs. Hi Thomas: Could you give some suggestion about this patch? Thanks -Li ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <3b8496c071214bda9e5ecfa048f18ab9@baidu.com>]
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown [not found] ` <3b8496c071214bda9e5ecfa048f18ab9@baidu.com> @ 2023-04-13 1:28 ` Wei Liu [not found] ` <1311175816673.202304.ZDdawTGHoa/UH20U@liuwe-devbox-debian-v2> 1 sibling, 0 replies; 33+ messages in thread From: Wei Liu @ 2023-04-13 1:28 UTC (permalink / raw) To: Li,Rongqing Cc: Sean Christopherson, Michael Kelley (LINUX), KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org On Wed, Apr 12, 2023 at 12:02:04PM +0000, Li,Rongqing wrote: > > > > -----Original Message----- > > From: Sean Christopherson <seanjc@google.com> > > Sent: Wednesday, February 8, 2023 11:04 PM > > To: Michael Kelley (LINUX) <mikelley@microsoft.com> > > Cc: Li,Rongqing <lirongqing@baidu.com>; KY Srinivasan <kys@microsoft.com>; > > Haiyang Zhang <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui > > <decui@microsoft.com>; tglx@linutronix.de; mingo@redhat.com; > > bp@alien8.de; dave.hansen@linux.intel.com; x86@kernel.org; > > linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org > > Subject: Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in > > shutdown > > > > On Wed, Feb 08, 2023, Michael Kelley (LINUX) wrote: > > > From: lirongqing@baidu.com <lirongqing@baidu.com> Sent: Monday, > > > February 6, 2023 5:15 PM > > > > > > > > Zeroing the counter register in pit_shutdown() isn't actually > > > > supposed to stop it from counting, will causes the PIT to start > > > > running again, From the spec: > > > > > > > > The largest possible initial count is 0; this is equivalent to 216 for > > > > binary counting and 104 for BCD counting. > > > > > > > > The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 > > the > > > > Counter "wraps around" to the highest count, either FFFF hex for binary > > > > count- ing or 9999 for BCD counting, and continues counting. > > > > > > > > Mode 0 is typically used for event counting. After the Control Word is > > > > written, OUT is initially low, and will remain low until the Counter > > > > reaches zero. OUT then goes high and remains high until a new count or a > > > > new Mode 0 Control Word is written into the Counter. > > > > > > > > Hyper-V and KVM follow the spec, the issue that 35b69a42 > > > > "(clockevents/drivers/ > > > > i8253: Add support for PIT shutdown quirk") fixed is in i8253 > > > > drivers, not Hyper-v, so delete the zero timer counter register in > > > > shutdown, and delete PIT shutdown quirk for Hyper-v > > > > > > From the standpoint of Hyper-V, I'm good with this change. But > > > there's a risk that old hardware might not be compliant with the spec, > > > and needs the zero'ing for some reason. The experts in the x86 space > > > will be in the best position to assess the risk. > > > > Yep, my feeling exactly. My input is purely from reading those crusty old specs. > > > > Ping I guess you want Thomas Gleixner and Daniel Lezcano's attention. > > Thanks > > -Li ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <1311175816673.202304.ZDdawTGHoa/UH20U@liuwe-devbox-debian-v2>]
* RE: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown [not found] ` <1311175816673.202304.ZDdawTGHoa/UH20U@liuwe-devbox-debian-v2> @ 2023-04-14 5:17 ` Li,Rongqing 0 siblings, 0 replies; 33+ messages in thread From: Li,Rongqing @ 2023-04-14 5:17 UTC (permalink / raw) To: Wei Liu, tglx@linutronix.de, daniel.lezcano@linaro.org Cc: Sean Christopherson, Michael Kelley (LINUX), KY Srinivasan, Haiyang Zhang, Dexuan Cui, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org > > > > > > Ping > > I guess you want Thomas Gleixner and Daniel Lezcano's attention. > Yes, Thanks you -Li ^ permalink raw reply [flat|nested] 33+ messages in thread
* RE: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2023-02-08 1:04 ` Michael Kelley (LINUX) 2023-02-08 15:04 ` Sean Christopherson @ 2024-08-01 9:00 ` David Woodhouse 2024-08-01 15:22 ` David Woodhouse 1 sibling, 1 reply; 33+ messages in thread From: David Woodhouse @ 2024-08-01 9:00 UTC (permalink / raw) To: mikelley, kvm Cc: bp, dave.hansen, decui, haiyangz, kys, linux-hyperv, linux-kernel, lirongqing, mingo, seanjc, tglx, wei.liu, x86 [-- Attachment #1: Type: text/plain, Size: 2161 bytes --] On 2023-02-08 at 01:04:19 +0000, "Michael Kelley (LINUX)" <mikelley@microsoft.com> said: > From: lirongqing@baidu.com <lirongqing@baidu.com> Sent: Monday, February 6, 2023 5:15 PM > > > > Zeroing the counter register in pit_shutdown() isn't actually supposed to > > stop it from counting, will causes the PIT to start running again, > > From the spec: > > > > The largest possible initial count is 0; this is equivalent to 216 for > > binary counting and 104 for BCD counting. > > > > The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 the > > Counter "wraps around" to the highest count, either FFFF hex for binary > > count- ing or 9999 for BCD counting, and continues counting. > > > > Mode 0 is typically used for event counting. After the Control Word is > > written, OUT is initially low, and will remain low until the Counter > > reaches zero. OUT then goes high and remains high until a new count or a > > new Mode 0 Control Word is written into the Counter. > > > > Hyper-V and KVM follow the spec, the issue that 35b69a42 "(clockevents/drivers/ > > i8253: Add support for PIT shutdown quirk") fixed is in i8253 drivers, not Hyper-v, > > so delete the zero timer counter register in shutdown, and delete PIT shutdown > > quirk for Hyper-v > > From the standpoint of Hyper-V, I'm good with this change. But there's a > risk that old hardware might not be compliant with the spec, and needs the > zero'ing for some reason. The experts in the x86 space will be in the best > position to assess the risk. At the time, the quirk approach was taken so > the change applied only to Hyper-V, and any such risk was avoided. > > For Hyper-V, > Reviewed-by: Michael Kelley <mikelley@microsoft.com> It's not entirely clear why we zero it at all. What was it supposed to achieve? I suppose if we want to be ultra-careful, perhaps we could do the zeroing only on real hardware, if hypervisor_is_type(X86_HYPER_NATIVE)? This does seem to be making VMMs spend a bunch of time stealing cycles from the guest to emulate an unwanted timer, so it would be good to fix it. [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 9:00 ` David Woodhouse @ 2024-08-01 15:22 ` David Woodhouse 2024-08-01 21:07 ` Thomas Gleixner 0 siblings, 1 reply; 33+ messages in thread From: David Woodhouse @ 2024-08-01 15:22 UTC (permalink / raw) To: mikelley, kvm Cc: bp, dave.hansen, decui, haiyangz, kys, linux-hyperv, linux-kernel, lirongqing, mingo, seanjc, tglx, wei.liu, x86 [-- Attachment #1: Type: text/plain, Size: 4465 bytes --] On Thu, 2024-08-01 at 10:00 +0100, David Woodhouse wrote: > > > > > > > > On 2023-02-08 at 01:04:19 +0000, "Michael Kelley (LINUX)" <mikelley@microsoft.com> said: > > > > From: lirongqing@baidu.com <lirongqing@baidu.com> Sent: Monday, February 6, 2023 5:15 PM > > > > > > > > > > > > Zeroing the counter register in pit_shutdown() isn't actually supposed to > > > > > > stop it from counting, will causes the PIT to start running again, > > > > > > From the spec: > > > > > > > > > > > > The largest possible initial count is 0; this is equivalent to 216 for > > > > > > binary counting and 104 for BCD counting. > > > > > > > > > > > > The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 the > > > > > > Counter "wraps around" to the highest count, either FFFF hex for binary > > > > > > count- ing or 9999 for BCD counting, and continues counting. > > > > > > > > > > > > Mode 0 is typically used for event counting. After the Control Word is > > > > > > written, OUT is initially low, and will remain low until the Counter > > > > > > reaches zero. OUT then goes high and remains high until a new count or a > > > > > > new Mode 0 Control Word is written into the Counter. > > > > > > > > > > > > Hyper-V and KVM follow the spec, the issue that 35b69a42 "(clockevents/drivers/ > > > > > > i8253: Add support for PIT shutdown quirk") fixed is in i8253 drivers, not Hyper-v, > > > > > > so delete the zero timer counter register in shutdown, and delete PIT shutdown > > > > > > quirk for Hyper-v > > > > > > > > From the standpoint of Hyper-V, I'm good with this change. But there's a > > > > risk that old hardware might not be compliant with the spec, and needs the > > > > zero'ing for some reason. The experts in the x86 space will be in the best > > > > position to assess the risk. At the time, the quirk approach was taken so > > > > the change applied only to Hyper-V, and any such risk was avoided. > > > > > > > > For Hyper-V, > > > > Reviewed-by: Michael Kelley <mikelley@microsoft.com> > > > > It's not entirely clear why we zero it at all. What was it supposed to > > achieve? Answering my own question here, it seems that *some* implementations don't apply the mode change until the counter is subsequently written. I think QEMU applies it immediately. But the KVM in-kernel one (and the AWS Nitro Hypervisor) do not. So the interrupt doesn't shut up until you write the counter. I don't see any justification for writing the counter causing any more than *one* further interrupt though, in single-shot mode. If that's what Hyper-V does, that seems like a bug (which is worked around already). I do see the VMM wasting a bunch of time emulating a PIT IRQ that goes nowhere, but that seems to be because the bootloader or BIOS turns it on, and nothing in Linux ever turns it back *off* again unless it's registered at boot and then pit_shutdown() is called when we switch to something better. I'll look at an optimisation in the VMM which can stop firing the timer if the PIT IRQ is already masked or pending in the PIC and IOAPIC; that's just a waste of steal time. But the guest kernel should probably do something like this... diff --git a/arch/x86/kernel/i8253.c b/arch/x86/kernel/i8253.c index 2b7999a1a50a..d13fd793254e 100644 --- a/arch/x86/kernel/i8253.c +++ b/arch/x86/kernel/i8253.c @@ -8,6 +8,7 @@ #include <linux/timex.h> #include <linux/i8253.h> +#include <asm/hypervisor.h> #include <asm/apic.h> #include <asm/hpet.h> #include <asm/time.h> @@ -39,8 +40,29 @@ static bool __init use_pit(void) bool __init pit_timer_init(void) { - if (!use_pit()) + if (!use_pit()) { + if (!hypervisor_is_type(X86_HYPER_NATIVE)) { + /* + * Don't just ignore the PIT. Ensure it's stopped, + * because VMMs otherwise steal CPU time just to + * pointlessly waggle the (masked) IRQ. + */ + raw_spin_lock(&i8253_lock); + outb_p(0x30, PIT_MODE); + + /* + * It's not entirely clear from the datasheet, but some + * virtual implementations don't stop until the counter + * is actually written. + */ + if (i8253_clear_counter_on_shutdown) { + outb_p(0, PIT_CH0); + outb_p(0, PIT_CH0); + } + raw_spin_unlock(&i8253_lock); + } return false; + } clockevent_i8253_init(true); global_clock_event = &i8253_clockevent; > [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 15:22 ` David Woodhouse @ 2024-08-01 21:07 ` Thomas Gleixner 2024-08-01 21:10 ` David Woodhouse 0 siblings, 1 reply; 33+ messages in thread From: Thomas Gleixner @ 2024-08-01 21:07 UTC (permalink / raw) To: David Woodhouse, mikelley, kvm Cc: bp, dave.hansen, decui, haiyangz, kys, linux-hyperv, linux-kernel, lirongqing, mingo, seanjc, wei.liu, x86 On Thu, Aug 01 2024 at 16:22, David Woodhouse wrote: > On Thu, 2024-08-01 at 10:00 +0100, David Woodhouse wrote: > bool __init pit_timer_init(void) > { > - if (!use_pit()) > + if (!use_pit()) { > + if (!hypervisor_is_type(X86_HYPER_NATIVE)) { > + /* > + * Don't just ignore the PIT. Ensure it's stopped, > + * because VMMs otherwise steal CPU time just to > + * pointlessly waggle the (masked) IRQ. > + */ > + raw_spin_lock(&i8253_lock); > + outb_p(0x30, PIT_MODE); > + > + /* > + * It's not entirely clear from the datasheet, but some > + * virtual implementations don't stop until the counter > + * is actually written. > + */ > + if (i8253_clear_counter_on_shutdown) { > + outb_p(0, PIT_CH0); > + outb_p(0, PIT_CH0); > + } > + raw_spin_unlock(&i8253_lock); > + } > return false; > + } That's just wrong. What we want is to have the underlying problem fixed in the driver and then make: > clockevent_i8253_init(true); bool clockevent_i8253_init(bool enable, bool oneshot); so it can invoke the shutdown sequence instead of registering the pile: if (!enable) { shutdown(); return false; } ... return true; and the call site becomes: if (!clockevent_i8253_init(use_pit(), true)) return false; No? Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 21:07 ` Thomas Gleixner @ 2024-08-01 21:10 ` David Woodhouse 0 siblings, 0 replies; 33+ messages in thread From: David Woodhouse @ 2024-08-01 21:10 UTC (permalink / raw) To: Thomas Gleixner, mikelley, kvm Cc: bp, dave.hansen, decui, haiyangz, kys, linux-hyperv, linux-kernel, lirongqing, mingo, seanjc, wei.liu, x86 On 1 August 2024 22:07:25 BST, Thomas Gleixner <tglx@linutronix.de> wrote: >On Thu, Aug 01 2024 at 16:22, David Woodhouse wrote: >> On Thu, 2024-08-01 at 10:00 +0100, David Woodhouse wrote: >> bool __init pit_timer_init(void) >> { >> - if (!use_pit()) >> + if (!use_pit()) { >> + if (!hypervisor_is_type(X86_HYPER_NATIVE)) { >> + /* >> + * Don't just ignore the PIT. Ensure it's stopped, >> + * because VMMs otherwise steal CPU time just to >> + * pointlessly waggle the (masked) IRQ. >> + */ >> + raw_spin_lock(&i8253_lock); >> + outb_p(0x30, PIT_MODE); >> + >> + /* >> + * It's not entirely clear from the datasheet, but some >> + * virtual implementations don't stop until the counter >> + * is actually written. >> + */ >> + if (i8253_clear_counter_on_shutdown) { >> + outb_p(0, PIT_CH0); >> + outb_p(0, PIT_CH0); >> + } >> + raw_spin_unlock(&i8253_lock); >> + } >> return false; >> + } > >That's just wrong. What we want is to have the underlying problem >fixed in the driver and then make: > >> clockevent_i8253_init(true); > >bool clockevent_i8253_init(bool enable, bool oneshot); > >so it can invoke the shutdown sequence instead of registering the pile: > > if (!enable) { > shutdown(); > return false; > } > ... > return true; > >and the call site becomes: > > if (!clockevent_i8253_init(use_pit(), true)) > return false; > >No? Yes. Well, kind of. The way I actually did it was by exposing the shutdown function instead of an "init" function which optionally did the opposite. But yes, I left the hardware-bashing in precisely once place, in the driver. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2023-02-07 1:14 [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown lirongqing 2023-02-08 1:04 ` Michael Kelley (LINUX) @ 2024-08-01 14:21 ` Thomas Gleixner 2024-08-01 16:14 ` Michael Kelley 2024-08-01 17:49 ` David Woodhouse 1 sibling, 2 replies; 33+ messages in thread From: Thomas Gleixner @ 2024-08-01 14:21 UTC (permalink / raw) To: lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel On Tue, Feb 07 2023 at 09:14, lirongqing@baidu.com wrote: > @@ -117,11 +110,6 @@ static int pit_shutdown(struct clock_event_device *evt) > > outb_p(0x30, PIT_MODE); > > - if (i8253_clear_counter_on_shutdown) { > - outb_p(0, PIT_CH0); > - outb_p(0, PIT_CH0); > - } > - The stop sequence is wrong: When there is a count in progress, writing a new LSB before the counter has counted down to 0 and rolled over to FFFFh, WILL stop the counter. However, if the LSB is loaded AFTER the counter has rolled over to FFFFh, so that an MSB now exists in the counter, then the counter WILL NOT stop. The original i8253 datasheet says: 1) Write 1st byte stops the current counting 2) Write 2nd byte starts the new count The above does not make sure it actually has not rolled over and it obviously is initiating the new count by writing the MSB too. So it does not work on real hardware either. The proper sequence is: // Switch to mode 0 outb_p(0x30, PIT_MODE); // Load the maximum value to ensure there is no rollover outb_p(0xff, PIT_CH0); // Writing MSB starts the counter from 0xFFFF and clears rollover outb_p(0xff, PIT_CH0); // Stop the counter by writing only LSB outb_p(0xff, PIT_CH0); That works on real hardware and fails on KVM... So much for the claim that KVM follows the spec :) So yes, the code is buggy, but instead of deleting it, we rather fix it, no? User space test program below. Thanks, tglx --- #include <stdio.h> #include <unistd.h> #include <sys/io.h> typedef unsigned char u8; typedef unsigned short u16; #define BUILDIO(bwl, bw, type) \ static __always_inline void __out##bwl(type value, u16 port) \ { \ asm volatile("out" #bwl " %" #bw "0, %w1" \ : : "a"(value), "Nd"(port)); \ } \ \ static __always_inline type __in##bwl(u16 port) \ { \ type value; \ asm volatile("in" #bwl " %w1, %" #bw "0" \ : "=a"(value) : "Nd"(port)); \ return value; \ } BUILDIO(b, b, u8) #define inb __inb #define outb __outb #define PIT_MODE 0x43 #define PIT_CH0 0x40 #define PIT_CH2 0x42 static int is8254; static void dump_pit(void) { if (is8254) { // Latch and output counter and status outb(0xC2, PIT_MODE); printf("%02x %02x %02x\n", inb(PIT_CH0), inb(PIT_CH0), inb(PIT_CH0)); } else { // Latch and output counter outb(0x0, PIT_MODE); printf("%02x %02x\n", inb(PIT_CH0), inb(PIT_CH0)); } } int main(int argc, char* argv[]) { if (argc > 1) is8254 = 1; if (ioperm(0x40, 4, 1) != 0) return 1; dump_pit(); printf("Set periodic\n"); outb(0x34, PIT_MODE); outb(0x00, PIT_CH0); outb(0x0F, PIT_CH0); dump_pit(); usleep(1000); dump_pit(); printf("Set oneshot\n"); outb(0x38, PIT_MODE); outb(0x00, PIT_CH0); outb(0x0F, PIT_CH0); dump_pit(); usleep(1000); dump_pit(); printf("Set stop\n"); outb(0x30, PIT_MODE); outb(0xFF, PIT_CH0); outb(0xFF, PIT_CH0); outb(0xFF, PIT_CH0); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); return 0; } ^ permalink raw reply [flat|nested] 33+ messages in thread
* RE: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 14:21 ` Thomas Gleixner @ 2024-08-01 16:14 ` Michael Kelley 2024-08-01 18:54 ` Thomas Gleixner 2024-08-01 17:49 ` David Woodhouse 1 sibling, 1 reply; 33+ messages in thread From: Michael Kelley @ 2024-08-01 16:14 UTC (permalink / raw) To: Thomas Gleixner, lirongqing@baidu.com, seanjc@google.com, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org From: Thomas Gleixner <tglx@linutronix.de> Sent: Thursday, August 1, 2024 7:21 AM > > On Tue, Feb 07 2023 at 09:14, lirongqing@baidu.com wrote: > > @@ -117,11 +110,6 @@ static int pit_shutdown(struct clock_event_device *evt) > > > > outb_p(0x30, PIT_MODE); > > > > - if (i8253_clear_counter_on_shutdown) { > > - outb_p(0, PIT_CH0); > > - outb_p(0, PIT_CH0); > > - } > > - > > The stop sequence is wrong: > > When there is a count in progress, writing a new LSB before the > counter has counted down to 0 and rolled over to FFFFh, WILL stop > the counter. However, if the LSB is loaded AFTER the counter has > rolled over to FFFFh, so that an MSB now exists in the counter, then > the counter WILL NOT stop. > > The original i8253 datasheet says: > > 1) Write 1st byte stops the current counting > 2) Write 2nd byte starts the new count > > The above does not make sure it actually has not rolled over and it > obviously is initiating the new count by writing the MSB too. So it does > not work on real hardware either. > > The proper sequence is: > > // Switch to mode 0 > outb_p(0x30, PIT_MODE); > // Load the maximum value to ensure there is no rollover > outb_p(0xff, PIT_CH0); > // Writing MSB starts the counter from 0xFFFF and clears rollover > outb_p(0xff, PIT_CH0); > // Stop the counter by writing only LSB > outb_p(0xff, PIT_CH0); > > That works on real hardware and fails on KVM... So much for the claim > that KVM follows the spec :) > > So yes, the code is buggy, but instead of deleting it, we rather fix it, > no? > FWIW, in Hyper-V guests with the Hyper-V quirk removed, tglx's new sequence does *not* stop the PIT. But this sequence does: outb_p(0x30, PIT_MODE); outb_p(0xff, PIT_CH0); outb_p(0xff, PIT_CH0); outb_p(0x30, PIT_MODE); outb_p(0xff, PIT_CH0); I don't have a convenient way to test my sequence on KVM. Michael ^ permalink raw reply [flat|nested] 33+ messages in thread
* RE: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 16:14 ` Michael Kelley @ 2024-08-01 18:54 ` Thomas Gleixner 2024-08-02 8:21 ` David Woodhouse 0 siblings, 1 reply; 33+ messages in thread From: Thomas Gleixner @ 2024-08-01 18:54 UTC (permalink / raw) To: Michael Kelley, lirongqing@baidu.com, seanjc@google.com, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, Aug 01 2024 at 16:14, Michael Kelley wrote: > From: Thomas Gleixner <tglx@linutronix.de> Sent: Thursday, August 1, 2024 7:21 AM > FWIW, in Hyper-V guests with the Hyper-V quirk removed, tglx's new > sequence does *not* stop the PIT. But this sequence does: > > outb_p(0x30, PIT_MODE); > outb_p(0xff, PIT_CH0); > outb_p(0xff, PIT_CH0); > > outb_p(0x30, PIT_MODE); > outb_p(0xff, PIT_CH0); That works on bare metal too > I don't have a convenient way to test my sequence on KVM. But still fails in KVM Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 18:54 ` Thomas Gleixner @ 2024-08-02 8:21 ` David Woodhouse 2024-08-02 14:55 ` Sean Christopherson 0 siblings, 1 reply; 33+ messages in thread From: David Woodhouse @ 2024-08-02 8:21 UTC (permalink / raw) To: Thomas Gleixner, Michael Kelley, lirongqing@baidu.com, seanjc@google.com, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1116 bytes --] On Thu, 2024-08-01 at 20:54 +0200, Thomas Gleixner wrote: > On Thu, Aug 01 2024 at 16:14, Michael Kelley wrote: > > From: Thomas Gleixner <tglx@linutronix.de> Sent: Thursday, August > > 1, 2024 7:21 AM > > FWIW, in Hyper-V guests with the Hyper-V quirk removed, tglx's new > > sequence does *not* stop the PIT. But this sequence does: > > > > outb_p(0x30, PIT_MODE); > > outb_p(0xff, PIT_CH0); > > outb_p(0xff, PIT_CH0); > > > > outb_p(0x30, PIT_MODE); > > outb_p(0xff, PIT_CH0); > > That works on bare metal too What about writing *just* the MODE? The datasheet definitely seems to be saying that should work too. And I believe it works on Hyper-V; that's what the "workaround" has been doing, because Hyper-V actually got it right. > > I don't have a convenient way to test my sequence on KVM. > > But still fails in KVM By KVM you mean the in-kernel one that we want to kill because everyone should be using userspace IRQ chips these days? I don't know that we care; the failure mode is basically harmless so let's just let it get fixed in KVM rather than pandering to it? [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-02 8:21 ` David Woodhouse @ 2024-08-02 14:55 ` Sean Christopherson 2024-08-02 15:04 ` David Woodhouse 0 siblings, 1 reply; 33+ messages in thread From: Sean Christopherson @ 2024-08-02 14:55 UTC (permalink / raw) To: David Woodhouse Cc: Thomas Gleixner, Michael Kelley, lirongqing@baidu.com, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org On Fri, Aug 02, 2024, David Woodhouse wrote: > On Thu, 2024-08-01 at 20:54 +0200, Thomas Gleixner wrote: > > On Thu, Aug 01 2024 at 16:14, Michael Kelley wrote: > > > I don't have a convenient way to test my sequence on KVM. > > > > But still fails in KVM > > By KVM you mean the in-kernel one that we want to kill because everyone > should be using userspace IRQ chips these days? What exactly do you want to kill? In-kernel local APIC obviously needs to stay for APICv/AVIC. And IMO, encouraging userspace I/O APIC emulation is a net negative for KVM and the community as a whole, as the number of VMMs in use these days results in a decent amount of duplicated work in userspace VMMs, especially when accounting for hardware and software quirks. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-02 14:55 ` Sean Christopherson @ 2024-08-02 15:04 ` David Woodhouse 2024-08-12 23:59 ` Sean Christopherson 0 siblings, 1 reply; 33+ messages in thread From: David Woodhouse @ 2024-08-02 15:04 UTC (permalink / raw) To: Sean Christopherson Cc: Thomas Gleixner, Michael Kelley, lirongqing@baidu.com, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1210 bytes --] On Fri, 2024-08-02 at 07:55 -0700, Sean Christopherson wrote: > On Fri, Aug 02, 2024, David Woodhouse wrote: > > On Thu, 2024-08-01 at 20:54 +0200, Thomas Gleixner wrote: > > > On Thu, Aug 01 2024 at 16:14, Michael Kelley wrote: > > > > I don't have a convenient way to test my sequence on KVM. > > > > > > But still fails in KVM > > > > By KVM you mean the in-kernel one that we want to kill because everyone > > should be using userspace IRQ chips these days? > > What exactly do you want to kill? In-kernel local APIC obviously needs to stay > for APICv/AVIC. The legacy PIT, PIC and I/O APIC. > And IMO, encouraging userspace I/O APIC emulation is a net negative for KVM and > the community as a whole, as the number of VMMs in use these days results in a > decent amount of duplicated work in userspace VMMs, especially when accounting > for hardware and software quirks. I don't particularly care, but I thought the general trend was towards split irqchip mode, with the local APIC in-kernel but i8259 PIC and I/O APIC (and the i8254 PIT, which was the topic of this discussion) being done in userspace. Especially if you want to support guests with APIC IDs > 255 :) [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-02 15:04 ` David Woodhouse @ 2024-08-12 23:59 ` Sean Christopherson 2024-08-13 6:39 ` David Woodhouse 0 siblings, 1 reply; 33+ messages in thread From: Sean Christopherson @ 2024-08-12 23:59 UTC (permalink / raw) To: David Woodhouse Cc: Thomas Gleixner, Michael Kelley, lirongqing@baidu.com, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org On Fri, Aug 02, 2024, David Woodhouse wrote: > On Fri, 2024-08-02 at 07:55 -0700, Sean Christopherson wrote: > > On Fri, Aug 02, 2024, David Woodhouse wrote: > > > On Thu, 2024-08-01 at 20:54 +0200, Thomas Gleixner wrote: > > > > On Thu, Aug 01 2024 at 16:14, Michael Kelley wrote: > > > > > I don't have a convenient way to test my sequence on KVM. > > > > > > > > But still fails in KVM > > > > > > By KVM you mean the in-kernel one that we want to kill because everyone > > > should be using userspace IRQ chips these days? > > > > What exactly do you want to kill? In-kernel local APIC obviously needs to stay > > for APICv/AVIC. > > The legacy PIT, PIC and I/O APIC. > > > And IMO, encouraging userspace I/O APIC emulation is a net negative for KVM and > > the community as a whole, as the number of VMMs in use these days results in a > > decent amount of duplicated work in userspace VMMs, especially when accounting > > for hardware and software quirks. > > I don't particularly care, but I thought the general trend was towards > split irqchip mode, with the local APIC in-kernel but i8259 PIC and I/O > APIC (and the i8254 PIT, which was the topic of this discussion) being > done in userspace. Yeah, that's where most everyone is headed, if not already there. Letting the I/O APIC live in userspace is probably the right direction long term, I just don't love that every VMM seems to have it's own slightly different version. But I think the answer to that is to build a library for (legacy?) device emulation so that VMMs can link to an implementation instead of copy+pasting from somwhere else and inevitably ending up with code that's frozen in time. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-12 23:59 ` Sean Christopherson @ 2024-08-13 6:39 ` David Woodhouse 0 siblings, 0 replies; 33+ messages in thread From: David Woodhouse @ 2024-08-13 6:39 UTC (permalink / raw) To: Sean Christopherson Cc: Thomas Gleixner, Michael Kelley, lirongqing@baidu.com, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org On 13 August 2024 00:59:40 BST, Sean Christopherson <seanjc@google.com> wrote: >On Fri, Aug 02, 2024, David Woodhouse wrote: >> On Fri, 2024-08-02 at 07:55 -0700, Sean Christopherson wrote: >> > On Fri, Aug 02, 2024, David Woodhouse wrote: >> > > On Thu, 2024-08-01 at 20:54 +0200, Thomas Gleixner wrote: >> > > > On Thu, Aug 01 2024 at 16:14, Michael Kelley wrote: >> > > > > I don't have a convenient way to test my sequence on KVM. >> > > > >> > > > But still fails in KVM >> > > >> > > By KVM you mean the in-kernel one that we want to kill because everyone >> > > should be using userspace IRQ chips these days? >> > >> > What exactly do you want to kill? In-kernel local APIC obviously needs to stay >> > for APICv/AVIC. >> >> The legacy PIT, PIC and I/O APIC. >> >> > And IMO, encouraging userspace I/O APIC emulation is a net negative for KVM and >> > the community as a whole, as the number of VMMs in use these days results in a >> > decent amount of duplicated work in userspace VMMs, especially when accounting >> > for hardware and software quirks. >> >> I don't particularly care, but I thought the general trend was towards >> split irqchip mode, with the local APIC in-kernel but i8259 PIC and I/O >> APIC (and the i8254 PIT, which was the topic of this discussion) being >> done in userspace. > >Yeah, that's where most everyone is headed, if not already there. Letting the >I/O APIC live in userspace is probably the right direction long term, I just don't >love that every VMM seems to have it's own slightly different version. But I think >the answer to that is to build a library for (legacy?) device emulation so that >VMMs can link to an implementation instead of copy+pasting from somwhere else and >inevitably ending up with code that's frozen in time. Some would say the right answer is to present a micro-vm machine model that doesn't have any of that crap at all. Sadly we're going in the wrong direction. For >255 vCPUs on AMD machines it looks like we even have to emulate a full virtual IOMMU with DMA translation support. Well done, AMD! (Linux is OK with the 15-bit Extended Destination ID, but not Windows) ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 14:21 ` Thomas Gleixner 2024-08-01 16:14 ` Michael Kelley @ 2024-08-01 17:49 ` David Woodhouse 2024-08-01 18:25 ` David Woodhouse 2024-08-01 19:06 ` Thomas Gleixner 1 sibling, 2 replies; 33+ messages in thread From: David Woodhouse @ 2024-08-01 17:49 UTC (permalink / raw) To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1910 bytes --] On Thu, 2024-08-01 at 16:21 +0200, Thomas Gleixner wrote: > On Tue, Feb 07 2023 at 09:14, lirongqing@baidu.com wrote: > > @@ -117,11 +110,6 @@ static int pit_shutdown(struct clock_event_device *evt) > > > > outb_p(0x30, PIT_MODE); > > > > - if (i8253_clear_counter_on_shutdown) { > > - outb_p(0, PIT_CH0); > > - outb_p(0, PIT_CH0); > > - } > > - > > The stop sequence is wrong: > > When there is a count in progress, writing a new LSB before the > counter has counted down to 0 and rolled over to FFFFh, WILL stop > the counter. However, if the LSB is loaded AFTER the counter has > rolled over to FFFFh, so that an MSB now exists in the counter, then > the counter WILL NOT stop. > > The original i8253 datasheet says: > > 1) Write 1st byte stops the current counting > 2) Write 2nd byte starts the new count It says that for mode zero ("Interrupt on Terminal Count"), yes. But in that mode, shouldn't the IRQ only fire *one* more time anyway, rather than repeatedly? That should be OK, shouldn't it? "When terminal count is reached, the output will go high and remain high until the selected count register is reloaded wityh the mode or a new count is loaded". It's OK for it to keep *counting* as long as it stops firing interrupts, isn't it? Either way, this is somewhat orthogonal to the patch I posted in https://lore.kernel.org/kvm/6cd62b5058e11a6262cb2e798cc85cc5daead3b1.camel@infradead.org/T/#u for the fact that we don't shut down the PIT at *all* if we aren't ever going to use it. I'm glad I decided to export a function from the clocksource driver and just *call* it from pit_timer_init() though. Means we can bikeshed the shutdown sequence in *one* place and it isn't duplicated. [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 17:49 ` David Woodhouse @ 2024-08-01 18:25 ` David Woodhouse 2024-08-01 18:57 ` Thomas Gleixner 2024-08-01 19:06 ` Thomas Gleixner 1 sibling, 1 reply; 33+ messages in thread From: David Woodhouse @ 2024-08-01 18:25 UTC (permalink / raw) To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2080 bytes --] On Thu, 2024-08-01 at 18:49 +0100, David Woodhouse wrote: > On Thu, 2024-08-01 at 16:21 +0200, Thomas Gleixner wrote: > > On Tue, Feb 07 2023 at 09:14, lirongqing@baidu.com wrote: > > > @@ -117,11 +110,6 @@ static int pit_shutdown(struct clock_event_device *evt) > > > > > > outb_p(0x30, PIT_MODE); > > > > > > - if (i8253_clear_counter_on_shutdown) { > > > - outb_p(0, PIT_CH0); > > > - outb_p(0, PIT_CH0); > > > - } > > > - > > > > The stop sequence is wrong: > > > > When there is a count in progress, writing a new LSB before the > > counter has counted down to 0 and rolled over to FFFFh, WILL stop > > the counter. However, if the LSB is loaded AFTER the counter has > > rolled over to FFFFh, so that an MSB now exists in the counter, then > > the counter WILL NOT stop. > > > > The original i8253 datasheet says: > > > > 1) Write 1st byte stops the current counting > > 2) Write 2nd byte starts the new count > It also prefixes that with "Rewriting a counter register during counting results in the following:". But after you write the MODE register, is it actually supposed to be counting? Just a little further up, under 'Counter Loading', it says: "The count register is not loaded until the count value is written (one or two bytes, depending on the mode selected by the RL bits), followed by a rising edge and a falling edge of the clock. Any read of the counter prior to that falling clock edge may yield invalid data". OK, but what *triggers* that invalid state? Given that it explicitly says that a one-byte counter write ends that state, it isn't the first of two bytes. Surely that means that from the time the MODE register is written, any read of the counter may yield invalid data, until the counter is written? I suspect there are as many implementations (virt and hardware) as there are reasonable interpretations of the spec... and then some. [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 18:25 ` David Woodhouse @ 2024-08-01 18:57 ` Thomas Gleixner 2024-08-02 8:07 ` David Woodhouse 0 siblings, 1 reply; 33+ messages in thread From: Thomas Gleixner @ 2024-08-01 18:57 UTC (permalink / raw) To: David Woodhouse, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel On Thu, Aug 01 2024 at 19:25, David Woodhouse wrote: > On Thu, 2024-08-01 at 18:49 +0100, David Woodhouse wrote: >> > The stop sequence is wrong: >> > >> > When there is a count in progress, writing a new LSB before the >> > counter has counted down to 0 and rolled over to FFFFh, WILL stop >> > the counter. However, if the LSB is loaded AFTER the counter has >> > rolled over to FFFFh, so that an MSB now exists in the counter, then >> > the counter WILL NOT stop. >> > >> > The original i8253 datasheet says: >> > >> > 1) Write 1st byte stops the current counting >> > 2) Write 2nd byte starts the new count >> > > It also prefixes that with "Rewriting a counter register during > counting results in the following:". > > But after you write the MODE register, is it actually supposed to be > counting? Just a little further up, under 'Counter Loading', it says: It's not counting right out of reset. But once it started counting it's tedious to stop :) > "The count register is not loaded until the count value is written (one > or two bytes, depending on the mode selected by the RL bits), followed > by a rising edge and a falling edge of the clock. Any read of the > counter prior to that falling clock edge may yield invalid data". > > OK, but what *triggers* that invalid state? Given that it explicitly > says that a one-byte counter write ends that state, it isn't the first > of two bytes. Surely that means that from the time the MODE register is > written, any read of the counter may yield invalid data, until the > counter is written? It seems to keep ticking with the old value. > I suspect there are as many implementations (virt and hardware) as > there are reasonable interpretations of the spec... and then some. Indeed. Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 18:57 ` Thomas Gleixner @ 2024-08-02 8:07 ` David Woodhouse 2024-08-02 10:49 ` Thomas Gleixner 0 siblings, 1 reply; 33+ messages in thread From: David Woodhouse @ 2024-08-02 8:07 UTC (permalink / raw) To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel [-- Attachment #1: Type: text/plain, Size: 3062 bytes --] On Thu, 2024-08-01 at 20:57 +0200, Thomas Gleixner wrote: > On Thu, Aug 01 2024 at 19:25, David Woodhouse wrote: > > On Thu, 2024-08-01 at 18:49 +0100, David Woodhouse wrote: > > > > The stop sequence is wrong: > > > > > > > > When there is a count in progress, writing a new LSB before the > > > > counter has counted down to 0 and rolled over to FFFFh, WILL stop > > > > the counter. However, if the LSB is loaded AFTER the counter has > > > > rolled over to FFFFh, so that an MSB now exists in the counter, then > > > > the counter WILL NOT stop. > > > > > > > > The original i8253 datasheet says: > > > > > > > > 1) Write 1st byte stops the current counting > > > > 2) Write 2nd byte starts the new count > > > > > > > It also prefixes that with "Rewriting a counter register during > > counting results in the following:". > > > > But after you write the MODE register, is it actually supposed to be > > counting? Just a little further up, under 'Counter Loading', it says: > > It's not counting right out of reset. But once it started counting it's > tedious to stop :) My reading of the data sheet definitely suggests that it *shouldn't* be. Mode 0 says: "The output will be initially low after the mode set operation. After the count is loaded into the selected count register... the counter will count." Mode 2 says: "When this mode is set, the output will remain high until after the count register is loaded." Mode 4 says: "After the mode is set, the output will be high. When the count is loaded, the counter will begin counting". All of that strongly hints to me that a *compliant* implementation (haha) would stop the interrupts (and the count) when the MODE is set. So writing *only* the mode ought to work, in theory. If the failure mode is just that a bad implementation takes a little bit more power or steal time, I wonder if that's what we should do? It's what Hyper-V wants, it seems. And if other hypervisors/VMMs want to avoid taking pointless steal time, they can fix their bugs. (... gets more coffee and starts fixing bugs ...) > > "The count register is not loaded until the count value is written (one > > or two bytes, depending on the mode selected by the RL bits), followed > > by a rising edge and a falling edge of the clock. Any read of the > > counter prior to that falling clock edge may yield invalid data". > > > > OK, but what *triggers* that invalid state? Given that it explicitly > > says that a one-byte counter write ends that state, it isn't the first > > of two bytes. Surely that means that from the time the MODE register is > > written, any read of the counter may yield invalid data, until the > > counter is written? > > It seems to keep ticking with the old value. > > > I suspect there are as many implementations (virt and hardware) as > > there are reasonable interpretations of the spec... and then some. > > Indeed. > > Thanks, > > tglx > [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-02 8:07 ` David Woodhouse @ 2024-08-02 10:49 ` Thomas Gleixner 2024-08-02 11:04 ` David Woodhouse 0 siblings, 1 reply; 33+ messages in thread From: Thomas Gleixner @ 2024-08-02 10:49 UTC (permalink / raw) To: David Woodhouse, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel On Fri, Aug 02 2024 at 09:07, David Woodhouse wrote: > On Thu, 2024-08-01 at 20:57 +0200, Thomas Gleixner wrote: >> It's not counting right out of reset. But once it started counting it's >> tedious to stop :) > > My reading of the data sheet definitely suggests that it *shouldn't* > be. > > Mode 0 says: "The output will be initially low after the mode set > operation. After the count is loaded into the selected count > register... the counter will count." Hmm. Indeed. That does not stop the counter, but it prevents the interrupt from firing over and over. So fine, we can go with the patch from Li, but the changelog needs a rewrite and the code want's a big fat comment. Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-02 10:49 ` Thomas Gleixner @ 2024-08-02 11:04 ` David Woodhouse 2024-08-02 13:27 ` Thomas Gleixner 0 siblings, 1 reply; 33+ messages in thread From: David Woodhouse @ 2024-08-02 11:04 UTC (permalink / raw) To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1569 bytes --] On Fri, 2024-08-02 at 12:49 +0200, Thomas Gleixner wrote: > On Fri, Aug 02 2024 at 09:07, David Woodhouse wrote: > > On Thu, 2024-08-01 at 20:57 +0200, Thomas Gleixner wrote: > > > It's not counting right out of reset. But once it started counting it's > > > tedious to stop :) > > > > My reading of the data sheet definitely suggests that it *shouldn't* > > be. > > > > Mode 0 says: "The output will be initially low after the mode set > > operation. After the count is loaded into the selected count > > register... the counter will count." > > Hmm. Indeed. That does not stop the counter, but it prevents the > interrupt from firing over and over. > > So fine, we can go with the patch from Li, but the changelog needs a > rewrite and the code want's a big fat comment. Nah, it wants to be MODE, COUNT, COUNT, MODE to handle all known implementations. Already posted as [PATCH 2/1] (with big fat comments and a version of your test) at https://lore.kernel.org/kvm/3bc237678ade809cc685fedb8c1a3d435e590639.camel@infradead.org/ Although I just realised that I edited the first patch (to *remove* the now-bogus comments about the stop sequence) before posting that one, so they don't follow cleanly from one another; there's a trivial conflict. I also forgot to remove the pre-1999 typedefs from the test program, despite fixing it to use <stdint.h> like it's the 21st century :) Top two commits of https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/clocks I'll repost properly if you're happy with them? [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-02 11:04 ` David Woodhouse @ 2024-08-02 13:27 ` Thomas Gleixner 2024-08-02 13:46 ` David Woodhouse 0 siblings, 1 reply; 33+ messages in thread From: Thomas Gleixner @ 2024-08-02 13:27 UTC (permalink / raw) To: David Woodhouse, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel On Fri, Aug 02 2024 at 12:04, David Woodhouse wrote: > On Fri, 2024-08-02 at 12:49 +0200, Thomas Gleixner wrote: >> So fine, we can go with the patch from Li, but the changelog needs a >> rewrite and the code want's a big fat comment. > > Nah, it wants to be MODE, COUNT, COUNT, MODE to handle all known > implementations. Yes. That works for whatever reason :) > Already posted as [PATCH 2/1] (with big fat comments and a version of > your test) at > > https://lore.kernel.org/kvm/3bc237678ade809cc685fedb8c1a3d435e590639.camel@infradead.org/ > > Although I just realised that I edited the first patch (to *remove* the > now-bogus comments about the stop sequence) before posting that one, so > they don't follow cleanly from one another; there's a trivial conflict. > I also forgot to remove the pre-1999 typedefs from the test program, > despite fixing it to use <stdint.h> like it's the 21st century :) Grandpas are allowed to use pre-1999 typedefs. :) > Top two commits of > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/clocks > > I'll repost properly if you're happy with them? Just make the disable unconditional. Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-02 13:27 ` Thomas Gleixner @ 2024-08-02 13:46 ` David Woodhouse 0 siblings, 0 replies; 33+ messages in thread From: David Woodhouse @ 2024-08-02 13:46 UTC (permalink / raw) To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel [-- Attachment #1: Type: text/plain, Size: 455 bytes --] On Fri, 2024-08-02 at 15:27 +0200, Thomas Gleixner wrote: > > Top two commits of > > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/clocks > > > > I'll repost properly if you're happy with them? > > Just make the disable unconditional. Oops, thought I'd done that too. Turns out I have to press the buttons on that big black slab in front of me, not just think about it. New series coming up after a brief smoke test. [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 17:49 ` David Woodhouse 2024-08-01 18:25 ` David Woodhouse @ 2024-08-01 19:06 ` Thomas Gleixner 2024-08-01 19:21 ` David Woodhouse 1 sibling, 1 reply; 33+ messages in thread From: Thomas Gleixner @ 2024-08-01 19:06 UTC (permalink / raw) To: David Woodhouse, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel On Thu, Aug 01 2024 at 18:49, David Woodhouse wrote: > On Thu, 2024-08-01 at 16:21 +0200, Thomas Gleixner wrote: >> The stop sequence is wrong: >> >> When there is a count in progress, writing a new LSB before the >> counter has counted down to 0 and rolled over to FFFFh, WILL stop >> the counter. However, if the LSB is loaded AFTER the counter has >> rolled over to FFFFh, so that an MSB now exists in the counter, then >> the counter WILL NOT stop. >> >> The original i8253 datasheet says: >> >> 1) Write 1st byte stops the current counting >> 2) Write 2nd byte starts the new count > > It says that for mode zero ("Interrupt on Terminal Count"), yes. But in > that mode, shouldn't the IRQ only fire *one* more time anyway, rather > than repeatedly? That should be OK, shouldn't it? > > "When terminal count is reached, the output will go high and remain > high until the selected count register is reloaded wityh the mode or a > new count is loaded". I just confirmed that this is the case on KVM. > It's OK for it to keep *counting* as long as it stops firing > interrupts, isn't it? Yes. So the sequence should stop KVM from trying to inject interrupts. Maybe someone fixes it to actually stop fiddling with the counter too :) > Either way, this is somewhat orthogonal to the patch I posted in > https://lore.kernel.org/kvm/6cd62b5058e11a6262cb2e798cc85cc5daead3b1.camel@infradead.org/T/#u > for the fact that we don't shut down the PIT at *all* if we aren't ever > going to use it. > > I'm glad I decided to export a function from the clocksource driver and > just *call* it from pit_timer_init() though. Means we can bikeshed the > shutdown sequence in *one* place and it isn't duplicated. Right. Though we don't have to make this conditional on hypervisor I think. Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 19:06 ` Thomas Gleixner @ 2024-08-01 19:21 ` David Woodhouse 2024-08-01 20:00 ` Thomas Gleixner 0 siblings, 1 reply; 33+ messages in thread From: David Woodhouse @ 2024-08-01 19:21 UTC (permalink / raw) To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2880 bytes --] On Thu, 2024-08-01 at 21:06 +0200, Thomas Gleixner wrote: > On Thu, Aug 01 2024 at 18:49, David Woodhouse wrote: > > On Thu, 2024-08-01 at 16:21 +0200, Thomas Gleixner wrote: > > > The stop sequence is wrong: > > > > > > When there is a count in progress, writing a new LSB before the > > > counter has counted down to 0 and rolled over to FFFFh, WILL stop > > > the counter. However, if the LSB is loaded AFTER the counter has > > > rolled over to FFFFh, so that an MSB now exists in the counter, then > > > the counter WILL NOT stop. > > > > > > The original i8253 datasheet says: > > > > > > 1) Write 1st byte stops the current counting > > > 2) Write 2nd byte starts the new count > > > > It says that for mode zero ("Interrupt on Terminal Count"), yes. But in > > that mode, shouldn't the IRQ only fire *one* more time anyway, rather > > than repeatedly? That should be OK, shouldn't it? > > > > "When terminal count is reached, the output will go high and remain > > high until the selected count register is reloaded wityh the mode or a > > new count is loaded". > > I just confirmed that this is the case on KVM. > > > It's OK for it to keep *counting* as long as it stops firing > > interrupts, isn't it? > > Yes. So the sequence should stop KVM from trying to inject > interrupts. Maybe someone fixes it to actually stop fiddling with the > counter too :) I don't think we care about the counter value, as that's *calculated* on demand when the guest tries to read from it. Or, more to the point, *if* the guest tries to read from it. As opposed to the interrupt, which is a timer in the VMM which takes a CPU out of guest mode and incurs steal time, just to waggle a pin on the emulated PICs for no good reason. > > Either way, this is somewhat orthogonal to the patch I posted in > > https://lore.kernel.org/kvm/6cd62b5058e11a6262cb2e798cc85cc5daead3b1.camel@infradead.org/T/#u > > for the fact that we don't shut down the PIT at *all* if we aren't ever > > going to use it. > > > > I'm glad I decided to export a function from the clocksource driver and > > just *call* it from pit_timer_init() though. Means we can bikeshed the > > shutdown sequence in *one* place and it isn't duplicated. > > Right. Though we don't have to make this conditional on hypervisor I > think. Right, we don't *have* to. I vacillated about that and almost ripped it out before sending the patch, but came down on the side of "hardware is a steaming pile of crap and if I don't *have* to change its behaviour, let's not touch it". I justify my cowardice on the basis that it doesn't *matter* if a hardware implementation is still toggling the IRQ pin; in that case it's only a few irrelevant transistors which are busy, and it doesn't translate to steal time. [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 19:21 ` David Woodhouse @ 2024-08-01 20:00 ` Thomas Gleixner 2024-08-01 20:49 ` David Woodhouse 0 siblings, 1 reply; 33+ messages in thread From: Thomas Gleixner @ 2024-08-01 20:00 UTC (permalink / raw) To: David Woodhouse, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel On Thu, Aug 01 2024 at 20:21, David Woodhouse wrote: > On Thu, 2024-08-01 at 21:06 +0200, Thomas Gleixner wrote: >> Yes. So the sequence should stop KVM from trying to inject >> interrupts. Maybe someone fixes it to actually stop fiddling with the >> counter too :) > > I don't think we care about the counter value, as that's *calculated* > on demand when the guest tries to read from it. Or, more to the point, > *if* the guest tries to read from it. > > As opposed to the interrupt, which is a timer in the VMM which takes a > CPU out of guest mode and incurs steal time, just to waggle a pin on > the emulated PICs for no good reason. Well, if the implementation still arms the timer in the background, then it matters because that has to be processed too. I haven't looked at that code at all, so what do I know. >> > I'm glad I decided to export a function from the clocksource driver and >> > just *call* it from pit_timer_init() though. Means we can bikeshed the >> > shutdown sequence in *one* place and it isn't duplicated. >> >> Right. Though we don't have to make this conditional on hypervisor I >> think. > > Right, we don't *have* to. I vacillated about that and almost ripped it > out before sending the patch, but came down on the side of "hardware is > a steaming pile of crap and if I don't *have* to change its behaviour, > let's not touch it". > > I justify my cowardice on the basis that it doesn't *matter* if a > hardware implementation is still toggling the IRQ pin; in that case > it's only a few irrelevant transistors which are busy, and it doesn't > translate to steal time. On real hardware it translates to power... Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 20:00 ` Thomas Gleixner @ 2024-08-01 20:49 ` David Woodhouse 2024-08-01 21:22 ` Thomas Gleixner 0 siblings, 1 reply; 33+ messages in thread From: David Woodhouse @ 2024-08-01 20:49 UTC (permalink / raw) To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2104 bytes --] On Thu, 2024-08-01 at 22:00 +0200, Thomas Gleixner wrote: > On Thu, Aug 01 2024 at 20:21, David Woodhouse wrote: > > On Thu, 2024-08-01 at 21:06 +0200, Thomas Gleixner wrote: > > > Yes. So the sequence should stop KVM from trying to inject > > > interrupts. Maybe someone fixes it to actually stop fiddling with the > > > counter too :) > > > > I don't think we care about the counter value, as that's *calculated* > > on demand when the guest tries to read from it. Or, more to the point, > > *if* the guest tries to read from it. > > > > As opposed to the interrupt, which is a timer in the VMM which takes a > > CPU out of guest mode and incurs steal time, just to waggle a pin on > > the emulated PICs for no good reason. > > Well, if the implementation still arms the timer in the background, then > it matters because that has to be processed too. I haven't looked at > that code at all, so what do I know. It only needs to arm the timer if it needs to send an interrupt. Otherwise, it's all event-driven from the guest interaction. > > > > I'm glad I decided to export a function from the clocksource driver and > > > > just *call* it from pit_timer_init() though. Means we can bikeshed the > > > > shutdown sequence in *one* place and it isn't duplicated. > > > > > > Right. Though we don't have to make this conditional on hypervisor I > > > think. > > > > Right, we don't *have* to. I vacillated about that and almost ripped it > > out before sending the patch, but came down on the side of "hardware is > > a steaming pile of crap and if I don't *have* to change its behaviour, > > let's not touch it". > > > > I justify my cowardice on the basis that it doesn't *matter* if a > > hardware implementation is still toggling the IRQ pin; in that case > > it's only a few irrelevant transistors which are busy, and it doesn't > > translate to steal time. > > On real hardware it translates to power... Perhaps, although I'd guess it's a negligible amount. Still, happy to be brave and make it unconditional. Want a new version of the patch? [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 20:49 ` David Woodhouse @ 2024-08-01 21:22 ` Thomas Gleixner 2024-08-01 21:31 ` David Woodhouse 0 siblings, 1 reply; 33+ messages in thread From: Thomas Gleixner @ 2024-08-01 21:22 UTC (permalink / raw) To: David Woodhouse, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel On Thu, Aug 01 2024 at 21:49, David Woodhouse wrote: > On Thu, 2024-08-01 at 22:00 +0200, Thomas Gleixner wrote: >> > I justify my cowardice on the basis that it doesn't *matter* if a >> > hardware implementation is still toggling the IRQ pin; in that case >> > it's only a few irrelevant transistors which are busy, and it doesn't >> > translate to steal time. >> >> On real hardware it translates to power... > > Perhaps, although I'd guess it's a negligible amount. Still, happy to > be brave and make it unconditional. Want a new version of the patch? Let'ss fix the shutdown sequence first (See Michaels latest mail) and then do the clockevents_i8253_init() change on top. Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 21:22 ` Thomas Gleixner @ 2024-08-01 21:31 ` David Woodhouse 2024-08-02 9:55 ` David Woodhouse 0 siblings, 1 reply; 33+ messages in thread From: David Woodhouse @ 2024-08-01 21:31 UTC (permalink / raw) To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel On 1 August 2024 22:22:56 BST, Thomas Gleixner <tglx@linutronix.de> wrote: >On Thu, Aug 01 2024 at 21:49, David Woodhouse wrote: >> On Thu, 2024-08-01 at 22:00 +0200, Thomas Gleixner wrote: >>> > I justify my cowardice on the basis that it doesn't *matter* if a >>> > hardware implementation is still toggling the IRQ pin; in that case >>> > it's only a few irrelevant transistors which are busy, and it doesn't >>> > translate to steal time. >>> >>> On real hardware it translates to power... >> >> Perhaps, although I'd guess it's a negligible amount. Still, happy to >> be brave and make it unconditional. Want a new version of the patch? > >Let'ss fix the shutdown sequence first (See Michaels latest mail) and >then do the clockevents_i8253_init() change on top. Makes sense. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown 2024-08-01 21:31 ` David Woodhouse @ 2024-08-02 9:55 ` David Woodhouse 0 siblings, 0 replies; 33+ messages in thread From: David Woodhouse @ 2024-08-02 9:55 UTC (permalink / raw) To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel [-- Attachment #1.1: Type: text/plain, Size: 2511 bytes --] On Thu, 2024-08-01 at 22:31 +0100, David Woodhouse wrote: > On 1 August 2024 22:22:56 BST, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Thu, Aug 01 2024 at 21:49, David Woodhouse wrote: > > > On Thu, 2024-08-01 at 22:00 +0200, Thomas Gleixner wrote: > > > > > I justify my cowardice on the basis that it doesn't *matter* if a > > > > > hardware implementation is still toggling the IRQ pin; in that case > > > > > it's only a few irrelevant transistors which are busy, and it doesn't > > > > > translate to steal time. > > > > > > > > On real hardware it translates to power... > > > > > > Perhaps, although I'd guess it's a negligible amount. Still, happy to > > > be brave and make it unconditional. Want a new version of the patch? > > > > Let'ss fix the shutdown sequence first (See Michaels latest mail) and > > then do the clockevents_i8253_init() change on top. > > Makes sense. On second thoughts, let's add clockevent_i8253_disable() first and call it when the PIT isn't being used, then we can bikeshed the precise behaviour of that function to our hearts' content. I think it should look like this. Revised version of your test program attached. void clockevent_i8253_disable(void) { raw_spin_lock(&i8253_lock); /* * Writing the MODE register should stop the counter, according to * the datasheet. This appears to work on real hardware (well, on * modern Intel and AMD boxes; I didn't dig the Pegasos out of the * shed). * * However, some virtual implementations differ, and the MODE change * doesn't have any effect until either the counter is written (KVM * in-kernel PIT) or the next interrupt (QEMU). And in those cases, * it may not stop the *count*, only the interrupts. Although in * the virt case, that probably doesn't matter, as the value of the * counter will only be calculated on demand if the guest reads it; * it's the interrupts which cause steal time. * * Hyper-V apparently has a bug where even in mode 0, the IRQ keeps * firing repeatedly if the counter is running. But it *does* do the * right thing when the MODE register is written. * * So: write the MODE and then load the counter, which ensures that * the IRQ is stopped on those buggy virt implementations. And then * write the MODE again, which is the right way to stop it. */ outb_p(0x30, PIT_MODE); outb_p(0, PIT_CH0); outb_p(0, PIT_CH0); outb_p(0x30, PIT_MODE); raw_spin_unlock(&i8253_lock); } [-- Attachment #1.2: i8254.c --] [-- Type: text/x-csrc, Size: 1914 bytes --] #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <stdint.h> #include <sys/io.h> typedef unsigned char uint8_t; typedef unsigned short uint16_t; #define BUILDIO(bwl, bw, type) \ static __always_inline void __out##bwl(type value, uint16_t port) \ { \ asm volatile("out" #bwl " %" #bw "0, %w1" \ : : "a"(value), "Nd"(port)); \ } \ \ static __always_inline type __in##bwl(uint16_t port) \ { \ type value; \ asm volatile("in" #bwl " %w1, %" #bw "0" \ : "=a"(value) : "Nd"(port)); \ return value; \ } BUILDIO(b, b, uint8_t) #define inb __inb #define outb __outb #define PIT_MODE 0x43 #define PIT_CH0 0x40 #define PIT_CH2 0x42 static int is8254; static void dump_pit(void) { if (is8254) { // Latch and output counter and status outb(0xC2, PIT_MODE); printf("%02x %02x %02x\n", inb(PIT_CH0), inb(PIT_CH0), inb(PIT_CH0)); } else { // Latch and output counter outb(0x0, PIT_MODE); printf("%02x %02x\n", inb(PIT_CH0), inb(PIT_CH0)); } } int main(int argc, char* argv[]) { int nr_counts = 2; if (argc > 1) nr_counts = atoi(argv[1]); if (argc > 2) is8254 = 1; if (ioperm(0x40, 4, 1) != 0) return 1; dump_pit(); printf("Set oneshot\n"); outb(0x38, PIT_MODE); outb(0x00, PIT_CH0); outb(0x0F, PIT_CH0); dump_pit(); usleep(1000); dump_pit(); printf("Set periodic\n"); outb(0x34, PIT_MODE); outb(0x00, PIT_CH0); outb(0x0F, PIT_CH0); dump_pit(); usleep(1000); dump_pit(); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); printf("Set stop (%d counter writes)\n", nr_counts); outb(0x30, PIT_MODE); while (nr_counts--) outb(0xFF, PIT_CH0); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); printf("Set MODE 0\n"); outb(0x30, PIT_MODE); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); return 0; } [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2024-08-13 6:40 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-07 1:14 [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown lirongqing
2023-02-08 1:04 ` Michael Kelley (LINUX)
2023-02-08 15:04 ` Sean Christopherson
2023-02-24 9:45 ` Li,Rongqing
[not found] ` <3b8496c071214bda9e5ecfa048f18ab9@baidu.com>
2023-04-13 1:28 ` Wei Liu
[not found] ` <1311175816673.202304.ZDdawTGHoa/UH20U@liuwe-devbox-debian-v2>
2023-04-14 5:17 ` Li,Rongqing
2024-08-01 9:00 ` David Woodhouse
2024-08-01 15:22 ` David Woodhouse
2024-08-01 21:07 ` Thomas Gleixner
2024-08-01 21:10 ` David Woodhouse
2024-08-01 14:21 ` Thomas Gleixner
2024-08-01 16:14 ` Michael Kelley
2024-08-01 18:54 ` Thomas Gleixner
2024-08-02 8:21 ` David Woodhouse
2024-08-02 14:55 ` Sean Christopherson
2024-08-02 15:04 ` David Woodhouse
2024-08-12 23:59 ` Sean Christopherson
2024-08-13 6:39 ` David Woodhouse
2024-08-01 17:49 ` David Woodhouse
2024-08-01 18:25 ` David Woodhouse
2024-08-01 18:57 ` Thomas Gleixner
2024-08-02 8:07 ` David Woodhouse
2024-08-02 10:49 ` Thomas Gleixner
2024-08-02 11:04 ` David Woodhouse
2024-08-02 13:27 ` Thomas Gleixner
2024-08-02 13:46 ` David Woodhouse
2024-08-01 19:06 ` Thomas Gleixner
2024-08-01 19:21 ` David Woodhouse
2024-08-01 20:00 ` Thomas Gleixner
2024-08-01 20:49 ` David Woodhouse
2024-08-01 21:22 ` Thomas Gleixner
2024-08-01 21:31 ` David Woodhouse
2024-08-02 9:55 ` David Woodhouse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).