linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
@ 2023-02-07  1:14 lirongqing
  2023-02-08  1:04 ` Michael Kelley (LINUX)
  2024-08-01 14:21 ` Thomas Gleixner
  0 siblings, 2 replies; 33+ messages in thread
From: lirongqing @ 2023-02-07  1:14 UTC (permalink / raw)
  To: seanjc, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
	dave.hansen, x86, linux-hyperv, linux-kernel

From: Li RongQing <lirongqing@baidu.com>

Zeroing the counter register in pit_shutdown() isn't actually supposed to
stop it from counting,  will causes the PIT to start running again,
From the spec:

  The largest possible initial count is 0; this is equivalent to 216 for
  binary counting and 104 for BCD counting.

  The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 the
  Counter "wraps around" to the highest count, either FFFF hex for binary
  count- ing or 9999 for BCD counting, and continues counting.

  Mode 0 is typically used for event counting. After the Control Word is
  written, OUT is initially low, and will remain low until the Counter
  reaches zero. OUT then goes high and remains high until a new count or a
  new Mode 0 Control Word is written into the Counter.

Hyper-V and KVM follow the spec, the issue that 35b69a42 "(clockevents/drivers/
i8253: Add support for PIT shutdown quirk") fixed is in i8253 drivers, not Hyper-v,
so delete the zero timer counter register in shutdown, and delete PIT shutdown
quirk for Hyper-v

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 arch/x86/kernel/cpu/mshyperv.c | 11 -----------
 drivers/clocksource/i8253.c    | 12 ------------
 include/linux/i8253.h          |  1 -
 3 files changed, 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 46668e2..f788889 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -16,7 +16,6 @@
 #include <linux/interrupt.h>
 #include <linux/irq.h>
 #include <linux/kexec.h>
-#include <linux/i8253.h>
 #include <linux/random.h>
 #include <linux/swiotlb.h>
 #include <asm/processor.h>
@@ -399,16 +398,6 @@ static void __init ms_hyperv_init_platform(void)
 	if (efi_enabled(EFI_BOOT))
 		x86_platform.get_nmi_reason = hv_get_nmi_reason;
 
-	/*
-	 * Hyper-V VMs have a PIT emulation quirk such that zeroing the
-	 * counter register during PIT shutdown restarts the PIT. So it
-	 * continues to interrupt @18.2 HZ. Setting i8253_clear_counter
-	 * to false tells pit_shutdown() not to zero the counter so that
-	 * the PIT really is shutdown. Generation 2 VMs don't have a PIT,
-	 * and setting this value has no effect.
-	 */
-	i8253_clear_counter_on_shutdown = false;
-
 #if IS_ENABLED(CONFIG_HYPERV)
 	/*
 	 * Setup the hook to get control post apic initialization.
diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c
index d4350bb..169474d 100644
--- a/drivers/clocksource/i8253.c
+++ b/drivers/clocksource/i8253.c
@@ -20,13 +20,6 @@
 DEFINE_RAW_SPINLOCK(i8253_lock);
 EXPORT_SYMBOL(i8253_lock);
 
-/*
- * Handle PIT quirk in pit_shutdown() where zeroing the counter register
- * restarts the PIT, negating the shutdown. On platforms with the quirk,
- * platform specific code can set this to false.
- */
-bool i8253_clear_counter_on_shutdown __ro_after_init = true;
-
 #ifdef CONFIG_CLKSRC_I8253
 /*
  * Since the PIT overflows every tick, its not very useful
@@ -117,11 +110,6 @@ static int pit_shutdown(struct clock_event_device *evt)
 
 	outb_p(0x30, PIT_MODE);
 
-	if (i8253_clear_counter_on_shutdown) {
-		outb_p(0, PIT_CH0);
-		outb_p(0, PIT_CH0);
-	}
-
 	raw_spin_unlock(&i8253_lock);
 	return 0;
 }
diff --git a/include/linux/i8253.h b/include/linux/i8253.h
index 8336b2f..e6bb36a 100644
--- a/include/linux/i8253.h
+++ b/include/linux/i8253.h
@@ -21,7 +21,6 @@
 #define PIT_LATCH	((PIT_TICK_RATE + HZ/2) / HZ)
 
 extern raw_spinlock_t i8253_lock;
-extern bool i8253_clear_counter_on_shutdown;
 extern struct clock_event_device i8253_clockevent;
 extern void clockevent_i8253_init(bool oneshot);
 
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* RE: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2023-02-07  1:14 [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown lirongqing
@ 2023-02-08  1:04 ` Michael Kelley (LINUX)
  2023-02-08 15:04   ` Sean Christopherson
  2024-08-01  9:00   ` David Woodhouse
  2024-08-01 14:21 ` Thomas Gleixner
  1 sibling, 2 replies; 33+ messages in thread
From: Michael Kelley (LINUX) @ 2023-02-08  1:04 UTC (permalink / raw)
  To: lirongqing@baidu.com, seanjc@google.com, KY Srinivasan,
	Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

From: lirongqing@baidu.com <lirongqing@baidu.com> Sent: Monday, February 6, 2023 5:15 PM
> 
> Zeroing the counter register in pit_shutdown() isn't actually supposed to
> stop it from counting,  will causes the PIT to start running again,
> From the spec:
> 
>   The largest possible initial count is 0; this is equivalent to 216 for
>   binary counting and 104 for BCD counting.
> 
>   The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 the
>   Counter "wraps around" to the highest count, either FFFF hex for binary
>   count- ing or 9999 for BCD counting, and continues counting.
> 
>   Mode 0 is typically used for event counting. After the Control Word is
>   written, OUT is initially low, and will remain low until the Counter
>   reaches zero. OUT then goes high and remains high until a new count or a
>   new Mode 0 Control Word is written into the Counter.
> 
> Hyper-V and KVM follow the spec, the issue that 35b69a42 "(clockevents/drivers/
> i8253: Add support for PIT shutdown quirk") fixed is in i8253 drivers, not Hyper-v,
> so delete the zero timer counter register in shutdown, and delete PIT shutdown
> quirk for Hyper-v

From the standpoint of Hyper-V, I'm good with this change.  But there's a
risk that old hardware might not be compliant with the spec, and needs the
zero'ing for some reason. The experts in the x86 space will be in the best
position to assess the risk.  At the time, the quirk approach was taken so
the change applied only to Hyper-V, and any such risk was avoided.

For Hyper-V,
Reviewed-by: Michael Kelley <mikelley@microsoft.com>

> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>  arch/x86/kernel/cpu/mshyperv.c | 11 -----------
>  drivers/clocksource/i8253.c    | 12 ------------
>  include/linux/i8253.h          |  1 -
>  3 files changed, 24 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 46668e2..f788889 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -16,7 +16,6 @@
>  #include <linux/interrupt.h>
>  #include <linux/irq.h>
>  #include <linux/kexec.h>
> -#include <linux/i8253.h>
>  #include <linux/random.h>
>  #include <linux/swiotlb.h>
>  #include <asm/processor.h>
> @@ -399,16 +398,6 @@ static void __init ms_hyperv_init_platform(void)
>  	if (efi_enabled(EFI_BOOT))
>  		x86_platform.get_nmi_reason = hv_get_nmi_reason;
> 
> -	/*
> -	 * Hyper-V VMs have a PIT emulation quirk such that zeroing the
> -	 * counter register during PIT shutdown restarts the PIT. So it
> -	 * continues to interrupt @18.2 HZ. Setting i8253_clear_counter
> -	 * to false tells pit_shutdown() not to zero the counter so that
> -	 * the PIT really is shutdown. Generation 2 VMs don't have a PIT,
> -	 * and setting this value has no effect.
> -	 */
> -	i8253_clear_counter_on_shutdown = false;
> -
>  #if IS_ENABLED(CONFIG_HYPERV)
>  	/*
>  	 * Setup the hook to get control post apic initialization.
> diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c
> index d4350bb..169474d 100644
> --- a/drivers/clocksource/i8253.c
> +++ b/drivers/clocksource/i8253.c
> @@ -20,13 +20,6 @@
>  DEFINE_RAW_SPINLOCK(i8253_lock);
>  EXPORT_SYMBOL(i8253_lock);
> 
> -/*
> - * Handle PIT quirk in pit_shutdown() where zeroing the counter register
> - * restarts the PIT, negating the shutdown. On platforms with the quirk,
> - * platform specific code can set this to false.
> - */
> -bool i8253_clear_counter_on_shutdown __ro_after_init = true;
> -
>  #ifdef CONFIG_CLKSRC_I8253
>  /*
>   * Since the PIT overflows every tick, its not very useful
> @@ -117,11 +110,6 @@ static int pit_shutdown(struct clock_event_device *evt)
> 
>  	outb_p(0x30, PIT_MODE);
> 
> -	if (i8253_clear_counter_on_shutdown) {
> -		outb_p(0, PIT_CH0);
> -		outb_p(0, PIT_CH0);
> -	}
> -
>  	raw_spin_unlock(&i8253_lock);
>  	return 0;
>  }
> diff --git a/include/linux/i8253.h b/include/linux/i8253.h
> index 8336b2f..e6bb36a 100644
> --- a/include/linux/i8253.h
> +++ b/include/linux/i8253.h
> @@ -21,7 +21,6 @@
>  #define PIT_LATCH	((PIT_TICK_RATE + HZ/2) / HZ)
> 
>  extern raw_spinlock_t i8253_lock;
> -extern bool i8253_clear_counter_on_shutdown;
>  extern struct clock_event_device i8253_clockevent;
>  extern void clockevent_i8253_init(bool oneshot);
> 
> --
> 2.9.4


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2023-02-08  1:04 ` Michael Kelley (LINUX)
@ 2023-02-08 15:04   ` Sean Christopherson
  2023-02-24  9:45     ` Li,Rongqing
       [not found]     ` <3b8496c071214bda9e5ecfa048f18ab9@baidu.com>
  2024-08-01  9:00   ` David Woodhouse
  1 sibling, 2 replies; 33+ messages in thread
From: Sean Christopherson @ 2023-02-08 15:04 UTC (permalink / raw)
  To: Michael Kelley (LINUX)
  Cc: lirongqing@baidu.com, KY Srinivasan, Haiyang Zhang,
	wei.liu@kernel.org, Dexuan Cui, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Wed, Feb 08, 2023, Michael Kelley (LINUX) wrote:
> From: lirongqing@baidu.com <lirongqing@baidu.com> Sent: Monday, February 6, 2023 5:15 PM
> > 
> > Zeroing the counter register in pit_shutdown() isn't actually supposed to
> > stop it from counting,  will causes the PIT to start running again,
> > From the spec:
> > 
> >   The largest possible initial count is 0; this is equivalent to 216 for
> >   binary counting and 104 for BCD counting.
> > 
> >   The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 the
> >   Counter "wraps around" to the highest count, either FFFF hex for binary
> >   count- ing or 9999 for BCD counting, and continues counting.
> > 
> >   Mode 0 is typically used for event counting. After the Control Word is
> >   written, OUT is initially low, and will remain low until the Counter
> >   reaches zero. OUT then goes high and remains high until a new count or a
> >   new Mode 0 Control Word is written into the Counter.
> > 
> > Hyper-V and KVM follow the spec, the issue that 35b69a42 "(clockevents/drivers/
> > i8253: Add support for PIT shutdown quirk") fixed is in i8253 drivers, not Hyper-v,
> > so delete the zero timer counter register in shutdown, and delete PIT shutdown
> > quirk for Hyper-v
> 
> From the standpoint of Hyper-V, I'm good with this change.  But there's a
> risk that old hardware might not be compliant with the spec, and needs the
> zero'ing for some reason. The experts in the x86 space will be in the best
> position to assess the risk.

Yep, my feeling exactly.  My input is purely from reading those crusty old specs.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2023-02-08 15:04   ` Sean Christopherson
@ 2023-02-24  9:45     ` Li,Rongqing
       [not found]     ` <3b8496c071214bda9e5ecfa048f18ab9@baidu.com>
  1 sibling, 0 replies; 33+ messages in thread
From: Li,Rongqing @ 2023-02-24  9:45 UTC (permalink / raw)
  To: Sean Christopherson, Michael Kelley (LINUX), tglx@linutronix.de
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

> > > Hyper-V and KVM follow the spec, the issue that 35b69a42
> > > "(clockevents/drivers/
> > > i8253: Add support for PIT shutdown quirk") fixed is in i8253
> > > drivers, not Hyper-v, so delete the zero timer counter register in
> > > shutdown, and delete PIT shutdown quirk for Hyper-v
> >
> > From the standpoint of Hyper-V, I'm good with this change.  But
> > there's a risk that old hardware might not be compliant with the spec,
> > and needs the zero'ing for some reason. The experts in the x86 space
> > will be in the best position to assess the risk.
> 
> Yep, my feeling exactly.  My input is purely from reading those crusty old specs.

Hi Thomas:

Could you give some suggestion about this patch?

Thanks

-Li

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
       [not found]     ` <3b8496c071214bda9e5ecfa048f18ab9@baidu.com>
@ 2023-04-13  1:28       ` Wei Liu
       [not found]       ` <1311175816673.202304.ZDdawTGHoa/UH20U@liuwe-devbox-debian-v2>
  1 sibling, 0 replies; 33+ messages in thread
From: Wei Liu @ 2023-04-13  1:28 UTC (permalink / raw)
  To: Li,Rongqing
  Cc: Sean Christopherson, Michael Kelley (LINUX), KY Srinivasan,
	Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Wed, Apr 12, 2023 at 12:02:04PM +0000, Li,Rongqing wrote:
> 
> 
> > -----Original Message-----
> > From: Sean Christopherson <seanjc@google.com>
> > Sent: Wednesday, February 8, 2023 11:04 PM
> > To: Michael Kelley (LINUX) <mikelley@microsoft.com>
> > Cc: Li,Rongqing <lirongqing@baidu.com>; KY Srinivasan <kys@microsoft.com>;
> > Haiyang Zhang <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> > <decui@microsoft.com>; tglx@linutronix.de; mingo@redhat.com;
> > bp@alien8.de; dave.hansen@linux.intel.com; x86@kernel.org;
> > linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in
> > shutdown
> > 
> > On Wed, Feb 08, 2023, Michael Kelley (LINUX) wrote:
> > > From: lirongqing@baidu.com <lirongqing@baidu.com> Sent: Monday,
> > > February 6, 2023 5:15 PM
> > > >
> > > > Zeroing the counter register in pit_shutdown() isn't actually
> > > > supposed to stop it from counting,  will causes the PIT to start
> > > > running again, From the spec:
> > > >
> > > >   The largest possible initial count is 0; this is equivalent to 216 for
> > > >   binary counting and 104 for BCD counting.
> > > >
> > > >   The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5
> > the
> > > >   Counter "wraps around" to the highest count, either FFFF hex for binary
> > > >   count- ing or 9999 for BCD counting, and continues counting.
> > > >
> > > >   Mode 0 is typically used for event counting. After the Control Word is
> > > >   written, OUT is initially low, and will remain low until the Counter
> > > >   reaches zero. OUT then goes high and remains high until a new count or a
> > > >   new Mode 0 Control Word is written into the Counter.
> > > >
> > > > Hyper-V and KVM follow the spec, the issue that 35b69a42
> > > > "(clockevents/drivers/
> > > > i8253: Add support for PIT shutdown quirk") fixed is in i8253
> > > > drivers, not Hyper-v, so delete the zero timer counter register in
> > > > shutdown, and delete PIT shutdown quirk for Hyper-v
> > >
> > > From the standpoint of Hyper-V, I'm good with this change.  But
> > > there's a risk that old hardware might not be compliant with the spec,
> > > and needs the zero'ing for some reason. The experts in the x86 space
> > > will be in the best position to assess the risk.
> > 
> > Yep, my feeling exactly.  My input is purely from reading those crusty old specs.
> 
> 
> 
> Ping

I guess you want Thomas Gleixner and Daniel Lezcano's attention.

> 
> Thanks
> 
> -Li

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
       [not found]       ` <1311175816673.202304.ZDdawTGHoa/UH20U@liuwe-devbox-debian-v2>
@ 2023-04-14  5:17         ` Li,Rongqing
  0 siblings, 0 replies; 33+ messages in thread
From: Li,Rongqing @ 2023-04-14  5:17 UTC (permalink / raw)
  To: Wei Liu, tglx@linutronix.de, daniel.lezcano@linaro.org
  Cc: Sean Christopherson, Michael Kelley (LINUX), KY Srinivasan,
	Haiyang Zhang, Dexuan Cui, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

> >
> >
> > Ping
> 
> I guess you want Thomas Gleixner and Daniel Lezcano's attention.
> 

Yes, 

Thanks you

-Li


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2023-02-08  1:04 ` Michael Kelley (LINUX)
  2023-02-08 15:04   ` Sean Christopherson
@ 2024-08-01  9:00   ` David Woodhouse
  2024-08-01 15:22     ` David Woodhouse
  1 sibling, 1 reply; 33+ messages in thread
From: David Woodhouse @ 2024-08-01  9:00 UTC (permalink / raw)
  To: mikelley, kvm
  Cc: bp, dave.hansen, decui, haiyangz, kys, linux-hyperv, linux-kernel,
	lirongqing, mingo, seanjc, tglx, wei.liu, x86

[-- Attachment #1: Type: text/plain, Size: 2161 bytes --]




On 2023-02-08 at 01:04:19 +0000, "Michael Kelley (LINUX)" <mikelley@microsoft.com> said:
> From: lirongqing@baidu.com <lirongqing@baidu.com> Sent: Monday, February 6, 2023 5:15 PM
> > 
> > Zeroing the counter register in pit_shutdown() isn't actually supposed to
> > stop it from counting,  will causes the PIT to start running again,
> > From the spec:
> > 
> >   The largest possible initial count is 0; this is equivalent to 216 for
> >   binary counting and 104 for BCD counting.
> > 
> >   The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 the
> >   Counter "wraps around" to the highest count, either FFFF hex for binary
> >   count- ing or 9999 for BCD counting, and continues counting.
> > 
> >   Mode 0 is typically used for event counting. After the Control Word is
> >   written, OUT is initially low, and will remain low until the Counter
> >   reaches zero. OUT then goes high and remains high until a new count or a
> >   new Mode 0 Control Word is written into the Counter.
> > 
> > Hyper-V and KVM follow the spec, the issue that 35b69a42 "(clockevents/drivers/
> > i8253: Add support for PIT shutdown quirk") fixed is in i8253 drivers, not Hyper-v,
> > so delete the zero timer counter register in shutdown, and delete PIT shutdown
> > quirk for Hyper-v
> 
> From the standpoint of Hyper-V, I'm good with this change.  But there's a
> risk that old hardware might not be compliant with the spec, and needs the
> zero'ing for some reason. The experts in the x86 space will be in the best
> position to assess the risk.  At the time, the quirk approach was taken so
> the change applied only to Hyper-V, and any such risk was avoided.
> 
> For Hyper-V,
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>

It's not entirely clear why we zero it at all. What was it supposed to
achieve?

I suppose if we want to be ultra-careful, perhaps we could do the
zeroing only on real hardware, if hypervisor_is_type(X86_HYPER_NATIVE)?

This does seem to be making VMMs spend a bunch of time stealing cycles
from the guest to emulate an unwanted timer, so it would be good to fix
it.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2023-02-07  1:14 [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown lirongqing
  2023-02-08  1:04 ` Michael Kelley (LINUX)
@ 2024-08-01 14:21 ` Thomas Gleixner
  2024-08-01 16:14   ` Michael Kelley
  2024-08-01 17:49   ` David Woodhouse
  1 sibling, 2 replies; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-01 14:21 UTC (permalink / raw)
  To: lirongqing, seanjc, kys, haiyangz, wei.liu, decui, mingo, bp,
	dave.hansen, x86, linux-hyperv, linux-kernel

On Tue, Feb 07 2023 at 09:14, lirongqing@baidu.com wrote:
> @@ -117,11 +110,6 @@ static int pit_shutdown(struct clock_event_device *evt)
>  
>  	outb_p(0x30, PIT_MODE);
>  
> -	if (i8253_clear_counter_on_shutdown) {
> -		outb_p(0, PIT_CH0);
> -		outb_p(0, PIT_CH0);
> -	}
> -

The stop sequence is wrong:

    When there is a count in progress, writing a new LSB before the
    counter has counted down to 0 and rolled over to FFFFh, WILL stop
    the counter.  However, if the LSB is loaded AFTER the counter has
    rolled over to FFFFh, so that an MSB now exists in the counter, then
    the counter WILL NOT stop.

The original i8253 datasheet says:

    1) Write 1st byte stops the current counting
    2) Write 2nd byte starts the new count

The above does not make sure it actually has not rolled over and it
obviously is initiating the new count by writing the MSB too. So it does
not work on real hardware either.

The proper sequence is:

         // Switch to mode 0
         outb_p(0x30, PIT_MODE);
         // Load the maximum value to ensure there is no rollover
         outb_p(0xff, PIT_CH0);
         // Writing MSB starts the counter from 0xFFFF and clears rollover
         outb_p(0xff, PIT_CH0);
         // Stop the counter by writing only LSB
         outb_p(0xff, PIT_CH0);

That works on real hardware and fails on KVM... So much for the claim
that KVM follows the spec :)

So yes, the code is buggy, but instead of deleting it, we rather fix it,
no?

User space test program below.

Thanks,

        tglx
---
#include <stdio.h>
#include <unistd.h>
#include <sys/io.h>

typedef unsigned char	u8;
typedef unsigned short	u16;

#define BUILDIO(bwl, bw, type)						\
static __always_inline void __out##bwl(type value, u16 port)		\
{									\
	asm volatile("out" #bwl " %" #bw "0, %w1"			\
		     : : "a"(value), "Nd"(port));			\
}									\
									\
static __always_inline type __in##bwl(u16 port)				\
{									\
	type value;							\
	asm volatile("in" #bwl " %w1, %" #bw "0"			\
		     : "=a"(value) : "Nd"(port));			\
	return value;							\
}

BUILDIO(b, b, u8)

#define inb __inb
#define outb __outb

#define PIT_MODE	0x43
#define PIT_CH0		0x40
#define PIT_CH2		0x42

static int is8254;

static void dump_pit(void)
{
	if (is8254) {
		// Latch and output counter and status
		outb(0xC2, PIT_MODE);
		printf("%02x %02x %02x\n", inb(PIT_CH0), inb(PIT_CH0), inb(PIT_CH0));
	} else {
		// Latch and output counter
		outb(0x0, PIT_MODE);
		printf("%02x %02x\n", inb(PIT_CH0), inb(PIT_CH0));
	}
}

int main(int argc, char* argv[])
{
	if (argc > 1)
		is8254 = 1;

	if (ioperm(0x40, 4, 1) != 0)
		return 1;

	dump_pit();

	printf("Set periodic\n");
	outb(0x34, PIT_MODE);
	outb(0x00, PIT_CH0);
	outb(0x0F, PIT_CH0);

	dump_pit();
	usleep(1000);
	dump_pit();

	printf("Set oneshot\n");
	outb(0x38, PIT_MODE);
	outb(0x00, PIT_CH0);
	outb(0x0F, PIT_CH0);

	dump_pit();
	usleep(1000);
	dump_pit();

	printf("Set stop\n");
	outb(0x30, PIT_MODE);
	outb(0xFF, PIT_CH0);
	outb(0xFF, PIT_CH0);
	outb(0xFF, PIT_CH0);

	dump_pit();
	usleep(100000);
	dump_pit();
	usleep(100000);
	dump_pit();

	return 0;
}

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01  9:00   ` David Woodhouse
@ 2024-08-01 15:22     ` David Woodhouse
  2024-08-01 21:07       ` Thomas Gleixner
  0 siblings, 1 reply; 33+ messages in thread
From: David Woodhouse @ 2024-08-01 15:22 UTC (permalink / raw)
  To: mikelley, kvm
  Cc: bp, dave.hansen, decui, haiyangz, kys, linux-hyperv, linux-kernel,
	lirongqing, mingo, seanjc, tglx, wei.liu, x86

[-- Attachment #1: Type: text/plain, Size: 4465 bytes --]

On Thu, 2024-08-01 at 10:00 +0100, David Woodhouse wrote:
> > 
> > 
> > 
> > On 2023-02-08 at 01:04:19 +0000, "Michael Kelley (LINUX)" <mikelley@microsoft.com> said:
> > > > From: lirongqing@baidu.com <lirongqing@baidu.com> Sent: Monday, February 6, 2023 5:15 PM
> > > > > > 
> > > > > > Zeroing the counter register in pit_shutdown() isn't actually supposed to
> > > > > > stop it from counting,  will causes the PIT to start running again,
> > > > > > From the spec:
> > > > > > 
> > > > > >   The largest possible initial count is 0; this is equivalent to 216 for
> > > > > >   binary counting and 104 for BCD counting.
> > > > > > 
> > > > > >   The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 the
> > > > > >   Counter "wraps around" to the highest count, either FFFF hex for binary
> > > > > >   count- ing or 9999 for BCD counting, and continues counting.
> > > > > > 
> > > > > >   Mode 0 is typically used for event counting. After the Control Word is
> > > > > >   written, OUT is initially low, and will remain low until the Counter
> > > > > >   reaches zero. OUT then goes high and remains high until a new count or a
> > > > > >   new Mode 0 Control Word is written into the Counter.
> > > > > > 
> > > > > > Hyper-V and KVM follow the spec, the issue that 35b69a42 "(clockevents/drivers/
> > > > > > i8253: Add support for PIT shutdown quirk") fixed is in i8253 drivers, not Hyper-v,
> > > > > > so delete the zero timer counter register in shutdown, and delete PIT shutdown
> > > > > > quirk for Hyper-v
> > > > 
> > > > From the standpoint of Hyper-V, I'm good with this change.  But there's a
> > > > risk that old hardware might not be compliant with the spec, and needs the
> > > > zero'ing for some reason. The experts in the x86 space will be in the best
> > > > position to assess the risk.  At the time, the quirk approach was taken so
> > > > the change applied only to Hyper-V, and any such risk was avoided.
> > > > 
> > > > For Hyper-V,
> > > > Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> > 
> > It's not entirely clear why we zero it at all. What was it supposed to
> > achieve?

Answering my own question here, it seems that *some* implementations
don't apply the mode change until the counter is subsequently written.

I think QEMU applies it immediately. But the KVM in-kernel one (and the
AWS Nitro Hypervisor) do not. So the interrupt doesn't shut up until
you write the counter.

I don't see any justification for writing the counter causing any more
than *one* further interrupt though, in single-shot mode. If that's
what Hyper-V does, that seems like a bug (which is worked around
already).

I do see the VMM wasting a bunch of time emulating a PIT IRQ that goes
nowhere, but that seems to be because the bootloader or BIOS turns it
on, and nothing in Linux ever turns it back *off* again unless it's
registered at boot and then pit_shutdown() is called when we switch to
something better.

I'll look at an optimisation in the VMM which can stop firing the timer
if the PIT IRQ is already masked or pending in the PIC and IOAPIC;
that's just a waste of steal time.

But the guest kernel should probably do something like this...

diff --git a/arch/x86/kernel/i8253.c b/arch/x86/kernel/i8253.c
index 2b7999a1a50a..d13fd793254e 100644
--- a/arch/x86/kernel/i8253.c
+++ b/arch/x86/kernel/i8253.c
@@ -8,6 +8,7 @@
 #include <linux/timex.h>
 #include <linux/i8253.h>
 
+#include <asm/hypervisor.h>
 #include <asm/apic.h>
 #include <asm/hpet.h>
 #include <asm/time.h>
@@ -39,8 +40,29 @@ static bool __init use_pit(void)
 
 bool __init pit_timer_init(void)
 {
-	if (!use_pit())
+	if (!use_pit()) {
+		if (!hypervisor_is_type(X86_HYPER_NATIVE)) {
+			/*
+			 * Don't just ignore the PIT. Ensure it's stopped,
+			 * because VMMs otherwise steal CPU time just to
+			 * pointlessly waggle the (masked) IRQ.
+			 */
+			raw_spin_lock(&i8253_lock);
+			outb_p(0x30, PIT_MODE);
+
+			/*
+			 * It's not entirely clear from the datasheet, but some
+			 * virtual implementations don't stop until the counter
+			 * is actually written.
+			 */
+			if (i8253_clear_counter_on_shutdown) {
+				outb_p(0, PIT_CH0);
+				outb_p(0, PIT_CH0);
+			}
+			raw_spin_unlock(&i8253_lock);
+		}
 		return false;
+	}
 
 	clockevent_i8253_init(true);
 	global_clock_event = &i8253_clockevent;
> 



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* RE: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 14:21 ` Thomas Gleixner
@ 2024-08-01 16:14   ` Michael Kelley
  2024-08-01 18:54     ` Thomas Gleixner
  2024-08-01 17:49   ` David Woodhouse
  1 sibling, 1 reply; 33+ messages in thread
From: Michael Kelley @ 2024-08-01 16:14 UTC (permalink / raw)
  To: Thomas Gleixner, lirongqing@baidu.com, seanjc@google.com,
	kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

From: Thomas Gleixner <tglx@linutronix.de> Sent: Thursday, August 1, 2024 7:21 AM
> 
> On Tue, Feb 07 2023 at 09:14, lirongqing@baidu.com wrote:
> > @@ -117,11 +110,6 @@ static int pit_shutdown(struct clock_event_device *evt)
> >
> >  	outb_p(0x30, PIT_MODE);
> >
> > -	if (i8253_clear_counter_on_shutdown) {
> > -		outb_p(0, PIT_CH0);
> > -		outb_p(0, PIT_CH0);
> > -	}
> > -
> 
> The stop sequence is wrong:
> 
>     When there is a count in progress, writing a new LSB before the
>     counter has counted down to 0 and rolled over to FFFFh, WILL stop
>     the counter.  However, if the LSB is loaded AFTER the counter has
>     rolled over to FFFFh, so that an MSB now exists in the counter, then
>     the counter WILL NOT stop.
> 
> The original i8253 datasheet says:
> 
>     1) Write 1st byte stops the current counting
>     2) Write 2nd byte starts the new count
> 
> The above does not make sure it actually has not rolled over and it
> obviously is initiating the new count by writing the MSB too. So it does
> not work on real hardware either.
> 
> The proper sequence is:
> 
>          // Switch to mode 0
>          outb_p(0x30, PIT_MODE);
>          // Load the maximum value to ensure there is no rollover
>          outb_p(0xff, PIT_CH0);
>          // Writing MSB starts the counter from 0xFFFF and clears rollover
>          outb_p(0xff, PIT_CH0);
>          // Stop the counter by writing only LSB
>          outb_p(0xff, PIT_CH0);
> 
> That works on real hardware and fails on KVM... So much for the claim
> that KVM follows the spec :)
> 
> So yes, the code is buggy, but instead of deleting it, we rather fix it,
> no?
> 

FWIW, in Hyper-V guests with the Hyper-V quirk removed, tglx's new
sequence does *not* stop the PIT. But this sequence does:

outb_p(0x30, PIT_MODE);
outb_p(0xff, PIT_CH0);
outb_p(0xff, PIT_CH0);

outb_p(0x30, PIT_MODE);
outb_p(0xff, PIT_CH0);

I don't have a convenient way to test my sequence on KVM.

Michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 14:21 ` Thomas Gleixner
  2024-08-01 16:14   ` Michael Kelley
@ 2024-08-01 17:49   ` David Woodhouse
  2024-08-01 18:25     ` David Woodhouse
  2024-08-01 19:06     ` Thomas Gleixner
  1 sibling, 2 replies; 33+ messages in thread
From: David Woodhouse @ 2024-08-01 17:49 UTC (permalink / raw)
  To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1910 bytes --]

On Thu, 2024-08-01 at 16:21 +0200, Thomas Gleixner wrote:
> On Tue, Feb 07 2023 at 09:14, lirongqing@baidu.com wrote:
> > @@ -117,11 +110,6 @@ static int pit_shutdown(struct clock_event_device *evt)
> >  
> >         outb_p(0x30, PIT_MODE);
> >  
> > -       if (i8253_clear_counter_on_shutdown) {
> > -               outb_p(0, PIT_CH0);
> > -               outb_p(0, PIT_CH0);
> > -       }
> > -
> 
> The stop sequence is wrong:
> 
>     When there is a count in progress, writing a new LSB before the
>     counter has counted down to 0 and rolled over to FFFFh, WILL stop
>     the counter.  However, if the LSB is loaded AFTER the counter has
>     rolled over to FFFFh, so that an MSB now exists in the counter, then
>     the counter WILL NOT stop.
> 
> The original i8253 datasheet says:
> 
>     1) Write 1st byte stops the current counting
>     2) Write 2nd byte starts the new count

It says that for mode zero ("Interrupt on Terminal Count"), yes. But in
that mode, shouldn't the IRQ only fire *one* more time anyway, rather
than repeatedly? That should be OK, shouldn't it?

"When terminal count is reached, the output will go high and remain
high until the selected count register is reloaded wityh the mode or a
new count is loaded".

It's OK for it to keep *counting* as long as it stops firing
interrupts, isn't it? 

Either way, this is somewhat orthogonal to the patch I posted in 
https://lore.kernel.org/kvm/6cd62b5058e11a6262cb2e798cc85cc5daead3b1.camel@infradead.org/T/#u
for the fact that we don't shut down the PIT at *all* if we aren't ever
going to use it.

I'm glad I decided to export a function from the clocksource driver and
just *call* it from pit_timer_init() though. Means we can bikeshed the
shutdown sequence in *one* place and it isn't duplicated.


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 17:49   ` David Woodhouse
@ 2024-08-01 18:25     ` David Woodhouse
  2024-08-01 18:57       ` Thomas Gleixner
  2024-08-01 19:06     ` Thomas Gleixner
  1 sibling, 1 reply; 33+ messages in thread
From: David Woodhouse @ 2024-08-01 18:25 UTC (permalink / raw)
  To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2080 bytes --]

On Thu, 2024-08-01 at 18:49 +0100, David Woodhouse wrote:
> On Thu, 2024-08-01 at 16:21 +0200, Thomas Gleixner wrote:
> > On Tue, Feb 07 2023 at 09:14, lirongqing@baidu.com wrote:
> > > @@ -117,11 +110,6 @@ static int pit_shutdown(struct clock_event_device *evt)
> > >  
> > >         outb_p(0x30, PIT_MODE);
> > >  
> > > -       if (i8253_clear_counter_on_shutdown) {
> > > -               outb_p(0, PIT_CH0);
> > > -               outb_p(0, PIT_CH0);
> > > -       }
> > > -
> > 
> > The stop sequence is wrong:
> > 
> >     When there is a count in progress, writing a new LSB before the
> >     counter has counted down to 0 and rolled over to FFFFh, WILL stop
> >     the counter.  However, if the LSB is loaded AFTER the counter has
> >     rolled over to FFFFh, so that an MSB now exists in the counter, then
> >     the counter WILL NOT stop.
> > 
> > The original i8253 datasheet says:
> > 
> >     1) Write 1st byte stops the current counting
> >     2) Write 2nd byte starts the new count
> 

It also prefixes that with "Rewriting a counter register during
counting results in the following:".

But after you write the MODE register, is it actually supposed to be
counting? Just a little further up, under 'Counter Loading', it says:

"The count register is not loaded until the count value is written (one
or two bytes, depending on the mode selected by the RL bits), followed
by a rising edge and a falling edge of the clock. Any read of the
counter prior to that falling clock edge may yield invalid data".

OK, but what *triggers* that invalid state? Given that it explicitly
says that a one-byte counter write ends that state, it isn't the first
of two bytes. Surely that means that from the time the MODE register is
written, any read of the counter may yield invalid data, until the
counter is written?

I suspect there are as many implementations (virt and hardware) as
there are reasonable interpretations of the spec... and then some.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 16:14   ` Michael Kelley
@ 2024-08-01 18:54     ` Thomas Gleixner
  2024-08-02  8:21       ` David Woodhouse
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-01 18:54 UTC (permalink / raw)
  To: Michael Kelley, lirongqing@baidu.com, seanjc@google.com,
	kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On Thu, Aug 01 2024 at 16:14, Michael Kelley wrote:
> From: Thomas Gleixner <tglx@linutronix.de> Sent: Thursday, August 1, 2024 7:21 AM
> FWIW, in Hyper-V guests with the Hyper-V quirk removed, tglx's new
> sequence does *not* stop the PIT. But this sequence does:
>
> outb_p(0x30, PIT_MODE);
> outb_p(0xff, PIT_CH0);
> outb_p(0xff, PIT_CH0);
>
> outb_p(0x30, PIT_MODE);
> outb_p(0xff, PIT_CH0);

That works on bare metal too

> I don't have a convenient way to test my sequence on KVM.

But still fails in KVM

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 18:25     ` David Woodhouse
@ 2024-08-01 18:57       ` Thomas Gleixner
  2024-08-02  8:07         ` David Woodhouse
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-01 18:57 UTC (permalink / raw)
  To: David Woodhouse, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

On Thu, Aug 01 2024 at 19:25, David Woodhouse wrote:
> On Thu, 2024-08-01 at 18:49 +0100, David Woodhouse wrote:
>> > The stop sequence is wrong:
>> > 
>> >     When there is a count in progress, writing a new LSB before the
>> >     counter has counted down to 0 and rolled over to FFFFh, WILL stop
>> >     the counter.  However, if the LSB is loaded AFTER the counter has
>> >     rolled over to FFFFh, so that an MSB now exists in the counter, then
>> >     the counter WILL NOT stop.
>> > 
>> > The original i8253 datasheet says:
>> > 
>> >     1) Write 1st byte stops the current counting
>> >     2) Write 2nd byte starts the new count
>> 
>
> It also prefixes that with "Rewriting a counter register during
> counting results in the following:".
>
> But after you write the MODE register, is it actually supposed to be
> counting? Just a little further up, under 'Counter Loading', it says:

It's not counting right out of reset. But once it started counting it's
tedious to stop :)

> "The count register is not loaded until the count value is written (one
> or two bytes, depending on the mode selected by the RL bits), followed
> by a rising edge and a falling edge of the clock. Any read of the
> counter prior to that falling clock edge may yield invalid data".
>
> OK, but what *triggers* that invalid state? Given that it explicitly
> says that a one-byte counter write ends that state, it isn't the first
> of two bytes. Surely that means that from the time the MODE register is
> written, any read of the counter may yield invalid data, until the
> counter is written?

It seems to keep ticking with the old value.

> I suspect there are as many implementations (virt and hardware) as
> there are reasonable interpretations of the spec... and then some.

Indeed.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 17:49   ` David Woodhouse
  2024-08-01 18:25     ` David Woodhouse
@ 2024-08-01 19:06     ` Thomas Gleixner
  2024-08-01 19:21       ` David Woodhouse
  1 sibling, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-01 19:06 UTC (permalink / raw)
  To: David Woodhouse, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

On Thu, Aug 01 2024 at 18:49, David Woodhouse wrote:
> On Thu, 2024-08-01 at 16:21 +0200, Thomas Gleixner wrote:
>> The stop sequence is wrong:
>> 
>>     When there is a count in progress, writing a new LSB before the
>>     counter has counted down to 0 and rolled over to FFFFh, WILL stop
>>     the counter.  However, if the LSB is loaded AFTER the counter has
>>     rolled over to FFFFh, so that an MSB now exists in the counter, then
>>     the counter WILL NOT stop.
>> 
>> The original i8253 datasheet says:
>> 
>>     1) Write 1st byte stops the current counting
>>     2) Write 2nd byte starts the new count
>
> It says that for mode zero ("Interrupt on Terminal Count"), yes. But in
> that mode, shouldn't the IRQ only fire *one* more time anyway, rather
> than repeatedly? That should be OK, shouldn't it?
>
> "When terminal count is reached, the output will go high and remain
> high until the selected count register is reloaded wityh the mode or a
> new count is loaded".

I just confirmed that this is the case on KVM.

> It's OK for it to keep *counting* as long as it stops firing
> interrupts, isn't it?

Yes. So the sequence should stop KVM from trying to inject
interrupts. Maybe someone fixes it to actually stop fiddling with the
counter too :)

> Either way, this is somewhat orthogonal to the patch I posted in 
> https://lore.kernel.org/kvm/6cd62b5058e11a6262cb2e798cc85cc5daead3b1.camel@infradead.org/T/#u
> for the fact that we don't shut down the PIT at *all* if we aren't ever
> going to use it.
>
> I'm glad I decided to export a function from the clocksource driver and
> just *call* it from pit_timer_init() though. Means we can bikeshed the
> shutdown sequence in *one* place and it isn't duplicated.

Right. Though we don't have to make this conditional on hypervisor I
think.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 19:06     ` Thomas Gleixner
@ 2024-08-01 19:21       ` David Woodhouse
  2024-08-01 20:00         ` Thomas Gleixner
  0 siblings, 1 reply; 33+ messages in thread
From: David Woodhouse @ 2024-08-01 19:21 UTC (permalink / raw)
  To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2880 bytes --]

On Thu, 2024-08-01 at 21:06 +0200, Thomas Gleixner wrote:
> On Thu, Aug 01 2024 at 18:49, David Woodhouse wrote:
> > On Thu, 2024-08-01 at 16:21 +0200, Thomas Gleixner wrote:
> > > The stop sequence is wrong:
> > > 
> > >     When there is a count in progress, writing a new LSB before the
> > >     counter has counted down to 0 and rolled over to FFFFh, WILL stop
> > >     the counter.  However, if the LSB is loaded AFTER the counter has
> > >     rolled over to FFFFh, so that an MSB now exists in the counter, then
> > >     the counter WILL NOT stop.
> > > 
> > > The original i8253 datasheet says:
> > > 
> > >     1) Write 1st byte stops the current counting
> > >     2) Write 2nd byte starts the new count
> > 
> > It says that for mode zero ("Interrupt on Terminal Count"), yes. But in
> > that mode, shouldn't the IRQ only fire *one* more time anyway, rather
> > than repeatedly? That should be OK, shouldn't it?
> > 
> > "When terminal count is reached, the output will go high and remain
> > high until the selected count register is reloaded wityh the mode or a
> > new count is loaded".
> 
> I just confirmed that this is the case on KVM.
> 
> > It's OK for it to keep *counting* as long as it stops firing
> > interrupts, isn't it?
> 
> Yes. So the sequence should stop KVM from trying to inject
> interrupts. Maybe someone fixes it to actually stop fiddling with the
> counter too :)

I don't think we care about the counter value, as that's *calculated*
on demand when the guest tries to read from it. Or, more to the point,
*if* the guest tries to read from it.

As opposed to the interrupt, which is a timer in the VMM which takes a
CPU out of guest mode and incurs steal time, just to waggle a pin on
the emulated PICs for no good reason.

> > Either way, this is somewhat orthogonal to the patch I posted in 
> > https://lore.kernel.org/kvm/6cd62b5058e11a6262cb2e798cc85cc5daead3b1.camel@infradead.org/T/#u
> > for the fact that we don't shut down the PIT at *all* if we aren't ever
> > going to use it.
> > 
> > I'm glad I decided to export a function from the clocksource driver and
> > just *call* it from pit_timer_init() though. Means we can bikeshed the
> > shutdown sequence in *one* place and it isn't duplicated.
> 
> Right. Though we don't have to make this conditional on hypervisor I
> think.

Right, we don't *have* to. I vacillated about that and almost ripped it
out before sending the patch, but came down on the side of "hardware is
a steaming pile of crap and if I don't *have* to change its behaviour,
let's not touch it".

I justify my cowardice on the basis that it doesn't *matter* if a
hardware implementation is still toggling the IRQ pin; in that case
it's only a few irrelevant transistors which are busy, and it doesn't
translate to steal time.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 19:21       ` David Woodhouse
@ 2024-08-01 20:00         ` Thomas Gleixner
  2024-08-01 20:49           ` David Woodhouse
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-01 20:00 UTC (permalink / raw)
  To: David Woodhouse, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

On Thu, Aug 01 2024 at 20:21, David Woodhouse wrote:
> On Thu, 2024-08-01 at 21:06 +0200, Thomas Gleixner wrote:
>> Yes. So the sequence should stop KVM from trying to inject
>> interrupts. Maybe someone fixes it to actually stop fiddling with the
>> counter too :)
>
> I don't think we care about the counter value, as that's *calculated*
> on demand when the guest tries to read from it. Or, more to the point,
> *if* the guest tries to read from it.
>
> As opposed to the interrupt, which is a timer in the VMM which takes a
> CPU out of guest mode and incurs steal time, just to waggle a pin on
> the emulated PICs for no good reason.

Well, if the implementation still arms the timer in the background, then
it matters because that has to be processed too. I haven't looked at
that code at all, so what do I know.

>> > I'm glad I decided to export a function from the clocksource driver and
>> > just *call* it from pit_timer_init() though. Means we can bikeshed the
>> > shutdown sequence in *one* place and it isn't duplicated.
>> 
>> Right. Though we don't have to make this conditional on hypervisor I
>> think.
>
> Right, we don't *have* to. I vacillated about that and almost ripped it
> out before sending the patch, but came down on the side of "hardware is
> a steaming pile of crap and if I don't *have* to change its behaviour,
> let's not touch it".
>
> I justify my cowardice on the basis that it doesn't *matter* if a
> hardware implementation is still toggling the IRQ pin; in that case
> it's only a few irrelevant transistors which are busy, and it doesn't
> translate to steal time.

On real hardware it translates to power...

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 20:00         ` Thomas Gleixner
@ 2024-08-01 20:49           ` David Woodhouse
  2024-08-01 21:22             ` Thomas Gleixner
  0 siblings, 1 reply; 33+ messages in thread
From: David Woodhouse @ 2024-08-01 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2104 bytes --]

On Thu, 2024-08-01 at 22:00 +0200, Thomas Gleixner wrote:
> On Thu, Aug 01 2024 at 20:21, David Woodhouse wrote:
> > On Thu, 2024-08-01 at 21:06 +0200, Thomas Gleixner wrote:
> > > Yes. So the sequence should stop KVM from trying to inject
> > > interrupts. Maybe someone fixes it to actually stop fiddling with the
> > > counter too :)
> > 
> > I don't think we care about the counter value, as that's *calculated*
> > on demand when the guest tries to read from it. Or, more to the point,
> > *if* the guest tries to read from it.
> > 
> > As opposed to the interrupt, which is a timer in the VMM which takes a
> > CPU out of guest mode and incurs steal time, just to waggle a pin on
> > the emulated PICs for no good reason.
> 
> Well, if the implementation still arms the timer in the background, then
> it matters because that has to be processed too. I haven't looked at
> that code at all, so what do I know.

It only needs to arm the timer if it needs to send an interrupt.
Otherwise, it's all event-driven from the guest interaction.

> > > > I'm glad I decided to export a function from the clocksource driver and
> > > > just *call* it from pit_timer_init() though. Means we can bikeshed the
> > > > shutdown sequence in *one* place and it isn't duplicated.
> > > 
> > > Right. Though we don't have to make this conditional on hypervisor I
> > > think.
> > 
> > Right, we don't *have* to. I vacillated about that and almost ripped it
> > out before sending the patch, but came down on the side of "hardware is
> > a steaming pile of crap and if I don't *have* to change its behaviour,
> > let's not touch it".
> > 
> > I justify my cowardice on the basis that it doesn't *matter* if a
> > hardware implementation is still toggling the IRQ pin; in that case
> > it's only a few irrelevant transistors which are busy, and it doesn't
> > translate to steal time.
> 
> On real hardware it translates to power...

Perhaps, although I'd guess it's a negligible amount. Still, happy to
be brave and make it unconditional. Want a new version of the patch?


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 15:22     ` David Woodhouse
@ 2024-08-01 21:07       ` Thomas Gleixner
  2024-08-01 21:10         ` David Woodhouse
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-01 21:07 UTC (permalink / raw)
  To: David Woodhouse, mikelley, kvm
  Cc: bp, dave.hansen, decui, haiyangz, kys, linux-hyperv, linux-kernel,
	lirongqing, mingo, seanjc, wei.liu, x86

On Thu, Aug 01 2024 at 16:22, David Woodhouse wrote:
> On Thu, 2024-08-01 at 10:00 +0100, David Woodhouse wrote:
>  bool __init pit_timer_init(void)
>  {
> -	if (!use_pit())
> +	if (!use_pit()) {
> +		if (!hypervisor_is_type(X86_HYPER_NATIVE)) {
> +			/*
> +			 * Don't just ignore the PIT. Ensure it's stopped,
> +			 * because VMMs otherwise steal CPU time just to
> +			 * pointlessly waggle the (masked) IRQ.
> +			 */
> +			raw_spin_lock(&i8253_lock);
> +			outb_p(0x30, PIT_MODE);
> +
> +			/*
> +			 * It's not entirely clear from the datasheet, but some
> +			 * virtual implementations don't stop until the counter
> +			 * is actually written.
> +			 */
> +			if (i8253_clear_counter_on_shutdown) {
> +				outb_p(0, PIT_CH0);
> +				outb_p(0, PIT_CH0);
> +			}
> +			raw_spin_unlock(&i8253_lock);
> +		}
>  		return false;
> +	}

That's just wrong. What we want is to have the underlying problem
fixed in the driver and then make:
  
>  	clockevent_i8253_init(true);

bool clockevent_i8253_init(bool enable, bool oneshot);

so it can invoke the shutdown sequence instead of registering the pile:

   if (!enable) {
      shutdown();
      return false;
   }
   ...
   return true;

and the call site becomes:

    if (!clockevent_i8253_init(use_pit(), true))
    	return false;

No?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 21:07       ` Thomas Gleixner
@ 2024-08-01 21:10         ` David Woodhouse
  0 siblings, 0 replies; 33+ messages in thread
From: David Woodhouse @ 2024-08-01 21:10 UTC (permalink / raw)
  To: Thomas Gleixner, mikelley, kvm
  Cc: bp, dave.hansen, decui, haiyangz, kys, linux-hyperv, linux-kernel,
	lirongqing, mingo, seanjc, wei.liu, x86

On 1 August 2024 22:07:25 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Thu, Aug 01 2024 at 16:22, David Woodhouse wrote:
>> On Thu, 2024-08-01 at 10:00 +0100, David Woodhouse wrote:
>>  bool __init pit_timer_init(void)
>>  {
>> -	if (!use_pit())
>> +	if (!use_pit()) {
>> +		if (!hypervisor_is_type(X86_HYPER_NATIVE)) {
>> +			/*
>> +			 * Don't just ignore the PIT. Ensure it's stopped,
>> +			 * because VMMs otherwise steal CPU time just to
>> +			 * pointlessly waggle the (masked) IRQ.
>> +			 */
>> +			raw_spin_lock(&i8253_lock);
>> +			outb_p(0x30, PIT_MODE);
>> +
>> +			/*
>> +			 * It's not entirely clear from the datasheet, but some
>> +			 * virtual implementations don't stop until the counter
>> +			 * is actually written.
>> +			 */
>> +			if (i8253_clear_counter_on_shutdown) {
>> +				outb_p(0, PIT_CH0);
>> +				outb_p(0, PIT_CH0);
>> +			}
>> +			raw_spin_unlock(&i8253_lock);
>> +		}
>>  		return false;
>> +	}
>
>That's just wrong. What we want is to have the underlying problem
>fixed in the driver and then make:
>  
>>  	clockevent_i8253_init(true);
>
>bool clockevent_i8253_init(bool enable, bool oneshot);
>
>so it can invoke the shutdown sequence instead of registering the pile:
>
>   if (!enable) {
>      shutdown();
>      return false;
>   }
>   ...
>   return true;
>
>and the call site becomes:
>
>    if (!clockevent_i8253_init(use_pit(), true))
>    	return false;
>
>No?

Yes. Well, kind of. The way I actually did it was by exposing the shutdown function instead of an "init" function which optionally did the opposite. But yes,  I left the hardware-bashing in precisely once place, in the driver.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 20:49           ` David Woodhouse
@ 2024-08-01 21:22             ` Thomas Gleixner
  2024-08-01 21:31               ` David Woodhouse
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-01 21:22 UTC (permalink / raw)
  To: David Woodhouse, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

On Thu, Aug 01 2024 at 21:49, David Woodhouse wrote:
> On Thu, 2024-08-01 at 22:00 +0200, Thomas Gleixner wrote:
>> > I justify my cowardice on the basis that it doesn't *matter* if a
>> > hardware implementation is still toggling the IRQ pin; in that case
>> > it's only a few irrelevant transistors which are busy, and it doesn't
>> > translate to steal time.
>> 
>> On real hardware it translates to power...
>
> Perhaps, although I'd guess it's a negligible amount. Still, happy to
> be brave and make it unconditional. Want a new version of the patch?

Let'ss fix the shutdown sequence first (See Michaels latest mail) and
then do the clockevents_i8253_init() change on top.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 21:22             ` Thomas Gleixner
@ 2024-08-01 21:31               ` David Woodhouse
  2024-08-02  9:55                 ` David Woodhouse
  0 siblings, 1 reply; 33+ messages in thread
From: David Woodhouse @ 2024-08-01 21:31 UTC (permalink / raw)
  To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

On 1 August 2024 22:22:56 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Thu, Aug 01 2024 at 21:49, David Woodhouse wrote:
>> On Thu, 2024-08-01 at 22:00 +0200, Thomas Gleixner wrote:
>>> > I justify my cowardice on the basis that it doesn't *matter* if a
>>> > hardware implementation is still toggling the IRQ pin; in that case
>>> > it's only a few irrelevant transistors which are busy, and it doesn't
>>> > translate to steal time.
>>> 
>>> On real hardware it translates to power...
>>
>> Perhaps, although I'd guess it's a negligible amount. Still, happy to
>> be brave and make it unconditional. Want a new version of the patch?
>
>Let'ss fix the shutdown sequence first (See Michaels latest mail) and
>then do the clockevents_i8253_init() change on top.

Makes sense.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 18:57       ` Thomas Gleixner
@ 2024-08-02  8:07         ` David Woodhouse
  2024-08-02 10:49           ` Thomas Gleixner
  0 siblings, 1 reply; 33+ messages in thread
From: David Woodhouse @ 2024-08-02  8:07 UTC (permalink / raw)
  To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3062 bytes --]

On Thu, 2024-08-01 at 20:57 +0200, Thomas Gleixner wrote:
> On Thu, Aug 01 2024 at 19:25, David Woodhouse wrote:
> > On Thu, 2024-08-01 at 18:49 +0100, David Woodhouse wrote:
> > > > The stop sequence is wrong:
> > > > 
> > > >     When there is a count in progress, writing a new LSB before the
> > > >     counter has counted down to 0 and rolled over to FFFFh, WILL stop
> > > >     the counter.  However, if the LSB is loaded AFTER the counter has
> > > >     rolled over to FFFFh, so that an MSB now exists in the counter, then
> > > >     the counter WILL NOT stop.
> > > > 
> > > > The original i8253 datasheet says:
> > > > 
> > > >     1) Write 1st byte stops the current counting
> > > >     2) Write 2nd byte starts the new count
> > > 
> > 
> > It also prefixes that with "Rewriting a counter register during
> > counting results in the following:".
> > 
> > But after you write the MODE register, is it actually supposed to be
> > counting? Just a little further up, under 'Counter Loading', it says:
> 
> It's not counting right out of reset. But once it started counting it's
> tedious to stop :)

My reading of the data sheet definitely suggests that it *shouldn't*
be.


Mode 0 says: "The output will be initially low after the mode set
operation. After the count is loaded into the selected count
register... the counter will count."

Mode 2 says: "When this mode is set, the output will remain high until
after the count register is loaded."

Mode 4 says: "After the mode is set, the output will be high. When the
count is loaded, the counter will begin counting".

All of that strongly hints to me that a *compliant* implementation
(haha) would stop the interrupts (and the count) when the MODE is set.

So writing *only* the mode ought to work, in theory. If the failure
mode is just that a bad implementation takes a little bit more power or
steal time, I wonder if that's what we should do? It's what Hyper-V
wants, it seems. And if other hypervisors/VMMs want to avoid taking
pointless steal time, they can fix their bugs.

(... gets more coffee and starts fixing bugs ...)


> > "The count register is not loaded until the count value is written (one
> > or two bytes, depending on the mode selected by the RL bits), followed
> > by a rising edge and a falling edge of the clock. Any read of the
> > counter prior to that falling clock edge may yield invalid data".
> > 
> > OK, but what *triggers* that invalid state? Given that it explicitly
> > says that a one-byte counter write ends that state, it isn't the first
> > of two bytes. Surely that means that from the time the MODE register is
> > written, any read of the counter may yield invalid data, until the
> > counter is written?
> 
> It seems to keep ticking with the old value.
> 
> > I suspect there are as many implementations (virt and hardware) as
> > there are reasonable interpretations of the spec... and then some.
> 
> Indeed.
> 
> Thanks,
> 
>         tglx
> 


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 18:54     ` Thomas Gleixner
@ 2024-08-02  8:21       ` David Woodhouse
  2024-08-02 14:55         ` Sean Christopherson
  0 siblings, 1 reply; 33+ messages in thread
From: David Woodhouse @ 2024-08-02  8:21 UTC (permalink / raw)
  To: Thomas Gleixner, Michael Kelley, lirongqing@baidu.com,
	seanjc@google.com, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1116 bytes --]

On Thu, 2024-08-01 at 20:54 +0200, Thomas Gleixner wrote:
> On Thu, Aug 01 2024 at 16:14, Michael Kelley wrote:
> > From: Thomas Gleixner <tglx@linutronix.de> Sent: Thursday, August
> > 1, 2024 7:21 AM
> > FWIW, in Hyper-V guests with the Hyper-V quirk removed, tglx's new
> > sequence does *not* stop the PIT. But this sequence does:
> > 
> > outb_p(0x30, PIT_MODE);
> > outb_p(0xff, PIT_CH0);
> > outb_p(0xff, PIT_CH0);
> > 
> > outb_p(0x30, PIT_MODE);
> > outb_p(0xff, PIT_CH0);
> 
> That works on bare metal too

What about writing *just* the MODE? The datasheet definitely seems to
be saying that should work too. And I believe it works on Hyper-V;
that's what the "workaround" has been doing, because Hyper-V actually
got it right.

> > I don't have a convenient way to test my sequence on KVM.
> 
> But still fails in KVM

By KVM you mean the in-kernel one that we want to kill because everyone
should be using userspace IRQ chips these days?

I don't know that we care; the failure mode is basically harmless so
let's just let it get fixed in KVM rather than pandering to it?


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-01 21:31               ` David Woodhouse
@ 2024-08-02  9:55                 ` David Woodhouse
  0 siblings, 0 replies; 33+ messages in thread
From: David Woodhouse @ 2024-08-02  9:55 UTC (permalink / raw)
  To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2511 bytes --]

On Thu, 2024-08-01 at 22:31 +0100, David Woodhouse wrote:
> On 1 August 2024 22:22:56 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Thu, Aug 01 2024 at 21:49, David Woodhouse wrote:
> > > On Thu, 2024-08-01 at 22:00 +0200, Thomas Gleixner wrote:
> > > > > I justify my cowardice on the basis that it doesn't *matter* if a
> > > > > hardware implementation is still toggling the IRQ pin; in that case
> > > > > it's only a few irrelevant transistors which are busy, and it doesn't
> > > > > translate to steal time.
> > > > 
> > > > On real hardware it translates to power...
> > > 
> > > Perhaps, although I'd guess it's a negligible amount. Still, happy to
> > > be brave and make it unconditional. Want a new version of the patch?
> > 
> > Let'ss fix the shutdown sequence first (See Michaels latest mail) and
> > then do the clockevents_i8253_init() change on top.
> 
> Makes sense.

On second thoughts, let's add clockevent_i8253_disable() first and call
it when the PIT isn't being used, then we can bikeshed the precise
behaviour of that function to our hearts' content.

I think it should look like this. Revised version of your test program
attached.

void clockevent_i8253_disable(void)
{
	raw_spin_lock(&i8253_lock);

	/*
	 * Writing the MODE register should stop the counter, according to
	 * the datasheet. This appears to work on real hardware (well, on
	 * modern Intel and AMD boxes; I didn't dig the Pegasos out of the
	 * shed).
	 *
	 * However, some virtual implementations differ, and the MODE change
	 * doesn't have any effect until either the counter is written (KVM
	 * in-kernel PIT) or the next interrupt (QEMU). And in those cases,
	 * it may not stop the *count*, only the interrupts. Although in
	 * the virt case, that probably doesn't matter, as the value of the
	 * counter will only be calculated on demand if the guest reads it;
	 * it's the interrupts which cause steal time.
	 *
	 * Hyper-V apparently has a bug where even in mode 0, the IRQ keeps
	 * firing repeatedly if the counter is running. But it *does* do the
	 * right thing when the MODE register is written.
	 *
	 * So: write the MODE and then load the counter, which ensures that
	 * the IRQ is stopped on those buggy virt implementations. And then
	 * write the MODE again, which is the right way to stop it.
	 */
	outb_p(0x30, PIT_MODE);
	outb_p(0, PIT_CH0);
	outb_p(0, PIT_CH0);

	outb_p(0x30, PIT_MODE);

	raw_spin_unlock(&i8253_lock);
}


[-- Attachment #1.2: i8254.c --]
[-- Type: text/x-csrc, Size: 1914 bytes --]

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/io.h>

typedef unsigned char	uint8_t;
typedef unsigned short	uint16_t;

#define BUILDIO(bwl, bw, type)						\
static __always_inline void __out##bwl(type value, uint16_t port)	\
{									\
	asm volatile("out" #bwl " %" #bw "0, %w1"			\
		     : : "a"(value), "Nd"(port));			\
}									\
									\
static __always_inline type __in##bwl(uint16_t port)			\
{									\
	type value;							\
	asm volatile("in" #bwl " %w1, %" #bw "0"			\
		     : "=a"(value) : "Nd"(port));			\
	return value;							\
}

BUILDIO(b, b, uint8_t)

#define inb __inb
#define outb __outb

#define PIT_MODE	0x43
#define PIT_CH0		0x40
#define PIT_CH2		0x42

static int is8254;

static void dump_pit(void)
{
	if (is8254) {
		// Latch and output counter and status
		outb(0xC2, PIT_MODE);
		printf("%02x %02x %02x\n", inb(PIT_CH0), inb(PIT_CH0), inb(PIT_CH0));
	} else {
		// Latch and output counter
		outb(0x0, PIT_MODE);
		printf("%02x %02x\n", inb(PIT_CH0), inb(PIT_CH0));
	}
}

int main(int argc, char* argv[])
{
	int nr_counts = 2;

	if (argc > 1)
		nr_counts = atoi(argv[1]);

	if (argc > 2)
		is8254 = 1;

	if (ioperm(0x40, 4, 1) != 0)
		return 1;

	dump_pit();

	printf("Set oneshot\n");
	outb(0x38, PIT_MODE);
	outb(0x00, PIT_CH0);
	outb(0x0F, PIT_CH0);

	dump_pit();
	usleep(1000);
	dump_pit();

	printf("Set periodic\n");
	outb(0x34, PIT_MODE);
	outb(0x00, PIT_CH0);
	outb(0x0F, PIT_CH0);

	dump_pit();
	usleep(1000);
	dump_pit();
	dump_pit();
	usleep(100000);
	dump_pit();
	usleep(100000);
	dump_pit();

	printf("Set stop (%d counter writes)\n", nr_counts);
	outb(0x30, PIT_MODE);
	while (nr_counts--)
		outb(0xFF, PIT_CH0);

	dump_pit();
	usleep(100000);
	dump_pit();
	usleep(100000);
	dump_pit();

	printf("Set MODE 0\n");
	outb(0x30, PIT_MODE);

	dump_pit();
	usleep(100000);
	dump_pit();
	usleep(100000);
	dump_pit();

	return 0;
}

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-02  8:07         ` David Woodhouse
@ 2024-08-02 10:49           ` Thomas Gleixner
  2024-08-02 11:04             ` David Woodhouse
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-02 10:49 UTC (permalink / raw)
  To: David Woodhouse, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

On Fri, Aug 02 2024 at 09:07, David Woodhouse wrote:
> On Thu, 2024-08-01 at 20:57 +0200, Thomas Gleixner wrote:
>> It's not counting right out of reset. But once it started counting it's
>> tedious to stop :)
>
> My reading of the data sheet definitely suggests that it *shouldn't*
> be.
>
> Mode 0 says: "The output will be initially low after the mode set
> operation. After the count is loaded into the selected count
> register... the counter will count."

Hmm. Indeed. That does not stop the counter, but it prevents the
interrupt from firing over and over.

So fine, we can go with the patch from Li, but the changelog needs a
rewrite and the code want's a big fat comment.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-02 10:49           ` Thomas Gleixner
@ 2024-08-02 11:04             ` David Woodhouse
  2024-08-02 13:27               ` Thomas Gleixner
  0 siblings, 1 reply; 33+ messages in thread
From: David Woodhouse @ 2024-08-02 11:04 UTC (permalink / raw)
  To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1569 bytes --]

On Fri, 2024-08-02 at 12:49 +0200, Thomas Gleixner wrote:
> On Fri, Aug 02 2024 at 09:07, David Woodhouse wrote:
> > On Thu, 2024-08-01 at 20:57 +0200, Thomas Gleixner wrote:
> > > It's not counting right out of reset. But once it started counting it's
> > > tedious to stop :)
> > 
> > My reading of the data sheet definitely suggests that it *shouldn't*
> > be.
> > 
> > Mode 0 says: "The output will be initially low after the mode set
> > operation. After the count is loaded into the selected count
> > register... the counter will count."
> 
> Hmm. Indeed. That does not stop the counter, but it prevents the
> interrupt from firing over and over.
> 
> So fine, we can go with the patch from Li, but the changelog needs a
> rewrite and the code want's a big fat comment.

Nah, it wants to be MODE, COUNT, COUNT, MODE to handle all known
implementations.

Already posted as [PATCH 2/1] (with big fat comments and a version of
your test) at

https://lore.kernel.org/kvm/3bc237678ade809cc685fedb8c1a3d435e590639.camel@infradead.org/

Although I just realised that I edited the first patch (to *remove* the
now-bogus comments about the stop sequence) before posting that one, so
they don't follow cleanly from one another; there's a trivial conflict.
I also forgot to remove the pre-1999 typedefs from the test program,
despite fixing it to use <stdint.h> like it's the 21st century :)

Top two commits of
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/clocks

I'll repost properly if you're happy with them?

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-02 11:04             ` David Woodhouse
@ 2024-08-02 13:27               ` Thomas Gleixner
  2024-08-02 13:46                 ` David Woodhouse
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-02 13:27 UTC (permalink / raw)
  To: David Woodhouse, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

On Fri, Aug 02 2024 at 12:04, David Woodhouse wrote:
> On Fri, 2024-08-02 at 12:49 +0200, Thomas Gleixner wrote:
>> So fine, we can go with the patch from Li, but the changelog needs a
>> rewrite and the code want's a big fat comment.
>
> Nah, it wants to be MODE, COUNT, COUNT, MODE to handle all known
> implementations.

Yes. That works for whatever reason :)

> Already posted as [PATCH 2/1] (with big fat comments and a version of
> your test) at
>
> https://lore.kernel.org/kvm/3bc237678ade809cc685fedb8c1a3d435e590639.camel@infradead.org/
>
> Although I just realised that I edited the first patch (to *remove* the
> now-bogus comments about the stop sequence) before posting that one, so
> they don't follow cleanly from one another; there's a trivial conflict.
> I also forgot to remove the pre-1999 typedefs from the test program,
> despite fixing it to use <stdint.h> like it's the 21st century :)

Grandpas are allowed to use pre-1999 typedefs. :)

> Top two commits of
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/clocks
>
> I'll repost properly if you're happy with them?

Just make the disable unconditional.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-02 13:27               ` Thomas Gleixner
@ 2024-08-02 13:46                 ` David Woodhouse
  0 siblings, 0 replies; 33+ messages in thread
From: David Woodhouse @ 2024-08-02 13:46 UTC (permalink / raw)
  To: Thomas Gleixner, lirongqing, seanjc, kys, haiyangz, wei.liu,
	decui, mingo, bp, dave.hansen, x86, linux-hyperv, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 455 bytes --]

On Fri, 2024-08-02 at 15:27 +0200, Thomas Gleixner wrote:
> > Top two commits of
> > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/clocks
> > 
> > I'll repost properly if you're happy with them?
> 
> Just make the disable unconditional.

Oops, thought I'd done that too. Turns out I have to press the buttons
on that big black slab in front of me, not just think about it.

New series coming up after a brief smoke test.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-02  8:21       ` David Woodhouse
@ 2024-08-02 14:55         ` Sean Christopherson
  2024-08-02 15:04           ` David Woodhouse
  0 siblings, 1 reply; 33+ messages in thread
From: Sean Christopherson @ 2024-08-02 14:55 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Thomas Gleixner, Michael Kelley, lirongqing@baidu.com,
	kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On Fri, Aug 02, 2024, David Woodhouse wrote:
> On Thu, 2024-08-01 at 20:54 +0200, Thomas Gleixner wrote:
> > On Thu, Aug 01 2024 at 16:14, Michael Kelley wrote:
> > > I don't have a convenient way to test my sequence on KVM.
> > 
> > But still fails in KVM
> 
> By KVM you mean the in-kernel one that we want to kill because everyone
> should be using userspace IRQ chips these days?

What exactly do you want to kill?  In-kernel local APIC obviously needs to stay
for APICv/AVIC.

And IMO, encouraging userspace I/O APIC emulation is a net negative for KVM and
the community as a whole, as the number of VMMs in use these days results in a
decent amount of duplicated work in userspace VMMs, especially when accounting
for hardware and software quirks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-02 14:55         ` Sean Christopherson
@ 2024-08-02 15:04           ` David Woodhouse
  2024-08-12 23:59             ` Sean Christopherson
  0 siblings, 1 reply; 33+ messages in thread
From: David Woodhouse @ 2024-08-02 15:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Thomas Gleixner, Michael Kelley, lirongqing@baidu.com,
	kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

On Fri, 2024-08-02 at 07:55 -0700, Sean Christopherson wrote:
> On Fri, Aug 02, 2024, David Woodhouse wrote:
> > On Thu, 2024-08-01 at 20:54 +0200, Thomas Gleixner wrote:
> > > On Thu, Aug 01 2024 at 16:14, Michael Kelley wrote:
> > > > I don't have a convenient way to test my sequence on KVM.
> > > 
> > > But still fails in KVM
> > 
> > By KVM you mean the in-kernel one that we want to kill because everyone
> > should be using userspace IRQ chips these days?
> 
> What exactly do you want to kill?  In-kernel local APIC obviously needs to stay
> for APICv/AVIC.

The legacy PIT, PIC and I/O APIC.

> And IMO, encouraging userspace I/O APIC emulation is a net negative for KVM and
> the community as a whole, as the number of VMMs in use these days results in a
> decent amount of duplicated work in userspace VMMs, especially when accounting
> for hardware and software quirks.

I don't particularly care, but I thought the general trend was towards
split irqchip mode, with the local APIC in-kernel but i8259 PIC and I/O
APIC (and the i8254 PIT, which was the topic of this discussion) being
done in userspace.

Especially if you want to support guests with APIC IDs > 255 :)

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-02 15:04           ` David Woodhouse
@ 2024-08-12 23:59             ` Sean Christopherson
  2024-08-13  6:39               ` David Woodhouse
  0 siblings, 1 reply; 33+ messages in thread
From: Sean Christopherson @ 2024-08-12 23:59 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Thomas Gleixner, Michael Kelley, lirongqing@baidu.com,
	kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On Fri, Aug 02, 2024, David Woodhouse wrote:
> On Fri, 2024-08-02 at 07:55 -0700, Sean Christopherson wrote:
> > On Fri, Aug 02, 2024, David Woodhouse wrote:
> > > On Thu, 2024-08-01 at 20:54 +0200, Thomas Gleixner wrote:
> > > > On Thu, Aug 01 2024 at 16:14, Michael Kelley wrote:
> > > > > I don't have a convenient way to test my sequence on KVM.
> > > > 
> > > > But still fails in KVM
> > > 
> > > By KVM you mean the in-kernel one that we want to kill because everyone
> > > should be using userspace IRQ chips these days?
> > 
> > What exactly do you want to kill?  In-kernel local APIC obviously needs to stay
> > for APICv/AVIC.
> 
> The legacy PIT, PIC and I/O APIC.
> 
> > And IMO, encouraging userspace I/O APIC emulation is a net negative for KVM and
> > the community as a whole, as the number of VMMs in use these days results in a
> > decent amount of duplicated work in userspace VMMs, especially when accounting
> > for hardware and software quirks.
> 
> I don't particularly care, but I thought the general trend was towards
> split irqchip mode, with the local APIC in-kernel but i8259 PIC and I/O
> APIC (and the i8254 PIT, which was the topic of this discussion) being
> done in userspace.

Yeah, that's where most everyone is headed, if not already there.  Letting the
I/O APIC live in userspace is probably the right direction long term, I just don't
love that every VMM seems to have it's own slightly different version.  But I think
the answer to that is to build a library for (legacy?) device emulation so that
VMMs can link to an implementation instead of copy+pasting from somwhere else and
inevitably ending up with code that's frozen in time.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown
  2024-08-12 23:59             ` Sean Christopherson
@ 2024-08-13  6:39               ` David Woodhouse
  0 siblings, 0 replies; 33+ messages in thread
From: David Woodhouse @ 2024-08-13  6:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Thomas Gleixner, Michael Kelley, lirongqing@baidu.com,
	kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On 13 August 2024 00:59:40 BST, Sean Christopherson <seanjc@google.com> wrote:
>On Fri, Aug 02, 2024, David Woodhouse wrote:
>> On Fri, 2024-08-02 at 07:55 -0700, Sean Christopherson wrote:
>> > On Fri, Aug 02, 2024, David Woodhouse wrote:
>> > > On Thu, 2024-08-01 at 20:54 +0200, Thomas Gleixner wrote:
>> > > > On Thu, Aug 01 2024 at 16:14, Michael Kelley wrote:
>> > > > > I don't have a convenient way to test my sequence on KVM.
>> > > > 
>> > > > But still fails in KVM
>> > > 
>> > > By KVM you mean the in-kernel one that we want to kill because everyone
>> > > should be using userspace IRQ chips these days?
>> > 
>> > What exactly do you want to kill?  In-kernel local APIC obviously needs to stay
>> > for APICv/AVIC.
>> 
>> The legacy PIT, PIC and I/O APIC.
>> 
>> > And IMO, encouraging userspace I/O APIC emulation is a net negative for KVM and
>> > the community as a whole, as the number of VMMs in use these days results in a
>> > decent amount of duplicated work in userspace VMMs, especially when accounting
>> > for hardware and software quirks.
>> 
>> I don't particularly care, but I thought the general trend was towards
>> split irqchip mode, with the local APIC in-kernel but i8259 PIC and I/O
>> APIC (and the i8254 PIT, which was the topic of this discussion) being
>> done in userspace.
>
>Yeah, that's where most everyone is headed, if not already there.  Letting the
>I/O APIC live in userspace is probably the right direction long term, I just don't
>love that every VMM seems to have it's own slightly different version.  But I think
>the answer to that is to build a library for (legacy?) device emulation so that
>VMMs can link to an implementation instead of copy+pasting from somwhere else and
>inevitably ending up with code that's frozen in time.

Some would say the right answer is to present a micro-vm machine model that doesn't have any of that crap at all.

Sadly we're going in the wrong direction. For >255 vCPUs on AMD machines it looks like we even have to emulate a full virtual IOMMU with DMA translation support. Well done, AMD!

(Linux is OK with the 15-bit Extended Destination ID, but not Windows)

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-08-13  6:40 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-07  1:14 [PATCH] clockevents/drivers/i8253: Do not zero timer counter in shutdown lirongqing
2023-02-08  1:04 ` Michael Kelley (LINUX)
2023-02-08 15:04   ` Sean Christopherson
2023-02-24  9:45     ` Li,Rongqing
     [not found]     ` <3b8496c071214bda9e5ecfa048f18ab9@baidu.com>
2023-04-13  1:28       ` Wei Liu
     [not found]       ` <1311175816673.202304.ZDdawTGHoa/UH20U@liuwe-devbox-debian-v2>
2023-04-14  5:17         ` Li,Rongqing
2024-08-01  9:00   ` David Woodhouse
2024-08-01 15:22     ` David Woodhouse
2024-08-01 21:07       ` Thomas Gleixner
2024-08-01 21:10         ` David Woodhouse
2024-08-01 14:21 ` Thomas Gleixner
2024-08-01 16:14   ` Michael Kelley
2024-08-01 18:54     ` Thomas Gleixner
2024-08-02  8:21       ` David Woodhouse
2024-08-02 14:55         ` Sean Christopherson
2024-08-02 15:04           ` David Woodhouse
2024-08-12 23:59             ` Sean Christopherson
2024-08-13  6:39               ` David Woodhouse
2024-08-01 17:49   ` David Woodhouse
2024-08-01 18:25     ` David Woodhouse
2024-08-01 18:57       ` Thomas Gleixner
2024-08-02  8:07         ` David Woodhouse
2024-08-02 10:49           ` Thomas Gleixner
2024-08-02 11:04             ` David Woodhouse
2024-08-02 13:27               ` Thomas Gleixner
2024-08-02 13:46                 ` David Woodhouse
2024-08-01 19:06     ` Thomas Gleixner
2024-08-01 19:21       ` David Woodhouse
2024-08-01 20:00         ` Thomas Gleixner
2024-08-01 20:49           ` David Woodhouse
2024-08-01 21:22             ` Thomas Gleixner
2024-08-01 21:31               ` David Woodhouse
2024-08-02  9:55                 ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).