public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] linux-2.5.59_lost-tick_A0
@ 2003-01-21 22:59 john stultz
  2003-01-21 23:32 ` Andrew Morton
  2003-01-24 23:13 ` Stephen Hemminger
  0 siblings, 2 replies; 10+ messages in thread
From: john stultz @ 2003-01-21 22:59 UTC (permalink / raw)
  To: lkml

All,
	This patch addresses the following problem: Linux cannot properly
handle the case where interrupts are disabled for longer then two ticks.

Quick background: 
	gettimeofday() calculates wall time using roughly the following
equation: xtime + (jiffies - wall_jiffies) + timer->get_offset()

When a tick is lost, the system is able to compensate short-term because
even though jiffies is not incremented, timer->get_offset() (which is
hardware based) continues to increase. However this falls apart, because
if after 3 or so lost-ticks an interrupt does occur, jiffies is
incremented only once and we reset timer->get_offset() effectively
loosing the time we had been compensating for. This causes brief
inconsistencies in time, in addition to causing system time to drift
behind that of other systems.

This patch solves the problem by checking when an interrupt occurs if
timer->get_offset() is a value greater then 2 ticks. If so, it
increments jiffies appropriately. 

Comments, flames and suggestions are requested and welcome.

thanks
-john

diff -Nru a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c
--- a/arch/i386/kernel/time.c	Tue Jan 21 14:14:18 2003
+++ b/arch/i386/kernel/time.c	Tue Jan 21 14:14:18 2003
@@ -265,6 +265,21 @@
 #endif
 }
 
+/*Lost tick detection and compensation*/
+void detect_lost_tick(void)
+{
+	/*read time since last interrupt*/
+	unsigned long delta = timer->get_offset();
+
+	/*check if delta is greater then two ticks*/
+	if(delta > 2*(1000000/HZ)){
+		/*calculate number of missed ticks*/
+		delta = delta/(1000000/HZ)-1;
+		jiffies += delta;
+	}
+		
+}
+
 /*
  * This is the same as the above, except we _also_ save the current
  * Time Stamp Counter value at the time of the timer interrupt, so that
@@ -281,6 +296,7 @@
 	 */
 	write_lock(&xtime_lock);
 
+	detect_lost_tick();
 	timer->mark_offset();
  
 	do_timer_interrupt(irq, NULL, regs);




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.5.59_lost-tick_A0
       [not found] <1043189962.15683.82.camel@w-jstultz2.beaverton.ibm.com.suse.lists.linux.kernel>
@ 2003-01-21 23:27 ` Andi Kleen
  2003-01-21 23:31   ` john stultz
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2003-01-21 23:27 UTC (permalink / raw)
  To: john stultz; +Cc: linux-kernel

john stultz <johnstul@us.ibm.com> writes:

> All,
> 	This patch addresses the following problem: Linux cannot properly
> handle the case where interrupts are disabled for longer then two ticks.

Comments:

Basic idea is good. The x86-64 2.4 tree has a similar solution for the
same problem. Especially with HZ=1000 this is really needed, because
now lost ticks are far more common than with the HZ=100 in 2.4.
I would consider some form of this patch as requirement for 2.6 release.

what happens when 1000000 does not evenly divide HZ? 
I think some ports use HZ=1024

Why is the condition > and not >= ? Eactly two ticks offset is already
one lost. In fact even >= 1.5*HZ would be dubious.

I would like to have some statistics counter somewhere in /proc for lost 
ticks, so that it can be checked for after bug reports. Perhaps even
printk for the first 5 or so.

Could you please add spaces after /* and before */

-Andi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.5.59_lost-tick_A0
  2003-01-21 23:27 ` Andi Kleen
@ 2003-01-21 23:31   ` john stultz
  2003-01-21 23:48     ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: john stultz @ 2003-01-21 23:31 UTC (permalink / raw)
  To: Andi Kleen; +Cc: lkml

On Tue, 2003-01-21 at 15:27, Andi Kleen wrote:
> Comments:
> 
> Basic idea is good. The x86-64 2.4 tree has a similar solution for the
> same problem. Especially with HZ=1000 this is really needed, because
> now lost ticks are far more common than with the HZ=100 in 2.4.
> I would consider some form of this patch as requirement for 2.6 release.
> 
> what happens when 1000000 does not evenly divide HZ? 
> I think some ports use HZ=1024

Then it comes out to close enough? I'm probably just not getting the
problem. Could you further explain?

> Why is the condition > and not >= ? Eactly two ticks offset is already
> one lost. In fact even >= 1.5*HZ would be dubious.

Exactly two, yes. However 1.5 wouldn't quite do it, as jiffies would be
incremented once and delay_at_last_interrupt should be set to .5*HZ,
thus loosing no time.

> I would like to have some statistics counter somewhere in /proc for lost 
> ticks, so that it can be checked for after bug reports. Perhaps even
> printk for the first 5 or so.

Yea, I had some printk code in there, but I have a card here that can
cause 30ms SMI stalls once per sec, so it was getting a bit verbose.
Although printing out for the first five, would be fine. I'll add that
right away. Thanks!

> Could you please add spaces after /* and before */

Doh, I read and read those style guidelines, but my fingers never seem
to take to em'. 

thanks again,
-john



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.5.59_lost-tick_A0
  2003-01-21 22:59 [RFC][PATCH] linux-2.5.59_lost-tick_A0 john stultz
@ 2003-01-21 23:32 ` Andrew Morton
  2003-01-21 23:48   ` john stultz
  2003-01-22  5:00   ` Valdis.Kletnieks
  2003-01-24 23:13 ` Stephen Hemminger
  1 sibling, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2003-01-21 23:32 UTC (permalink / raw)
  To: john stultz; +Cc: lkml

john stultz wrote:
> 
> All,
>         This patch addresses the following problem: Linux cannot properly
> handle the case where interrupts are disabled for longer then two ticks.
> 

Question is: who is holding interrupts off for so long?

It might be better to implement a detection scheme inside
the timer interrupt handler: if the time since last interrupt
exceeds two ticks, whine and drop a backtrace.

That backtrace will point up into the local_irq_enable() in
the offending code, and we can go fix it?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.5.59_lost-tick_A0
  2003-01-21 23:32 ` Andrew Morton
@ 2003-01-21 23:48   ` john stultz
  2003-01-22  5:00   ` Valdis.Kletnieks
  1 sibling, 0 replies; 10+ messages in thread
From: john stultz @ 2003-01-21 23:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml

On Tue, 2003-01-21 at 15:32, Andrew Morton wrote:
> john stultz wrote:
> > 
> > All,
> >         This patch addresses the following problem: Linux cannot properly
> > handle the case where interrupts are disabled for longer then two ticks.
> > 
> 
> Question is: who is holding interrupts off for so long?

Unfortunately in my situation there is a card which can cause 30ms
stalls in an SMI handler. Yuck, I know. :P

> It might be better to implement a detection scheme inside
> the timer interrupt handler: if the time since last interrupt
> exceeds two ticks, whine and drop a backtrace.
> 
> That backtrace will point up into the local_irq_enable() in
> the offending code, and we can go fix it?

Hmm, clever! That would be a very good check for software caused stalls.
I'll try to drop that in. 

thanks for the feedback!
-john
 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.5.59_lost-tick_A0
  2003-01-21 23:31   ` john stultz
@ 2003-01-21 23:48     ` Andi Kleen
  2003-01-21 23:49       ` john stultz
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2003-01-21 23:48 UTC (permalink / raw)
  To: john stultz; +Cc: Andi Kleen, lkml

On Tue, Jan 21, 2003 at 03:31:08PM -0800, john stultz wrote:
> > what happens when 1000000 does not evenly divide HZ? 
> > I think some ports use HZ=1024
> 
> Then it comes out to close enough? I'm probably just not getting the
> problem. Could you further explain?

Have you checked it that it yields an usable value? I'm just asking 
because I've been badly burned by such integer rounding issues in the 
past, so it is better to double check them...

-Andi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.5.59_lost-tick_A0
  2003-01-21 23:48     ` Andi Kleen
@ 2003-01-21 23:49       ` john stultz
  0 siblings, 0 replies; 10+ messages in thread
From: john stultz @ 2003-01-21 23:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: lkml

On Tue, 2003-01-21 at 15:48, Andi Kleen wrote:
> On Tue, Jan 21, 2003 at 03:31:08PM -0800, john stultz wrote:
> > > what happens when 1000000 does not evenly divide HZ? 
> > > I think some ports use HZ=1024
> > 
> > Then it comes out to close enough? I'm probably just not getting the
> > problem. Could you further explain?
> 
> Have you checked it that it yields an usable value? I'm just asking 
> because I've been badly burned by such integer rounding issues in the 
> past, so it is better to double check them...

Ok, fair enough. I'll double check that. 

thanks
-john



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.5.59_lost-tick_A0
  2003-01-21 23:32 ` Andrew Morton
  2003-01-21 23:48   ` john stultz
@ 2003-01-22  5:00   ` Valdis.Kletnieks
  1 sibling, 0 replies; 10+ messages in thread
From: Valdis.Kletnieks @ 2003-01-22  5:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: john stultz, lkml

[-- Attachment #1: Type: text/plain, Size: 729 bytes --]

On Tue, 21 Jan 2003 15:32:55 PST, Andrew Morton said:
> john stultz wrote:
> > 
> > All,
> >         This patch addresses the following problem: Linux cannot properly
> > handle the case where interrupts are disabled for longer then two ticks.
> > 
> 
> Question is: who is holding interrupts off for so long?

Dell Latitude C840 - if you use APM, you drop clock ticks.  And that's
even *with* CONFIG_APM_ALLOW_INTS.  I think I've caught the i8k driver
doing it as well - it seems to be a generic BIOS fuckage.  Make a BIOS
call, drop some ticks.  And unless somebody has a BIOS flash that works
better than Dell's C840A07, we're stuck with it....
-- 
				Valdis Kletnieks
				Computer Systems Senior Engineer
				Virginia Tech


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.5.59_lost-tick_A0
  2003-01-24 23:13 ` Stephen Hemminger
@ 2003-01-24 23:06   ` john stultz
  0 siblings, 0 replies; 10+ messages in thread
From: john stultz @ 2003-01-24 23:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: lkml

On Fri, 2003-01-24 at 15:13, Stephen Hemminger wrote:
> Minor nit: detect_lost_tick could be static and inline.

Good point, I'll fix that before my next release. 

Thanks for the feedback!
-john



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.5.59_lost-tick_A0
  2003-01-21 22:59 [RFC][PATCH] linux-2.5.59_lost-tick_A0 john stultz
  2003-01-21 23:32 ` Andrew Morton
@ 2003-01-24 23:13 ` Stephen Hemminger
  2003-01-24 23:06   ` john stultz
  1 sibling, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2003-01-24 23:13 UTC (permalink / raw)
  To: john stultz; +Cc: lkml

Minor nit: detect_lost_tick could be static and inline.

On Tue, 2003-01-21 at 14:59, john stultz wrote:
> All,
> 	This patch addresses the following problem: Linux cannot properly
> handle the case where interrupts are disabled for longer then two ticks.
> 
> Quick background: 
> 	gettimeofday() calculates wall time using roughly the following
> equation: xtime + (jiffies - wall_jiffies) + timer->get_offset()
> 
> When a tick is lost, the system is able to compensate short-term because
> even though jiffies is not incremented, timer->get_offset() (which is
> hardware based) continues to increase. However this falls apart, because
> if after 3 or so lost-ticks an interrupt does occur, jiffies is
> incremented only once and we reset timer->get_offset() effectively
> loosing the time we had been compensating for. This causes brief
> inconsistencies in time, in addition to causing system time to drift
> behind that of other systems.
> 
> This patch solves the problem by checking when an interrupt occurs if
> timer->get_offset() is a value greater then 2 ticks. If so, it
> increments jiffies appropriately. 
> 
> Comments, flames and suggestions are requested and welcome.
> 
> thanks
> -john
> 
> diff -Nru a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c
> --- a/arch/i386/kernel/time.c	Tue Jan 21 14:14:18 2003
> +++ b/arch/i386/kernel/time.c	Tue Jan 21 14:14:18 2003
> @@ -265,6 +265,21 @@
>  #endif
>  }
>  
> +/*Lost tick detection and compensation*/
> +void detect_lost_tick(void)
> +{
> +	/*read time since last interrupt*/
> +	unsigned long delta = timer->get_offset();
> +
> +	/*check if delta is greater then two ticks*/
> +	if(delta > 2*(1000000/HZ)){
> +		/*calculate number of missed ticks*/
> +		delta = delta/(1000000/HZ)-1;
> +		jiffies += delta;
> +	}
> +		
> +}
> +
>  /*
>   * This is the same as the above, except we _also_ save the current
>   * Time Stamp Counter value at the time of the timer interrupt, so that
> @@ -281,6 +296,7 @@
>  	 */
>  	write_lock(&xtime_lock);
>  
> +	detect_lost_tick();
>  	timer->mark_offset();
>   
>  	do_timer_interrupt(irq, NULL, regs);
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Stephen Hemminger <shemminger@osdl.org>
Open Source Devlopment Lab


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-01-24 23:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-21 22:59 [RFC][PATCH] linux-2.5.59_lost-tick_A0 john stultz
2003-01-21 23:32 ` Andrew Morton
2003-01-21 23:48   ` john stultz
2003-01-22  5:00   ` Valdis.Kletnieks
2003-01-24 23:13 ` Stephen Hemminger
2003-01-24 23:06   ` john stultz
     [not found] <1043189962.15683.82.camel@w-jstultz2.beaverton.ibm.com.suse.lists.linux.kernel>
2003-01-21 23:27 ` Andi Kleen
2003-01-21 23:31   ` john stultz
2003-01-21 23:48     ` Andi Kleen
2003-01-21 23:49       ` john stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox