public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] i386 Time: Avoid PIT SMP lockups
@ 2006-10-11 19:54 john stultz
  2006-10-11 21:26 ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: john stultz @ 2006-10-11 19:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andi Kleen, lkml

Andrew: I think this is 2.6.19 material, but probably should go through an -mm or two.

thanks
-john


This patch avoids possible PIT livelock issues seen on SMP systems (and
reported by Andi), by not allowing it as a clocksource on SMP boxes.

However, since the PIT may no longer be present, we have to properly
handle the cases where SMP systems have TSC skew and fall back from the
TSC. Since the PIT isn't there, it would "fall back" to the TSC again.
So this changes the jiffies rating to 1, and the TSC-bad rating value to
0.

Thus you will get the following behavior priority on i386 systems:

tsc		[if present & stable]
hpet		[if present]
cyclone		[if present]
acpi_pm		[if present]
pit		[if UP]
jiffies

Rather then the current more complicated:
tsc		[if present & stable]
hpet		[if present]
cyclone		[if present]
acpi_pm		[if present]
pit		[if cpus < 4]
tsc		[if present & unstable]
jiffies

Signed-off-by: John Stultz <johnstul@us.ibm.com>

diff --git a/arch/i386/kernel/i8253.c b/arch/i386/kernel/i8253.c
index 477b24d..9a0060b 100644
--- a/arch/i386/kernel/i8253.c
+++ b/arch/i386/kernel/i8253.c
@@ -109,7 +109,7 @@ static struct clocksource clocksource_pi
 
 static int __init init_pit_clocksource(void)
 {
-	if (num_possible_cpus() > 4) /* PIT does not scale! */
+	if (num_possible_cpus() > 1) /* PIT does not scale! */
 		return 0;
 
 	clocksource_pit.mult = clocksource_hz2mult(CLOCK_TICK_RATE, 20);
diff --git a/arch/i386/kernel/tsc.c b/arch/i386/kernel/tsc.c
index b8fa0a8..fbc9582 100644
--- a/arch/i386/kernel/tsc.c
+++ b/arch/i386/kernel/tsc.c
@@ -349,8 +349,8 @@ static int tsc_update_callback(void)
 	int change = 0;
 
 	/* check to see if we should switch to the safe clocksource: */
-	if (clocksource_tsc.rating != 50 && check_tsc_unstable()) {
-		clocksource_tsc.rating = 50;
+	if (clocksource_tsc.rating != 0 && check_tsc_unstable()) {
+		clocksource_tsc.rating = 0;
 		clocksource_reselect();
 		change = 1;
 	}
@@ -461,7 +461,7 @@ static int __init init_tsc_clocksource(v
 							clocksource_tsc.shift);
 		/* lower the rating if we already know its unstable: */
 		if (check_tsc_unstable())
-			clocksource_tsc.rating = 50;
+			clocksource_tsc.rating = 0;
 
 		init_timer(&verify_tsc_freq_timer);
 		verify_tsc_freq_timer.function = verify_tsc_freq;
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index 126bb30..a99b2a6 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -57,7 +57,7 @@ static cycle_t jiffies_read(void)
 
 struct clocksource clocksource_jiffies = {
 	.name		= "jiffies",
-	.rating		= 0, /* lowest rating*/
+	.rating		= 1, /* lowest valid rating*/
 	.read		= jiffies_read,
 	.mask		= 0xffffffff, /*32bits*/
 	.mult		= NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] i386 Time: Avoid PIT SMP lockups
  2006-10-11 19:54 [PATCH] i386 Time: Avoid PIT SMP lockups john stultz
@ 2006-10-11 21:26 ` Andrew Morton
  2006-10-11 22:48   ` john stultz
  2006-10-18 16:16   ` Maciej W. Rozycki
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2006-10-11 21:26 UTC (permalink / raw)
  To: john stultz; +Cc: Andi Kleen, lkml

On Wed, 11 Oct 2006 12:54:21 -0700
john stultz <johnstul@us.ibm.com> wrote:

> Andrew: I think this is 2.6.19 material, but probably should go through an -mm or two.
> 
> thanks
> -john
> 
> 
> This patch avoids possible PIT livelock issues seen on SMP systems (and
> reported by Andi), by not allowing it as a clocksource on SMP boxes.
> 
> However, since the PIT may no longer be present, we have to properly
> handle the cases where SMP systems have TSC skew and fall back from the
> TSC. Since the PIT isn't there, it would "fall back" to the TSC again.
> So this changes the jiffies rating to 1, and the TSC-bad rating value to
> 0.
> 
> Thus you will get the following behavior priority on i386 systems:
> 
> tsc		[if present & stable]
> hpet		[if present]
> cyclone		[if present]
> acpi_pm		[if present]
> pit		[if UP]
> jiffies
> 
> Rather then the current more complicated:
> tsc		[if present & stable]
> hpet		[if present]
> cyclone		[if present]
> acpi_pm		[if present]
> pit		[if cpus < 4]

Actually <=4, and that matters: there are a lot of 4-ways.

> tsc		[if present & unstable]
> jiffies
> 

So this patch has the potential to screw up people who have 2-way or 4-way,
no hpet/pm-timer and dodgy TSCs.

Wouldn't it be better to fix the livelock?  What's causing it?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] i386 Time: Avoid PIT SMP lockups
  2006-10-11 21:26 ` Andrew Morton
@ 2006-10-11 22:48   ` john stultz
  2006-10-11 23:03     ` Andrew Morton
  2006-10-18 16:16   ` Maciej W. Rozycki
  1 sibling, 1 reply; 11+ messages in thread
From: john stultz @ 2006-10-11 22:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andi Kleen, lkml

On Wed, 2006-10-11 at 14:26 -0700, Andrew Morton wrote:
> On Wed, 11 Oct 2006 12:54:21 -0700
> john stultz <johnstul@us.ibm.com> wrote:
> > Andrew: I think this is 2.6.19 material, but probably should go through an -mm or two.
> > 
> > 
> > This patch avoids possible PIT livelock issues seen on SMP systems (and
> > reported by Andi), by not allowing it as a clocksource on SMP boxes.
> > 
> > However, since the PIT may no longer be present, we have to properly
> > handle the cases where SMP systems have TSC skew and fall back from the
> > TSC. Since the PIT isn't there, it would "fall back" to the TSC again.
> > So this changes the jiffies rating to 1, and the TSC-bad rating value to
> > 0.
> > 
> > Thus you will get the following behavior priority on i386 systems:
> > 
> > tsc		[if present & stable]
> > hpet		[if present]
> > cyclone		[if present]
> > acpi_pm		[if present]
> > pit		[if UP]
> > jiffies
> > 
> > Rather then the current more complicated:
> > tsc		[if present & stable]
> > hpet		[if present]
> > cyclone		[if present]
> > acpi_pm		[if present]
> > pit		[if cpus < 4]
> 
> Actually <=4, and that matters: there are a lot of 4-ways.

Yes, you're right.

> 
> > tsc		[if present & unstable]
> > jiffies
> > 
> 
> So this patch has the potential to screw up people who have 2-way or 4-way,
> no hpet/pm-timer and dodgy TSCs.

Indeed. Although I believe the number of TSC screwy SMP boxes w/o an
ACPI PM or HPET is quite small (NUMAQ being the one I'm familiar with,
but it already was using jiffies in multi-node configs).

And by "screw up", it would mean pretty course(tick) granular time.
Otherwise the systems should still function correctly, but I recognize
the loss in granularity could be seen as a regression. The trade off
being that a hang (which would be seen otherwise) isn't acceptable
either.

> Wouldn't it be better to fix the livelock?  What's causing it?

I spent a few days trying to narrow this down, and I haven't been able
to do so to my satisfaction.

At this point, my suspicion is that because the PIT io-read is very slow
(~18us), and done while holding a lock. It would be possible that one
cpu calling gettimeofday would do the following:

grab xtime sequence read lock
grab i8253 spin lock
do port io (very slow)
release i8253 spin lock
realize xtime has been grabed and repeat

While another cpu does the following after in a timer interrupt:
Grabs xtime sequence write lock
spins trying to grab i8253 spin lock

Assuming the first thread can reacquire the i8253 lock before the
second, you could have both threads potentially spinning forever.

I've not yet been able to prove this. alt-sysrq-t usually doesn't have
*any* threads in gettimeofday/timer_interrupt, but removing the
expensive port io in pit_read() causes the hang to go away.

I'm open to digging on this more, but I wanted to get the fix out so
others didn't hit the hang in the meantime.

thanks
-john


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] i386 Time: Avoid PIT SMP lockups
  2006-10-11 22:48   ` john stultz
@ 2006-10-11 23:03     ` Andrew Morton
  2006-10-16 13:48       ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2006-10-11 23:03 UTC (permalink / raw)
  To: john stultz; +Cc: Andi Kleen, lkml

On Wed, 11 Oct 2006 15:48:31 -0700
john stultz <johnstul@us.ibm.com> wrote:

> > Wouldn't it be better to fix the livelock?  What's causing it?
> 
> I spent a few days trying to narrow this down, and I haven't been able
> to do so to my satisfaction.
> 
> At this point, my suspicion is that because the PIT io-read is very slow
> (~18us), and done while holding a lock. It would be possible that one
> cpu calling gettimeofday would do the following:
> 
> grab xtime sequence read lock
> grab i8253 spin lock
> do port io (very slow)
> release i8253 spin lock
> realize xtime has been grabed and repeat
> 
> While another cpu does the following after in a timer interrupt:
> Grabs xtime sequence write lock
> spins trying to grab i8253 spin lock
> 
> Assuming the first thread can reacquire the i8253 lock before the
> second, you could have both threads potentially spinning forever.

Is there any actual need to hold xtime_lock while doing the port IO?  I'd
have thought it would suffice to do

	temp = port_io
	write_seqlock(xtime_lock);
	xtime = muck_with(temp);
	write_sequnlock(xtime_lock);

?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] i386 Time: Avoid PIT SMP lockups
  2006-10-11 23:03     ` Andrew Morton
@ 2006-10-16 13:48       ` Andi Kleen
  2006-10-16 18:39         ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2006-10-16 13:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, johnstul

Andrew Morton <akpm@osdl.org> writes:
> 
> Is there any actual need to hold xtime_lock while doing the port IO?  I'd
> have thought it would suffice to do
> 
> 	temp = port_io
> 	write_seqlock(xtime_lock);
> 	xtime = muck_with(temp);
> 	write_sequnlock(xtime_lock);
> 
> ?

That would be a good idea in general. The trouble is just that whatever race
is there will be still there then, just harder to trigger (so instead of 
every third boot it will muck up every 6 weeks). Not sure that is
a real improvement.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] i386 Time: Avoid PIT SMP lockups
  2006-10-16 13:48       ` Andi Kleen
@ 2006-10-16 18:39         ` Andrew Morton
  2006-10-16 18:42           ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2006-10-16 18:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: lkml, johnstul

On 16 Oct 2006 15:48:02 +0200
Andi Kleen <ak@suse.de> wrote:

> Andrew Morton <akpm@osdl.org> writes:
> > 
> > Is there any actual need to hold xtime_lock while doing the port IO?  I'd
> > have thought it would suffice to do
> > 
> > 	temp = port_io
> > 	write_seqlock(xtime_lock);
> > 	xtime = muck_with(temp);
> > 	write_sequnlock(xtime_lock);
> > 
> > ?
> 
> That would be a good idea in general. The trouble is just that whatever race
> is there will be still there then, just harder to trigger (so instead of 
> every third boot it will muck up every 6 weeks). Not sure that is
> a real improvement.
> 

Confused.  What race are you referring to?

This is addressing a starvation problem which is due to the slowness of the
port-io (iirc).


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] i386 Time: Avoid PIT SMP lockups
  2006-10-16 18:39         ` Andrew Morton
@ 2006-10-16 18:42           ` Andi Kleen
  2006-10-16 19:09             ` john stultz
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2006-10-16 18:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, johnstul

On Monday 16 October 2006 20:39, Andrew Morton wrote:
> On 16 Oct 2006 15:48:02 +0200
> Andi Kleen <ak@suse.de> wrote:
> 
> > Andrew Morton <akpm@osdl.org> writes:
> > > 
> > > Is there any actual need to hold xtime_lock while doing the port IO?  I'd
> > > have thought it would suffice to do
> > > 
> > > 	temp = port_io
> > > 	write_seqlock(xtime_lock);
> > > 	xtime = muck_with(temp);
> > > 	write_sequnlock(xtime_lock);
> > > 
> > > ?
> > 
> > That would be a good idea in general. The trouble is just that whatever race
> > is there will be still there then, just harder to trigger (so instead of 
> > every third boot it will muck up every 6 weeks). Not sure that is
> > a real improvement.
> > 
> 
> Confused.  What race are you referring to?

Sorry s/race/starvation/

> 
> This is addressing a starvation problem which is due to the slowness of the
> port-io (iirc).

Is it just sure to go away when the critical section is shorter?

-Andi


 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] i386 Time: Avoid PIT SMP lockups
  2006-10-16 18:42           ` Andi Kleen
@ 2006-10-16 19:09             ` john stultz
  0 siblings, 0 replies; 11+ messages in thread
From: john stultz @ 2006-10-16 19:09 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, lkml

On Mon, 2006-10-16 at 20:42 +0200, Andi Kleen wrote:
> On Monday 16 October 2006 20:39, Andrew Morton wrote:
> > On 16 Oct 2006 15:48:02 +0200
> > Andi Kleen <ak@suse.de> wrote:
> > 
> > > Andrew Morton <akpm@osdl.org> writes:
> > > > 
> > > > Is there any actual need to hold xtime_lock while doing the port IO?  I'd
> > > > have thought it would suffice to do
> > > > 
> > > > 	temp = port_io
> > > > 	write_seqlock(xtime_lock);
> > > > 	xtime = muck_with(temp);
> > > > 	write_sequnlock(xtime_lock);
> > > > 
> > > > ?
> > > 
> > > That would be a good idea in general. The trouble is just that whatever race
> > > is there will be still there then, just harder to trigger (so instead of 
> > > every third boot it will muck up every 6 weeks). Not sure that is
> > > a real improvement.

It is an interesting idea that could be applied generally. There might
be some small races possible (where the cycle_last grabed within
xtime_lock might be ahead of the value pulled from the port_io), but
I'll put some time into seeing if it will work.


> > Confused.  What race are you referring to?
> 
> Sorry s/race/starvation/
> 
> > 
> > This is addressing a starvation problem which is due to the slowness of the
> > port-io (iirc).
> 
> Is it just sure to go away when the critical section is shorter?

Yea. I don't see the box hangs when the portio is removed. Although I
don't really feel I've narrowed it down completely to the starvation
theory. Its just the best theory I've got so far.

Currently my test box is busy with other things, but as soon as its free
I'll put some more time into this one.

thanks
-john


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] i386 Time: Avoid PIT SMP lockups
  2006-10-11 21:26 ` Andrew Morton
  2006-10-11 22:48   ` john stultz
@ 2006-10-18 16:16   ` Maciej W. Rozycki
  2006-10-18 16:27     ` Andi Kleen
  1 sibling, 1 reply; 11+ messages in thread
From: Maciej W. Rozycki @ 2006-10-18 16:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: john stultz, Andi Kleen, lkml

On Wed, 11 Oct 2006, Andrew Morton wrote:

> So this patch has the potential to screw up people who have 2-way or 4-way,
> no hpet/pm-timer and dodgy TSCs.

 Note that all APIC-based SMP systems (even these rare i486 beasts) by 
definition do have local APIC timers, one per CPU, with a reasonable 
resolution which could likely be used instead.  There is an erratum in 
some early Pentium integrated APICs, where a tick is lost when 0 is 
reached (the timer counts downwards), so it may be needed to be taken into 
account for good timekeeping, but otherwise the timers look like a 
reasonable choice.  The timers are local to their respective CPUs and are 
never written by Linux (see references to APIC_TMCCT), so there is no need 
for locking to access them.

  Maciej

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] i386 Time: Avoid PIT SMP lockups
  2006-10-18 16:16   ` Maciej W. Rozycki
@ 2006-10-18 16:27     ` Andi Kleen
  2006-10-18 16:45       ` Maciej W. Rozycki
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2006-10-18 16:27 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: john stultz, lkml

"Maciej W. Rozycki" <macro@linux-mips.org> writes:

> On Wed, 11 Oct 2006, Andrew Morton wrote:
> 
> > So this patch has the potential to screw up people who have 2-way or 4-way,
> > no hpet/pm-timer and dodgy TSCs.
> 
>  Note that all APIC-based SMP systems (even these rare i486 beasts) by 
> definition do have local APIC timers, one per CPU, with a reasonable 
> resolution which could likely be used instead. 

It wouldn't work on dual core laptop chipsets which support C3 and where
the APIC timer stops during C3.

I had a apicrunsmaintimer option for some time on x86-64 but it broke
on a few other non laptop machines for unknown reasons too, so I cannot 
really recommend it.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] i386 Time: Avoid PIT SMP lockups
  2006-10-18 16:27     ` Andi Kleen
@ 2006-10-18 16:45       ` Maciej W. Rozycki
  0 siblings, 0 replies; 11+ messages in thread
From: Maciej W. Rozycki @ 2006-10-18 16:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: john stultz, lkml

On Wed, 18 Oct 2006, Andi Kleen wrote:

> It wouldn't work on dual core laptop chipsets which support C3 and where
> the APIC timer stops during C3.
> 
> I had a apicrunsmaintimer option for some time on x86-64 but it broke
> on a few other non laptop machines for unknown reasons too, so I cannot 
> really recommend it.

 So they managed to break it too?  Oh well...

 If you look at early APIC documentation, then you'll see how the APIC 
timers were recommended to be the only source of the clock tick for SMP 
systems in favour to the old-fashioned PIT.  Now, how can it be used if it 
does not run all the time?...

  Maciej

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-10-18 16:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-11 19:54 [PATCH] i386 Time: Avoid PIT SMP lockups john stultz
2006-10-11 21:26 ` Andrew Morton
2006-10-11 22:48   ` john stultz
2006-10-11 23:03     ` Andrew Morton
2006-10-16 13:48       ` Andi Kleen
2006-10-16 18:39         ` Andrew Morton
2006-10-16 18:42           ` Andi Kleen
2006-10-16 19:09             ` john stultz
2006-10-18 16:16   ` Maciej W. Rozycki
2006-10-18 16:27     ` Andi Kleen
2006-10-18 16:45       ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox