public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] Avoid PIT SMP lockups
@ 2006-10-06 21:38 john stultz
  2006-10-07 15:50 ` S.Çağlar Onur
  2006-10-10  9:11 ` S.Çağlar Onur
  0 siblings, 2 replies; 27+ messages in thread
From: john stultz @ 2006-10-06 21:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: S.Çağlar Onur, lkml

Hey Andi,
	Mind testing this patch on the AMD SMP box you were using earlier w/
acpi=off? I have spent a bit of time trying to hunt down the cause of
the reported SMP boxes hanging when they use the PIT for a clocksource,
and have not been able to root cause it. Removing the first three PIT io
instructions from pit_read() seemed to avoid the issue, but I can't see
why.

My current theory is that we're livelocking somehow:

timer_interrupt:
	seq_write_lock_irqsave(xtime_lock)
	spin_lock_irqsave(i8253_lock)
	portio()
	spin_unlock_irqrestore(i8253_lock)
	seq_write_unlock_irqrestore(xtime_lock)

gettime:
	do {
		seq = read_seqbegin(xtime_lock)
		spin_lock_irqsave(i8253_lock)
		portio()
		spin_unlock_irqrestore(i8253_lock)
	} while (read_seqretry(&xtime_lock, seq))


Where maybe one cpu is running gettime, spinning like mad grabbing and
releasing the i8253_lock, while another cpu is in the timer_interrupt
thread already holding the xtime lock, trying to grab the i8253_lock.

Yea.. its a weak theory (and sysrq-t output doesn't support it)... Don't
have a clue otherwise though. Your thoughts? 

Anyway, since I can't figure it out, this patch should avoid the issue,
by disabling the PIT on SMP boxes (and makes a minor change so we
properly fall back to jiffies if the TSC is bad and there's nothing
else).

S.Çağlar: Could you give it a whirl to see if it changes your vmware
issue?

thanks
-john




This patch avoids possible PIT livelock issues seen on SMP systems, by
not allowing it as a clocksource on SMP boxes.

However, since the PIT may no longer be present, we have to properly
handle the cases where SMP systems have TSC skew and fall back from the
TSC. Since the PIT isn't there, it would "fall back" to the TSC again.
So this changes the jiffies rating to 1, and the TSC-bad rating value to
0.

Thus you will get the following behavior priority on i386 systems:

tsc		[if present & stable]
hpet		[if present]
cyclone		[if present]
acpi_pm		[if present]
pit		[if UP]
jiffies

Rather then the current more complicated:
tsc		[if present & stable]
hpet		[if present]
cyclone		[if present]
acpi_pm		[if present]
pit		[if cpus < 4]
tsc		[if present & unstable]
jiffies

Signed-off-by: John Stultz <johnstul@us.ibm.com>

diff --git a/arch/i386/kernel/i8253.c b/arch/i386/kernel/i8253.c
index 477b24d..9a0060b 100644
--- a/arch/i386/kernel/i8253.c
+++ b/arch/i386/kernel/i8253.c
@@ -109,7 +109,7 @@ static struct clocksource clocksource_pi
 
 static int __init init_pit_clocksource(void)
 {
-	if (num_possible_cpus() > 4) /* PIT does not scale! */
+	if (num_possible_cpus() > 1) /* PIT does not scale! */
 		return 0;
 
 	clocksource_pit.mult = clocksource_hz2mult(CLOCK_TICK_RATE, 20);
diff --git a/arch/i386/kernel/tsc.c b/arch/i386/kernel/tsc.c
index b8fa0a8..fbc9582 100644
--- a/arch/i386/kernel/tsc.c
+++ b/arch/i386/kernel/tsc.c
@@ -349,8 +349,8 @@ static int tsc_update_callback(void)
 	int change = 0;
 
 	/* check to see if we should switch to the safe clocksource: */
-	if (clocksource_tsc.rating != 50 && check_tsc_unstable()) {
-		clocksource_tsc.rating = 50;
+	if (clocksource_tsc.rating != 0 && check_tsc_unstable()) {
+		clocksource_tsc.rating = 0;
 		clocksource_reselect();
 		change = 1;
 	}
@@ -461,7 +461,7 @@ static int __init init_tsc_clocksource(v
 							clocksource_tsc.shift);
 		/* lower the rating if we already know its unstable: */
 		if (check_tsc_unstable())
-			clocksource_tsc.rating = 50;
+			clocksource_tsc.rating = 0;
 
 		init_timer(&verify_tsc_freq_timer);
 		verify_tsc_freq_timer.function = verify_tsc_freq;
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index 126bb30..a99b2a6 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -57,7 +57,7 @@ static cycle_t jiffies_read(void)
 
 struct clocksource clocksource_jiffies = {
 	.name		= "jiffies",
-	.rating		= 0, /* lowest rating*/
+	.rating		= 1, /* lowest valid rating*/
 	.read		= jiffies_read,
 	.mask		= 0xffffffff, /*32bits*/
 	.mult		= NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */



^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2006-10-20 10:36 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-06 21:38 [RFC] Avoid PIT SMP lockups john stultz
2006-10-07 15:50 ` S.Çağlar Onur
2006-10-10  9:11 ` S.Çağlar Onur
2006-10-10 18:27   ` john stultz
2006-10-11 10:49     ` S.Çağlar Onur
2006-10-11 17:59       ` john stultz
2006-10-11 18:37         ` S.Çağlar Onur
2006-10-11 18:43           ` john stultz
2006-10-11 19:09             ` S.Çağlar Onur
2006-10-11 19:26               ` john stultz
2006-10-11 19:31                 ` S.Çağlar Onur
2006-10-12  7:28             ` Gerd Hoffmann
2006-10-12  7:45               ` S.Çağlar Onur
2006-10-16 22:17                 ` Zachary Amsden
2006-10-16 22:21                   ` S.Çağlar Onur
2006-10-17 12:05                     ` S.Çağlar Onur
2006-10-17 12:16                       ` Andi Kleen
2006-10-19  8:00                       ` [PATCH] Fix potential interrupts during alternative patching [was Re: [RFC] Avoid PIT SMP lockups] Zachary Amsden
2006-10-19  8:49                         ` Jeremy Fitzhardinge
2006-10-19  9:00                           ` Zachary Amsden
2006-10-20 10:36                             ` S.Çağlar Onur
2006-10-20  5:25                         ` Greg KH
2006-10-16 22:40                   ` [RFC] Avoid PIT SMP lockups Andi Kleen
2006-10-16 23:25                     ` Zachary Amsden
2006-10-16 16:08             ` Gerd Hoffmann
2006-10-16 16:22               ` Vmware problems was " Andi Kleen
2006-10-16 22:16                 ` Petr Vandrovec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox