* regression from softlockup fix
@ 2007-11-19 9:21 David Miller
2007-11-19 9:43 ` Ingo Molnar
0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2007-11-19 9:21 UTC (permalink / raw)
To: linux-kernel; +Cc: mingo, jeremy, gregkh
This changeset:
commit 436e61d93605a3a36902c9ee510b0ecba0d7d361
Author: Ingo Molnar <mingo@elte.hu>
Date: Tue Oct 16 23:18:38 2007 -0700
fix the softlockup watchdog to actually work
...
Causes my SMP niagara systems to trigger the softlockup message
frequently when the nohz timer fires.
The backtrace is always in the timer handler, the cpu is not
wedged at all, which makes me think it's likely triggering
erroneously.
I suspect that what is happening is that the NOHZ period is
longer than the softlockup timeout (10 seconds) and we get
an interrupt before the watchdog thread gets onto the cpu.
I'll happily test any suggested fix for this bug, thanks!
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: regression from softlockup fix
2007-11-19 9:21 regression from softlockup fix David Miller
@ 2007-11-19 9:43 ` Ingo Molnar
2007-11-19 11:10 ` David Miller
2007-11-19 17:15 ` Jeremy Fitzhardinge
0 siblings, 2 replies; 5+ messages in thread
From: Ingo Molnar @ 2007-11-19 9:43 UTC (permalink / raw)
To: David Miller; +Cc: linux-kernel, jeremy, gregkh, Andrew Morton
* David Miller <davem@davemloft.net> wrote:
> I suspect that what is happening is that the NOHZ period is longer
> than the softlockup timeout (10 seconds) and we get an interrupt
> before the watchdog thread gets onto the cpu.
indeed! Does the patch below do the trick?
Ingo
--------------->
Subject: softlockup: do the wakeup from a hrtimer
From: Ingo Molnar <mingo@elte.hu>
David Miller reported soft lockup false-positives that trigger
on NOHZ due to CPUs idling for more than 10 seconds.
The solution is to drive the wakeup of the watchdog threads
not from the timer tick (which has no guaranteed frequency),
but from the watchdog tasks themselves.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/softlockup.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
Index: linux/kernel/softlockup.c
===================================================================
--- linux.orig/kernel/softlockup.c
+++ linux/kernel/softlockup.c
@@ -100,10 +100,6 @@ void softlockup_tick(void)
now = get_timestamp(this_cpu);
- /* Wake up the high-prio watchdog task every second: */
- if (now > (touch_timestamp + 1))
- wake_up_process(per_cpu(watchdog_task, this_cpu));
-
/* Warn about unreasonable 10+ seconds delays: */
if (now <= (touch_timestamp + softlockup_thresh))
return;
@@ -141,7 +137,7 @@ static int watchdog(void *__bind_cpu)
while (!kthread_should_stop()) {
set_current_state(TASK_INTERRUPTIBLE);
touch_softlockup_watchdog();
- schedule();
+ msleep(1000);
}
return 0;
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: regression from softlockup fix
2007-11-19 9:43 ` Ingo Molnar
@ 2007-11-19 11:10 ` David Miller
2007-11-19 17:15 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2007-11-19 11:10 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, jeremy, gregkh, akpm
From: Ingo Molnar <mingo@elte.hu>
Date: Mon, 19 Nov 2007 10:43:38 +0100
> * David Miller <davem@davemloft.net> wrote:
>
> > I suspect that what is happening is that the NOHZ period is longer
> > than the softlockup timeout (10 seconds) and we get an interrupt
> > before the watchdog thread gets onto the cpu.
>
> indeed! Does the patch below do the trick?
I'm sure it works but it partly defeats the purpose of NOHZ.
I really like it that my cpus sleep completely for hours at a time
when not in use. :)
Anyways, I'll give your patch a test.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: regression from softlockup fix
2007-11-19 9:43 ` Ingo Molnar
2007-11-19 11:10 ` David Miller
@ 2007-11-19 17:15 ` Jeremy Fitzhardinge
2007-11-19 19:03 ` Ingo Molnar
1 sibling, 1 reply; 5+ messages in thread
From: Jeremy Fitzhardinge @ 2007-11-19 17:15 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Miller, linux-kernel, gregkh, Andrew Morton,
Thomas Gleixner
Ingo Molnar wrote:
> * David Miller <davem@davemloft.net> wrote:
>
>
>> I suspect that what is happening is that the NOHZ period is longer
>> than the softlockup timeout (10 seconds) and we get an interrupt
>> before the watchdog thread gets onto the cpu.
>>
>
> indeed! Does the patch below do the trick?
>
> Ingo
>
> --------------->
> Subject: softlockup: do the wakeup from a hrtimer
> From: Ingo Molnar <mingo@elte.hu>
>
> David Miller reported soft lockup false-positives that trigger
> on NOHZ due to CPUs idling for more than 10 seconds.
>
> The solution is to drive the wakeup of the watchdog threads
> not from the timer tick (which has no guaranteed frequency),
> but from the watchdog tasks themselves.
>
I thought the timer code kicked the watchdog after waking up after a
long sleep anyway? At one point I was looking into a mechanism to
temporarily disable the watchdog during a wait for a timer event, but it
got complex - and I thought - unnecessary.
Specifically this in kernel/time/timekeeping.c:
/*
* When we are idle and the tick is stopped, we have to touch
* the watchdog as we might not schedule for a really long
* time. This happens on complete idle SMP systems while
* waiting on the login prompt. We also increment the "start of
* idle" jiffy stamp so the idle accounting adjustment we do
* when we go busy again does not account too much ticks.
*/
if (ts->tick_stopped) {
touch_softlockup_watchdog();
ts->idle_jiffies++;
}
Or does this happen on the sleep path? If so, wouldn't the right fix to
be this on the wakeup path?
J
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: regression from softlockup fix
2007-11-19 17:15 ` Jeremy Fitzhardinge
@ 2007-11-19 19:03 ` Ingo Molnar
0 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2007-11-19 19:03 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: David Miller, linux-kernel, gregkh, Andrew Morton,
Thomas Gleixner
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> I thought the timer code kicked the watchdog after waking up after a
> long sleep anyway? At one point I was looking into a mechanism to
> temporarily disable the watchdog during a wait for a timer event, but
> it got complex - and I thought - unnecessary.
>
> Specifically this in kernel/time/timekeeping.c:
>
> /*
> * When we are idle and the tick is stopped, we have to touch
> * the watchdog as we might not schedule for a really long
> * time. This happens on complete idle SMP systems while
> * waiting on the login prompt. We also increment the "start of
> * idle" jiffy stamp so the idle accounting adjustment we do
> * when we go busy again does not account too much ticks.
> */
> if (ts->tick_stopped) {
> touch_softlockup_watchdog();
> ts->idle_jiffies++;
> }
>
> Or does this happen on the sleep path? If so, wouldn't the right fix
> to be this on the wakeup path?
yep, i guess this would do the trick. David, could you try it perhaps
(let me know if i should make a patch for you).
Ingo
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-11-19 19:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-19 9:21 regression from softlockup fix David Miller
2007-11-19 9:43 ` Ingo Molnar
2007-11-19 11:10 ` David Miller
2007-11-19 17:15 ` Jeremy Fitzhardinge
2007-11-19 19:03 ` Ingo Molnar
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.