All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: David Miller <davem@davemloft.net>
Cc: linux-kernel@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>
Subject: Re: Soft lockup regression from today's sched.git merge.
Date: Tue, 22 Apr 2008 11:14:56 +0200	[thread overview]
Message-ID: <20080422091456.GC9939@elte.hu> (raw)
In-Reply-To: <20080422.015931.70144614.davem@davemloft.net>


* David Miller <davem@davemloft.net> wrote:

> The following commit:
> 
> commit 27ec4407790d075c325e1f4da0a19c56953cce23
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Thu Feb 28 21:00:21 2008 +0100
> 
>     sched: make cpu_clock() globally synchronous
>     
>     Alexey Zaytsev reported (and bisected) that the introduction of
>     cpu_clock() in printk made the timestamps jump back and forth.
>     
>     Make cpu_clock() more reliable while still keeping it fast when it's
>     called frequently.
>     
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> causes watchdog triggers when a cpu exits NOHZ state when it has been 
> there for >= the soft lockup threshold, for example here are some 
> messages from a 128 cpu Niagara2 box:
> 
> [  168.106406] BUG: soft lockup - CPU#11 stuck for 128s! [dd:3239]
> [  168.989592] BUG: soft lockup - CPU#21 stuck for 86s! [swapper:0]
> [  168.999587] BUG: soft lockup - CPU#29 stuck for 91s! [make:4511]
> [  168.999615] BUG: soft lockup - CPU#2 stuck for 85s! [swapper:0]
> [  169.020514] BUG: soft lockup - CPU#37 stuck for 91s! [swapper:0]
> [  169.020514] BUG: soft lockup - CPU#45 stuck for 91s! [sh:4515]
> [  169.020515] BUG: soft lockup - CPU#69 stuck for 92s! [swapper:0]
> [  169.020515] BUG: soft lockup - CPU#77 stuck for 92s! [swapper:0]
> [  169.020515] BUG: soft lockup - CPU#61 stuck for 92s! [swapper:0]
> [  169.112554] BUG: soft lockup - CPU#85 stuck for 92s! [swapper:0]
> [  169.112554] BUG: soft lockup - CPU#101 stuck for 92s! [swapper:0]
> [  169.112554] BUG: soft lockup - CPU#109 stuck for 92s! [swapper:0]
> [  169.112554] BUG: soft lockup - CPU#117 stuck for 92s! [swapper:0]
> [  169.171483] BUG: soft lockup - CPU#40 stuck for 80s! [dd:3239]
> [  169.331483] BUG: soft lockup - CPU#13 stuck for 86s! [swapper:0]
> [  169.351500] BUG: soft lockup - CPU#43 stuck for 101s! [dd:3239]
> [  169.531482] BUG: soft lockup - CPU#9 stuck for 129s! [mkdir:4565]
> [  169.595754] BUG: soft lockup - CPU#20 stuck for 93s! [swapper:0]
> [  169.626787] BUG: soft lockup - CPU#52 stuck for 93s! [swapper:0]
> [  169.626787] BUG: soft lockup - CPU#84 stuck for 92s! [swapper:0]
> [  169.636812] BUG: soft lockup - CPU#116 stuck for 94s! [swapper:0]
> 
> It's simple enough to trigger this by doing a 10 minute sleep after a 
> fresh bootup then starting a parallel kernel build.
> 
> I suspect this might be reintroducing a problem we've had and fixed 
> before, see the thread:
> 
> http://marc.info/?l=linux-kernel&m=119546414004065&w=2

yeah, it looks very similar.

> Please have a look, thank you.

thanks for reporting it. I havent seen this false positive happen in a 
long time - but then again, PC CPUs are a lot less idle than a 128-CPU 
Niagara2 :-/ I'm wondering what the best method would be to provoke a 
CPU to stay idle that long - to make sure this bug is fixed.

so i only have the untested patch below for now - does it fix the bug 
for you?

	Ingo

----------------------------------->
Subject: softlockup: fix NOHZ wakeup
From: Ingo Molnar <mingo@elte.hu>

David Miller reported:

|--------------->
the following commit:

| commit 27ec4407790d075c325e1f4da0a19c56953cce23
| Author: Ingo Molnar <mingo@elte.hu>
| Date:   Thu Feb 28 21:00:21 2008 +0100
|
|     sched: make cpu_clock() globally synchronous
|
|     Alexey Zaytsev reported (and bisected) that the introduction of
|     cpu_clock() in printk made the timestamps jump back and forth.
|
|     Make cpu_clock() more reliable while still keeping it fast when it's
|     called frequently.
|
|     Signed-off-by: Ingo Molnar <mingo@elte.hu>

causes watchdog triggers when a cpu exits NOHZ state when it has been
there for >= the soft lockup threshold, for example here are some
messages from a 128 cpu Niagara2 box:

[  168.106406] BUG: soft lockup - CPU#11 stuck for 128s! [dd:3239]
[  168.989592] BUG: soft lockup - CPU#21 stuck for 86s! [swapper:0]
[  168.999587] BUG: soft lockup - CPU#29 stuck for 91s! [make:4511]
[  168.999615] BUG: soft lockup - CPU#2 stuck for 85s! [swapper:0]
[  169.020514] BUG: soft lockup - CPU#37 stuck for 91s! [swapper:0]
[  169.020514] BUG: soft lockup - CPU#45 stuck for 91s! [sh:4515]
[  169.020515] BUG: soft lockup - CPU#69 stuck for 92s! [swapper:0]
[  169.020515] BUG: soft lockup - CPU#77 stuck for 92s! [swapper:0]
[  169.020515] BUG: soft lockup - CPU#61 stuck for 92s! [swapper:0]
[  169.112554] BUG: soft lockup - CPU#85 stuck for 92s! [swapper:0]
[  169.112554] BUG: soft lockup - CPU#101 stuck for 92s! [swapper:0]
[  169.112554] BUG: soft lockup - CPU#109 stuck for 92s! [swapper:0]
[  169.112554] BUG: soft lockup - CPU#117 stuck for 92s! [swapper:0]
[  169.171483] BUG: soft lockup - CPU#40 stuck for 80s! [dd:3239]
[  169.331483] BUG: soft lockup - CPU#13 stuck for 86s! [swapper:0]
[  169.351500] BUG: soft lockup - CPU#43 stuck for 101s! [dd:3239]
[  169.531482] BUG: soft lockup - CPU#9 stuck for 129s! [mkdir:4565]
[  169.595754] BUG: soft lockup - CPU#20 stuck for 93s! [swapper:0]
[  169.626787] BUG: soft lockup - CPU#52 stuck for 93s! [swapper:0]
[  169.626787] BUG: soft lockup - CPU#84 stuck for 92s! [swapper:0]
[  169.636812] BUG: soft lockup - CPU#116 stuck for 94s! [swapper:0]

It's simple enough to trigger this by doing a 10 minute sleep after a
fresh bootup then starting a parallel kernel build.

I suspect this might be reintroducing a problem we've had and fixed
before, see the thread:

http://marc.info/?l=linux-kernel&m=119546414004065&w=2
<---------------|

touch the softlockup watchdog when exiting NOHZ state - we are
obviously not locked up.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/time/tick-sched.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux/kernel/time/tick-sched.c
===================================================================
--- linux.orig/kernel/time/tick-sched.c
+++ linux/kernel/time/tick-sched.c
@@ -393,6 +393,7 @@ void tick_nohz_restart_sched_tick(void)
 		sub_preempt_count(HARDIRQ_OFFSET);
 	}
 
+	touch_softlockup_watchdog();
 	/*
 	 * Cancel the scheduled timer and restore the tick
 	 */

  reply	other threads:[~2008-04-22  9:15 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-22  8:59 Soft lockup regression from today's sched.git merge David Miller
2008-04-22  9:14 ` Ingo Molnar [this message]
2008-04-22 10:05   ` David Miller
2008-04-22 12:45     ` Peter Zijlstra
2008-05-06 22:41       ` Rafael J. Wysocki
2008-05-06 23:05         ` David Miller
2008-05-07  6:43           ` Ingo Molnar
2008-05-07 18:56             ` Rafael J. Wysocki
2008-04-23  8:50     ` [patch] softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds Ingo Molnar
2008-04-23 10:55       ` David Miller
2008-04-23 12:29         ` David Miller
2008-04-23 13:36           ` Ingo Molnar
2008-04-23 23:23             ` David Miller
2008-04-23  5:42   ` Soft lockup regression from today's sched.git merge David Miller
2008-04-23  7:32     ` Dhaval Giani
2008-04-23  7:51     ` Ingo Molnar
2008-04-23  9:40     ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080422091456.GC9939@elte.hu \
    --to=mingo@elte.hu \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.