From: Andrew Morton <akpm@linux-foundation.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: a.p.zijlstra@chello.nl, linux-kernel@vger.kernel.org, ego@in.ibm.com
Subject: Re: softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
Date: Wed, 6 Feb 2008 17:12:30 -0800 [thread overview]
Message-ID: <20080206171230.72a058ae.akpm@linux-foundation.org> (raw)
In-Reply-To: <20080207005110.GA1457@elte.hu>
On Thu, 7 Feb 2008 01:51:10 +0100
Ingo Molnar <mingo@elte.hu> wrote:
>
> * Andrew Morton <akpm@linux-foundation.org> wrote:
>
> > Nope.
> >
> > But I tested it on mainline, and mainline exhibits the
> > never-powers-off symptom, whereas
> > ed50d6cbc394cd0966469d3e249353c9dd1d38b9 demonstrates the
> > powers-off-after-20-seconds symptom.
> >
> > So we _may_ be dealing with two bugs here, and your patch might have
> > fixed the first, but that success is obscured by the second. I guess
> > I need to prepare a tree which has
> > ed50d6cbc394cd0966469d3e249353c9dd1d38b9 at its tip. (Wonders how to
> > do that).
>
> the way i do it in bisection is to do:
>
> mkdir patches
> git-log -1 -p ed50d6cbc394cd0966469d3 > patches/fix.patch
> echo fix.patch > patches/series
>
> and then before testing a bisection point, i do a 'quilt push'. Before
> telling git-bisect about the quality of that bisection point (good/bad)
> i pop it off via 'quilt pop'.
>
> this way the 'required fix' can be kept during the bisection, to find
> the secondary bug.
>
> > btw, mainline (plus this patch, not that it changed anything) prints
> >
> > <stopping disk stuff>
> > Disabling non-boot CPUs
> > CPU 1 is now offline
> >
> > and that's it. This machine has eight cpus. Might be a hint?
>
> what should be the proper message?
Seems that it should be a stream of eight
CPU n is now offline
CPU n down
> my suspects, besides there being something wrong in the hung-tasks code
> of the softlockup watchdog, would be the cpu-hotplug commits, or some
> arch/x86 commit. (although we didnt really have anything specifically
> touching the the reboot path)
>
> does a stupid patch like the one below tell you more about what the
> other CPUs are doing during this hang? [32-bit only patch]
>
> Ingo
>
> ---
> arch/i386/kernel/nmi.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> Index: linux/arch/i386/kernel/nmi.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/nmi_64.c
> +++ linux/arch/x86/kernel/nmi_64.c
> @@ -331,6 +331,14 @@ __kprobes int nmi_watchdog_tick(struct p
> int touched = 0;
> int cpu = smp_processor_id();
> int rc=0;
> + static int count[NR_CPUS];
> +
> + if (!count[cpu]) {
> + count[cpu] = nmi_hz;
> + printk("CPU#%d, tick\n", cpu);
> + show_regs(regs);
> + }
> + count[cpu]--;
>
> /* check for other users first */
> if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
I reworked that on top of ed50d6cbc394cd0966469d3e249353c9dd1d38b9: no
change.
However I watched the vga console this time (nothing is coming over
netconsole at this stage) I saw this:
CPU 1 is now offline
<10 second pause>
CPU 1 is down
CPU 2 is now offline
CPU 2 is down
CPU 3 is now offline
CPU 3 is down
CPU 4 is now offline
<10 second pause>
followed by a quick spew of the remaining CPUs going down and offline then
poweroff.
prev parent reply other threads:[~2008-02-07 1:13 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <200801252259.m0PMxHmD012059@hera.kernel.org>
2008-02-06 0:46 ` softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks Andrew Morton
2008-02-06 14:50 ` Peter Zijlstra
2008-02-06 18:05 ` Andrew Morton
2008-02-07 0:04 ` Ingo Molnar
2008-02-07 0:31 ` Andrew Morton
2008-02-07 0:47 ` Andrew Morton
2008-02-07 0:51 ` Ingo Molnar
2008-02-07 1:12 ` Andrew Morton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080206171230.72a058ae.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=a.p.zijlstra@chello.nl \
--cc=ego@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox