All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: a.p.zijlstra@chello.nl, linux-kernel@vger.kernel.org, ego@in.ibm.com
Subject: Re: softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
Date: Wed, 6 Feb 2008 17:12:30 -0800	[thread overview]
Message-ID: <20080206171230.72a058ae.akpm@linux-foundation.org> (raw)
In-Reply-To: <20080207005110.GA1457@elte.hu>

On Thu, 7 Feb 2008 01:51:10 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > Nope.
> > 
> > But I tested it on mainline, and mainline exhibits the 
> > never-powers-off symptom, whereas 
> > ed50d6cbc394cd0966469d3e249353c9dd1d38b9 demonstrates the 
> > powers-off-after-20-seconds symptom.
> > 
> > So we _may_ be dealing with two bugs here, and your patch might have 
> > fixed the first, but that success is obscured by the second.  I guess 
> > I need to prepare a tree which has 
> > ed50d6cbc394cd0966469d3e249353c9dd1d38b9 at its tip.  (Wonders how to 
> > do that).
> 
> the way i do it in bisection is to do:
> 
>   mkdir patches
>   git-log -1 -p ed50d6cbc394cd0966469d3 > patches/fix.patch
>   echo fix.patch > patches/series
> 
> and then before testing a bisection point, i do a 'quilt push'. Before 
> telling git-bisect about the quality of that bisection point (good/bad) 
> i pop it off via 'quilt pop'.
> 
> this way the 'required fix' can be kept during the bisection, to find 
> the secondary bug.
> 
> > btw, mainline (plus this patch, not that it changed anything) prints
> > 
> > <stopping disk stuff>
> > Disabling non-boot CPUs
> > CPU 1 is now offline
> > 
> > and that's it.   This machine has eight cpus.  Might be a hint?
> 
> what should be the proper message?

Seems that it should be a stream of eight

CPU n is now offline
CPU n down

> my suspects, besides there being something wrong in the hung-tasks code 
> of the softlockup watchdog, would be the cpu-hotplug commits, or some 
> arch/x86 commit. (although we didnt really have anything specifically 
> touching the the reboot path)
> 
> does a stupid patch like the one below tell you more about what the 
> other CPUs are doing during this hang? [32-bit only patch]
> 
> 	Ingo
> 
> ---
>  arch/i386/kernel/nmi.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> Index: linux/arch/i386/kernel/nmi.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/nmi_64.c
> +++ linux/arch/x86/kernel/nmi_64.c
> @@ -331,6 +331,14 @@ __kprobes int nmi_watchdog_tick(struct p
>  	int touched = 0;
>  	int cpu = smp_processor_id();
>  	int rc=0;
> +	static int count[NR_CPUS];
> +
> +	if (!count[cpu]) {
> +		count[cpu] = nmi_hz;
> +		printk("CPU#%d, tick\n", cpu);
> +		show_regs(regs);
> +	}
> +	count[cpu]--;
>  
>  	/* check for other users first */
>  	if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)

I reworked that on top of ed50d6cbc394cd0966469d3e249353c9dd1d38b9: no
change.

However I watched the vga console this time (nothing is coming over
netconsole at this stage) I saw this:


CPU 1 is now offline
<10 second pause>
CPU 1 is down
CPU 2 is now offline
CPU 2 is down
CPU 3 is now offline
CPU 3 is down
CPU 4 is now offline
<10 second pause>

followed by a quick spew of the remaining CPUs going down and offline then
poweroff.


      reply	other threads:[~2008-02-07  1:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200801252259.m0PMxHmD012059@hera.kernel.org>
2008-02-06  0:46 ` softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks Andrew Morton
2008-02-06 14:50   ` Peter Zijlstra
2008-02-06 18:05     ` Andrew Morton
2008-02-07  0:04       ` Ingo Molnar
2008-02-07  0:31         ` Andrew Morton
2008-02-07  0:47           ` Andrew Morton
2008-02-07  0:51           ` Ingo Molnar
2008-02-07  1:12             ` Andrew Morton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080206171230.72a058ae.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=ego@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.