public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Andi Kleen <andi@firstfloor.org>
Cc: Arjan van de Ven <arjan@infradead.org>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks
Date: Sun, 2 Dec 2007 22:10:27 +0100	[thread overview]
Message-ID: <20071202211027.GA32282@elte.hu> (raw)
In-Reply-To: <20071202204725.GA25891@one.firstfloor.org>


* Andi Kleen <andi@firstfloor.org> wrote:

> > Out of direct experience, 95% of the "too long delay" cases are plain 
> > old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could 
> 
> I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).  It 
> would be pretty bad to merge this patch without converting them to 
> TASK_KILLABLE first

which we want to do in 2.6.25 anyway, so i dont see any big problems 
here. Also, it costs nothing to just stick it in and see the results, 
worst case we'd have to flip around the default. I think this is much 
ado about nothing - so far i dont really see any objective basis for 
your negative attitude.

> There's also the additional issue that even block devices are often 
> network or SAN backed these days. Having 120 second delays in there is 
> quite possible.
>
> So most likely adding this patch and still keeping a robust kernel 
> would require converting most of these delays to TASK_KILLABLE first. 
> That would not be a bad thing -- i would often like to kill a process 
> stuck on a bad block device -- but is likely a lot of work.

what if you considered - just for a minute - the possibility of this 
debug tool being the thing that actually animates developers to fix such 
long delay bugs that have bothered users for almost a decade meanwhile?

Until now users had little direct recourse to get such problems fixed. 
(we had sysrq-t, but that included no real metric of how long a task was 
blocked, so there was no direct link in the typical case and users had 
no real reliable tool to express their frustration about unreasonable 
delays.)

Now this changes: they get a "smoking gun" backtrace reported by the 
kernel, and blamed on exactly the place that caused that unreasonable 
delay. And it's not like the kernel breaks - at most 10 such messages 
are reported per bootup.

We increase the delay timeout to say 300 seconds, and if the system is 
under extremely high IO load then 120+ might be a reasonable delay, so 
it's all tunable and runtime disable-able anyway. So if you _know_ that 
you will see and tolerate such long delays, you can tweak it - but i can 
tell you with 100% certainty that 99.9% of the typical Linux users do 
not characterize such long delays as "correct behavior".

> > There are no softlockup false positive bugs open at the moment. If 
> > you know about any, then please do not hesitate and report them, 
> > i'll be eager to fix them. The softlockup detector is turned on by 
> > default in Fedora (alongside lockdep in rawhide), and it helped us 
> > find countless
> 
> That just means nobody runs stress tests on those. [...]

that is an all-encompassing blanket assertion that sadly drips of ill 
will (which permeates your mails lately). I for example run tons of 
stress tests on "those" and of course many others do too. So i dont 
really know what to think of your statement :-(

> [...] e.g. lockdep tends to explode even on simple stress tests on 
> larger systems because it tracks all locks in all dynamic objects in 
> memory and towards 6k-10k entries the graph walks tend to take 
> multiple seconds on some NUMA systems.

a bug was fixed in this area - can you still see this with 2.6.24-rc3?

[ But i'd be the first one to point out that lockdep is certainly not
  from the cheap tools department, that's why i said above that lockdep
  is enabled in Fedora rawhide (i.e. development) kernels. Softlockup
  detector is much cheaper and it's default enabled all the time. ]

	Ingo

  reply	other threads:[~2007-12-02 21:10 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-01  9:20 [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks Ingo Molnar
2007-12-01 18:31 ` David Rientjes
2007-12-01 18:33   ` Ingo Molnar
2007-12-01 18:42 ` David Rientjes
2007-12-01 19:36   ` Ingo Molnar
2007-12-02  0:54     ` Ingo Oeser
2007-12-02  8:58       ` Ingo Molnar
2007-12-02 15:52       ` David Rientjes
2007-12-02 18:57 ` Andi Kleen
2007-12-02 18:59   ` Ingo Molnar
2007-12-02 19:41     ` Arjan van de Ven
2007-12-02 20:08       ` Ingo Molnar
2007-12-02 20:09       ` Andi Kleen
2007-12-02 20:26         ` Ingo Molnar
2007-12-02 20:47           ` Andi Kleen
2007-12-02 21:10             ` Ingo Molnar [this message]
2007-12-02 21:19               ` Andi Kleen
2007-12-02 21:24                 ` Ingo Molnar
2007-12-02 21:34                   ` Andi Kleen
2007-12-02 22:25                     ` Ingo Molnar
2007-12-02 22:18                 ` Arjan van de Ven
2007-12-02 22:20                 ` Ingo Molnar
2007-12-03  0:00                   ` Andi Kleen
2007-12-02 22:43             ` Arjan van de Ven
2007-12-03  0:07               ` Andi Kleen
2007-12-03  0:59                 ` Arjan van de Ven
2007-12-03  9:55                   ` Andi Kleen
2007-12-03 10:15                     ` Radoslaw Szkodzinski
2007-12-03 10:23                       ` Ingo Molnar
2007-12-03 10:27                       ` Andi Kleen
2007-12-03 10:38                         ` Ingo Molnar
2007-12-03 11:04                           ` Andi Kleen
2007-12-03 11:59                             ` Ingo Molnar
2007-12-03 12:13                               ` Andi Kleen
2007-12-03 12:28                                 ` Ingo Molnar
2007-12-03 12:41                                   ` Andi Kleen
2007-12-03 13:00                                     ` Ingo Molnar
2007-12-03 13:14                                       ` Andi Kleen
     [not found]                                         ` <20071203132955.GA31354@elte.hu>
2007-12-03 13:41                                           ` Radoslaw Szkodzinski
2007-12-03 13:59                                             ` Ingo Molnar
2007-12-03 14:15                                               ` Andi Kleen
2007-12-03 13:48                                           ` Andi Kleen
2007-12-03 13:55                                             ` Ingo Molnar
2007-12-03 14:17                                               ` Andi Kleen
2007-12-03 14:33                                                 ` Ingo Molnar
2007-12-03 17:02                                                 ` Ray Lee
2007-12-03 13:50                                 ` Pekka Enberg
2007-12-03 13:57                                   ` Ingo Molnar
2007-12-03 14:14                                   ` Andi Kleen
2007-12-03 14:19                                     ` Ingo Molnar
2007-12-03 17:57                                       ` Andrew Morton
2007-12-03 18:28                                         ` Rafael J. Wysocki
2007-12-03 19:24                                           ` Ingo Molnar
2007-12-03 22:47                                             ` Rafael J. Wysocki
2007-12-04  0:05                                               ` Ingo Molnar
2007-12-03 15:23                         ` Arjan van de Ven
2007-12-03 16:36                           ` Andi Kleen
2007-12-05 22:31                           ` Mark Lord

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071202211027.GA32282@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=arjan@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox