All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Andi Kleen <andi@firstfloor.org>
Cc: Arjan van de Ven <arjan@infradead.org>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks
Date: Sun, 2 Dec 2007 22:10:27 +0100	[thread overview]
Message-ID: <20071202211027.GA32282@elte.hu> (raw)
In-Reply-To: <20071202204725.GA25891@one.firstfloor.org>


* Andi Kleen <andi@firstfloor.org> wrote:

> > Out of direct experience, 95% of the "too long delay" cases are plain 
> > old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could 
> 
> I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).  It 
> would be pretty bad to merge this patch without converting them to 
> TASK_KILLABLE first

which we want to do in 2.6.25 anyway, so i dont see any big problems 
here. Also, it costs nothing to just stick it in and see the results, 
worst case we'd have to flip around the default. I think this is much 
ado about nothing - so far i dont really see any objective basis for 
your negative attitude.

> There's also the additional issue that even block devices are often 
> network or SAN backed these days. Having 120 second delays in there is 
> quite possible.
>
> So most likely adding this patch and still keeping a robust kernel 
> would require converting most of these delays to TASK_KILLABLE first. 
> That would not be a bad thing -- i would often like to kill a process 
> stuck on a bad block device -- but is likely a lot of work.

what if you considered - just for a minute - the possibility of this 
debug tool being the thing that actually animates developers to fix such 
long delay bugs that have bothered users for almost a decade meanwhile?

Until now users had little direct recourse to get such problems fixed. 
(we had sysrq-t, but that included no real metric of how long a task was 
blocked, so there was no direct link in the typical case and users had 
no real reliable tool to express their frustration about unreasonable 
delays.)

Now this changes: they get a "smoking gun" backtrace reported by the 
kernel, and blamed on exactly the place that caused that unreasonable 
delay. And it's not like the kernel breaks - at most 10 such messages 
are reported per bootup.

We increase the delay timeout to say 300 seconds, and if the system is 
under extremely high IO load then 120+ might be a reasonable delay, so 
it's all tunable and runtime disable-able anyway. So if you _know_ that 
you will see and tolerate such long delays, you can tweak it - but i can 
tell you with 100% certainty that 99.9% of the typical Linux users do 
not characterize such long delays as "correct behavior".

> > There are no softlockup false positive bugs open at the moment. If 
> > you know about any, then please do not hesitate and report them, 
> > i'll be eager to fix them. The softlockup detector is turned on by 
> > default in Fedora (alongside lockdep in rawhide), and it helped us 
> > find countless
> 
> That just means nobody runs stress tests on those. [...]

that is an all-encompassing blanket assertion that sadly drips of ill 
will (which permeates your mails lately). I for example run tons of 
stress tests on "those" and of course many others do too. So i dont 
really know what to think of your statement :-(

> [...] e.g. lockdep tends to explode even on simple stress tests on 
> larger systems because it tracks all locks in all dynamic objects in 
> memory and towards 6k-10k entries the graph walks tend to take 
> multiple seconds on some NUMA systems.

a bug was fixed in this area - can you still see this with 2.6.24-rc3?

[ But i'd be the first one to point out that lockdep is certainly not
  from the cheap tools department, that's why i said above that lockdep
  is enabled in Fedora rawhide (i.e. development) kernels. Softlockup
  detector is much cheaper and it's default enabled all the time. ]

	Ingo

  reply	other threads:[~2007-12-02 21:10 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-01  9:20 [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks Ingo Molnar
2007-12-01 18:31 ` David Rientjes
2007-12-01 18:33   ` Ingo Molnar
2007-12-01 18:42 ` David Rientjes
2007-12-01 19:36   ` Ingo Molnar
2007-12-02  0:54     ` Ingo Oeser
2007-12-02  8:58       ` Ingo Molnar
2007-12-02 15:52       ` David Rientjes
2007-12-02 18:57 ` Andi Kleen
2007-12-02 18:59   ` Ingo Molnar
2007-12-02 19:41     ` Arjan van de Ven
2007-12-02 20:08       ` Ingo Molnar
2007-12-02 20:09       ` Andi Kleen
2007-12-02 20:26         ` Ingo Molnar
2007-12-02 20:47           ` Andi Kleen
2007-12-02 21:10             ` Ingo Molnar [this message]
2007-12-02 21:19               ` Andi Kleen
2007-12-02 21:24                 ` Ingo Molnar
2007-12-02 21:34                   ` Andi Kleen
2007-12-02 22:25                     ` Ingo Molnar
2007-12-02 22:18                 ` Arjan van de Ven
2007-12-02 22:20                 ` Ingo Molnar
2007-12-03  0:00                   ` Andi Kleen
2007-12-02 22:43             ` Arjan van de Ven
2007-12-03  0:07               ` Andi Kleen
2007-12-03  0:59                 ` Arjan van de Ven
2007-12-03  9:55                   ` Andi Kleen
2007-12-03 10:15                     ` Radoslaw Szkodzinski
2007-12-03 10:23                       ` Ingo Molnar
2007-12-03 10:27                       ` Andi Kleen
2007-12-03 10:38                         ` Ingo Molnar
2007-12-03 11:04                           ` Andi Kleen
2007-12-03 11:59                             ` Ingo Molnar
2007-12-03 12:13                               ` Andi Kleen
2007-12-03 12:28                                 ` Ingo Molnar
2007-12-03 12:41                                   ` Andi Kleen
2007-12-03 13:00                                     ` Ingo Molnar
2007-12-03 13:14                                       ` Andi Kleen
     [not found]                                         ` <20071203132955.GA31354@elte.hu>
2007-12-03 13:41                                           ` Radoslaw Szkodzinski
2007-12-03 13:59                                             ` Ingo Molnar
2007-12-03 14:15                                               ` Andi Kleen
2007-12-03 13:48                                           ` Andi Kleen
2007-12-03 13:55                                             ` Ingo Molnar
2007-12-03 14:17                                               ` Andi Kleen
2007-12-03 14:33                                                 ` Ingo Molnar
2007-12-03 17:02                                                 ` Ray Lee
2007-12-03 13:50                                 ` Pekka Enberg
2007-12-03 13:57                                   ` Ingo Molnar
2007-12-03 14:14                                   ` Andi Kleen
2007-12-03 14:19                                     ` Ingo Molnar
2007-12-03 17:57                                       ` Andrew Morton
2007-12-03 18:28                                         ` Rafael J. Wysocki
2007-12-03 19:24                                           ` Ingo Molnar
2007-12-03 22:47                                             ` Rafael J. Wysocki
2007-12-04  0:05                                               ` Ingo Molnar
2007-12-03 15:23                         ` Arjan van de Ven
2007-12-03 16:36                           ` Andi Kleen
2007-12-05 22:31                           ` Mark Lord

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071202211027.GA32282@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=arjan@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.