From: Ingo Molnar <mingo@elte.hu>
To: Andi Kleen <andi@firstfloor.org>
Cc: Arjan van de Ven <arjan@infradead.org>,
linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks
Date: Sun, 2 Dec 2007 22:10:27 +0100 [thread overview]
Message-ID: <20071202211027.GA32282@elte.hu> (raw)
In-Reply-To: <20071202204725.GA25891@one.firstfloor.org>
* Andi Kleen <andi@firstfloor.org> wrote:
> > Out of direct experience, 95% of the "too long delay" cases are plain
> > old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could
>
> I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs). It
> would be pretty bad to merge this patch without converting them to
> TASK_KILLABLE first
which we want to do in 2.6.25 anyway, so i dont see any big problems
here. Also, it costs nothing to just stick it in and see the results,
worst case we'd have to flip around the default. I think this is much
ado about nothing - so far i dont really see any objective basis for
your negative attitude.
> There's also the additional issue that even block devices are often
> network or SAN backed these days. Having 120 second delays in there is
> quite possible.
>
> So most likely adding this patch and still keeping a robust kernel
> would require converting most of these delays to TASK_KILLABLE first.
> That would not be a bad thing -- i would often like to kill a process
> stuck on a bad block device -- but is likely a lot of work.
what if you considered - just for a minute - the possibility of this
debug tool being the thing that actually animates developers to fix such
long delay bugs that have bothered users for almost a decade meanwhile?
Until now users had little direct recourse to get such problems fixed.
(we had sysrq-t, but that included no real metric of how long a task was
blocked, so there was no direct link in the typical case and users had
no real reliable tool to express their frustration about unreasonable
delays.)
Now this changes: they get a "smoking gun" backtrace reported by the
kernel, and blamed on exactly the place that caused that unreasonable
delay. And it's not like the kernel breaks - at most 10 such messages
are reported per bootup.
We increase the delay timeout to say 300 seconds, and if the system is
under extremely high IO load then 120+ might be a reasonable delay, so
it's all tunable and runtime disable-able anyway. So if you _know_ that
you will see and tolerate such long delays, you can tweak it - but i can
tell you with 100% certainty that 99.9% of the typical Linux users do
not characterize such long delays as "correct behavior".
> > There are no softlockup false positive bugs open at the moment. If
> > you know about any, then please do not hesitate and report them,
> > i'll be eager to fix them. The softlockup detector is turned on by
> > default in Fedora (alongside lockdep in rawhide), and it helped us
> > find countless
>
> That just means nobody runs stress tests on those. [...]
that is an all-encompassing blanket assertion that sadly drips of ill
will (which permeates your mails lately). I for example run tons of
stress tests on "those" and of course many others do too. So i dont
really know what to think of your statement :-(
> [...] e.g. lockdep tends to explode even on simple stress tests on
> larger systems because it tracks all locks in all dynamic objects in
> memory and towards 6k-10k entries the graph walks tend to take
> multiple seconds on some NUMA systems.
a bug was fixed in this area - can you still see this with 2.6.24-rc3?
[ But i'd be the first one to point out that lockdep is certainly not
from the cheap tools department, that's why i said above that lockdep
is enabled in Fedora rawhide (i.e. development) kernels. Softlockup
detector is much cheaper and it's default enabled all the time. ]
Ingo
next prev parent reply other threads:[~2007-12-02 21:10 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-01 9:20 [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks Ingo Molnar
2007-12-01 18:31 ` David Rientjes
2007-12-01 18:33 ` Ingo Molnar
2007-12-01 18:42 ` David Rientjes
2007-12-01 19:36 ` Ingo Molnar
2007-12-02 0:54 ` Ingo Oeser
2007-12-02 8:58 ` Ingo Molnar
2007-12-02 15:52 ` David Rientjes
2007-12-02 18:57 ` Andi Kleen
2007-12-02 18:59 ` Ingo Molnar
2007-12-02 19:41 ` Arjan van de Ven
2007-12-02 20:08 ` Ingo Molnar
2007-12-02 20:09 ` Andi Kleen
2007-12-02 20:26 ` Ingo Molnar
2007-12-02 20:47 ` Andi Kleen
2007-12-02 21:10 ` Ingo Molnar [this message]
2007-12-02 21:19 ` Andi Kleen
2007-12-02 21:24 ` Ingo Molnar
2007-12-02 21:34 ` Andi Kleen
2007-12-02 22:25 ` Ingo Molnar
2007-12-02 22:18 ` Arjan van de Ven
2007-12-02 22:20 ` Ingo Molnar
2007-12-03 0:00 ` Andi Kleen
2007-12-02 22:43 ` Arjan van de Ven
2007-12-03 0:07 ` Andi Kleen
2007-12-03 0:59 ` Arjan van de Ven
2007-12-03 9:55 ` Andi Kleen
2007-12-03 10:15 ` Radoslaw Szkodzinski
2007-12-03 10:23 ` Ingo Molnar
2007-12-03 10:27 ` Andi Kleen
2007-12-03 10:38 ` Ingo Molnar
2007-12-03 11:04 ` Andi Kleen
2007-12-03 11:59 ` Ingo Molnar
2007-12-03 12:13 ` Andi Kleen
2007-12-03 12:28 ` Ingo Molnar
2007-12-03 12:41 ` Andi Kleen
2007-12-03 13:00 ` Ingo Molnar
2007-12-03 13:14 ` Andi Kleen
[not found] ` <20071203132955.GA31354@elte.hu>
2007-12-03 13:41 ` Radoslaw Szkodzinski
2007-12-03 13:59 ` Ingo Molnar
2007-12-03 14:15 ` Andi Kleen
2007-12-03 13:48 ` Andi Kleen
2007-12-03 13:55 ` Ingo Molnar
2007-12-03 14:17 ` Andi Kleen
2007-12-03 14:33 ` Ingo Molnar
2007-12-03 17:02 ` Ray Lee
2007-12-03 13:50 ` Pekka Enberg
2007-12-03 13:57 ` Ingo Molnar
2007-12-03 14:14 ` Andi Kleen
2007-12-03 14:19 ` Ingo Molnar
2007-12-03 17:57 ` Andrew Morton
2007-12-03 18:28 ` Rafael J. Wysocki
2007-12-03 19:24 ` Ingo Molnar
2007-12-03 22:47 ` Rafael J. Wysocki
2007-12-04 0:05 ` Ingo Molnar
2007-12-03 15:23 ` Arjan van de Ven
2007-12-03 16:36 ` Andi Kleen
2007-12-05 22:31 ` Mark Lord
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071202211027.GA32282@elte.hu \
--to=mingo@elte.hu \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=arjan@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox