From: Andi Kleen <andi@firstfloor.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <andi@firstfloor.org>,
Radoslaw Szkodzinski <lkml@astralstorm.puszkin.org>,
Arjan van de Ven <arjan@infradead.org>,
linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks
Date: Mon, 3 Dec 2007 13:41:44 +0100 [thread overview]
Message-ID: <20071203124144.GC2986@one.firstfloor.org> (raw)
In-Reply-To: <20071203122833.GA20232@elte.hu>
On Mon, Dec 03, 2007 at 01:28:33PM +0100, Ingo Molnar wrote:
>
> > On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > > no. (that's why i added the '(or a kill -9)' qualification above - if
> > > NFS is mounted noninterruptible then standard signals (such as Ctrl-C)
> > > should not have an interrupting effect.)
> >
> > NFS is already interruptible with umount -f (I use that all the
> > time...), but softlockup won't know that and throw the warning
> > anyways.
>
> umount -f is a spectacularly unintelligent solution (it requires the
> user to know precisely which path to umount, etc.),
lsof | grep programname
> TASK_KILLABLE is a lot more useful.
Not sure it is better on all measures.
One problem is how to distingush again between program abort
(which only affects the program) and IO abort (which leaves
EIO marked pages in the page cache affecting other processes too)
umount -f does this at last.
I didn't think TASK_KILLABLE has solved that cleanly (although
I admit I haven't read the latest patchkit, perhaps that has changed
over the first iteration)
But it also probably doesn't make things much worse than they were before.
>
> > > your syslet snide comment aside (which is quite incomprehensible - a
> >
> > For the record I have no principle problem with syslets, just I do
> > consider them roughly equivalent in end result to a explicit retry
> > based AIO implementation.
>
> which suggests you have not really understood syslets. Syslets have no
That's possible.
> "retry" component, they just process straight through the workflow.
> Retry based AIO has a retry component, which - as its name suggests
> already - retries operations instead of processing through the workload
> intelligently. Depending on how "deep" the context of an operation the
> retries might or might not make a noticeable difference in performance,
> but it sure is an inferior approach.
Not sure what is that less intelligent in retry (you're
refering to more CPU cycles needed?), but I admit I haven't
thought very deeply about that.
>
> > > retry based asynchonous IO model is clearly inferior even if it were
> > > implemented everywhere), i do think that most if not all of these
> > > supposedly "difficult to fix" codepaths are just on the backburner
> > > out of lack of a clear blame vector.
> >
> > Hmm. -ENOPARSE. Can you please clarify?
>
> which bit was unclear to you? The retry bit i've explained above, lemme
> know if there's any other unclarity.
The clear blame vector bit was unclear.
> > > nice euphemism for hiding from the blame forever. We had 10 years
> > > for it
> >
> > Ok your approach is then to "let's warn about it and hope it will go
> > away"
>
> s/hope//, but yes. Surprisingly, this works quite well :-) [as long as
> the warnings are not excessively bogus, of course]
Well i consider a backtrace excessively bogus.
> > Anyways I think I could live with it a one liner warning (if it's
> > seriously rate limited etc.) and a sysctl to enable the backtraces;
> > off by default. Or if you prefer that record the backtrace always in a
> > buffer and make it available somewhere in /proc or /sys or /debug.
> > Would that work for you?
>
> you are over-designing it way too much - a backtrace is obviously very
> helpful and it must be printed by default. There's enough
> configurability in it already so that you can turn it off if you want.
So it will hit everybody first before they can figure out how
to get rid of it? That was the part I was objecting too.
If it is decided to warn about something which is not 100% clear a bug
(and I think I have established this for now -- at least you didn't
object to many of my examples...) then the likely
false positives shouldn't be too obnoxious. Backtraces are unfortunately
obnoxious and always come at a high cost (worried user, linux reputation
as a buggy OS, mailing list bandwidth, support load etc.) and having that
for too many false positives is a bad thing.
> (And you said SLES has softlockup turned off already so it shouldnt
> affect you anyway.)
My objection was not really for SLES, but for general Linux kernel
quality.
-Andi
next prev parent reply other threads:[~2007-12-03 12:41 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-01 9:20 [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks Ingo Molnar
2007-12-01 18:31 ` David Rientjes
2007-12-01 18:33 ` Ingo Molnar
2007-12-01 18:42 ` David Rientjes
2007-12-01 19:36 ` Ingo Molnar
2007-12-02 0:54 ` Ingo Oeser
2007-12-02 8:58 ` Ingo Molnar
2007-12-02 15:52 ` David Rientjes
2007-12-02 18:57 ` Andi Kleen
2007-12-02 18:59 ` Ingo Molnar
2007-12-02 19:41 ` Arjan van de Ven
2007-12-02 20:08 ` Ingo Molnar
2007-12-02 20:09 ` Andi Kleen
2007-12-02 20:26 ` Ingo Molnar
2007-12-02 20:47 ` Andi Kleen
2007-12-02 21:10 ` Ingo Molnar
2007-12-02 21:19 ` Andi Kleen
2007-12-02 21:24 ` Ingo Molnar
2007-12-02 21:34 ` Andi Kleen
2007-12-02 22:25 ` Ingo Molnar
2007-12-02 22:18 ` Arjan van de Ven
2007-12-02 22:20 ` Ingo Molnar
2007-12-03 0:00 ` Andi Kleen
2007-12-02 22:43 ` Arjan van de Ven
2007-12-03 0:07 ` Andi Kleen
2007-12-03 0:59 ` Arjan van de Ven
2007-12-03 9:55 ` Andi Kleen
2007-12-03 10:15 ` Radoslaw Szkodzinski
2007-12-03 10:23 ` Ingo Molnar
2007-12-03 10:27 ` Andi Kleen
2007-12-03 10:38 ` Ingo Molnar
2007-12-03 11:04 ` Andi Kleen
2007-12-03 11:59 ` Ingo Molnar
2007-12-03 12:13 ` Andi Kleen
2007-12-03 12:28 ` Ingo Molnar
2007-12-03 12:41 ` Andi Kleen [this message]
2007-12-03 13:00 ` Ingo Molnar
2007-12-03 13:14 ` Andi Kleen
[not found] ` <20071203132955.GA31354@elte.hu>
2007-12-03 13:41 ` Radoslaw Szkodzinski
2007-12-03 13:59 ` Ingo Molnar
2007-12-03 14:15 ` Andi Kleen
2007-12-03 13:48 ` Andi Kleen
2007-12-03 13:55 ` Ingo Molnar
2007-12-03 14:17 ` Andi Kleen
2007-12-03 14:33 ` Ingo Molnar
2007-12-03 17:02 ` Ray Lee
2007-12-03 13:50 ` Pekka Enberg
2007-12-03 13:57 ` Ingo Molnar
2007-12-03 14:14 ` Andi Kleen
2007-12-03 14:19 ` Ingo Molnar
2007-12-03 17:57 ` Andrew Morton
2007-12-03 18:28 ` Rafael J. Wysocki
2007-12-03 19:24 ` Ingo Molnar
2007-12-03 22:47 ` Rafael J. Wysocki
2007-12-04 0:05 ` Ingo Molnar
2007-12-03 15:23 ` Arjan van de Ven
2007-12-03 16:36 ` Andi Kleen
2007-12-05 22:31 ` Mark Lord
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071203124144.GC2986@one.firstfloor.org \
--to=andi@firstfloor.org \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lkml@astralstorm.puszkin.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox