Re: 2.6.32-rc5-mmotm1101 - unkillable processes stuck in futex.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Darren Hart <dvhltc@us.ibm.com>
To: Valdis.Kletnieks@vt.edu
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: Re: 2.6.32-rc5-mmotm1101 - unkillable processes stuck in futex.
Date: Thu, 05 Nov 2009 11:22:06 -0800	[thread overview]
Message-ID: <4AF325DE.4000901@us.ibm.com> (raw)
In-Reply-To: <5906.1257443268@turing-police.cc.vt.edu>

Valdis.Kletnieks@vt.edu wrote:

Hi Valdis,

Thanks for reporting. There are a couple things of interest below, but
first: which kernel version exactly?

Specifically, do you have the following patches applied:

43746940a0067656b612490e921ee8e782f12e30 futex: Fix spurious wakeup for requeue_pi r
e814515d47b9e15ebaa08bab0559d189e8ec90eb futex: Detect mismatched requeue targets
41890f2456998c170f416fc29715fadfd57e6626 futex: Handle spurious wake up
370eaf38450c77ec9b5ce6bc74bc575b2e2ce448 futex: Revert "futex: Wake up waiter outsid
a03d103555aa7b3e0c39a9bc9608502d3354392f futex: Fix wakeup race by setting TASK_INTE

> (Hmm.. I seem to be on a roll on this -mmotm, breaking all sorts of stuff.. :)
> 
> Am cc'ing Thomas and Darren because their names were attached to commits in
> the origin.patch that touched futex.c
> 
> It looks like pulseaudio clients with multiple threads manage to hose up
> the futex code to the point they're not kill -9'able.  Semi-replicatable,
> as I've hit it twice by accident. No recipe for triggering it yet.
> 
> Did it once to gyachi (a Yahoo Messenger client) and  twice to pidgin (an
> everything-else IM client). 'top' would report 100%CPU usage, all of it kernel
> mode, and it was confirmed by the CPU going to top Ghz and warming up some 6-7
> degrees (so we were spinning on something rather than a wait/deadlock). In both
> cases, I tried to kill -9 the process, the process didn't go away.
> 
> Here's the 'alt-sysrq-t' for both cases.  I started a second pidgin the second
> time around, that one wedged real fast (on the first thread it created) and
> didn't get kill -9'ed (if that makes a diff in the stack trace...)
> 
> gyachi wedged up - main thread kept going, subthread hung.

> 
> [44347.339018] gyachi        ? ffff88000260e010  3856  3183   2393 0x00000080
> [44347.339018]  ffff88006c3cfeb8 0000000000000046 ffff88006c3cfe80 ffff88006c3cfe7c
> [44347.339018]  ffff88006c3cfe28 0000000000000000 0000000000000155 ffff88006c0dabc0
> [44347.339018]  ffff88006c3ce000 000000000000e010 ffff88006c0dabc0 00000001029f3766
> [44347.339018] Call Trace:
> [44347.339018]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
> [44347.339018]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
> [44347.339018]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
> [44347.339018]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
> [44347.339018]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
> [44347.339018] gyachi        R  running task     5344  3187   2393 0x00000084
> [44347.339018]  ffff88006c2c6b40 0000000000000002 ffff88007967f988 ffffffff81066193
> [44347.339018]  ffff88007967f998 ffffffff81066193 ffffffff823ceab0 0000000000000000
> [44347.339018]  000000007967fab8 ffffffff814bd184 0000000000000000 ffff88007f8b0000
> [44347.339018] Call Trace:
> [44347.339018]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
> [44347.339018]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
> [44347.339018]  [<ffffffff814bd184>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [44347.339018]  [<ffffffff814be2c0>] ? restore_args+0x0/0x30
> [44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
> [44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
> [44347.339018]  [<ffffffff811caa4c>] ? _raw_spin_lock+0xe9/0x1ab
> [44347.339018]  [<ffffffff81030429>] ? get_parent_ip+0x11/0x41
> [44347.339018]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
> [44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
> [44347.339018]  [<ffffffff810692d2>] ? queue_unlock+0x1d/0x21
> [44347.339018]  [<ffffffff8106939f>] ? futex_wait_setup+0xc9/0xeb
> [44347.339018]  [<ffffffff8106ae9d>] ? futex_wait_requeue_pi+0x190/0x3d4

I see this a couple of times in this trace. This indicates the use of the requeue_pi feature. You shouldn't be able to use this without a not-yet-released version of glibc and applications that are using PTHREAD_PRIO_INHERIT pthread_mutexes. Neither of the apps you mentioned seem like good candidates for that. Do you have some other RT workload running?

Thanks,

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

     prev parent reply	other threads:[~2009-11-05 19:22 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-05 17:47 2.6.32-rc5-mmotm1101 - unkillable processes stuck in futex Valdis.Kletnieks
2009-11-05 19:20 ` Thomas Gleixner
2009-11-05 19:22 ` Darren Hart [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AF325DE.4000901@us.ibm.com \
    --to=dvhltc@us.ibm.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox