public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.32-rc5-mmotm1101 - unkillable processes stuck in futex.
@ 2009-11-05 17:47 Valdis.Kletnieks
  2009-11-05 19:20 ` Thomas Gleixner
  2009-11-05 19:22 ` Darren Hart
  0 siblings, 2 replies; 3+ messages in thread
From: Valdis.Kletnieks @ 2009-11-05 17:47 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Darren Hart; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 11209 bytes --]

(Hmm.. I seem to be on a roll on this -mmotm, breaking all sorts of stuff.. :)

Am cc'ing Thomas and Darren because their names were attached to commits in
the origin.patch that touched futex.c

It looks like pulseaudio clients with multiple threads manage to hose up
the futex code to the point they're not kill -9'able.  Semi-replicatable,
as I've hit it twice by accident. No recipe for triggering it yet.

Did it once to gyachi (a Yahoo Messenger client) and  twice to pidgin (an
everything-else IM client). 'top' would report 100%CPU usage, all of it kernel
mode, and it was confirmed by the CPU going to top Ghz and warming up some 6-7
degrees (so we were spinning on something rather than a wait/deadlock). In both
cases, I tried to kill -9 the process, the process didn't go away.

Here's the 'alt-sysrq-t' for both cases.  I started a second pidgin the second
time around, that one wedged real fast (on the first thread it created) and
didn't get kill -9'ed (if that makes a diff in the stack trace...)

gyachi wedged up - main thread kept going, subthread hung.

[44347.339018] gyachi        ? ffff88000260e010  3856  3183   2393 0x00000080
[44347.339018]  ffff88006c3cfeb8 0000000000000046 ffff88006c3cfe80 ffff88006c3cfe7c
[44347.339018]  ffff88006c3cfe28 0000000000000000 0000000000000155 ffff88006c0dabc0
[44347.339018]  ffff88006c3ce000 000000000000e010 ffff88006c0dabc0 00000001029f3766
[44347.339018] Call Trace:
[44347.339018]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[44347.339018]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
[44347.339018]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[44347.339018]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[44347.339018]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[44347.339018] gyachi        R  running task     5344  3187   2393 0x00000084
[44347.339018]  ffff88006c2c6b40 0000000000000002 ffff88007967f988 ffffffff81066193
[44347.339018]  ffff88007967f998 ffffffff81066193 ffffffff823ceab0 0000000000000000
[44347.339018]  000000007967fab8 ffffffff814bd184 0000000000000000 ffff88007f8b0000
[44347.339018] Call Trace:
[44347.339018]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
[44347.339018]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
[44347.339018]  [<ffffffff814bd184>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[44347.339018]  [<ffffffff814be2c0>] ? restore_args+0x0/0x30
[44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[44347.339018]  [<ffffffff811caa4c>] ? _raw_spin_lock+0xe9/0x1ab
[44347.339018]  [<ffffffff81030429>] ? get_parent_ip+0x11/0x41
[44347.339018]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[44347.339018]  [<ffffffff810692d2>] ? queue_unlock+0x1d/0x21
[44347.339018]  [<ffffffff8106939f>] ? futex_wait_setup+0xc9/0xeb
[44347.339018]  [<ffffffff8106ae9d>] ? futex_wait_requeue_pi+0x190/0x3d4
[44347.339018]  [<ffffffff814bdae1>] ? _spin_unlock_irq+0x62/0x6f
[44347.339018]  [<ffffffff814bda7a>] ? _spin_unlock_irqrestore+0x7b/0x80
[44347.339018]  [<ffffffff8102843e>] ? need_resched+0x3a/0x40
[44347.339018]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
[44347.339018]  [<ffffffff814bd9fa>] ? _spin_unlock+0x65/0x6a
[44347.339018]  [<ffffffff81069afc>] ? futex_wake+0x108/0x11a
[44347.339018]  [<ffffffff8106c11f>] ? do_futex+0x95d/0x9cb
[44347.339018]  [<ffffffff8106c2d9>] ? sys_futex+0x14c/0x164
[44347.339018]  [<ffffffff810e2328>] ? path_put+0x1d/0x22
[44347.339018]  [<ffffffff8100246b>] ? system_call_fastpath+0x16/0x1b

After the reboot, it bit again, pidgin this time.  Since the main thread
is the one that wedged, it locked up hard.

[ 1730.490005] pidgin        R  running task     4112  4195   2312 0x00000084
[ 1730.490005]  ffff880068889a08 ffffffff81066193 ffff880068889b54 0000000000000000
[ 1730.490005]  ffff880068889ae8 ffff880068aa8c80 0000000000000002 0000000000000000
[ 1730.490005]  ffffffff81069189 0000000000000000 ffff880068889ab8 0000000000000246
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
[ 1730.490005]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[ 1730.490005]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[ 1730.490005]  [<ffffffff814be2c0>] ? restore_args+0x0/0x30
[ 1730.490005]  [<ffffffff811caa4c>] ? _raw_spin_lock+0xe9/0x1ab
[ 1730.490005]  [<ffffffff814bd506>] ? _spin_lock+0x36/0x45
[ 1730.490005]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[ 1730.490005]  [<ffffffff814bd9bb>] ? _spin_unlock+0x26/0x6a
[ 1730.490005]  [<ffffffff810691bf>] ? get_futex_value_locked+0x2b/0x49
[ 1730.490005]  [<ffffffff810692c9>] ? queue_unlock+0x14/0x21
[ 1730.490005]  [<ffffffff8106939f>] ? futex_wait_setup+0xc9/0xeb
[ 1730.490005]  [<ffffffff81097e34>] ? ftrace_likely_update+0xc/0x14
[ 1730.490005]  [<ffffffff8106ae9d>] ? futex_wait_requeue_pi+0x190/0x3d4
[ 1730.490005]  [<ffffffff814bdae1>] ? _spin_unlock_irq+0x62/0x6f
[ 1730.490005]  [<ffffffff814bda7a>] ? _spin_unlock_irqrestore+0x7b/0x80
[ 1730.490005]  [<ffffffff8102843e>] ? need_resched+0x3a/0x40
[ 1730.490005]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
[ 1730.490005]  [<ffffffff814bd9fa>] ? _spin_unlock+0x65/0x6a
[ 1730.490005]  [<ffffffff81069afc>] ? futex_wake+0x108/0x11a
[ 1730.490005]  [<ffffffff8106c11f>] ? do_futex+0x95d/0x9cb
[ 1730.490005]  [<ffffffff8106c2d9>] ? sys_futex+0x14c/0x164
[ 1730.490005]  [<ffffffff810e2328>] ? path_put+0x1d/0x22
[ 1730.490005]  [<ffffffff8100246b>] ? system_call_fastpath+0x16/0x1b

(This is me starting another one because the first one wedged. It wedged too, but
I don't remember kill -9'ing this one...)

[ 1730.490005] pidgin        R  running task     5672  4220   2312 0x00000084
[ 1730.490005]  ffff880057ce7a18 0000000000000046 ffff8800026133c0 ffff88005410a380
[ 1730.490005]  ffff880057ce7978 ffff8800026133c0 ffff880057ce7998 ffff880057f3c4c0
[ 1730.490005]  ffff880057ce6000 000000000000e010 ffff880057f3c4c8 ffffffff81030d7e
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff81030d7e>] ? finish_task_switch+0x95/0xb8
[ 1730.490005]  [<ffffffff814bd184>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1730.490005]  [<ffffffff814bb7bd>] preempt_schedule_irq+0x56/0x73
[ 1730.490005]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[ 1730.490005]  [<ffffffff814be3d6>] retint_kernel+0x26/0x30
[ 1730.490005]  [<ffffffff81069128>] ? get_futex_key+0x24e/0x25f
[ 1730.490005]  [<ffffffff81068fa7>] ? get_futex_key+0xcd/0x25f
[ 1730.490005]  [<ffffffff814bd9f1>] ? _spin_unlock+0x5c/0x6a
[ 1730.490005]  [<ffffffff81069319>] futex_wait_setup+0x43/0xeb
[ 1730.490005]  [<ffffffff81068fa7>] ? get_futex_key+0xcd/0x25f
[ 1730.490005]  [<ffffffff8106ae9d>] futex_wait_requeue_pi+0x190/0x3d4
[ 1730.490005]  [<ffffffff814bdae1>] ? _spin_unlock_irq+0x62/0x6f
[ 1730.490005]  [<ffffffff814bda7a>] ? _spin_unlock_irqrestore+0x7b/0x80
[ 1730.490005]  [<ffffffff8102843e>] ? need_resched+0x3a/0x40
[ 1730.490005]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
[ 1730.490005]  [<ffffffff814bd9fa>] ? _spin_unlock+0x65/0x6a
[ 1730.490005]  [<ffffffff81069afc>] ? futex_wake+0x108/0x11a
[ 1730.490005]  [<ffffffff8106c11f>] do_futex+0x95d/0x9cb
[ 1730.490005]  [<ffffffff811c2bd6>] ? __up_read+0x1a/0x7f
[ 1730.490005]  [<ffffffff811c2bd6>] ? __up_read+0x1a/0x7f
[ 1730.490005]  [<ffffffff811caa4c>] ? _raw_spin_lock+0xe9/0x1ab
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff814bda71>] ? _spin_unlock_irqrestore+0x72/0x80
[ 1730.490005]  [<ffffffff811c2c32>] ? __up_read+0x76/0x7f
[ 1730.490005]  [<ffffffff8106c2d9>] sys_futex+0x14c/0x164
[ 1730.490005]  [<ffffffff810e2328>] ? path_put+0x1d/0x22
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b

And the rest of the thread from the first one I started. They're all
packed up and ready to leave Dodge on the first stagecoach, but the one
thread is still stuck in the saloon and unable to find its way out...

[ 1730.490005] pidgin        ? ffff88007f872040  5568  4214   4195 0x00000080
[ 1730.490005]  ffff880054033eb8 0000000000000046 ffffffff8103ec0b ffff88007e968400
[ 1730.490005]  0000000000000011 0000000000040001 000003c700001076 ffff880057c4b280
[ 1730.490005]  ffff880054032000 000000000000e010 ffff880057c4b280 0000000000000000
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff8103ec0b>] ? do_exit+0x779/0x906
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[ 1730.490005]  [<ffffffff814bd1fa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1730.490005]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[ 1730.490005]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[ 1730.490005] pidgin        ? ffff88007f872040  5568  4215   4195 0x00000080
[ 1730.490005]  ffff880054059eb8 0000000000000046 ffffffff8103ec0b ffff88007e968400
[ 1730.490005]  0000000000000011 0000000000040001 000003c700001077 ffff8800689d0f00
[ 1730.490005]  ffff880054058000 000000000000e010 ffff8800689d0f00 0000000000000000
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff8103ec0b>] ? do_exit+0x779/0x906
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[ 1730.490005]  [<ffffffff814bd1fa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1730.490005]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[ 1730.490005]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[ 1730.490005] pidgin        ? ffff88007f872040  5568  4216   4195 0x00000080
[ 1730.490005]  ffff8800542e7eb8 0000000000000046 ffffffff8103ec0b ffff88007e968400
[ 1730.490005]  0000000000000011 0000000000040001 000003c700001078 ffff88006887ad40
[ 1730.490005]  ffff8800542e6000 000000000000e010 ffff88006887ad40 0000000000000000
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff8103ec0b>] ? do_exit+0x779/0x906
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[ 1730.490005]  [<ffffffff814bd1fa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1730.490005]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[ 1730.490005]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[ 1730.490005] pidgin        ? ffff88007f872040  5440  4217   4195 0x00000080
[ 1730.490005]  ffff880053c7deb8 0000000000000046 ffffffff8103ec0b ffff88007e968400
[ 1730.490005]  0000000000000011 0000000000040001 000003c700001079 ffff880053c7a440
[ 1730.490005]  ffff880053c7c000 000000000000e010 ffff880053c7a440 0000000000000000
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff8103ec0b>] ? do_exit+0x779/0x906
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[ 1730.490005]  [<ffffffff814bd1fa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1730.490005]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[ 1730.490005]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b





[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 2.6.32-rc5-mmotm1101 - unkillable processes stuck in futex.
  2009-11-05 17:47 2.6.32-rc5-mmotm1101 - unkillable processes stuck in futex Valdis.Kletnieks
@ 2009-11-05 19:20 ` Thomas Gleixner
  2009-11-05 19:22 ` Darren Hart
  1 sibling, 0 replies; 3+ messages in thread
From: Thomas Gleixner @ 2009-11-05 19:20 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Andrew Morton, Darren Hart, linux-kernel

On Thu, 5 Nov 2009, Valdis.Kletnieks@vt.edu wrote:

> (Hmm.. I seem to be on a roll on this -mmotm, breaking all sorts of stuff.. :)
> 
> Am cc'ing Thomas and Darren because their names were attached to commits in
> the origin.patch that touched futex.c

Looks like you are hitting the bug we fixed last week. 

Thanks,

	tglx
---
commit 11df6dddcbc38affb7473aad3d962baf8414a947
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Oct 28 20:26:48 2009 +0100

    futex: Fix spurious wakeup for requeue_pi really
    
    The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
    NULL test) nor does it use the wake_list of futex_wake() which where
    the reason for commit 41890f2 (futex: Handle spurious wake up)
    
    See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com>
    
    The changes in this fix to the wait_requeue_pi path were considered to
    be a likely unecessary, but harmless safety net. But it turns out that
    due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
    as EAGAIN we built an endless loop in the code path which returns
    correctly EWOULDBLOCK.
    
    Spurious wakeups in wait_requeue_pi code path are unlikely so we do
    the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
    it deal with the spurious wakeup.
    
    Cc: Darren Hart <dvhltc@us.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Cc: John Stultz <johnstul@linux.vnet.ibm.com>
    Cc: Dinakar Guniguntala <dino@in.ibm.com>
    LKML-Reference: <4AE23C74.1090502@us.ibm.com>
    Cc: stable@kernel.org
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/kernel/futex.c b/kernel/futex.c
index 642f3bb..fb65e82 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2127,7 +2127,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
 		plist_del(&q->list, &q->list.plist);
 
 		/* Handle spurious wakeups gracefully */
-		ret = -EAGAIN;
+		ret = -EWOULDBLOCK;
 		if (timeout && !timeout->task)
 			ret = -ETIMEDOUT;
 		else if (signal_pending(current))
@@ -2208,7 +2208,6 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 	debug_rt_mutex_init_waiter(&rt_waiter);
 	rt_waiter.task = NULL;
 
-retry:
 	key2 = FUTEX_KEY_INIT;
 	ret = get_futex_key(uaddr2, fshared, &key2, VERIFY_WRITE);
 	if (unlikely(ret != 0))
@@ -2303,9 +2302,6 @@ out_put_keys:
 out_key2:
 	put_futex_key(fshared, &key2);
 
-	/* Spurious wakeup ? */
-	if (ret == -EAGAIN)
-		goto retry;
 out:
 	if (to) {
 		hrtimer_cancel(&to->timer);

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: 2.6.32-rc5-mmotm1101 - unkillable processes stuck in futex.
  2009-11-05 17:47 2.6.32-rc5-mmotm1101 - unkillable processes stuck in futex Valdis.Kletnieks
  2009-11-05 19:20 ` Thomas Gleixner
@ 2009-11-05 19:22 ` Darren Hart
  1 sibling, 0 replies; 3+ messages in thread
From: Darren Hart @ 2009-11-05 19:22 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Andrew Morton, Thomas Gleixner, linux-kernel

Valdis.Kletnieks@vt.edu wrote:

Hi Valdis,

Thanks for reporting. There are a couple things of interest below, but
first: which kernel version exactly?

Specifically, do you have the following patches applied:

43746940a0067656b612490e921ee8e782f12e30 futex: Fix spurious wakeup for requeue_pi r
e814515d47b9e15ebaa08bab0559d189e8ec90eb futex: Detect mismatched requeue targets
41890f2456998c170f416fc29715fadfd57e6626 futex: Handle spurious wake up
370eaf38450c77ec9b5ce6bc74bc575b2e2ce448 futex: Revert "futex: Wake up waiter outsid
a03d103555aa7b3e0c39a9bc9608502d3354392f futex: Fix wakeup race by setting TASK_INTE

> (Hmm.. I seem to be on a roll on this -mmotm, breaking all sorts of stuff.. :)
> 
> Am cc'ing Thomas and Darren because their names were attached to commits in
> the origin.patch that touched futex.c
> 
> It looks like pulseaudio clients with multiple threads manage to hose up
> the futex code to the point they're not kill -9'able.  Semi-replicatable,
> as I've hit it twice by accident. No recipe for triggering it yet.
> 
> Did it once to gyachi (a Yahoo Messenger client) and  twice to pidgin (an
> everything-else IM client). 'top' would report 100%CPU usage, all of it kernel
> mode, and it was confirmed by the CPU going to top Ghz and warming up some 6-7
> degrees (so we were spinning on something rather than a wait/deadlock). In both
> cases, I tried to kill -9 the process, the process didn't go away.
> 
> Here's the 'alt-sysrq-t' for both cases.  I started a second pidgin the second
> time around, that one wedged real fast (on the first thread it created) and
> didn't get kill -9'ed (if that makes a diff in the stack trace...)
> 
> gyachi wedged up - main thread kept going, subthread hung.

> 
> [44347.339018] gyachi        ? ffff88000260e010  3856  3183   2393 0x00000080
> [44347.339018]  ffff88006c3cfeb8 0000000000000046 ffff88006c3cfe80 ffff88006c3cfe7c
> [44347.339018]  ffff88006c3cfe28 0000000000000000 0000000000000155 ffff88006c0dabc0
> [44347.339018]  ffff88006c3ce000 000000000000e010 ffff88006c0dabc0 00000001029f3766
> [44347.339018] Call Trace:
> [44347.339018]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
> [44347.339018]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
> [44347.339018]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
> [44347.339018]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
> [44347.339018]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
> [44347.339018] gyachi        R  running task     5344  3187   2393 0x00000084
> [44347.339018]  ffff88006c2c6b40 0000000000000002 ffff88007967f988 ffffffff81066193
> [44347.339018]  ffff88007967f998 ffffffff81066193 ffffffff823ceab0 0000000000000000
> [44347.339018]  000000007967fab8 ffffffff814bd184 0000000000000000 ffff88007f8b0000
> [44347.339018] Call Trace:
> [44347.339018]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
> [44347.339018]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
> [44347.339018]  [<ffffffff814bd184>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [44347.339018]  [<ffffffff814be2c0>] ? restore_args+0x0/0x30
> [44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
> [44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
> [44347.339018]  [<ffffffff811caa4c>] ? _raw_spin_lock+0xe9/0x1ab
> [44347.339018]  [<ffffffff81030429>] ? get_parent_ip+0x11/0x41
> [44347.339018]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
> [44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
> [44347.339018]  [<ffffffff810692d2>] ? queue_unlock+0x1d/0x21
> [44347.339018]  [<ffffffff8106939f>] ? futex_wait_setup+0xc9/0xeb
> [44347.339018]  [<ffffffff8106ae9d>] ? futex_wait_requeue_pi+0x190/0x3d4

I see this a couple of times in this trace. This indicates the use of the requeue_pi feature. You shouldn't be able to use this without a not-yet-released version of glibc and applications that are using PTHREAD_PRIO_INHERIT pthread_mutexes. Neither of the apps you mentioned seem like good candidates for that. Do you have some other RT workload running?

Thanks,

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-11-05 19:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-05 17:47 2.6.32-rc5-mmotm1101 - unkillable processes stuck in futex Valdis.Kletnieks
2009-11-05 19:20 ` Thomas Gleixner
2009-11-05 19:22 ` Darren Hart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox