public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.32-rc5-mmotm1101 - unkillable processes stuck in futex.
@ 2009-11-05 17:47 Valdis.Kletnieks
  2009-11-05 19:20 ` Thomas Gleixner
  2009-11-05 19:22 ` Darren Hart
  0 siblings, 2 replies; 3+ messages in thread
From: Valdis.Kletnieks @ 2009-11-05 17:47 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Darren Hart; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 11209 bytes --]

(Hmm.. I seem to be on a roll on this -mmotm, breaking all sorts of stuff.. :)

Am cc'ing Thomas and Darren because their names were attached to commits in
the origin.patch that touched futex.c

It looks like pulseaudio clients with multiple threads manage to hose up
the futex code to the point they're not kill -9'able.  Semi-replicatable,
as I've hit it twice by accident. No recipe for triggering it yet.

Did it once to gyachi (a Yahoo Messenger client) and  twice to pidgin (an
everything-else IM client). 'top' would report 100%CPU usage, all of it kernel
mode, and it was confirmed by the CPU going to top Ghz and warming up some 6-7
degrees (so we were spinning on something rather than a wait/deadlock). In both
cases, I tried to kill -9 the process, the process didn't go away.

Here's the 'alt-sysrq-t' for both cases.  I started a second pidgin the second
time around, that one wedged real fast (on the first thread it created) and
didn't get kill -9'ed (if that makes a diff in the stack trace...)

gyachi wedged up - main thread kept going, subthread hung.

[44347.339018] gyachi        ? ffff88000260e010  3856  3183   2393 0x00000080
[44347.339018]  ffff88006c3cfeb8 0000000000000046 ffff88006c3cfe80 ffff88006c3cfe7c
[44347.339018]  ffff88006c3cfe28 0000000000000000 0000000000000155 ffff88006c0dabc0
[44347.339018]  ffff88006c3ce000 000000000000e010 ffff88006c0dabc0 00000001029f3766
[44347.339018] Call Trace:
[44347.339018]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[44347.339018]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
[44347.339018]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[44347.339018]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[44347.339018]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[44347.339018] gyachi        R  running task     5344  3187   2393 0x00000084
[44347.339018]  ffff88006c2c6b40 0000000000000002 ffff88007967f988 ffffffff81066193
[44347.339018]  ffff88007967f998 ffffffff81066193 ffffffff823ceab0 0000000000000000
[44347.339018]  000000007967fab8 ffffffff814bd184 0000000000000000 ffff88007f8b0000
[44347.339018] Call Trace:
[44347.339018]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
[44347.339018]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
[44347.339018]  [<ffffffff814bd184>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[44347.339018]  [<ffffffff814be2c0>] ? restore_args+0x0/0x30
[44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[44347.339018]  [<ffffffff811caa4c>] ? _raw_spin_lock+0xe9/0x1ab
[44347.339018]  [<ffffffff81030429>] ? get_parent_ip+0x11/0x41
[44347.339018]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[44347.339018]  [<ffffffff810692d2>] ? queue_unlock+0x1d/0x21
[44347.339018]  [<ffffffff8106939f>] ? futex_wait_setup+0xc9/0xeb
[44347.339018]  [<ffffffff8106ae9d>] ? futex_wait_requeue_pi+0x190/0x3d4
[44347.339018]  [<ffffffff814bdae1>] ? _spin_unlock_irq+0x62/0x6f
[44347.339018]  [<ffffffff814bda7a>] ? _spin_unlock_irqrestore+0x7b/0x80
[44347.339018]  [<ffffffff8102843e>] ? need_resched+0x3a/0x40
[44347.339018]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
[44347.339018]  [<ffffffff814bd9fa>] ? _spin_unlock+0x65/0x6a
[44347.339018]  [<ffffffff81069afc>] ? futex_wake+0x108/0x11a
[44347.339018]  [<ffffffff8106c11f>] ? do_futex+0x95d/0x9cb
[44347.339018]  [<ffffffff8106c2d9>] ? sys_futex+0x14c/0x164
[44347.339018]  [<ffffffff810e2328>] ? path_put+0x1d/0x22
[44347.339018]  [<ffffffff8100246b>] ? system_call_fastpath+0x16/0x1b

After the reboot, it bit again, pidgin this time.  Since the main thread
is the one that wedged, it locked up hard.

[ 1730.490005] pidgin        R  running task     4112  4195   2312 0x00000084
[ 1730.490005]  ffff880068889a08 ffffffff81066193 ffff880068889b54 0000000000000000
[ 1730.490005]  ffff880068889ae8 ffff880068aa8c80 0000000000000002 0000000000000000
[ 1730.490005]  ffffffff81069189 0000000000000000 ffff880068889ab8 0000000000000246
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
[ 1730.490005]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[ 1730.490005]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[ 1730.490005]  [<ffffffff814be2c0>] ? restore_args+0x0/0x30
[ 1730.490005]  [<ffffffff811caa4c>] ? _raw_spin_lock+0xe9/0x1ab
[ 1730.490005]  [<ffffffff814bd506>] ? _spin_lock+0x36/0x45
[ 1730.490005]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[ 1730.490005]  [<ffffffff814bd9bb>] ? _spin_unlock+0x26/0x6a
[ 1730.490005]  [<ffffffff810691bf>] ? get_futex_value_locked+0x2b/0x49
[ 1730.490005]  [<ffffffff810692c9>] ? queue_unlock+0x14/0x21
[ 1730.490005]  [<ffffffff8106939f>] ? futex_wait_setup+0xc9/0xeb
[ 1730.490005]  [<ffffffff81097e34>] ? ftrace_likely_update+0xc/0x14
[ 1730.490005]  [<ffffffff8106ae9d>] ? futex_wait_requeue_pi+0x190/0x3d4
[ 1730.490005]  [<ffffffff814bdae1>] ? _spin_unlock_irq+0x62/0x6f
[ 1730.490005]  [<ffffffff814bda7a>] ? _spin_unlock_irqrestore+0x7b/0x80
[ 1730.490005]  [<ffffffff8102843e>] ? need_resched+0x3a/0x40
[ 1730.490005]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
[ 1730.490005]  [<ffffffff814bd9fa>] ? _spin_unlock+0x65/0x6a
[ 1730.490005]  [<ffffffff81069afc>] ? futex_wake+0x108/0x11a
[ 1730.490005]  [<ffffffff8106c11f>] ? do_futex+0x95d/0x9cb
[ 1730.490005]  [<ffffffff8106c2d9>] ? sys_futex+0x14c/0x164
[ 1730.490005]  [<ffffffff810e2328>] ? path_put+0x1d/0x22
[ 1730.490005]  [<ffffffff8100246b>] ? system_call_fastpath+0x16/0x1b

(This is me starting another one because the first one wedged. It wedged too, but
I don't remember kill -9'ing this one...)

[ 1730.490005] pidgin        R  running task     5672  4220   2312 0x00000084
[ 1730.490005]  ffff880057ce7a18 0000000000000046 ffff8800026133c0 ffff88005410a380
[ 1730.490005]  ffff880057ce7978 ffff8800026133c0 ffff880057ce7998 ffff880057f3c4c0
[ 1730.490005]  ffff880057ce6000 000000000000e010 ffff880057f3c4c8 ffffffff81030d7e
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff81030d7e>] ? finish_task_switch+0x95/0xb8
[ 1730.490005]  [<ffffffff814bd184>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1730.490005]  [<ffffffff814bb7bd>] preempt_schedule_irq+0x56/0x73
[ 1730.490005]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[ 1730.490005]  [<ffffffff814be3d6>] retint_kernel+0x26/0x30
[ 1730.490005]  [<ffffffff81069128>] ? get_futex_key+0x24e/0x25f
[ 1730.490005]  [<ffffffff81068fa7>] ? get_futex_key+0xcd/0x25f
[ 1730.490005]  [<ffffffff814bd9f1>] ? _spin_unlock+0x5c/0x6a
[ 1730.490005]  [<ffffffff81069319>] futex_wait_setup+0x43/0xeb
[ 1730.490005]  [<ffffffff81068fa7>] ? get_futex_key+0xcd/0x25f
[ 1730.490005]  [<ffffffff8106ae9d>] futex_wait_requeue_pi+0x190/0x3d4
[ 1730.490005]  [<ffffffff814bdae1>] ? _spin_unlock_irq+0x62/0x6f
[ 1730.490005]  [<ffffffff814bda7a>] ? _spin_unlock_irqrestore+0x7b/0x80
[ 1730.490005]  [<ffffffff8102843e>] ? need_resched+0x3a/0x40
[ 1730.490005]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
[ 1730.490005]  [<ffffffff814bd9fa>] ? _spin_unlock+0x65/0x6a
[ 1730.490005]  [<ffffffff81069afc>] ? futex_wake+0x108/0x11a
[ 1730.490005]  [<ffffffff8106c11f>] do_futex+0x95d/0x9cb
[ 1730.490005]  [<ffffffff811c2bd6>] ? __up_read+0x1a/0x7f
[ 1730.490005]  [<ffffffff811c2bd6>] ? __up_read+0x1a/0x7f
[ 1730.490005]  [<ffffffff811caa4c>] ? _raw_spin_lock+0xe9/0x1ab
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff814bda71>] ? _spin_unlock_irqrestore+0x72/0x80
[ 1730.490005]  [<ffffffff811c2c32>] ? __up_read+0x76/0x7f
[ 1730.490005]  [<ffffffff8106c2d9>] sys_futex+0x14c/0x164
[ 1730.490005]  [<ffffffff810e2328>] ? path_put+0x1d/0x22
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b

And the rest of the thread from the first one I started. They're all
packed up and ready to leave Dodge on the first stagecoach, but the one
thread is still stuck in the saloon and unable to find its way out...

[ 1730.490005] pidgin        ? ffff88007f872040  5568  4214   4195 0x00000080
[ 1730.490005]  ffff880054033eb8 0000000000000046 ffffffff8103ec0b ffff88007e968400
[ 1730.490005]  0000000000000011 0000000000040001 000003c700001076 ffff880057c4b280
[ 1730.490005]  ffff880054032000 000000000000e010 ffff880057c4b280 0000000000000000
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff8103ec0b>] ? do_exit+0x779/0x906
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[ 1730.490005]  [<ffffffff814bd1fa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1730.490005]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[ 1730.490005]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[ 1730.490005] pidgin        ? ffff88007f872040  5568  4215   4195 0x00000080
[ 1730.490005]  ffff880054059eb8 0000000000000046 ffffffff8103ec0b ffff88007e968400
[ 1730.490005]  0000000000000011 0000000000040001 000003c700001077 ffff8800689d0f00
[ 1730.490005]  ffff880054058000 000000000000e010 ffff8800689d0f00 0000000000000000
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff8103ec0b>] ? do_exit+0x779/0x906
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[ 1730.490005]  [<ffffffff814bd1fa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1730.490005]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[ 1730.490005]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[ 1730.490005] pidgin        ? ffff88007f872040  5568  4216   4195 0x00000080
[ 1730.490005]  ffff8800542e7eb8 0000000000000046 ffffffff8103ec0b ffff88007e968400
[ 1730.490005]  0000000000000011 0000000000040001 000003c700001078 ffff88006887ad40
[ 1730.490005]  ffff8800542e6000 000000000000e010 ffff88006887ad40 0000000000000000
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff8103ec0b>] ? do_exit+0x779/0x906
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[ 1730.490005]  [<ffffffff814bd1fa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1730.490005]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[ 1730.490005]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[ 1730.490005] pidgin        ? ffff88007f872040  5440  4217   4195 0x00000080
[ 1730.490005]  ffff880053c7deb8 0000000000000046 ffffffff8103ec0b ffff88007e968400
[ 1730.490005]  0000000000000011 0000000000040001 000003c700001079 ffff880053c7a440
[ 1730.490005]  ffff880053c7c000 000000000000e010 ffff880053c7a440 0000000000000000
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff8103ec0b>] ? do_exit+0x779/0x906
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[ 1730.490005]  [<ffffffff814bd1fa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1730.490005]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[ 1730.490005]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b





[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-11-05 19:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-05 17:47 2.6.32-rc5-mmotm1101 - unkillable processes stuck in futex Valdis.Kletnieks
2009-11-05 19:20 ` Thomas Gleixner
2009-11-05 19:22 ` Darren Hart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox