linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Deadlock related to file permissions and/or cgroup, 4.4.6+
@ 2016-04-07 21:59 Ben Greear
  2016-04-13 19:00 ` Tejun Heo
  0 siblings, 1 reply; 2+ messages in thread
From: Ben Greear @ 2016-04-07 21:59 UTC (permalink / raw)
  To: Linux Kernel Mailing List

This is from a modified 4.4.6+ kernel, with local patches.  Git tree found
below, but I don't think this lockup is related to any local changes we have made.

http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=summary

The test case involves using a libcurl based application that
is making an ftp request to a second port on the same machine.
vsftp is serving up the ftp file.
The ports are looped together with an ethernet cable, and routing rules
are set up so that traffic flows over the external interface.

The key change from a working solution and kernel deadlock, is that
with the file-to-be-read has permissions 700, it fails, but 600
does not.  (As I was writing this up, our system-test guy managed to
lock it up with 600 permissions as well, so it is not *just* related to
permission 700).

This is very repeatable permissions 700.

The tainting probably comes from a warning in another (GPL, but out-of-tree module that we write),
but very unlikely that has anything to do with this issue.


INFO: task systemd:1 blocked for more than 180 seconds.
       Tainted: G        W  O    4.4.6+ #23
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd         D ffff88006b0ffa60     0     1      0 0x00000000
  ffff88006b0ffa60 00ffffff811875d3 ffff88006bc9ff98 ffff88006bc9ff80
  ffff88006b0f0008 ffff880069f24c00 ffff88006b0f0000 ffff88006b0f8000
  ffff88006b0ffac0 0000000000000040 ffffffff8375c6a0 ffffffff8375c698
Call Trace:
  [<ffffffff81b3066d>] schedule+0xc3/0xe0
  [<ffffffff8118e64a>] percpu_down_write+0x196/0x1d9
  [<ffffffff8118e4b4>] ? percpu_up_read+0x4f/0x4f
  [<ffffffff811877da>] ? wake_up_atomic_t+0x27/0x27
  [<ffffffff811f7246>] __cgroup_procs_write+0xd3/0x36f
  [<ffffffff811f7173>] ? cgroup_attach_task_all+0xba/0xba
  [<ffffffff8118fa8a>] ? match_held_lock+0x30/0x103
  [<ffffffff8118fb86>] ? __lock_is_held+0x29/0x64
  [<ffffffff811f74ff>] cgroup_procs_write+0xf/0x11
  [<ffffffff811f3c2f>] cgroup_file_write+0xb8/0x1c7
  [<ffffffff811f3b77>] ? rcu_read_unlock+0x5d/0x5d
  [<ffffffff8118fba7>] ? __lock_is_held+0x4a/0x64
  [<ffffffff8118fc34>] ? lock_is_held+0x73/0x7f
  [<ffffffff811f3b77>] ? rcu_read_unlock+0x5d/0x5d
  [<ffffffff8139a6fd>] kernfs_fop_write+0x16d/0x1ce
  [<ffffffff812f1763>] __vfs_write+0xca/0x1e2
  [<ffffffff812f1699>] ? __vfs_read+0x1da/0x1da
  [<ffffffff8118fb86>] ? __lock_is_held+0x29/0x64
  [<ffffffff8118fc34>] ? lock_is_held+0x73/0x7f
  [<ffffffff811b5587>] ? rcu_read_lock_sched_held+0x60/0x68
  [<ffffffff811b58cf>] ? rcu_sync_lockdep_assert+0x3c/0x62
  [<ffffffff8118e3b5>] ? update_fast_ctr+0x36/0x58
  [<ffffffff8118e43c>] ? percpu_down_read+0x49/0x72
  [<ffffffff812f55c9>] ? __sb_start_write+0x5a/0xab
  [<ffffffff812f55c9>] ? __sb_start_write+0x5a/0xab
  [<ffffffff812f2555>] vfs_write+0xb5/0x116
  [<ffffffff812f3285>] SyS_write+0xb5/0x10f
  [<ffffffff812f31d0>] ? SyS_read+0x10f/0x10f
  [<ffffffff81192cdf>] ? trace_hardirqs_on_caller+0x209/0x250
  [<ffffffff81004017>] ? trace_hardirqs_on_thunk+0x17/0x19
  [<ffffffff81b36d76>] entry_SYSCALL_64_fastpath+0x16/0x7a
4 locks held by systemd/1:
  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffff812f55c9>] __sb_start_write+0x5a/0xab
  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8139a65c>] kernfs_fop_write+0xcc/0x1ce
  #2:  (cgroup_mutex){+.+.+.}, at: [<ffffffff811f300a>] cgroup_kn_lock_live+0x8c/0xec
  #3:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff8118e538>] percpu_down_write+0x84/0x1d9
INFO: task kworker/u8:0:6 blocked for more than 180 seconds.
       Tainted: G        W  O    4.4.6+ #23
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u8:0    D ffff88006b16f890     0     6      2 0x00000000
Workqueue: netns cleanup_net
  ffff88006b16f890 000000000000001d ffff88006bd9ff98 ffff88006bd9ff80
  ffff88006b142608 ffff880051eaa600 ffff88006b142600 ffff88006b168000
  1ffff1000d62df18 ffff88006b142600 ffff88006b16fa58 ffff88006b142600
Call Trace:
  [<ffffffff81b3066d>] schedule+0xc3/0xe0
  [<ffffffff81b356ab>] schedule_timeout+0x86/0x181
  [<ffffffff81b35625>] ? usleep_range+0xc2/0xc2
  [<ffffffff81192a73>] ? mark_held_locks+0x2d/0x90
  [<ffffffff81b365b4>] ? _raw_spin_unlock_irq+0x27/0x3f
  [<ffffffff81192cdf>] ? trace_hardirqs_on_caller+0x209/0x250
  [<ffffffff81b313aa>] __wait_for_common+0x1ee/0x2a3
  [<ffffffff81b313aa>] ? __wait_for_common+0x1ee/0x2a3
  [<ffffffff81192a73>] ? mark_held_locks+0x2d/0x90
  [<ffffffff81b35625>] ? usleep_range+0xc2/0xc2
  [<ffffffff81b311bc>] ? out_of_line_wait_on_bit_lock+0xdd/0xdd
  [<ffffffff811688fd>] ? wake_up_q+0x5e/0x5e
  [<ffffffff8116887b>] ? try_to_wake_up+0x2c3/0x2d5
  [<ffffffff81b3147e>] wait_for_completion+0x1f/0x21
  [<ffffffff811582f8>] kthread_stop+0x132/0x330
  [<ffffffffa0cd30d5>] pg_net_exit+0x185/0x210 [pktgen]
  [<ffffffffa0cd2f50>] ? pktgen_device_event+0x181/0x181 [pktgen]
  [<ffffffffa0ddc9cc>] ? rcu_read_unlock+0x5b/0x5d [8021q]
  [<ffffffff819e1621>] ops_exit_list+0x6b/0x88
  [<ffffffff819e3332>] cleanup_net+0x2b7/0x3cd
  [<ffffffff819e307b>] ? net_drop_ns+0x3d/0x3d
  [<ffffffff8118fa8a>] ? match_held_lock+0x30/0x103
  [<ffffffff8118fc34>] ? lock_is_held+0x73/0x7f
  [<ffffffff8114ff57>] process_one_work+0x3ed/0x77c
  [<ffffffff8114fb6a>] ? pool_mayday_timeout+0x1d9/0x1d9
  [<ffffffff81194aad>] ? lock_acquire+0x193/0x233
  [<ffffffff8114b5ce>] ? worker_clr_flags+0x71/0xa9
  [<ffffffff81150b06>] worker_thread+0x2ba/0x3c2
  [<ffffffff8115084c>] ? rescuer_thread+0x534/0x534
  [<ffffffff81158739>] kthread+0x162/0x171
  [<ffffffff811585d7>] ? kthread_parkme+0x2d/0x2d
  [<ffffffff811585d7>] ? kthread_parkme+0x2d/0x2d
  [<ffffffff81b3712f>] ret_from_fork+0x3f/0x70
  [<ffffffff811585d7>] ? kthread_parkme+0x2d/0x2d
3 locks held by kworker/u8:0/6:
  #0:  ("%s""netns"){.+.+.+}, at: [<ffffffff8114fdf5>] process_one_work+0x28b/0x77c
  #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff8114fdf5>] process_one_work+0x28b/0x77c
  #2:  (net_mutex){+.+.+.}, at: [<ffffffff819e318a>] cleanup_net+0x10f/0x3cd
INFO: task kworker/dying:20 blocked for more than 180 seconds.
       Tainted: G        W  O    4.4.6+ #23
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/dying   D ffff88006b2efaa0     0    20      2 0x00000000
  ffff88006b2efaa0 00ff88006b2d53c8 ffff88006bd1ff98 ffff88006bd1ff80
  ffff88006b2d4c08 ffff8800684f8000 ffff88006b2d4c00 ffff88006b2e8000
  ffff88006b2d4c00 ffff880068ecfc20 ffff88006b2d6d50 ffffffff8375c620
Call Trace:
  [<ffffffff81b3066d>] schedule+0xc3/0xe0
  [<ffffffff81b35003>] rwsem_down_read_failed+0x190/0x1f9
  [<ffffffff81b34e73>] ? rt_mutex_futex_unlock+0x28/0x28
  [<ffffffff811e6762>] ? is_module_text_address+0x10/0x28
  [<ffffffff81155c2a>] ? __kernel_text_address+0x24/0x44
  [<ffffffff8118fa8a>] ? match_held_lock+0x30/0x103
  [<ffffffff81560b84>] call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81560b84>] ? call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81142345>] ? exit_signals+0x80/0x1ef
  [<ffffffff8118e453>] ? percpu_down_read+0x60/0x72
  [<ffffffff81142345>] exit_signals+0x80/0x1ef
  [<ffffffff811422c5>] ? get_signal+0x8a2/0x8a2
  [<ffffffff81159dae>] ? __blocking_notifier_call_chain+0x2e/0x73
  [<ffffffff8112e64a>] do_exit+0x224/0x1157
  [<ffffffff812d88cb>] ? __slab_free+0x1ab/0x2a8
  [<ffffffff8112e426>] ? release_task+0x738/0x738
  [<ffffffff81192cdf>] ? trace_hardirqs_on_caller+0x209/0x250
  [<ffffffff812da600>] ? kasan_poison_shadow+0x2f/0x31
  [<ffffffff81150bfd>] ? worker_thread+0x3b1/0x3c2
  [<ffffffff812d8f31>] ? kfree+0x17e/0x203
  [<ffffffff812d8f31>] ? kfree+0x17e/0x203
  [<ffffffff81150bfd>] ? worker_thread+0x3b1/0x3c2
  [<ffffffff81150bfd>] ? worker_thread+0x3b1/0x3c2
  [<ffffffff8115084c>] ? rescuer_thread+0x534/0x534
  [<ffffffff81158748>] kthread+0x171/0x171
  [<ffffffff811585d7>] ? kthread_parkme+0x2d/0x2d
  [<ffffffff811585d7>] ? kthread_parkme+0x2d/0x2d
  [<ffffffff81b3712f>] ret_from_fork+0x3f/0x70
  [<ffffffff811585d7>] ? kthread_parkme+0x2d/0x2d
1 lock held by kworker/dying/20:
  #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff81142345>] exit_signals+0x80/0x1ef
INFO: task kworker/dying:25 blocked for more than 180 seconds.
       Tainted: G        W  O    4.4.6+ #23
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/dying   D ffff88006b33faa0     0    25      2 0x00000000
  ffff88006b33faa0 00ff88006b3307c8 ffff88006bd9ff98 ffff88006bd9ff80
  ffff88006b330008 ffff88006b1d4c00 ffff88006b330000 ffff88006b338000
  ffff88006b330000 ffff88004b79fb70 ffff88006b332150 ffffffff8375c620
Call Trace:
  [<ffffffff81b3066d>] schedule+0xc3/0xe0
  [<ffffffff81b35003>] rwsem_down_read_failed+0x190/0x1f9
  [<ffffffff81b34e73>] ? rt_mutex_futex_unlock+0x28/0x28
  [<ffffffff811e6762>] ? is_module_text_address+0x10/0x28
  [<ffffffff81155c2a>] ? __kernel_text_address+0x24/0x44
  [<ffffffff8118fa8a>] ? match_held_lock+0x30/0x103
  [<ffffffff81560b84>] call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81560b84>] ? call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81142345>] ? exit_signals+0x80/0x1ef
  [<ffffffff8118e453>] ? percpu_down_read+0x60/0x72
  [<ffffffff81142345>] exit_signals+0x80/0x1ef
  [<ffffffff811422c5>] ? get_signal+0x8a2/0x8a2
  [<ffffffff81159dae>] ? __blocking_notifier_call_chain+0x2e/0x73
  [<ffffffff8112e64a>] do_exit+0x224/0x1157
  [<ffffffff812d88cb>] ? __slab_free+0x1ab/0x2a8
  [<ffffffff8112e426>] ? release_task+0x738/0x738
  [<ffffffff81192cdf>] ? trace_hardirqs_on_caller+0x209/0x250
  [<ffffffff812da600>] ? kasan_poison_shadow+0x2f/0x31
  [<ffffffff81150bfd>] ? worker_thread+0x3b1/0x3c2
  [<ffffffff812d8f31>] ? kfree+0x17e/0x203
  [<ffffffff812d8f31>] ? kfree+0x17e/0x203
  [<ffffffff81150bfd>] ? worker_thread+0x3b1/0x3c2
  [<ffffffff81150bfd>] ? worker_thread+0x3b1/0x3c2
  [<ffffffff8115084c>] ? rescuer_thread+0x534/0x534
  [<ffffffff81158748>] kthread+0x171/0x171
  [<ffffffff811585d7>] ? kthread_parkme+0x2d/0x2d
  [<ffffffff811585d7>] ? kthread_parkme+0x2d/0x2d
  [<ffffffff81b3712f>] ret_from_fork+0x3f/0x70
  [<ffffffff811585d7>] ? kthread_parkme+0x2d/0x2d
1 lock held by kworker/dying/25:
  #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff81142345>] exit_signals+0x80/0x1ef
INFO: task systemd-journal:644 blocked for more than 180 seconds.
       Tainted: G        W  O    4.4.6+ #23
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd-journal D ffff8800656179f0     0   644      1 0x00000080
  ffff8800656179f0 00ff88006abe8000 ffff88006bd1ff98 ffff88006bd1ff80
  ffff88006abe8008 ffff880068364c00 ffff88006abe8000 ffff880065610000
  ffff88006abe8000 0000000000000246 ffffffff82500648 ffff880065617a80
Call Trace:
  [<ffffffff81b3066d>] schedule+0xc3/0xe0
  [<ffffffff81b309e0>] schedule_preempt_disabled+0x10/0x19
  [<ffffffff81b31b32>] mutex_lock_nested+0x2ef/0x5a8
  [<ffffffff811fa246>] ? proc_cgroup_show+0x59/0x2c4
  [<ffffffff81b31843>] ? mutex_trylock+0x1ed/0x1ed
  [<ffffffff812dac8d>] ? kasan_unpoison_shadow+0x14/0x35
  [<ffffffff812d7c10>] ? kmem_cache_alloc_trace+0xf5/0x105
  [<ffffffff811fa246>] proc_cgroup_show+0x59/0x2c4
  [<ffffffff811fa246>] ? proc_cgroup_show+0x59/0x2c4
  [<ffffffff8138337d>] proc_single_show+0x8a/0xa2
  [<ffffffff81321148>] seq_read+0x37c/0x791
  [<ffffffff81194360>] ? __lock_acquire+0x11ef/0x143b
  [<ffffffff81320dcc>] ? seq_hlist_start_percpu+0xcb/0xcb
  [<ffffffff8118fa72>] ? match_held_lock+0x18/0x103
  [<ffffffff81343eea>] ? fsnotify+0x133/0x831
  [<ffffffff81194d96>] ? lock_release+0x249/0x4ec
  [<ffffffff812f1589>] __vfs_read+0xca/0x1da
  [<ffffffff812f14bf>] ? fdget_pos+0x19/0x19
  [<ffffffff81194d96>] ? lock_release+0x249/0x4ec
  [<ffffffff81194a81>] ? lock_acquire+0x167/0x233
  [<ffffffff81192a73>] ? mark_held_locks+0x2d/0x90
  [<ffffffff811cc983>] ? read_seqcount_begin.constprop.23+0x6b/0x87
  [<ffffffff813447bd>] ? __fsnotify_parent+0x31/0x100
  [<ffffffff814659e9>] ? fsnotify_perm+0x88/0x95
  [<ffffffff81466fcf>] ? security_file_permission+0x4d/0x54
  [<ffffffff812f2349>] ? rw_verify_area+0xd7/0x12e
  [<ffffffff812f2443>] vfs_read+0xa3/0x100
  [<ffffffff812f3176>] SyS_read+0xb5/0x10f
  [<ffffffff812f30c1>] ? do_sendfile+0x3d5/0x3d5
  [<ffffffff81192cdf>] ? trace_hardirqs_on_caller+0x209/0x250
  [<ffffffff81004017>] ? trace_hardirqs_on_thunk+0x17/0x19
  [<ffffffff81b36d76>] entry_SYSCALL_64_fastpath+0x16/0x7a
2 locks held by systemd-journal/644:
  #0:  (&p->lock){+.+.+.}, at: [<ffffffff81320e9c>] seq_read+0xd0/0x791
  #1:  (cgroup_mutex){+.+.+.}, at: [<ffffffff811fa246>] proc_cgroup_show+0x59/0x2c4
INFO: task systemd-udevd:695 blocked for more than 180 seconds.
       Tainted: G        W  O    4.4.6+ #23
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd-udevd   D ffff88003699fad0     0   695      1 0x00000080
  ffff88003699fad0 00ff88003696d3c8 ffff88006bd9ff98 ffff88006bd9ff80
  ffff88003696cc08 ffff88006b142600 ffff88003696cc00 ffff880036998000
  ffff88003696cc00 ffff88004fa57c20 ffff88003696ed50 ffffffff8375c620
Call Trace:
  [<ffffffff81b3066d>] schedule+0xc3/0xe0
  [<ffffffff81b35003>] rwsem_down_read_failed+0x190/0x1f9
  [<ffffffff81b34e73>] ? rt_mutex_futex_unlock+0x28/0x28
  [<ffffffff8118fa8a>] ? match_held_lock+0x30/0x103
  [<ffffffff81560b84>] call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81560b84>] ? call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81128dd8>] ? copy_process+0x9fa/0x2c70
  [<ffffffff8118e453>] ? percpu_down_read+0x60/0x72
  [<ffffffff81128dd8>] copy_process+0x9fa/0x2c70
  [<ffffffff81194aad>] ? lock_acquire+0x193/0x233
  [<ffffffff811283de>] ? __cleanup_sighand+0x25/0x25
  [<ffffffff81192cdf>] ? trace_hardirqs_on_caller+0x209/0x250
  [<ffffffff811cc98b>] ? read_seqcount_begin.constprop.23+0x73/0x87
  [<ffffffff811ccdfb>] ? current_kernel_time64+0xb/0x31
  [<ffffffff8120dcf5>] ? __audit_syscall_entry+0x1de/0x1f9
  [<ffffffff8112b2b6>] _do_fork+0x113/0x498
  [<ffffffff8112b1a3>] ? fork_idle+0x11c/0x11c
  [<ffffffff81004dd5>] ? syscall_trace_enter_phase1+0x1fd/0x22f
  [<ffffffff81004bd8>] ? trace_raw_output_sys_exit+0xb5/0xb5
  [<ffffffff81b36f0f>] ? int_ret_from_sys_call+0x52/0x9f
  [<ffffffff81192cdf>] ? trace_hardirqs_on_caller+0x209/0x250
  [<ffffffff81004017>] ? trace_hardirqs_on_thunk+0x17/0x19
  [<ffffffff8112b6ad>] SyS_clone+0x14/0x16
  [<ffffffff81b36d76>] entry_SYSCALL_64_fastpath+0x16/0x7a
1 lock held by systemd-udevd/695:
  #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff81128dd8>] copy_process+0x9fa/0x2c70
INFO: task kworker/dying:736 blocked for more than 180 seconds.
       Tainted: G        W  O    4.4.6+ #23
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/dying   D ffff880065ccfaa0     0   736      2 0x00000080
  ffff880065ccfaa0 00ff880066b9adc8 ffff88006bd9ff98 ffff88006bd9ff80
  ffff880066b9a608 ffff880068d54c00 ffff880066b9a600 ffff880065cc8000
  ffff880066b9a600 ffff88006b2efb40 ffff880066b9c750 ffffffff8375c620
Call Trace:
  [<ffffffff81b3066d>] schedule+0xc3/0xe0
  [<ffffffff81b35003>] rwsem_down_read_failed+0x190/0x1f9
  [<ffffffff81b34e73>] ? rt_mutex_futex_unlock+0x28/0x28
  [<ffffffff8118fa8a>] ? match_held_lock+0x30/0x103
  [<ffffffff81560b84>] call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81560b84>] ? call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81142345>] ? exit_signals+0x80/0x1ef
  [<ffffffff8118e453>] ? percpu_down_read+0x60/0x72
  [<ffffffff81142345>] exit_signals+0x80/0x1ef
  [<ffffffff811422c5>] ? get_signal+0x8a2/0x8a2
  [<ffffffff81159dae>] ? __blocking_notifier_call_chain+0x2e/0x73
  [<ffffffff8112e64a>] do_exit+0x224/0x1157
  [<ffffffff812d8976>] ? __slab_free+0x256/0x2a8
  [<ffffffff8112e426>] ? release_task+0x738/0x738
  [<ffffffff81192cdf>] ? trace_hardirqs_on_caller+0x209/0x250
  [<ffffffff812da600>] ? kasan_poison_shadow+0x2f/0x31
  [<ffffffff81150bfd>] ? worker_thread+0x3b1/0x3c2
  [<ffffffff812d8f31>] ? kfree+0x17e/0x203
  [<ffffffff812d8f31>] ? kfree+0x17e/0x203
  [<ffffffff81150bfd>] ? worker_thread+0x3b1/0x3c2
  [<ffffffff81150bfd>] ? worker_thread+0x3b1/0x3c2
  [<ffffffff8115084c>] ? rescuer_thread+0x534/0x534
  [<ffffffff81158748>] kthread+0x171/0x171
  [<ffffffff811585d7>] ? kthread_parkme+0x2d/0x2d
  [<ffffffff811585d7>] ? kthread_parkme+0x2d/0x2d
  [<ffffffff81b3712f>] ret_from_fork+0x3f/0x70
  [<ffffffff811585d7>] ? kthread_parkme+0x2d/0x2d
1 lock held by kworker/dying/736:
  #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff81142345>] exit_signals+0x80/0x1ef
INFO: task llvmpipe-0:1354 blocked for more than 180 seconds.
       Tainted: G        W  O    4.4.6+ #23
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
llvmpipe-0      D ffff880036a8fbc0     0  1354      1 0x00000000
  ffff880036a8fbc0 00ff880036bd53c8 ffff88006bd9ff98 ffff88006bd9ff80
  ffff880036bd4c08 ffff880051eaa600 ffff880036bd4c00 ffff880036a88000
  ffff880036bd4c00 ffff88006a11fc20 ffff880036bd6d50 ffffffff8375c620
Call Trace:
  [<ffffffff81b3066d>] schedule+0xc3/0xe0
  [<ffffffff81b35003>] rwsem_down_read_failed+0x190/0x1f9
  [<ffffffff81b34e73>] ? rt_mutex_futex_unlock+0x28/0x28
  [<ffffffff812a0c39>] ? tlb_flush_mmu_free+0x24/0x6a
  [<ffffffff8118fa8a>] ? match_held_lock+0x30/0x103
  [<ffffffff81560b84>] call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81560b84>] ? call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81142345>] ? exit_signals+0x80/0x1ef
  [<ffffffff8118e453>] ? percpu_down_read+0x60/0x72
  [<ffffffff81142345>] exit_signals+0x80/0x1ef
  [<ffffffff811422c5>] ? get_signal+0x8a2/0x8a2
  [<ffffffff81159dae>] ? __blocking_notifier_call_chain+0x2e/0x73
  [<ffffffff8112e64a>] do_exit+0x224/0x1157
  [<ffffffff812b9d31>] ? swapin_walk_pmd_entry+0x1c3/0x1c3
  [<ffffffff8112e426>] ? release_task+0x738/0x738
  [<ffffffff811625b7>] ? ___might_sleep+0xfa/0x223
  [<ffffffff81192a73>] ? mark_held_locks+0x2d/0x90
  [<ffffffff811957b5>] ? lockdep_sys_exit+0x1a/0x91
  [<ffffffff81004044>] ? lockdep_sys_exit_thunk+0x12/0x14
  [<ffffffff81130fc7>] SyS_exit+0x1d/0x1d
  [<ffffffff81b36d76>] entry_SYSCALL_64_fastpath+0x16/0x7a
1 lock held by llvmpipe-0/1354:
  #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff81142345>] exit_signals+0x80/0x1ef
INFO: task llvmpipe-1:1355 blocked for more than 180 seconds.
       Tainted: G        W  O    4.4.6+ #23
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
llvmpipe-1      D ffff8800682e7bc0     0  1355      1 0x00000000
  ffff8800682e7bc0 00ff880065af07c8 ffff88006bc9ff98 ffff88006bc9ff80
  ffff880065af0008 ffff880066bfcc00 ffff880065af0000 ffff8800682e0000
  ffff880065af0000 ffff880036a8fc60 ffff880065af2150 ffffffff8375c620
Call Trace:
  [<ffffffff81b3066d>] schedule+0xc3/0xe0
  [<ffffffff81b35003>] rwsem_down_read_failed+0x190/0x1f9
  [<ffffffff81b34e73>] ? rt_mutex_futex_unlock+0x28/0x28
  [<ffffffff812a0c39>] ? tlb_flush_mmu_free+0x24/0x6a
  [<ffffffff8118fa8a>] ? match_held_lock+0x30/0x103
  [<ffffffff81560b84>] call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81560b84>] ? call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81142345>] ? exit_signals+0x80/0x1ef
  [<ffffffff8118e453>] ? percpu_down_read+0x60/0x72
  [<ffffffff81142345>] exit_signals+0x80/0x1ef
  [<ffffffff811422c5>] ? get_signal+0x8a2/0x8a2
  [<ffffffff81159dae>] ? __blocking_notifier_call_chain+0x2e/0x73
  [<ffffffff8112e64a>] do_exit+0x224/0x1157
  [<ffffffff812b9d31>] ? swapin_walk_pmd_entry+0x1c3/0x1c3
  [<ffffffff8112e426>] ? release_task+0x738/0x738
  [<ffffffff811dc74d>] ? SyS_futex+0x152/0x1ee
  [<ffffffff811dc5fb>] ? do_futex+0xb4d/0xb4d
  [<ffffffff81192a73>] ? mark_held_locks+0x2d/0x90
  [<ffffffff811957b5>] ? lockdep_sys_exit+0x1a/0x91
  [<ffffffff81004044>] ? lockdep_sys_exit_thunk+0x12/0x14
  [<ffffffff81130fc7>] SyS_exit+0x1d/0x1d
  [<ffffffff81b36d76>] entry_SYSCALL_64_fastpath+0x16/0x7a
1 lock held by llvmpipe-1/1355:
  #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff81142345>] exit_signals+0x80/0x1ef
INFO: task llvmpipe-2:1356 blocked for more than 180 seconds.
       Tainted: G        W  O    4.4.6+ #23
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
llvmpipe-2      D ffff880068d2fbc0     0  1356      1 0x00000000
  ffff880068d2fbc0 00ff880069b5d3c8 ffff88006bc9ff98 ffff88006bc9ff80
  ffff880069b5cc08 ffff880066bfcc00 ffff880069b5cc00 ffff880068d28000
  ffff880069b5cc00 ffff8800682e7c60 ffff880069b5ed50 ffffffff8375c620
Call Trace:
  [<ffffffff81b3066d>] schedule+0xc3/0xe0
  [<ffffffff81b35003>] rwsem_down_read_failed+0x190/0x1f9
  [<ffffffff81b34e73>] ? rt_mutex_futex_unlock+0x28/0x28
  [<ffffffff812a0c39>] ? tlb_flush_mmu_free+0x24/0x6a
  [<ffffffff8118fa8a>] ? match_held_lock+0x30/0x103
  [<ffffffff81560b84>] call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81560b84>] ? call_rwsem_down_read_failed+0x14/0x30
  [<ffffffff81142345>] ? exit_signals+0x80/0x1ef
  [<ffffffff8118e453>] ? percpu_down_read+0x60/0x72
  [<ffffffff81142345>] exit_signals+0x80/0x1ef
  [<ffffffff811422c5>] ? get_signal+0x8a2/0x8a2
  [<ffffffff81159dae>] ? __blocking_notifier_call_chain+0x2e/0x73
  [<ffffffff8112e64a>] do_exit+0x224/0x1157
  [<ffffffff812b9d31>] ? swapin_walk_pmd_entry+0x1c3/0x1c3
  [<ffffffff8112e426>] ? release_task+0x738/0x738
  [<ffffffff811dc74d>] ? SyS_futex+0x152/0x1ee
  [<ffffffff811dc5fb>] ? do_futex+0xb4d/0xb4d
  [<ffffffff81192a73>] ? mark_held_locks+0x2d/0x90
  [<ffffffff811957b5>] ? lockdep_sys_exit+0x1a/0x91
  [<ffffffff81004044>] ? lockdep_sys_exit_thunk+0x12/0x14
  [<ffffffff81130fc7>] SyS_exit+0x1d/0x1d
  [<ffffffff81b36d76>] entry_SYSCALL_64_fastpath+0x16/0x7a
1 lock held by llvmpipe-2/1356:
  #0:  (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff81142345>] exit_signals+0x80/0x1ef


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Deadlock related to file permissions and/or cgroup, 4.4.6+
  2016-04-07 21:59 Deadlock related to file permissions and/or cgroup, 4.4.6+ Ben Greear
@ 2016-04-13 19:00 ` Tejun Heo
  0 siblings, 0 replies; 2+ messages in thread
From: Tejun Heo @ 2016-04-13 19:00 UTC (permalink / raw)
  To: Ben Greear; +Cc: Linux Kernel Mailing List

Hello,

On Thu, Apr 07, 2016 at 02:59:41PM -0700, Ben Greear wrote:
> This is from a modified 4.4.6+ kernel, with local patches.  Git tree found
> below, but I don't think this lockup is related to any local changes we have made.
> 
> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=summary
> 
> The test case involves using a libcurl based application that
> is making an ftp request to a second port on the same machine.
> vsftp is serving up the ftp file.
> The ports are looped together with an ethernet cable, and routing rules
> are set up so that traffic flows over the external interface.
> 
> The key change from a working solution and kernel deadlock, is that
> with the file-to-be-read has permissions 700, it fails, but 600
> does not.  (As I was writing this up, our system-test guy managed to
> lock it up with 600 permissions as well, so it is not *just* related to
> permission 700).
> 
> This is very repeatable permissions 700.
> 
> The tainting probably comes from a warning in another (GPL, but out-of-tree module that we write),
> but very unlikely that has anything to do with this issue.

Can you please try to reproduce the problem with lockdep turned on and
without the out-of-tree module?  Lockdep should be able to point to
where the actual deadlock is happening.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-04-13 19:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-07 21:59 Deadlock related to file permissions and/or cgroup, 4.4.6+ Ben Greear
2016-04-13 19:00 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).