Hung task for proc_cgroup

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Hung task for proc_cgroup_show
@ 2015-07-14 14:40 Christoph Mathys
  2015-07-14 14:45 ` Christoph Mathys
  2015-07-14 15:00 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 12+ messages in thread
From: Christoph Mathys @ 2015-07-14 14:40 UTC (permalink / raw)
  To: Linux RT Users

Hi there!

I just tried out lxc (Linux Containers) with 3.18.17-rt14. After some
time (~20min) the lxc-commands stop working. I got the following trace
from dmesg. Any ideas whats causing it and how to fix it besides a
reboot? I used the same version of lxc with 3.12-rt with no (I think)
rt-specific problems.

[ 1200.764167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1200.764168] lxc-destroy     D ffff88040eae2680     0 15783  15775 0x00000004
[ 1200.764169]  ffff880202697c88 0000000000000086 ffff880202697fd8
ffff880202697fd8
[ 1200.764170]  ffff880202697fd8 0000000000062680 ffff8804078c0000
ffff88028db18000
[ 1200.764171]  ffff880202697dd8 ffff88028db18000 ffff88028db18000
0000000000000000
[ 1200.764172] Call Trace:
[ 1200.764173]  [<ffffffff816face4>] schedule+0x34/0xa0
[ 1200.764174]  [<ffffffff816fc2d6>] __rt_mutex_slowlock+0xe6/0x180
[ 1200.764175]  [<ffffffff816fc99a>] rt_mutex_slowlock+0x12a/0x310
[ 1200.764176]  [<ffffffff8115f1b4>] ? vma_merge+0xf4/0x330
[ 1200.764177]  [<ffffffff8115f7ef>] ? vma_set_page_prot+0x3f/0x60
[ 1200.764178]  [<ffffffff816fcc31>] rt_mutex_lock+0x31/0x40
[ 1200.764179]  [<ffffffff816fdfce>] _mutex_lock+0xe/0x10
[ 1200.764180]  [<ffffffff810de2e2>] proc_cgroup_show+0x52/0x200
[ 1200.764180]  [<ffffffff81200941>] proc_single_show+0x51/0xa0
[ 1200.764182]  [<ffffffff811bcdfa>] seq_read+0xea/0x370
[ 1200.764182]  [<ffffffff811980dc>] vfs_read+0x9c/0x180
[ 1200.764183]  [<ffffffff81198c59>] SyS_read+0x49/0xb0
[ 1200.764184]  [<ffffffff81197d91>] ? SyS_lseek+0x91/0xb0
[ 1200.764185]  [<ffffffff816fe676>] system_call_fastpath+0x16/0x1b

Thanks.
Christoph

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hung task for proc_cgroup_show
  2015-07-14 14:40 Hung task for proc_cgroup_show Christoph Mathys
@ 2015-07-14 14:45 ` Christoph Mathys
  2015-07-14 15:00 ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Mathys @ 2015-07-14 14:45 UTC (permalink / raw)
  To: Linux RT Users

Sorry, I've just seen that I might have left out an important bit, so
here another, more complete message:

[ 1080.708465] INFO: task kswork:69 blocked for more than 120 seconds.
[ 1080.708471]       Tainted: G            E  3.18.17-realtime-1-rt14 #2
[ 1080.708473] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1080.708475] kswork          D ffff88040e862680     0    69      2 0x00000000
[ 1080.708482]  ffff880407543cb8 0000000000000046 ffff880407543fd8
ffff880407543fd8
[ 1080.708484]  ffff880407543fd8 0000000000062680 ffffffff81c1a480
ffff88040749cd40
[ 1080.708486]  ffff880407543ca8 ffff88040749cd40 ffff88040749cd40
0000000000000000
[ 1080.708489] Call Trace:
[ 1080.708497]  [<ffffffff816face4>] schedule+0x34/0xa0
[ 1080.708500]  [<ffffffff816fc2d6>] __rt_mutex_slowlock+0xe6/0x180
[ 1080.708504]  [<ffffffff816fc99a>] rt_mutex_slowlock+0x12a/0x310
[ 1080.708507]  [<ffffffff816fcc31>] rt_mutex_lock+0x31/0x40
[ 1080.708510]  [<ffffffff816fdfce>] _mutex_lock+0xe/0x10
[ 1080.708514]  [<ffffffff810d8d3f>] css_release_work_fn+0x2f/0xd0
[ 1080.708518]  [<ffffffff8108e59a>] swork_kthread+0xfa/0x150
[ 1080.708520]  [<ffffffff8108e4a0>] ? swork_readable+0x40/0x40
[ 1080.708524]  [<ffffffff8106c8fb>] kthread+0xbb/0xe0
[ 1080.708527]  [<ffffffff8106c840>] ? kthread_worker_fn+0x190/0x190
[ 1080.708530]  [<ffffffff816fe5c8>] ret_from_fork+0x58/0x90
[ 1080.708533]  [<ffffffff8106c840>] ? kthread_worker_fn+0x190/0x190
[ 1080.708591] INFO: task lxc-ls:15615 blocked for more than 120 seconds.
[ 1080.708595]       Tainted: G            E  3.18.17-realtime-1-rt14 #2
[ 1080.708595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1080.708596] lxc-ls          D ffff88040ebe2680     0 15615  15614 0x00000004
[ 1080.708598]  ffff88009219bc88 0000000000000086 ffff88009219bfd8
ffff88009219bfd8
[ 1080.708599]  ffff88009219bfd8 0000000000062680 ffff8804078c3380
ffff880035f499c0
[ 1080.708600]  ffff88009219bdd8 ffff880035f499c0 ffff880035f499c0
0000000000000000
[ 1080.708601] Call Trace:
[ 1080.708602]  [<ffffffff816face4>] schedule+0x34/0xa0
[ 1080.708603]  [<ffffffff816fc2d6>] __rt_mutex_slowlock+0xe6/0x180
[ 1080.708604]  [<ffffffff816fc99a>] rt_mutex_slowlock+0x12a/0x310
[ 1080.708607]  [<ffffffff8115f7ef>] ? vma_set_page_prot+0x3f/0x60
[ 1080.708608]  [<ffffffff816fcc31>] rt_mutex_lock+0x31/0x40
[ 1080.708609]  [<ffffffff816fdfce>] _mutex_lock+0xe/0x10
[ 1080.708610]  [<ffffffff810de2e2>] proc_cgroup_show+0x52/0x200
[ 1080.708612]  [<ffffffff81200941>] proc_single_show+0x51/0xa0
[ 1080.708615]  [<ffffffff811bcdfa>] seq_read+0xea/0x370
[ 1080.708616]  [<ffffffff811980dc>] vfs_read+0x9c/0x180
[ 1080.708617]  [<ffffffff81198c59>] SyS_read+0x49/0xb0
[ 1080.708617]  [<ffffffff81197d91>] ? SyS_lseek+0x91/0xb0
[ 1080.708619]  [<ffffffff816fe676>] system_call_fastpath+0x16/0x1b

On Tue, Jul 14, 2015 at 4:40 PM, Christoph Mathys <eraserix@gmail.com> wrote:
> Hi there!
>
> I just tried out lxc (Linux Containers) with 3.18.17-rt14. After some
> time (~20min) the lxc-commands stop working. I got the following trace
> from dmesg. Any ideas whats causing it and how to fix it besides a
> reboot? I used the same version of lxc with 3.12-rt with no (I think)
> rt-specific problems.
>
> [ 1200.764167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1200.764168] lxc-destroy     D ffff88040eae2680     0 15783  15775 0x00000004
> [ 1200.764169]  ffff880202697c88 0000000000000086 ffff880202697fd8
> ffff880202697fd8
> [ 1200.764170]  ffff880202697fd8 0000000000062680 ffff8804078c0000
> ffff88028db18000
> [ 1200.764171]  ffff880202697dd8 ffff88028db18000 ffff88028db18000
> 0000000000000000
> [ 1200.764172] Call Trace:
> [ 1200.764173]  [<ffffffff816face4>] schedule+0x34/0xa0
> [ 1200.764174]  [<ffffffff816fc2d6>] __rt_mutex_slowlock+0xe6/0x180
> [ 1200.764175]  [<ffffffff816fc99a>] rt_mutex_slowlock+0x12a/0x310
> [ 1200.764176]  [<ffffffff8115f1b4>] ? vma_merge+0xf4/0x330
> [ 1200.764177]  [<ffffffff8115f7ef>] ? vma_set_page_prot+0x3f/0x60
> [ 1200.764178]  [<ffffffff816fcc31>] rt_mutex_lock+0x31/0x40
> [ 1200.764179]  [<ffffffff816fdfce>] _mutex_lock+0xe/0x10
> [ 1200.764180]  [<ffffffff810de2e2>] proc_cgroup_show+0x52/0x200
> [ 1200.764180]  [<ffffffff81200941>] proc_single_show+0x51/0xa0
> [ 1200.764182]  [<ffffffff811bcdfa>] seq_read+0xea/0x370
> [ 1200.764182]  [<ffffffff811980dc>] vfs_read+0x9c/0x180
> [ 1200.764183]  [<ffffffff81198c59>] SyS_read+0x49/0xb0
> [ 1200.764184]  [<ffffffff81197d91>] ? SyS_lseek+0x91/0xb0
> [ 1200.764185]  [<ffffffff816fe676>] system_call_fastpath+0x16/0x1b
>
> Thanks.
> Christoph

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hung task for proc_cgroup_show
  2015-07-14 14:40 Hung task for proc_cgroup_show Christoph Mathys
  2015-07-14 14:45 ` Christoph Mathys
@ 2015-07-14 15:00 ` Sebastian Andrzej Siewior
  2015-08-03  6:19   ` Christoph Mathys
  1 sibling, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-07-14 15:00 UTC (permalink / raw)
  To: Christoph Mathys; +Cc: Linux RT Users

* Christoph Mathys | 2015-07-14 16:40:16 [+0200]:

>Hi there!
Hi,

>I just tried out lxc (Linux Containers) with 3.18.17-rt14. After some
>time (~20min) the lxc-commands stop working. I got the following trace
>from dmesg. Any ideas whats causing it and how to fix it besides a
>reboot? I used the same version of lxc with 3.12-rt with no (I think)
>rt-specific problems.
>
>[ 1200.764167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
…
>[ 1200.764172] Call Trace:
>[ 1200.764173]  [<ffffffff816face4>] schedule+0x34/0xa0
>[ 1200.764174]  [<ffffffff816fc2d6>] __rt_mutex_slowlock+0xe6/0x180
>[ 1200.764175]  [<ffffffff816fc99a>] rt_mutex_slowlock+0x12a/0x310
>[ 1200.764176]  [<ffffffff8115f1b4>] ? vma_merge+0xf4/0x330
>[ 1200.764177]  [<ffffffff8115f7ef>] ? vma_set_page_prot+0x3f/0x60
>[ 1200.764178]  [<ffffffff816fcc31>] rt_mutex_lock+0x31/0x40
>[ 1200.764179]  [<ffffffff816fdfce>] _mutex_lock+0xe/0x10
>[ 1200.764180]  [<ffffffff810de2e2>] proc_cgroup_show+0x52/0x200
>[ 1200.764180]  [<ffffffff81200941>] proc_single_show+0x51/0xa0
>[ 1200.764182]  [<ffffffff811bcdfa>] seq_read+0xea/0x370
>[ 1200.764182]  [<ffffffff811980dc>] vfs_read+0x9c/0x180
>[ 1200.764183]  [<ffffffff81198c59>] SyS_read+0x49/0xb0
>[ 1200.764184]  [<ffffffff81197d91>] ? SyS_lseek+0x91/0xb0
>[ 1200.764185]  [<ffffffff816fe676>] system_call_fastpath+0x16/0x1b

This seems to be always cgroup_mutex mutex it blocks on. SysRq d (show
all locks) should be able to show you who has the lock.

>Thanks.
>Christoph

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hung task for proc_cgroup_show
  2015-07-14 15:00 ` Sebastian Andrzej Siewior
@ 2015-08-03  6:19   ` Christoph Mathys
  2015-12-11 16:14     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Mathys @ 2015-08-03  6:19 UTC (permalink / raw)
  To: Linux RT Users

> This seems to be always cgroup_mutex mutex it blocks on. SysRq d (show
> all locks) should be able to show you who has the lock.

I could reproduce the lockup with cgroup stuff again, this time I with
information about locks.

[14406.842236] INFO: task kswork:69 blocked for more than 120 seconds.
[14406.842240]       Tainted: G            E  3.18.17-realtime-2-rt14 #3
[14406.842241] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14406.842243] kswork          D ffff8803fbf1c500     0    69      2 0x00000000
[14406.842247]  ffff8804015e3c18 0000000000000092 00000000015e3c18
ffff8804015e3fd8
[14406.842248]  ffff8804015e3fd8 ffff8804015e3fd8 ffff8804015e3fd8
ffff8804015e3ce0
[14406.842249]  ffff880401928000 ffff8804015d8000 ffff8804015e3c38
ffff8804015d8000
[14406.842250] Call Trace:
[14406.842256]  [<ffffffff81732694>] schedule+0x34/0xa0
[14406.842257]  [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0
[14406.842258]  [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320
[14406.842259]  [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20
[14406.842260]  [<ffffffff81735f19>] _mutex_lock+0x39/0x40
[14406.842263]  [<ffffffff810e3d4f>] ? css_release_work_fn+0x2f/0xd0
[14406.842265]  [<ffffffff810e3d4f>] css_release_work_fn+0x2f/0xd0
[14406.842268]  [<ffffffff81092a7a>] swork_kthread+0xfa/0x150
[14406.842269]  [<ffffffff81092980>] ? swork_readable+0x40/0x40
[14406.842272]  [<ffffffff81070036>] kthread+0xd6/0xf0
[14406.842273]  [<ffffffff8107581f>] ? finish_task_switch+0x3f/0x140
[14406.842275]  [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220
[14406.842276]  [<ffffffff81736888>] ret_from_fork+0x58/0x90
[14406.842278]  [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220
[14406.842279] 1 lock held by kswork/69:
[14406.842283]  #0:  (cgroup_mutex){......}, at: [<ffffffff810e3d4f>]
css_release_work_fn+0x2f/0xd0
[14406.842324] INFO: task kworker/3:0:10502 blocked for more than 120 seconds.
[14406.842326]       Tainted: G            E  3.18.17-realtime-2-rt14 #3
[14406.842326] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14406.842327] kworker/3:0     D ffff8803a9426d40     0 10502      2 0x00000000
[14406.842331] Workqueue: cgroup_destroy css_killed_work_fn
[14406.842333]  ffff8803a947fb98 0000000000000096 00000000a947fb98
ffff8803a947ffd8
[14406.842334]  ffff8803a947ffd8 ffff8803a947ffd8 ffff8803a947ffd8
ffffffff81d25470
[14406.842334]  ffff8803dace0000 ffff8800995f2290 ffff8803a947fbb8
ffff8800995f2290
[14406.842336] Call Trace:
[14406.842338]  [<ffffffff81732694>] schedule+0x34/0xa0
[14406.842338]  [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0
[14406.842339]  [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320
[14406.842340]  [<ffffffff8173443e>] ? rt_spin_lock_slowlock+0x5e/0x2c0
[14406.842341]  [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20
[14406.842342]  [<ffffffff81735f19>] _mutex_lock+0x39/0x40
[14406.842343]  [<ffffffff810e549f>] ? css_killed_work_fn+0x1f/0x170
[14406.842344]  [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170
[14406.842346]  [<ffffffff8106a92a>] process_one_work+0x1fa/0x5b0
[14406.842347]  [<ffffffff8106a88d>] ? process_one_work+0x15d/0x5b0
[14406.842348]  [<ffffffff8106ae4b>] worker_thread+0x16b/0x4c0
[14406.842349]  [<ffffffff8106ace0>] ? process_one_work+0x5b0/0x5b0
[14406.842350]  [<ffffffff81070036>] kthread+0xd6/0xf0
[14406.842351]  [<ffffffff8107581f>] ? finish_task_switch+0x3f/0x140
[14406.842353]  [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220
[14406.842354]  [<ffffffff81736888>] ret_from_fork+0x58/0x90
[14406.842355]  [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220
[14406.842356] 3 locks held by kworker/3:0/10502:
[14406.842361]  #0:  ("cgroup_destroy"){......}, at:
[<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
[14406.842364]  #1:  ((&css->destroy_work)){......}, at:
[<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
[14406.842367]  #2:  (cgroup_mutex){......}, at: [<ffffffff810e549f>]
css_killed_work_fn+0x1f/0x170
[14406.842375] INFO: task lxc-start:21854 blocked for more than 120 seconds.
[14406.842376]       Tainted: G            E  3.18.17-realtime-2-rt14 #3
[14406.842377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14406.842377] lxc-start       D ffff8803a9422840     0 21854  21848 0x00000000
[14406.842380]  ffff880401b33c48 0000000000000096 0000000001b33c48
ffff880401b33fd8
[14406.842380]  ffff880401b33fd8 ffff880401b33fd8 ffff880401b33fd8
ffff880401b33d10
[14406.842381]  ffff880401940000 ffff8803a7f38000 ffff880401b33c68
ffff8803a7f38000
[14406.842382] Call Trace:
[14406.842384]  [<ffffffff81732694>] schedule+0x34/0xa0
[14406.842385]  [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0
[14406.842386]  [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320
[14406.842387]  [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20
[14406.842387]  [<ffffffff81735f19>] _mutex_lock+0x39/0x40
[14406.842389]  [<ffffffff810ea122>] ? proc_cgroup_show+0x52/0x200
[14406.842390]  [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200
[14406.842393]  [<ffffffff81215b64>] proc_single_show+0x54/0xa0
[14406.842395]  [<ffffffff811d082d>] seq_read+0xed/0x380
[14406.842397]  [<ffffffff811aa32f>] vfs_read+0x9f/0x180
[14406.842398]  [<ffffffff811aaea9>] SyS_read+0x49/0xb0
[14406.842399]  [<ffffffff81736936>] system_call_fastpath+0x16/0x1b
[14406.842399] 2 locks held by lxc-start/21854:
[14406.842403]  #0:  (&p->lock){......}, at: [<ffffffff811d077b>]
seq_read+0x3b/0x380
[14406.842406]  #1:  (cgroup_mutex){......}, at: [<ffffffff810ea122>]
proc_cgroup_show+0x52/0x200
[14406.842409] INFO: task lxc-ls:21856 blocked for more than 120 seconds.
[14406.842410]       Tainted: G            E  3.18.17-realtime-2-rt14 #3
[14406.842411] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14406.842411] lxc-ls          D ffff8803a9565640     0 21856  21855 0x00000000
[14406.842414]  ffff8803b1103c48 0000000000000096 00000000b1103c48
ffff8803b1103fd8
[14406.842415]  ffff8803b1103fd8 ffff8803b1103fd8 ffff8803b1103fd8
ffff8803b1103d10
[14406.842415]  ffff880401942290 ffff8803ad9e4520 ffff8803b1103c68
ffff8803ad9e4520
[14406.842417] Call Trace:
[14406.842418]  [<ffffffff81732694>] schedule+0x34/0xa0
[14406.842419]  [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0
[14406.842420]  [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320
[14406.842421]  [<ffffffff810ea103>] ? proc_cgroup_show+0x33/0x200
[14406.842422]  [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20
[14406.842423]  [<ffffffff81735f19>] _mutex_lock+0x39/0x40
[14406.842424]  [<ffffffff810ea122>] ? proc_cgroup_show+0x52/0x200
[14406.842425]  [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200
[14406.842426]  [<ffffffff81215b64>] proc_single_show+0x54/0xa0
[14406.842427]  [<ffffffff811d082d>] seq_read+0xed/0x380
[14406.842428]  [<ffffffff811aa32f>] vfs_read+0x9f/0x180
[14406.842429]  [<ffffffff811aaea9>] SyS_read+0x49/0xb0
[14406.842430]  [<ffffffff81736936>] system_call_fastpath+0x16/0x1b
[14406.842430] 2 locks held by lxc-ls/21856:
[14406.842435]  #0:  (&p->lock){......}, at: [<ffffffff811d077b>]
seq_read+0x3b/0x380
[14406.842437]  #1:  (cgroup_mutex){......}, at: [<ffffffff810ea122>]
proc_cgroup_show+0x52/0x200
[14526.897563] INFO: task kswork:69 blocked for more than 120 seconds.
[14526.897570]       Tainted: G            E  3.18.17-realtime-2-rt14 #3
[14526.897572] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14526.897575] kswork          D ffff8803fbf1c500     0    69      2 0x00000000
[14526.897583]  ffff8804015e3c18 0000000000000092 00000000015e3c18
ffff8804015e3fd8
[14526.897585]  ffff8804015e3fd8 ffff8804015e3fd8 ffff8804015e3fd8
ffff8804015e3ce0
[14526.897586]  ffff880401928000 ffff8804015d8000 ffff8804015e3c38
ffff8804015d8000
[14526.897590] Call Trace:
[14526.897611]  [<ffffffff81732694>] schedule+0x34/0xa0
[14526.897613]  [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0
[14526.897614]  [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320
[14526.897616]  [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20
[14526.897617]  [<ffffffff81735f19>] _mutex_lock+0x39/0x40
[14526.897621]  [<ffffffff810e3d4f>] ? css_release_work_fn+0x2f/0xd0
[14526.897622]  [<ffffffff810e3d4f>] css_release_work_fn+0x2f/0xd0
[14526.897624]  [<ffffffff81092a7a>] swork_kthread+0xfa/0x150
[14526.897625]  [<ffffffff81092980>] ? swork_readable+0x40/0x40
[14526.897628]  [<ffffffff81070036>] kthread+0xd6/0xf0
[14526.897629]  [<ffffffff8107581f>] ? finish_task_switch+0x3f/0x140
[14526.897631]  [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220
[14526.897632]  [<ffffffff81736888>] ret_from_fork+0x58/0x90
[14526.897633]  [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220
[14526.897634] 1 lock held by kswork/69:
[14526.897639]  #0:  (cgroup_mutex){......}, at: [<ffffffff810e3d4f>]
css_release_work_fn+0x2f/0xd0
[14526.897680] INFO: task kworker/3:0:10502 blocked for more than 120 seconds.
[14526.897682]       Tainted: G            E  3.18.17-realtime-2-rt14 #3
[14526.897682] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14526.897683] kworker/3:0     D ffff8803a9426d40     0 10502      2 0x00000000
[14526.897687] Workqueue: cgroup_destroy css_killed_work_fn
[14526.897689]  ffff8803a947fb98 0000000000000096 00000000a947fb98
ffff8803a947ffd8
[14526.897690]  ffff8803a947ffd8 ffff8803a947ffd8 ffff8803a947ffd8
ffffffff81d25470
[14526.897690]  ffff8803dace0000 ffff8800995f2290 ffff8803a947fbb8
ffff8800995f2290
[14526.897692] Call Trace:
[14526.897694]  [<ffffffff81732694>] schedule+0x34/0xa0
[14526.897694]  [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0
[14526.897695]  [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320
[14526.897696]  [<ffffffff8173443e>] ? rt_spin_lock_slowlock+0x5e/0x2c0
[14526.897697]  [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20
[14526.897698]  [<ffffffff81735f19>] _mutex_lock+0x39/0x40
[14526.897700]  [<ffffffff810e549f>] ? css_killed_work_fn+0x1f/0x170
[14526.897701]  [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170
[14526.897702]  [<ffffffff8106a92a>] process_one_work+0x1fa/0x5b0
[14526.897703]  [<ffffffff8106a88d>] ? process_one_work+0x15d/0x5b0
[14526.897705]  [<ffffffff8106ae4b>] worker_thread+0x16b/0x4c0
[14526.897706]  [<ffffffff8106ace0>] ? process_one_work+0x5b0/0x5b0
[14526.897707]  [<ffffffff81070036>] kthread+0xd6/0xf0
[14526.897708]  [<ffffffff8107581f>] ? finish_task_switch+0x3f/0x140
[14526.897710]  [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220
[14526.897711]  [<ffffffff81736888>] ret_from_fork+0x58/0x90
[14526.897712]  [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220
[14526.897713] 3 locks held by kworker/3:0/10502:
[14526.897719]  #0:  ("cgroup_destroy"){......}, at:
[<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
[14526.897722]  #1:  ((&css->destroy_work)){......}, at:
[<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
[14526.897724]  #2:  (cgroup_mutex){......}, at: [<ffffffff810e549f>]
css_killed_work_fn+0x1f/0x170
[14526.897732] INFO: task lxc-start:21854 blocked for more than 120 seconds.
[14526.897733]       Tainted: G            E  3.18.17-realtime-2-rt14 #3
[14526.897734] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14526.897735] lxc-start       D ffff8803a9422840     0 21854  21848 0x00000000
[14526.897737]  ffff880401b33c48 0000000000000096 0000000001b33c48
ffff880401b33fd8
[14526.897738]  ffff880401b33fd8 ffff880401b33fd8 ffff880401b33fd8
ffff880401b33d10
[14526.897739]  ffff880401940000 ffff8803a7f38000 ffff880401b33c68
ffff8803a7f38000
[14526.897740] Call Trace:
[14526.897742]  [<ffffffff81732694>] schedule+0x34/0xa0
[14526.897742]  [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0
[14526.897743]  [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320
[14526.897744]  [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20
[14526.897745]  [<ffffffff81735f19>] _mutex_lock+0x39/0x40
[14526.897747]  [<ffffffff810ea122>] ? proc_cgroup_show+0x52/0x200
[14526.897748]  [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200
[14526.897751]  [<ffffffff81215b64>] proc_single_show+0x54/0xa0
[14526.897753]  [<ffffffff811d082d>] seq_read+0xed/0x380
[14526.897754]  [<ffffffff811aa32f>] vfs_read+0x9f/0x180
[14526.897755]  [<ffffffff811aaea9>] SyS_read+0x49/0xb0
[14526.897756]  [<ffffffff81736936>] system_call_fastpath+0x16/0x1b
[14526.897757] 2 locks held by lxc-start/21854:
[14526.897761]  #0:  (&p->lock){......}, at: [<ffffffff811d077b>]
seq_read+0x3b/0x380
[14526.897764]  #1:  (cgroup_mutex){......}, at: [<ffffffff810ea122>]
proc_cgroup_show+0x52/0x200
[14526.897767] INFO: task lxc-ls:21856 blocked for more than 120 seconds.
[14526.897768]       Tainted: G            E  3.18.17-realtime-2-rt14 #3
[14526.897769] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14526.897770] lxc-ls          D ffff8803a9565640     0 21856  21855 0x00000000
[14526.897772]  ffff8803b1103c48 0000000000000096 00000000b1103c48
ffff8803b1103fd8
[14526.897773]  ffff8803b1103fd8 ffff8803b1103fd8 ffff8803b1103fd8
ffff8803b1103d10
[14526.897773]  ffff880401942290 ffff8803ad9e4520 ffff8803b1103c68
ffff8803ad9e4520
[14526.897775] Call Trace:
[14526.897777]  [<ffffffff81732694>] schedule+0x34/0xa0
[14526.897777]  [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0
[14526.897778]  [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320
[14526.897779]  [<ffffffff810ea103>] ? proc_cgroup_show+0x33/0x200
[14526.897780]  [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20
[14526.897781]  [<ffffffff81735f19>] _mutex_lock+0x39/0x40
[14526.897782]  [<ffffffff810ea122>] ? proc_cgroup_show+0x52/0x200
[14526.897783]  [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200
[14526.897785]  [<ffffffff81215b64>] proc_single_show+0x54/0xa0
[14526.897786]  [<ffffffff811d082d>] seq_read+0xed/0x380
[14526.897787]  [<ffffffff811aa32f>] vfs_read+0x9f/0x180
[14526.897788]  [<ffffffff811aaea9>] SyS_read+0x49/0xb0
[14526.897789]  [<ffffffff81736936>] system_call_fastpath+0x16/0x1b
[14526.897789] 2 locks held by lxc-ls/21856:
[14526.897793]  #0:  (&p->lock){......}, at: [<ffffffff811d077b>]
seq_read+0x3b/0x380
[14526.897796]  #1:  (cgroup_mutex){......}, at: [<ffffffff810ea122>]
proc_cgroup_show+0x52/0x200
[14646.952905] INFO: task kswork:69 blocked for more than 120 seconds.
[14646.952912]       Tainted: G            E  3.18.17-realtime-2-rt14 #3
[14646.952914] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14646.952917] kswork          D ffff8803fbf1c500     0    69      2 0x00000000
[14646.952925]  ffff8804015e3c18 0000000000000092 00000000015e3c18
ffff8804015e3fd8
[14646.952926]  ffff8804015e3fd8 ffff8804015e3fd8 ffff8804015e3fd8
ffff8804015e3ce0
[14646.952928]  ffff880401928000 ffff8804015d8000 ffff8804015e3c38
ffff8804015d8000
[14646.952932] Call Trace:
[14646.952941]  [<ffffffff81732694>] schedule+0x34/0xa0
[14646.952944]  [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0
[14646.952946]  [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320
[14646.952948]  [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20
[14646.952951]  [<ffffffff81735f19>] _mutex_lock+0x39/0x40
[14646.952956]  [<ffffffff810e3d4f>] ? css_release_work_fn+0x2f/0xd0
[14646.952960]  [<ffffffff810e3d4f>] css_release_work_fn+0x2f/0xd0
[14646.952965]  [<ffffffff81092a7a>] swork_kthread+0xfa/0x150
[14646.952968]  [<ffffffff81092980>] ? swork_readable+0x40/0x40
[14646.952972]  [<ffffffff81070036>] kthread+0xd6/0xf0
[14646.952976]  [<ffffffff8107581f>] ? finish_task_switch+0x3f/0x140
[14646.952979]  [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220
[14646.952982]  [<ffffffff81736888>] ret_from_fork+0x58/0x90
[14646.952985]  [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220
[14646.952987] 1 lock held by kswork/69:
[14646.952997]  #0:  (cgroup_mutex){......}, at: [<ffffffff810e3d4f>]
css_release_work_fn+0x2f/0xd0
[14646.953052] INFO: task kworker/3:0:10502 blocked for more than 120 seconds.
[14646.953054]       Tainted: G            E  3.18.17-realtime-2-rt14 #3
[14646.953054] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[14646.953055] kworker/3:0     D ffff8803a9426d40     0 10502      2 0x00000000
[14646.953059] Workqueue: cgroup_destroy css_killed_work_fn
[14646.953060]  ffff8803a947fb98 0000000000000096 00000000a947fb98
ffff8803a947ffd8
[14646.953061]  ffff8803a947ffd8 ffff8803a947ffd8 ffff8803a947ffd8
ffffffff81d25470
[14646.953062]  ffff8803dace0000 ffff8800995f2290 ffff8803a947fbb8
ffff8800995f2290
[14646.953063] Call Trace:
[14646.953065]  [<ffffffff81732694>] schedule+0x34/0xa0
[14646.953066]  [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0
[14646.953067]  [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320
[14646.953067]  [<ffffffff8173443e>] ? rt_spin_lock_slowlock+0x5e/0x2c0
[14646.953068]  [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20
[14646.953069]  [<ffffffff81735f19>] _mutex_lock+0x39/0x40
[14646.953070]  [<ffffffff810e549f>] ? css_killed_work_fn+0x1f/0x170
[14646.953071]  [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170
[14646.953073]  [<ffffffff8106a92a>] process_one_work+0x1fa/0x5b0
[14646.953074]  [<ffffffff8106a88d>] ? process_one_work+0x15d/0x5b0
[14646.953075]  [<ffffffff8106ae4b>] worker_thread+0x16b/0x4c0
[14646.953076]  [<ffffffff8106ace0>] ? process_one_work+0x5b0/0x5b0
[14646.953077]  [<ffffffff81070036>] kthread+0xd6/0xf0
[14646.953078]  [<ffffffff8107581f>] ? finish_task_switch+0x3f/0x140
[14646.953080]  [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220
[14646.953080]  [<ffffffff81736888>] ret_from_fork+0x58/0x90
[14646.953082]  [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220
[14646.953083] 3 locks held by kworker/3:0/10502:
[14646.953088]  #0:  ("cgroup_destroy"){......}, at:
[<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
[14646.953091]  #1:  ((&css->destroy_work)){......}, at:
[<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
[14646.953094]  #2:  (cgroup_mutex){......}, at: [<ffffffff810e549f>]
css_killed_work_fn+0x1f/0x170



[16929.175280] SysRq : Show Locks Held
[16929.175286]
[16929.175286] Showing all locks held in the system:
[16929.175301] 1 lock held by kswork/69:
[16929.175302]  #0:  (cgroup_mutex){......}, at: [<ffffffff810e3d4f>]
css_release_work_fn+0x2f/0xd0
[16929.175316] 2 locks held by systemd-logind/583:
[16929.175317]  #0:  (&p->lock){......}, at: [<ffffffff811d077b>]
seq_read+0x3b/0x380
[16929.175321]  #1:  (cgroup_mutex){......}, at: [<ffffffff810ea122>]
proc_cgroup_show+0x52/0x200
[16929.175327] 1 lock held by in:imklog/620:
[16929.175327]  #0:  (&f->f_pos_lock){......}, at:
[<ffffffff811ca69a>] __fdget_pos+0x4a/0x50
[16929.175334] 3 locks held by polkitd/913:
[16929.175334]  #0:  (&f->f_pos_lock){......}, at:
[<ffffffff811ca69a>] __fdget_pos+0x4a/0x50
[16929.175337]  #1:  (&p->lock){......}, at: [<ffffffff811d077b>]
seq_read+0x3b/0x380
[16929.175340]  #2:  (cgroup_mutex){......}, at: [<ffffffff810ea122>]
proc_cgroup_show+0x52/0x200
[16929.175346] 2 locks held by getty/1095:
[16929.175347]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175351]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175355] 2 locks held by getty/1098:
[16929.175356]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175358]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175362] 2 locks held by getty/1106:
[16929.175363]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175365]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175368] 2 locks held by getty/1107:
[16929.175369]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175371]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175375] 2 locks held by getty/1110:
[16929.175375]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175378]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175384] 2 locks held by getty/1749:
[16929.175385]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175387]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175415] 1 lock held by rsyslogd/3299:
[16929.175415]  #0:  (&f->f_pos_lock){......}, at:
[<ffffffff811ca69a>] __fdget_pos+0x4a/0x50
[16929.175420] 2 locks held by getty/3445:
[16929.175421]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175423]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175427] 2 locks held by getty/3452:
[16929.175427]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175429]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175433] 2 locks held by getty/3455:
[16929.175433]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175436]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175440] 2 locks held by getty/3513:
[16929.175441]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175443]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175446] 2 locks held by getty/3515:
[16929.175446]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175449]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175457] 2 locks held by zsh/14000:
[16929.175457]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175460]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175464] 2 locks held by zsh/7684:
[16929.175464]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175466]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175470] 3 locks held by kworker/3:0/10502:
[16929.175470]  #0:  ("cgroup_destroy"){......}, at:
[<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
[16929.175476]  #1:  ((&css->destroy_work)){......}, at:
[<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
[16929.175478]  #2:  (cgroup_mutex){......}, at: [<ffffffff810e549f>]
css_killed_work_fn+0x1f/0x170
[16929.175484] 1 lock held by rsyslogd/16108:
[16929.175485]  #0:  (&f->f_pos_lock){......}, at:
[<ffffffff811ca69a>] __fdget_pos+0x4a/0x50
[16929.175489] 2 locks held by getty/16596:
[16929.175490]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175492]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175496] 2 locks held by getty/16601:
[16929.175497]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175499]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175502] 2 locks held by getty/16603:
[16929.175503]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175505]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175509] 2 locks held by getty/16621:
[16929.175509]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175511]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175514] 2 locks held by getty/16627:
[16929.175515]  #0:  (&tty->ldisc_sem){......}, at:
[<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60
[16929.175517]  #1:  (&ldata->atomic_read_lock){......}, at:
[<ffffffff8144cbce>] n_tty_read+0xae/0xb90
[16929.175521] 3 locks held by kworker/2:3/19520:
[16929.175522]  #0:  ("cgroup_destroy"){......}, at:
[<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
[16929.175524]  #1:  ((&css->destroy_work)){......}, at:
[<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
[16929.175527]  #2:  (cgroup_mutex){......}, at: [<ffffffff810e549f>]
css_killed_work_fn+0x1f/0x170
[16929.175531] 2 locks held by lxc-start/21854:
[16929.175531]  #0:  (&p->lock){......}, at: [<ffffffff811d077b>]
seq_read+0x3b/0x380
[16929.175534]  #1:  (cgroup_mutex){......}, at: [<ffffffff810ea122>]
proc_cgroup_show+0x52/0x200
[16929.175538] 2 locks held by lxc-ls/21856:
[16929.175539]  #0:  (&p->lock){......}, at: [<ffffffff811d077b>]
seq_read+0x3b/0x380
[16929.175545]  #1:  (cgroup_mutex){......}, at: [<ffffffff810ea122>]
proc_cgroup_show+0x52/0x200
[16929.175554] 3 locks held by bash/10002:
[16929.175555]  #0:  (sb_writers#6){......}, at: [<ffffffff811aa5c3>]
vfs_write+0x1b3/0x1f0
[16929.175559]  #1:  (rcu_read_lock){......}, at: [<ffffffff81455f45>]
__handle_sysrq+0x5/0x1b0
[16929.175563]  #2:  (tasklist_lock){......}, at: [<ffffffff8109a923>]
debug_show_all_locks+0x43/0x1e0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hung task for proc_cgroup_show
  2015-08-03  6:19   ` Christoph Mathys
@ 2015-12-11 16:14     ` Sebastian Andrzej Siewior
  2015-12-21  8:58       ` Christoph Mathys
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-12-11 16:14 UTC (permalink / raw)
  To: Christoph Mathys; +Cc: Linux RT Users

* Christoph Mathys | 2015-08-03 08:19:59 [+0200]:

>I could reproduce the lockup with cgroup stuff again, this time I with
>information about locks.

According to the info you sent, we have:
|Showing all locks held in the system:
|1 lock held by kswork/69:
| #0:  (cgroup_mutex){......}, at: [<ffffffff810e3d4f>] css_release_work_fn+0x2f/0xd0
|2 locks held by systemd-logind/583:
| #0:  (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380
| #1:  (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200
|3 locks held by polkitd/913:
| #0:  (&f->f_pos_lock){......}, at: [<ffffffff811ca69a>] __fdget_pos+0x4a/0x50
| #1:  (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380
| #2:  (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200
|3 locks held by kworker/3:0/10502:
| #0:  ("cgroup_destroy"){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
| #1:  ((&css->destroy_work)){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
| #2:  (cgroup_mutex){......}, at: [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170
|3 locks held by kworker/2:3/19520:
| #0:  ("cgroup_destroy"){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
| #1:  ((&css->destroy_work)){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0
| #2:  (cgroup_mutex){......}, at: [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170
|2 locks held by lxc-start/21854:
| #0:  (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380
| #1:  (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200
|2 locks held by lxc-ls/21856:
| #0:  (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380
| #1:  (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200

One of them owns the lock, the others are blocked on it. We see
cgroup_mutex multiple times because we first tell lockdep that we own it
and then we block on it. I think none of proc_cgroup_show() functions
really own the mutex because I would expect to see css_set_rwsem as well
in the list of owned locks which I don't.

This means css_release_work_fn() or css_killed_work_fn(). Can you issue
a task dump which says what the tasks are doing?
What is it that you are doing exactly?

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hung task for proc_cgroup_show
  2015-12-11 16:14     ` Sebastian Andrzej Siewior
@ 2015-12-21  8:58       ` Christoph Mathys
  2015-12-21  9:49         ` Christoph Mathys
  2015-12-21 18:11         ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 12+ messages in thread
From: Christoph Mathys @ 2015-12-21  8:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Linux RT Users

On Fri, Dec 11, 2015 at 5:14 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> This means css_release_work_fn() or css_killed_work_fn(). Can you issue
> a task dump which says what the tasks are doing?

Here is another of those lockups, this time with 3.18.24-rt22. I took
the liberty to remove everything not related to cgroups from the logs.

Seeing the memory control group inside one of the call stacks I tried
disabling this cgroup. I did not see the lockup again, but two system
freezes instead.

[ 1951.780283] SysRq : Show Locks Held
[ 1951.780288]
[ 1951.780288] Showing all locks held in the system:
[ 1951.780294] 1 lock held by kswork/69:
[ 1951.780295]  #0:  (cgroup_mutex){+.+.+.}, at: [<ffffffff810ea7bf>]
css_release_work_fn+0x2f/0xe0
[ 1951.780307] 4 locks held by cgmanager/349:
[ 1951.780308]  #0:  (sb_writers#9){.+.+.+}, at: [<ffffffff811da1c4>]
mnt_want_write+0x24/0x50
[ 1951.780313]  #1:  (&type->i_mutex_dir_key#3/1){+.+.+.}, at:
[<ffffffff811c7e7c>] do_rmdir+0x15c/0x220
[ 1951.780317]  #2:  (&type->i_mutex_dir_key#3){+.+.+.}, at:
[<ffffffff811c2017>] vfs_rmdir+0x57/0x150
[ 1951.780322]  #3:  (cgroup_mutex){+.+.+.}, at: [<ffffffff810ebd40>]
cgroup_kn_lock_live+0x50/0x180
[ 1951.780399] 3 locks held by kworker/2:2/3642:
[ 1951.780399]  #0:  ("cgroup_destroy"){.+.+.+}, at:
[<ffffffff8106b801>] process_one_work+0x171/0x5b0
[ 1951.780403]  #1:  ((&css->destroy_work)){+.+.+.}, at:
[<ffffffff8106b801>] process_one_work+0x171/0x5b0
[ 1951.780406]  #2:  (cgroup_mutex){+.+.+.}, at: [<ffffffff810ebf2f>]
css_killed_work_fn+0x1f/0x170


[ 2769.928061] SysRq : Show State
[ 2769.928065]   task                        PC stack   pid father

[ 2769.944889] kworker/2:2     D ffff8803dccba580     0  3642      2 0x00000000
[ 2769.944900] Workqueue: cgroup_destroy css_killed_work_fn
[ 2769.944901]  ffff8803cae47a38 0000000000000096 ffff8803cae47a88
ffff8803cae47fd8
[ 2769.944902]  ffff8803cae47fd8 ffff8803cae47fd8 ffff8803cae47fd8
000000000000002b
[ 2769.944903]  ffff8803fe2ecc60 ffff8803dcdacc60 0000000000000005
ffff8803dcdacc60
[ 2769.944904] Call Trace:
[ 2769.944906]  [<ffffffff8174e044>] schedule+0x34/0xa0
[ 2769.944907]  [<ffffffff81750494>] schedule_timeout+0x1f4/0x260
[ 2769.944908]  [<ffffffff8109e325>] ? mark_held_locks+0x75/0xa0
[ 2769.944909]  [<ffffffff81751720>] ? _raw_spin_unlock_irq+0x30/0x60
[ 2769.944910]  [<ffffffff8109e44d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 2769.944911]  [<ffffffff8174f394>] wait_for_completion+0xb4/0xf0
[ 2769.944912]  [<ffffffff810bdde0>] ? __call_rcu.constprop.69+0x240/0x240
[ 2769.944913]  [<ffffffff810bad06>] wait_rcu_gp+0x46/0x50
[ 2769.944914]  [<ffffffff810baca0>] ?
ftrace_raw_output_rcu_utilization+0x50/0x50
[ 2769.944916]  [<ffffffff8174f316>] ? wait_for_completion+0x36/0xf0
[ 2769.944917]  [<ffffffff810be33f>] synchronize_rcu.part.67+0x1f/0x40
[ 2769.944918]  [<ffffffff810be378>] synchronize_rcu+0x18/0x20
[ 2769.944919]  [<ffffffff811a4c65>] mem_cgroup_start_move+0x15/0x20
[ 2769.944920]  [<ffffffff811aa603>] mem_cgroup_reparent_charges+0x73/0x3f0
[ 2769.944921]  [<ffffffff811aaafb>] mem_cgroup_css_offline+0x14b/0x3e0
[ 2769.944922]  [<ffffffff810ebf5e>] css_killed_work_fn+0x4e/0x170
[ 2769.944924]  [<ffffffff8106b89c>] process_one_work+0x20c/0x5b0
[ 2769.944925]  [<ffffffff8106b801>] ? process_one_work+0x171/0x5b0
[ 2769.944926]  [<ffffffff8106bdab>] worker_thread+0x16b/0x4c0
[ 2769.944927]  [<ffffffff8106bc40>] ? process_one_work+0x5b0/0x5b0
[ 2769.944928]  [<ffffffff81070fc9>] kthread+0xd9/0xf0
[ 2769.944929]  [<ffffffff8109e51d>] ? trace_hardirqs_on+0xd/0x10
[ 2769.944931]  [<ffffffff81070ef0>] ? kthread_create_on_node+0x240/0x240
[ 2769.944932]  [<ffffffff817528c8>] ret_from_fork+0x58/0x90
[ 2769.944933]  [<ffffffff81070ef0>] ? kthread_create_on_node+0x240/0x240

[ 2769.930980] cgmanager       D ffff8803f7ce7080     0   349      1 0x00000000
[ 2769.930988]  ffff8803f998bc18 0000000000000092 ffff8803f998bbe8
ffff8803f998bfd8
[ 2769.930990]  ffff8803f998bfd8 ffff8803f998bfd8 ffff8803f998bfd8
ffff8803f998bce0
[ 2769.930992]  ffffffff81c224c0 ffff8803f7c4cc60 ffff8803f998bc38
ffff8803f7c4cc60
[ 2769.930995] Call Trace:
[ 2769.930998]  [<ffffffff8174e044>] schedule+0x34/0xa0
[ 2769.930999]  [<ffffffff8174f865>] __rt_mutex_slowlock+0x55/0x1c0
[ 2769.931001]  [<ffffffff8174fad5>] rt_mutex_slowlock+0x105/0x320
[ 2769.931003]  [<ffffffff8174fd28>] rt_mutex_lock+0x38/0x40
[ 2769.931004]  [<ffffffff81751d5c>] _mutex_lock+0x3c/0x50
[ 2769.931006]  [<ffffffff810ebd40>] ? cgroup_kn_lock_live+0x50/0x180
[ 2769.931008]  [<ffffffff810ebd40>] cgroup_kn_lock_live+0x50/0x180
[ 2769.931010]  [<ffffffff810ebd60>] ? cgroup_kn_lock_live+0x70/0x180
[ 2769.931011]  [<ffffffff810efe84>] cgroup_rmdir+0x14/0x40
[ 2769.931013]  [<ffffffff8123308e>] kernfs_iop_rmdir+0x4e/0x80
[ 2769.931016]  [<ffffffff811c208e>] vfs_rmdir+0xce/0x150
[ 2769.931017]  [<ffffffff811c7f1b>] do_rmdir+0x1fb/0x220
[ 2769.931018]  [<ffffffff817536d1>] ? retint_swapgs+0xe/0x13
[ 2769.931020]  [<ffffffff8109e44d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 2769.931021]  [<ffffffff8139edac>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 2769.931023]  [<ffffffff811c9256>] SyS_rmdir+0x16/0x20
[ 2769.931024]  [<ffffffff81752976>] system_call_fastpath+0x16/0x1b

[ 2769.929671] kswork          D ffff8803fa6df080     0    69      2 0x00000000
[ 2769.929676]  ffff8803fda23be8 0000000000000092 ffff8803fda23bb8
ffff8803fda23fd8
[ 2769.929677]  ffff8803fda23fd8 ffff8803fda23fd8 ffff8803fda23fd8
ffff8803fda23cb0
[ 2769.929678]  ffff8803fe2ea630 ffff8803fda02630 ffff8803fda23c08
ffff8803fda02630
[ 2769.929679] Call Trace:
[ 2769.929681]  [<ffffffff8174e044>] schedule+0x34/0xa0
[ 2769.929683]  [<ffffffff8174f865>] __rt_mutex_slowlock+0x55/0x1c0
[ 2769.929684]  [<ffffffff8174fad5>] rt_mutex_slowlock+0x105/0x320
[ 2769.929686]  [<ffffffff8174fd28>] rt_mutex_lock+0x38/0x40
[ 2769.929687]  [<ffffffff81751d5c>] _mutex_lock+0x3c/0x50
[ 2769.929689]  [<ffffffff810ea7bf>] ? css_release_work_fn+0x2f/0xe0
[ 2769.929689]  [<ffffffff8109e44d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 2769.929691]  [<ffffffff810ea7bf>] css_release_work_fn+0x2f/0xe0
[ 2769.929692]  [<ffffffff81093eba>] swork_kthread+0xfa/0x150
[ 2769.929693]  [<ffffffff81093dc0>] ? swork_readable+0x40/0x40
[ 2769.929694]  [<ffffffff81070fc9>] kthread+0xd9/0xf0
[ 2769.929695]  [<ffffffff8109e51d>] ? trace_hardirqs_on+0xd/0x10
[ 2769.929697]  [<ffffffff81070ef0>] ? kthread_create_on_node+0x240/0x240
[ 2769.929698]  [<ffffffff817528c8>] ret_from_fork+0x58/0x90
[ 2769.929700]  [<ffffffff81070ef0>] ? kthread_create_on_node+0x240/0x240


> What is it that you are doing exactly?

My test script (see below) has two processes that each start and stop
a Linux Container (linuxcontainers.org). There are two other processes
that try to get information about the state of the containers (running
or not, IP).

#!/bin/bash

set -e
set -u

cont1=test1
cont2=test2

create_container()
{
    local container=$1
    if [ -z "$(sudo lxc-ls | grep $container)" ] ; then
        sudo lxc-create -n $container -t ubuntu -- --release=precise
    fi
}

start_stop_container_forever()
{
    local container=$1
    while true ; do
        sudo lxc-start -n $container -d
        sleep 1
        sudo lxc-stop -n $container
    done
}

list_container_forever()
{
    set +e
    while true ; do
        sudo lxc-ls -f
        sleep 1;
    done
}

create_container $cont1
create_container $cont2

(start_stop_container_forever $cont1) &
(start_stop_container_forever $cont2) &
(list_container_forever) &
(list_container_forever) &
wait

The following is just my current idea of how lxc works: When the
container is started (lxc-start), cgroups for different resource kinds
are created and init of the containerized Linux is started into a new
namespaces and attached to the cgroups created before. Init will start
more processes that inherit its namespace and cgroups. When the
container is stopped (lxc-stop), the init process inside the container
is sent a signal to shutdown. After the shutdown, all cgroups are
removed. The processes gathering information (lxc-ls) about the
containers attach themselves to the namespace of the containers init.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hung task for proc_cgroup_show
  2015-12-21  8:58       ` Christoph Mathys
@ 2015-12-21  9:49         ` Christoph Mathys
  2015-12-21 17:59           ` Sebastian Andrzej Siewior
  2015-12-21 18:11         ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Mathys @ 2015-12-21  9:49 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Linux RT Users

When testing on another computer I got a BUG from the same general area:

[   57.430698] BUG: sleeping function called from invalid context at
kernel/locking/rtmutex.c:917
[   57.430699] in_atomic(): 1, irqs_disabled(): 0, pid: 3329, name: init
[   57.430700] 2 locks held by init/3329:
[   57.430709]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8103f17e>]
__do_page_fault+0x12e/0x5f0
[   57.430715]  #1:  (ptlock_ptr(page)#2){+.+...}, at:
[<ffffffff8117235f>] do_read_fault.isra.66+0xaf/0x2d0
[   57.430718] CPU: 1 PID: 3329 Comm: init Tainted: G            E
3.18.24-realtime-1-rt22 #1
[   57.430719] Hardware name: Komax AG, Dierikon Komax-PC/DH61DL, BIOS
BEH6110H.86A.0042.2012.0327.2202 03/27/2012
[   57.430721]  0000000000000003 ffff8800d03c3888 ffffffff8174ad8a
0000000000000005
[   57.430723]  ffff8800d010a630 ffff8800d03c38a8 ffffffff810771a7
ffff8800d010a630
[   57.430724]  ffffffff81c5f220 ffff8800d03c38d8 ffffffff81751b44
ffff8800d03c38d8
[   57.430725] Call Trace:
[   57.430729]  [<ffffffff8174ad8a>] dump_stack+0x4e/0x84
[   57.430733]  [<ffffffff810771a7>] __might_sleep+0xe7/0x120
[   57.430735]  [<ffffffff81751b44>] rt_spin_lock+0x24/0x60
[   57.430737]  [<ffffffff8109e44d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[   57.430740]  [<ffffffff81093349>] __wake_up+0x29/0x60
[   57.430742]  [<ffffffff810bcd04>] rcu_report_exp_rnp.isra.64+0xb4/0xc0
[   57.430744]  [<ffffffff810c0a9b>] rcu_read_unlock_special+0x1fb/0x250
[   57.430747]  [<ffffffff8174e4ba>] ? preempt_schedule_irq+0x4a/0x80
[   57.430748]  [<ffffffff810c0cd4>] rcu_note_context_switch+0x1e4/0x420
[   57.430750]  [<ffffffff8174d7c6>] __schedule+0x56/0x8a0
[   57.430752]  [<ffffffff8174e4ae>] ? preempt_schedule_irq+0x3e/0x80
[   57.430754]  [<ffffffff8174e4b4>] preempt_schedule_irq+0x44/0x80
[   57.430757]  [<ffffffff811ac815>] ? mem_cgroup_begin_page_stat+0x5/0x110
[   57.430758]  [<ffffffff81753883>] check_int_off+0x11/0x1e
[   57.430760]  [<ffffffff810c08a5>] ? rcu_read_unlock_special+0x5/0x250
[   57.430762]  [<ffffffff810bada5>] ? __rcu_read_unlock+0x95/0xa0
[   57.430764]  [<ffffffff81143848>] filemap_map_pages+0x188/0x270
[   57.430765]  [<ffffffff811436c5>] ? filemap_map_pages+0x5/0x270
[   57.430768]  [<ffffffff8117246a>] do_read_fault.isra.66+0x1ba/0x2d0
[   57.430770]  [<ffffffff81173abf>] handle_mm_fault+0x5ef/0xec0
[   57.430772]  [<ffffffff8103f17e>] ? __do_page_fault+0x12e/0x5f0
[   57.430774]  [<ffffffff8103f1e2>] __do_page_fault+0x192/0x5f0
[   57.430777]  [<ffffffff8139edeb>] ? trace_hardirqs_off_thunk+0x3a/0x3f
[   57.430779]  [<ffffffff8103f64c>] do_page_fault+0xc/0x10
[   57.430781]  [<ffffffff81754752>] page_fault+0x22/0x30
[   57.430783]  [<ffffffff813bd9dc>] ? strncpy_from_user+0x6c/0x120
[   57.430786]  [<ffffffff811c7a75>] getname_flags+0xb5/0x1a0
[   57.430788]  [<ffffffff811c7b72>] getname+0x12/0x20
[   57.430790]  [<ffffffff811b594d>] do_sys_open+0xfd/0x220
[   57.430792]  [<ffffffff8109e44d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[   57.430793]  [<ffffffff811b5a8e>] SyS_open+0x1e/0x20
[   57.430795]  [<ffffffff81752976>] system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hung task for proc_cgroup_show
  2015-12-21  9:49         ` Christoph Mathys
@ 2015-12-21 17:59           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-12-21 17:59 UTC (permalink / raw)
  To: Christoph Mathys; +Cc: Linux RT Users

* Christoph Mathys | 2015-12-21 10:49:41 [+0100]:

>When testing on another computer I got a BUG from the same general area:

If you can reproduce this then the following patch should fix it:

---
Subject: [PATCH] kernel/rcu: use swait for sync_rcu_preempt_exp_wq

__schedule() invokes rcu_note_context_switch() with preemption disabled:

|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
|in_atomic(): 1, irqs_disabled(): 0, pid: 3329, name: init
|2 locks held by init/3329:
| #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8103f17e>] __do_page_fault+0x12e/0x5f0
| #1:  (ptlock_ptr(page)#2){+.+...}, at: [<ffffffff8117235f>] do_read_fault.isra.66+0xaf/0x2d0
|CPU: 1 PID: 3329 Comm: init Tainted: G            E 3.18.24-realtime-1-rt22 #1
|Call Trace:
| [<ffffffff81751b44>] rt_spin_lock+0x24/0x60
| [<ffffffff81093349>] __wake_up+0x29/0x60
| [<ffffffff810bcd04>] rcu_report_exp_rnp.isra.64+0xb4/0xc0
| [<ffffffff810c0a9b>] rcu_read_unlock_special+0x1fb/0x250
| [<ffffffff810c0cd4>] rcu_note_context_switch+0x1e4/0x420
| [<ffffffff8174d7c6>] __schedule+0x56/0x8a0
| [<ffffffff8174e4b4>] preempt_schedule_irq+0x44/0x80
| [<ffffffff81753883>] check_int_off+0x11/0x1e
| [<ffffffff81143848>] filemap_map_pages+0x188/0x270
| [<ffffffff8117246a>] do_read_fault.isra.66+0x1ba/0x2d0
| [<ffffffff81173abf>] handle_mm_fault+0x5ef/0xec0
| [<ffffffff8103f1e2>] __do_page_fault+0x192/0x5f0
| [<ffffffff8103f64c>] do_page_fault+0xc/0x10
| [<ffffffff81754752>] page_fault+0x22/0x30
| [<ffffffff811c7a75>] getname_flags+0xb5/0x1a0
| [<ffffffff811c7b72>] getname+0x12/0x20
| [<ffffffff811b594d>] do_sys_open+0xfd/0x220
| [<ffffffff8109e44d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
| [<ffffffff811b5a8e>] SyS_open+0x1e/0x20
| [<ffffffff81752976>] system_call_fastpath+0x16/0x1b

Reported-by: Christoph Mathys <eraserix@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/rcu/tree_plugin.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 54da8f44d586..1cbeb0aa79e2 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -527,7 +527,7 @@ void synchronize_rcu(void)
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu);
 
-static DECLARE_WAIT_QUEUE_HEAD(sync_rcu_preempt_exp_wq);
+static DEFINE_SWAIT_HEAD(sync_rcu_preempt_exp_wq);
 static unsigned long sync_rcu_preempt_exp_count;
 static DEFINE_MUTEX(sync_rcu_preempt_exp_mutex);
 
@@ -584,7 +584,7 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
 			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			if (wake) {
 				smp_mb(); /* EGP done before wake_up(). */
-				wake_up(&sync_rcu_preempt_exp_wq);
+				swait_wake(&sync_rcu_preempt_exp_wq);
 			}
 			break;
 		}
@@ -759,7 +759,7 @@ void synchronize_rcu_expedited(void)
 
 	/* Wait for snapshotted ->blkd_tasks lists to drain. */
 	rnp = rcu_get_root(rsp);
-	wait_event(sync_rcu_preempt_exp_wq,
+	swait_event(sync_rcu_preempt_exp_wq,
 		   sync_rcu_preempt_exp_done(rnp));
 
 	/* Clean up and exit. */
-- 
2.6.4


Sebastian

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Hung task for proc_cgroup_show
  2015-12-21  8:58       ` Christoph Mathys
  2015-12-21  9:49         ` Christoph Mathys
@ 2015-12-21 18:11         ` Sebastian Andrzej Siewior
  2015-12-22 12:36           ` Christoph Mathys
  1 sibling, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-12-21 18:11 UTC (permalink / raw)
  To: Christoph Mathys; +Cc: Linux RT Users

* Christoph Mathys | 2015-12-21 09:58:11 [+0100]:

>On Fri, Dec 11, 2015 at 5:14 PM, Sebastian Andrzej Siewior
><bigeasy@linutronix.de> wrote:
>> This means css_release_work_fn() or css_killed_work_fn(). Can you issue
>> a task dump which says what the tasks are doing?
>
>Here is another of those lockups, this time with 3.18.24-rt22. I took
>the liberty to remove everything not related to cgroups from the logs.
>
>Seeing the memory control group inside one of the call stacks I tried
>disabling this cgroup. I did not see the lockup again, but two system
>freezes instead.
>
So based on this output:

SysRq : Show Locks Held

Showing all locks held in the system:
1 lock held by kswork/69:
 #0:  (cgroup_mutex){+.+.+.}, at: [<ffffffff810ea7bf>] css_release_work_fn+0x2f/0xe0
4 locks held by cgmanager/349:
 #0:  (sb_writers#9){.+.+.+}, at: [<ffffffff811da1c4>] mnt_want_write+0x24/0x50
 #1:  (&type->i_mutex_dir_key#3/1){+.+.+.}, at: [<ffffffff811c7e7c>] do_rmdir+0x15c/0x220
 #2:  (&type->i_mutex_dir_key#3){+.+.+.}, at: [<ffffffff811c2017>] vfs_rmdir+0x57/0x150
 #3:  (cgroup_mutex){+.+.+.}, at: [<ffffffff810ebd40>] cgroup_kn_lock_live+0x50/0x180
3 locks held by kworker/2:2/3642:
 #0:  ("cgroup_destroy"){.+.+.+}, at: [<ffffffff8106b801>] process_one_work+0x171/0x5b0
 #1:  ((&css->destroy_work)){+.+.+.}, at: [<ffffffff8106b801>] process_one_work+0x171/0x5b0
 #2:  (cgroup_mutex){+.+.+.}, at: [<ffffffff810ebf2f>] css_killed_work_fn+0x1f/0x170

and the stack trace below it looks like each one of those three tasks
block on the mutex but none of them owns it.
So the question is if any of those tree waiters are marked as such and
if so why is there no wake up in progress.

Thank you for the steps how to reproduce it. I will try it next year.

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hung task for proc_cgroup_show
  2015-12-21 18:11         ` Sebastian Andrzej Siewior
@ 2015-12-22 12:36           ` Christoph Mathys
  2015-12-22 13:51             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Mathys @ 2015-12-22 12:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Linux RT Users

I tried to reproduce this bug in kvm, hoping to get copy/pastable
dumps. The test didn't run at all, instead I got a stunning number of
the following slowpath warning (seem like it was repeated over 100
times during 9ms). I see them occasionally (less than once every few
minutes) on real hardware as well. I don't know what to make of them
and if they can help in any way. Here is the repeated message:

[   63.141432] WARNING: CPU: 0 PID: 54 at kernel/res_counter.c:28
res_counter_uncharge_until+0x98/0xa0()
[   63.141434] Modules linked in: veth(E) xt_CHECKSUM(E)
iptable_mangle(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E)
iptable_nat(E) nf_co
nntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E)
nf_conntrack(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) iptable_filter(E)
ip_tables
(E) ppdev(E) x_tables(E) kvm(E) microcode(E) cirrus(E) ttm(E)
serio_raw(E) drm_kms_helper(E) virtio_balloon(E) drm(E) syscopyarea(E)
sysfil
lrect(E) sysimgblt(E) i2c_piix4(E) parport_pc(E) mac_hid(E) lp(E)
parport(E) psmouse(E) pata_acpi(E) floppy(E)
[   63.141456] CPU: 0 PID: 54 Comm: kworker/0:1 Tainted: G        W
E  3.18.24-realtime-1-with-patch-rt22+ #2
[   63.141457] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[   63.141459] Workqueue: cgroup_destroy css_killed_work_fn
[   63.141462]  0000000000000009 ffff88013639bb98 ffffffff8174ad9a
0000000000000002
[   63.141465]  0000000000000000 ffff88013639bbd8 ffffffff8104f371
ffff8800ba093100
[   63.141467]  ffff8800ba093100 0000000000000000 0000000000001000
ffff8800ba093100
[   63.141468] Call Trace:
[   63.141470]  [<ffffffff8174ad9a>] dump_stack+0x4e/0x84
[   63.141471]  [<ffffffff8104f371>] warn_slowpath_common+0x81/0xc0
[   63.141473]  [<ffffffff8104f46a>] warn_slowpath_null+0x1a/0x20
[   63.141475]  [<ffffffff810f76a8>] res_counter_uncharge_until+0x98/0xa0
[   63.141476]  [<ffffffff811aa88e>] mem_cgroup_reparent_charges+0x2ee/0x3f0
[   63.141478]  [<ffffffff811aab0b>] mem_cgroup_css_offline+0x14b/0x3e0
[   63.141480]  [<ffffffff810ebf6e>] css_killed_work_fn+0x4e/0x170
[   63.141481]  [<ffffffff8106b89c>] process_one_work+0x20c/0x5b0
[   63.141483]  [<ffffffff8106b801>] ? process_one_work+0x171/0x5b0
[   63.141485]  [<ffffffff8106bdab>] worker_thread+0x16b/0x4c0
[   63.141486]  [<ffffffff8106bc40>] ? process_one_work+0x5b0/0x5b0
[   63.141488]  [<ffffffff81070fc9>] kthread+0xd9/0xf0
[   63.141490]  [<ffffffff8109e51d>] ? trace_hardirqs_on+0xd/0x10
[   63.141492]  [<ffffffff81070ef0>] ? kthread_create_on_node+0x240/0x240
[   63.141493]  [<ffffffff817528c8>] ret_from_fork+0x58/0x90
[   63.141495]  [<ffffffff81070ef0>] ? kthread_create_on_node+0x240/0x240
[   63.141496] ---[ end trace 0000000000000075 ]---

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hung task for proc_cgroup_show
  2015-12-22 12:36           ` Christoph Mathys
@ 2015-12-22 13:51             ` Sebastian Andrzej Siewior
  2015-12-23  6:53               ` Christoph Mathys
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-12-22 13:51 UTC (permalink / raw)
  To: Christoph Mathys; +Cc: Linux RT Users

* Christoph Mathys | 2015-12-22 13:36:19 [+0100]:

>I tried to reproduce this bug in kvm, hoping to get copy/pastable
>dumps. The test didn't run at all, instead I got a stunning number of
>the following slowpath warning (seem like it was repeated over 100
>times during 9ms). I see them occasionally (less than once every few
>minutes) on real hardware as well. I don't know what to make of them
>and if they can help in any way. Here is the repeated message:
>
>[   63.141432] WARNING: CPU: 0 PID: 54 at kernel/res_counter.c:28
>res_counter_uncharge_until+0x98/0xa0()

It seems the ressource counter code is somehow unbalanced. It has its
own lock but you see what you see.
You could try without CONFIG_RESOURCE_COUNTERS or switch to v4.1 instead
since it has the file in question removed as per 5b1efc027c0b ("kernel:
res_counter: remove the unused API").

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Hung task for proc_cgroup_show
  2015-12-22 13:51             ` Sebastian Andrzej Siewior
@ 2015-12-23  6:53               ` Christoph Mathys
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Mathys @ 2015-12-23  6:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Linux RT Users

On Tue, Dec 22, 2015 at 2:51 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> You could try without CONFIG_RESOURCE_COUNTERS or switch to v4.1 instead
> since it has the file in question removed as per 5b1efc027c0b ("kernel:
> res_counter: remove the unused API").

Yup, 4.1 does not show this behaviour.

Thanks.
Christoph

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-12-23  6:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-14 14:40 Hung task for proc_cgroup_show Christoph Mathys
2015-07-14 14:45 ` Christoph Mathys
2015-07-14 15:00 ` Sebastian Andrzej Siewior
2015-08-03  6:19   ` Christoph Mathys
2015-12-11 16:14     ` Sebastian Andrzej Siewior
2015-12-21  8:58       ` Christoph Mathys
2015-12-21  9:49         ` Christoph Mathys
2015-12-21 17:59           ` Sebastian Andrzej Siewior
2015-12-21 18:11         ` Sebastian Andrzej Siewior
2015-12-22 12:36           ` Christoph Mathys
2015-12-22 13:51             ` Sebastian Andrzej Siewior
2015-12-23  6:53               ` Christoph Mathys

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).