* Hung task for proc_cgroup_show @ 2015-07-14 14:40 Christoph Mathys 2015-07-14 14:45 ` Christoph Mathys 2015-07-14 15:00 ` Sebastian Andrzej Siewior 0 siblings, 2 replies; 12+ messages in thread From: Christoph Mathys @ 2015-07-14 14:40 UTC (permalink / raw) To: Linux RT Users Hi there! I just tried out lxc (Linux Containers) with 3.18.17-rt14. After some time (~20min) the lxc-commands stop working. I got the following trace from dmesg. Any ideas whats causing it and how to fix it besides a reboot? I used the same version of lxc with 3.12-rt with no (I think) rt-specific problems. [ 1200.764167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1200.764168] lxc-destroy D ffff88040eae2680 0 15783 15775 0x00000004 [ 1200.764169] ffff880202697c88 0000000000000086 ffff880202697fd8 ffff880202697fd8 [ 1200.764170] ffff880202697fd8 0000000000062680 ffff8804078c0000 ffff88028db18000 [ 1200.764171] ffff880202697dd8 ffff88028db18000 ffff88028db18000 0000000000000000 [ 1200.764172] Call Trace: [ 1200.764173] [<ffffffff816face4>] schedule+0x34/0xa0 [ 1200.764174] [<ffffffff816fc2d6>] __rt_mutex_slowlock+0xe6/0x180 [ 1200.764175] [<ffffffff816fc99a>] rt_mutex_slowlock+0x12a/0x310 [ 1200.764176] [<ffffffff8115f1b4>] ? vma_merge+0xf4/0x330 [ 1200.764177] [<ffffffff8115f7ef>] ? vma_set_page_prot+0x3f/0x60 [ 1200.764178] [<ffffffff816fcc31>] rt_mutex_lock+0x31/0x40 [ 1200.764179] [<ffffffff816fdfce>] _mutex_lock+0xe/0x10 [ 1200.764180] [<ffffffff810de2e2>] proc_cgroup_show+0x52/0x200 [ 1200.764180] [<ffffffff81200941>] proc_single_show+0x51/0xa0 [ 1200.764182] [<ffffffff811bcdfa>] seq_read+0xea/0x370 [ 1200.764182] [<ffffffff811980dc>] vfs_read+0x9c/0x180 [ 1200.764183] [<ffffffff81198c59>] SyS_read+0x49/0xb0 [ 1200.764184] [<ffffffff81197d91>] ? SyS_lseek+0x91/0xb0 [ 1200.764185] [<ffffffff816fe676>] system_call_fastpath+0x16/0x1b Thanks. Christoph ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Hung task for proc_cgroup_show 2015-07-14 14:40 Hung task for proc_cgroup_show Christoph Mathys @ 2015-07-14 14:45 ` Christoph Mathys 2015-07-14 15:00 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 12+ messages in thread From: Christoph Mathys @ 2015-07-14 14:45 UTC (permalink / raw) To: Linux RT Users Sorry, I've just seen that I might have left out an important bit, so here another, more complete message: [ 1080.708465] INFO: task kswork:69 blocked for more than 120 seconds. [ 1080.708471] Tainted: G E 3.18.17-realtime-1-rt14 #2 [ 1080.708473] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1080.708475] kswork D ffff88040e862680 0 69 2 0x00000000 [ 1080.708482] ffff880407543cb8 0000000000000046 ffff880407543fd8 ffff880407543fd8 [ 1080.708484] ffff880407543fd8 0000000000062680 ffffffff81c1a480 ffff88040749cd40 [ 1080.708486] ffff880407543ca8 ffff88040749cd40 ffff88040749cd40 0000000000000000 [ 1080.708489] Call Trace: [ 1080.708497] [<ffffffff816face4>] schedule+0x34/0xa0 [ 1080.708500] [<ffffffff816fc2d6>] __rt_mutex_slowlock+0xe6/0x180 [ 1080.708504] [<ffffffff816fc99a>] rt_mutex_slowlock+0x12a/0x310 [ 1080.708507] [<ffffffff816fcc31>] rt_mutex_lock+0x31/0x40 [ 1080.708510] [<ffffffff816fdfce>] _mutex_lock+0xe/0x10 [ 1080.708514] [<ffffffff810d8d3f>] css_release_work_fn+0x2f/0xd0 [ 1080.708518] [<ffffffff8108e59a>] swork_kthread+0xfa/0x150 [ 1080.708520] [<ffffffff8108e4a0>] ? swork_readable+0x40/0x40 [ 1080.708524] [<ffffffff8106c8fb>] kthread+0xbb/0xe0 [ 1080.708527] [<ffffffff8106c840>] ? kthread_worker_fn+0x190/0x190 [ 1080.708530] [<ffffffff816fe5c8>] ret_from_fork+0x58/0x90 [ 1080.708533] [<ffffffff8106c840>] ? kthread_worker_fn+0x190/0x190 [ 1080.708591] INFO: task lxc-ls:15615 blocked for more than 120 seconds. [ 1080.708595] Tainted: G E 3.18.17-realtime-1-rt14 #2 [ 1080.708595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1080.708596] lxc-ls D ffff88040ebe2680 0 15615 15614 0x00000004 [ 1080.708598] ffff88009219bc88 0000000000000086 ffff88009219bfd8 ffff88009219bfd8 [ 1080.708599] ffff88009219bfd8 0000000000062680 ffff8804078c3380 ffff880035f499c0 [ 1080.708600] ffff88009219bdd8 ffff880035f499c0 ffff880035f499c0 0000000000000000 [ 1080.708601] Call Trace: [ 1080.708602] [<ffffffff816face4>] schedule+0x34/0xa0 [ 1080.708603] [<ffffffff816fc2d6>] __rt_mutex_slowlock+0xe6/0x180 [ 1080.708604] [<ffffffff816fc99a>] rt_mutex_slowlock+0x12a/0x310 [ 1080.708607] [<ffffffff8115f7ef>] ? vma_set_page_prot+0x3f/0x60 [ 1080.708608] [<ffffffff816fcc31>] rt_mutex_lock+0x31/0x40 [ 1080.708609] [<ffffffff816fdfce>] _mutex_lock+0xe/0x10 [ 1080.708610] [<ffffffff810de2e2>] proc_cgroup_show+0x52/0x200 [ 1080.708612] [<ffffffff81200941>] proc_single_show+0x51/0xa0 [ 1080.708615] [<ffffffff811bcdfa>] seq_read+0xea/0x370 [ 1080.708616] [<ffffffff811980dc>] vfs_read+0x9c/0x180 [ 1080.708617] [<ffffffff81198c59>] SyS_read+0x49/0xb0 [ 1080.708617] [<ffffffff81197d91>] ? SyS_lseek+0x91/0xb0 [ 1080.708619] [<ffffffff816fe676>] system_call_fastpath+0x16/0x1b On Tue, Jul 14, 2015 at 4:40 PM, Christoph Mathys <eraserix@gmail.com> wrote: > Hi there! > > I just tried out lxc (Linux Containers) with 3.18.17-rt14. After some > time (~20min) the lxc-commands stop working. I got the following trace > from dmesg. Any ideas whats causing it and how to fix it besides a > reboot? I used the same version of lxc with 3.12-rt with no (I think) > rt-specific problems. > > [ 1200.764167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 1200.764168] lxc-destroy D ffff88040eae2680 0 15783 15775 0x00000004 > [ 1200.764169] ffff880202697c88 0000000000000086 ffff880202697fd8 > ffff880202697fd8 > [ 1200.764170] ffff880202697fd8 0000000000062680 ffff8804078c0000 > ffff88028db18000 > [ 1200.764171] ffff880202697dd8 ffff88028db18000 ffff88028db18000 > 0000000000000000 > [ 1200.764172] Call Trace: > [ 1200.764173] [<ffffffff816face4>] schedule+0x34/0xa0 > [ 1200.764174] [<ffffffff816fc2d6>] __rt_mutex_slowlock+0xe6/0x180 > [ 1200.764175] [<ffffffff816fc99a>] rt_mutex_slowlock+0x12a/0x310 > [ 1200.764176] [<ffffffff8115f1b4>] ? vma_merge+0xf4/0x330 > [ 1200.764177] [<ffffffff8115f7ef>] ? vma_set_page_prot+0x3f/0x60 > [ 1200.764178] [<ffffffff816fcc31>] rt_mutex_lock+0x31/0x40 > [ 1200.764179] [<ffffffff816fdfce>] _mutex_lock+0xe/0x10 > [ 1200.764180] [<ffffffff810de2e2>] proc_cgroup_show+0x52/0x200 > [ 1200.764180] [<ffffffff81200941>] proc_single_show+0x51/0xa0 > [ 1200.764182] [<ffffffff811bcdfa>] seq_read+0xea/0x370 > [ 1200.764182] [<ffffffff811980dc>] vfs_read+0x9c/0x180 > [ 1200.764183] [<ffffffff81198c59>] SyS_read+0x49/0xb0 > [ 1200.764184] [<ffffffff81197d91>] ? SyS_lseek+0x91/0xb0 > [ 1200.764185] [<ffffffff816fe676>] system_call_fastpath+0x16/0x1b > > Thanks. > Christoph ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Hung task for proc_cgroup_show 2015-07-14 14:40 Hung task for proc_cgroup_show Christoph Mathys 2015-07-14 14:45 ` Christoph Mathys @ 2015-07-14 15:00 ` Sebastian Andrzej Siewior 2015-08-03 6:19 ` Christoph Mathys 1 sibling, 1 reply; 12+ messages in thread From: Sebastian Andrzej Siewior @ 2015-07-14 15:00 UTC (permalink / raw) To: Christoph Mathys; +Cc: Linux RT Users * Christoph Mathys | 2015-07-14 16:40:16 [+0200]: >Hi there! Hi, >I just tried out lxc (Linux Containers) with 3.18.17-rt14. After some >time (~20min) the lxc-commands stop working. I got the following trace >from dmesg. Any ideas whats causing it and how to fix it besides a >reboot? I used the same version of lxc with 3.12-rt with no (I think) >rt-specific problems. > >[ 1200.764167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" … >[ 1200.764172] Call Trace: >[ 1200.764173] [<ffffffff816face4>] schedule+0x34/0xa0 >[ 1200.764174] [<ffffffff816fc2d6>] __rt_mutex_slowlock+0xe6/0x180 >[ 1200.764175] [<ffffffff816fc99a>] rt_mutex_slowlock+0x12a/0x310 >[ 1200.764176] [<ffffffff8115f1b4>] ? vma_merge+0xf4/0x330 >[ 1200.764177] [<ffffffff8115f7ef>] ? vma_set_page_prot+0x3f/0x60 >[ 1200.764178] [<ffffffff816fcc31>] rt_mutex_lock+0x31/0x40 >[ 1200.764179] [<ffffffff816fdfce>] _mutex_lock+0xe/0x10 >[ 1200.764180] [<ffffffff810de2e2>] proc_cgroup_show+0x52/0x200 >[ 1200.764180] [<ffffffff81200941>] proc_single_show+0x51/0xa0 >[ 1200.764182] [<ffffffff811bcdfa>] seq_read+0xea/0x370 >[ 1200.764182] [<ffffffff811980dc>] vfs_read+0x9c/0x180 >[ 1200.764183] [<ffffffff81198c59>] SyS_read+0x49/0xb0 >[ 1200.764184] [<ffffffff81197d91>] ? SyS_lseek+0x91/0xb0 >[ 1200.764185] [<ffffffff816fe676>] system_call_fastpath+0x16/0x1b This seems to be always cgroup_mutex mutex it blocks on. SysRq d (show all locks) should be able to show you who has the lock. >Thanks. >Christoph Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Hung task for proc_cgroup_show 2015-07-14 15:00 ` Sebastian Andrzej Siewior @ 2015-08-03 6:19 ` Christoph Mathys 2015-12-11 16:14 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 12+ messages in thread From: Christoph Mathys @ 2015-08-03 6:19 UTC (permalink / raw) To: Linux RT Users > This seems to be always cgroup_mutex mutex it blocks on. SysRq d (show > all locks) should be able to show you who has the lock. I could reproduce the lockup with cgroup stuff again, this time I with information about locks. [14406.842236] INFO: task kswork:69 blocked for more than 120 seconds. [14406.842240] Tainted: G E 3.18.17-realtime-2-rt14 #3 [14406.842241] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [14406.842243] kswork D ffff8803fbf1c500 0 69 2 0x00000000 [14406.842247] ffff8804015e3c18 0000000000000092 00000000015e3c18 ffff8804015e3fd8 [14406.842248] ffff8804015e3fd8 ffff8804015e3fd8 ffff8804015e3fd8 ffff8804015e3ce0 [14406.842249] ffff880401928000 ffff8804015d8000 ffff8804015e3c38 ffff8804015d8000 [14406.842250] Call Trace: [14406.842256] [<ffffffff81732694>] schedule+0x34/0xa0 [14406.842257] [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0 [14406.842258] [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320 [14406.842259] [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20 [14406.842260] [<ffffffff81735f19>] _mutex_lock+0x39/0x40 [14406.842263] [<ffffffff810e3d4f>] ? css_release_work_fn+0x2f/0xd0 [14406.842265] [<ffffffff810e3d4f>] css_release_work_fn+0x2f/0xd0 [14406.842268] [<ffffffff81092a7a>] swork_kthread+0xfa/0x150 [14406.842269] [<ffffffff81092980>] ? swork_readable+0x40/0x40 [14406.842272] [<ffffffff81070036>] kthread+0xd6/0xf0 [14406.842273] [<ffffffff8107581f>] ? finish_task_switch+0x3f/0x140 [14406.842275] [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220 [14406.842276] [<ffffffff81736888>] ret_from_fork+0x58/0x90 [14406.842278] [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220 [14406.842279] 1 lock held by kswork/69: [14406.842283] #0: (cgroup_mutex){......}, at: [<ffffffff810e3d4f>] css_release_work_fn+0x2f/0xd0 [14406.842324] INFO: task kworker/3:0:10502 blocked for more than 120 seconds. [14406.842326] Tainted: G E 3.18.17-realtime-2-rt14 #3 [14406.842326] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [14406.842327] kworker/3:0 D ffff8803a9426d40 0 10502 2 0x00000000 [14406.842331] Workqueue: cgroup_destroy css_killed_work_fn [14406.842333] ffff8803a947fb98 0000000000000096 00000000a947fb98 ffff8803a947ffd8 [14406.842334] ffff8803a947ffd8 ffff8803a947ffd8 ffff8803a947ffd8 ffffffff81d25470 [14406.842334] ffff8803dace0000 ffff8800995f2290 ffff8803a947fbb8 ffff8800995f2290 [14406.842336] Call Trace: [14406.842338] [<ffffffff81732694>] schedule+0x34/0xa0 [14406.842338] [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0 [14406.842339] [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320 [14406.842340] [<ffffffff8173443e>] ? rt_spin_lock_slowlock+0x5e/0x2c0 [14406.842341] [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20 [14406.842342] [<ffffffff81735f19>] _mutex_lock+0x39/0x40 [14406.842343] [<ffffffff810e549f>] ? css_killed_work_fn+0x1f/0x170 [14406.842344] [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170 [14406.842346] [<ffffffff8106a92a>] process_one_work+0x1fa/0x5b0 [14406.842347] [<ffffffff8106a88d>] ? process_one_work+0x15d/0x5b0 [14406.842348] [<ffffffff8106ae4b>] worker_thread+0x16b/0x4c0 [14406.842349] [<ffffffff8106ace0>] ? process_one_work+0x5b0/0x5b0 [14406.842350] [<ffffffff81070036>] kthread+0xd6/0xf0 [14406.842351] [<ffffffff8107581f>] ? finish_task_switch+0x3f/0x140 [14406.842353] [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220 [14406.842354] [<ffffffff81736888>] ret_from_fork+0x58/0x90 [14406.842355] [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220 [14406.842356] 3 locks held by kworker/3:0/10502: [14406.842361] #0: ("cgroup_destroy"){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 [14406.842364] #1: ((&css->destroy_work)){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 [14406.842367] #2: (cgroup_mutex){......}, at: [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170 [14406.842375] INFO: task lxc-start:21854 blocked for more than 120 seconds. [14406.842376] Tainted: G E 3.18.17-realtime-2-rt14 #3 [14406.842377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [14406.842377] lxc-start D ffff8803a9422840 0 21854 21848 0x00000000 [14406.842380] ffff880401b33c48 0000000000000096 0000000001b33c48 ffff880401b33fd8 [14406.842380] ffff880401b33fd8 ffff880401b33fd8 ffff880401b33fd8 ffff880401b33d10 [14406.842381] ffff880401940000 ffff8803a7f38000 ffff880401b33c68 ffff8803a7f38000 [14406.842382] Call Trace: [14406.842384] [<ffffffff81732694>] schedule+0x34/0xa0 [14406.842385] [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0 [14406.842386] [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320 [14406.842387] [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20 [14406.842387] [<ffffffff81735f19>] _mutex_lock+0x39/0x40 [14406.842389] [<ffffffff810ea122>] ? proc_cgroup_show+0x52/0x200 [14406.842390] [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 [14406.842393] [<ffffffff81215b64>] proc_single_show+0x54/0xa0 [14406.842395] [<ffffffff811d082d>] seq_read+0xed/0x380 [14406.842397] [<ffffffff811aa32f>] vfs_read+0x9f/0x180 [14406.842398] [<ffffffff811aaea9>] SyS_read+0x49/0xb0 [14406.842399] [<ffffffff81736936>] system_call_fastpath+0x16/0x1b [14406.842399] 2 locks held by lxc-start/21854: [14406.842403] #0: (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380 [14406.842406] #1: (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 [14406.842409] INFO: task lxc-ls:21856 blocked for more than 120 seconds. [14406.842410] Tainted: G E 3.18.17-realtime-2-rt14 #3 [14406.842411] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [14406.842411] lxc-ls D ffff8803a9565640 0 21856 21855 0x00000000 [14406.842414] ffff8803b1103c48 0000000000000096 00000000b1103c48 ffff8803b1103fd8 [14406.842415] ffff8803b1103fd8 ffff8803b1103fd8 ffff8803b1103fd8 ffff8803b1103d10 [14406.842415] ffff880401942290 ffff8803ad9e4520 ffff8803b1103c68 ffff8803ad9e4520 [14406.842417] Call Trace: [14406.842418] [<ffffffff81732694>] schedule+0x34/0xa0 [14406.842419] [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0 [14406.842420] [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320 [14406.842421] [<ffffffff810ea103>] ? proc_cgroup_show+0x33/0x200 [14406.842422] [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20 [14406.842423] [<ffffffff81735f19>] _mutex_lock+0x39/0x40 [14406.842424] [<ffffffff810ea122>] ? proc_cgroup_show+0x52/0x200 [14406.842425] [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 [14406.842426] [<ffffffff81215b64>] proc_single_show+0x54/0xa0 [14406.842427] [<ffffffff811d082d>] seq_read+0xed/0x380 [14406.842428] [<ffffffff811aa32f>] vfs_read+0x9f/0x180 [14406.842429] [<ffffffff811aaea9>] SyS_read+0x49/0xb0 [14406.842430] [<ffffffff81736936>] system_call_fastpath+0x16/0x1b [14406.842430] 2 locks held by lxc-ls/21856: [14406.842435] #0: (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380 [14406.842437] #1: (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 [14526.897563] INFO: task kswork:69 blocked for more than 120 seconds. [14526.897570] Tainted: G E 3.18.17-realtime-2-rt14 #3 [14526.897572] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [14526.897575] kswork D ffff8803fbf1c500 0 69 2 0x00000000 [14526.897583] ffff8804015e3c18 0000000000000092 00000000015e3c18 ffff8804015e3fd8 [14526.897585] ffff8804015e3fd8 ffff8804015e3fd8 ffff8804015e3fd8 ffff8804015e3ce0 [14526.897586] ffff880401928000 ffff8804015d8000 ffff8804015e3c38 ffff8804015d8000 [14526.897590] Call Trace: [14526.897611] [<ffffffff81732694>] schedule+0x34/0xa0 [14526.897613] [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0 [14526.897614] [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320 [14526.897616] [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20 [14526.897617] [<ffffffff81735f19>] _mutex_lock+0x39/0x40 [14526.897621] [<ffffffff810e3d4f>] ? css_release_work_fn+0x2f/0xd0 [14526.897622] [<ffffffff810e3d4f>] css_release_work_fn+0x2f/0xd0 [14526.897624] [<ffffffff81092a7a>] swork_kthread+0xfa/0x150 [14526.897625] [<ffffffff81092980>] ? swork_readable+0x40/0x40 [14526.897628] [<ffffffff81070036>] kthread+0xd6/0xf0 [14526.897629] [<ffffffff8107581f>] ? finish_task_switch+0x3f/0x140 [14526.897631] [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220 [14526.897632] [<ffffffff81736888>] ret_from_fork+0x58/0x90 [14526.897633] [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220 [14526.897634] 1 lock held by kswork/69: [14526.897639] #0: (cgroup_mutex){......}, at: [<ffffffff810e3d4f>] css_release_work_fn+0x2f/0xd0 [14526.897680] INFO: task kworker/3:0:10502 blocked for more than 120 seconds. [14526.897682] Tainted: G E 3.18.17-realtime-2-rt14 #3 [14526.897682] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [14526.897683] kworker/3:0 D ffff8803a9426d40 0 10502 2 0x00000000 [14526.897687] Workqueue: cgroup_destroy css_killed_work_fn [14526.897689] ffff8803a947fb98 0000000000000096 00000000a947fb98 ffff8803a947ffd8 [14526.897690] ffff8803a947ffd8 ffff8803a947ffd8 ffff8803a947ffd8 ffffffff81d25470 [14526.897690] ffff8803dace0000 ffff8800995f2290 ffff8803a947fbb8 ffff8800995f2290 [14526.897692] Call Trace: [14526.897694] [<ffffffff81732694>] schedule+0x34/0xa0 [14526.897694] [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0 [14526.897695] [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320 [14526.897696] [<ffffffff8173443e>] ? rt_spin_lock_slowlock+0x5e/0x2c0 [14526.897697] [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20 [14526.897698] [<ffffffff81735f19>] _mutex_lock+0x39/0x40 [14526.897700] [<ffffffff810e549f>] ? css_killed_work_fn+0x1f/0x170 [14526.897701] [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170 [14526.897702] [<ffffffff8106a92a>] process_one_work+0x1fa/0x5b0 [14526.897703] [<ffffffff8106a88d>] ? process_one_work+0x15d/0x5b0 [14526.897705] [<ffffffff8106ae4b>] worker_thread+0x16b/0x4c0 [14526.897706] [<ffffffff8106ace0>] ? process_one_work+0x5b0/0x5b0 [14526.897707] [<ffffffff81070036>] kthread+0xd6/0xf0 [14526.897708] [<ffffffff8107581f>] ? finish_task_switch+0x3f/0x140 [14526.897710] [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220 [14526.897711] [<ffffffff81736888>] ret_from_fork+0x58/0x90 [14526.897712] [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220 [14526.897713] 3 locks held by kworker/3:0/10502: [14526.897719] #0: ("cgroup_destroy"){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 [14526.897722] #1: ((&css->destroy_work)){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 [14526.897724] #2: (cgroup_mutex){......}, at: [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170 [14526.897732] INFO: task lxc-start:21854 blocked for more than 120 seconds. [14526.897733] Tainted: G E 3.18.17-realtime-2-rt14 #3 [14526.897734] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [14526.897735] lxc-start D ffff8803a9422840 0 21854 21848 0x00000000 [14526.897737] ffff880401b33c48 0000000000000096 0000000001b33c48 ffff880401b33fd8 [14526.897738] ffff880401b33fd8 ffff880401b33fd8 ffff880401b33fd8 ffff880401b33d10 [14526.897739] ffff880401940000 ffff8803a7f38000 ffff880401b33c68 ffff8803a7f38000 [14526.897740] Call Trace: [14526.897742] [<ffffffff81732694>] schedule+0x34/0xa0 [14526.897742] [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0 [14526.897743] [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320 [14526.897744] [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20 [14526.897745] [<ffffffff81735f19>] _mutex_lock+0x39/0x40 [14526.897747] [<ffffffff810ea122>] ? proc_cgroup_show+0x52/0x200 [14526.897748] [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 [14526.897751] [<ffffffff81215b64>] proc_single_show+0x54/0xa0 [14526.897753] [<ffffffff811d082d>] seq_read+0xed/0x380 [14526.897754] [<ffffffff811aa32f>] vfs_read+0x9f/0x180 [14526.897755] [<ffffffff811aaea9>] SyS_read+0x49/0xb0 [14526.897756] [<ffffffff81736936>] system_call_fastpath+0x16/0x1b [14526.897757] 2 locks held by lxc-start/21854: [14526.897761] #0: (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380 [14526.897764] #1: (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 [14526.897767] INFO: task lxc-ls:21856 blocked for more than 120 seconds. [14526.897768] Tainted: G E 3.18.17-realtime-2-rt14 #3 [14526.897769] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [14526.897770] lxc-ls D ffff8803a9565640 0 21856 21855 0x00000000 [14526.897772] ffff8803b1103c48 0000000000000096 00000000b1103c48 ffff8803b1103fd8 [14526.897773] ffff8803b1103fd8 ffff8803b1103fd8 ffff8803b1103fd8 ffff8803b1103d10 [14526.897773] ffff880401942290 ffff8803ad9e4520 ffff8803b1103c68 ffff8803ad9e4520 [14526.897775] Call Trace: [14526.897777] [<ffffffff81732694>] schedule+0x34/0xa0 [14526.897777] [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0 [14526.897778] [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320 [14526.897779] [<ffffffff810ea103>] ? proc_cgroup_show+0x33/0x200 [14526.897780] [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20 [14526.897781] [<ffffffff81735f19>] _mutex_lock+0x39/0x40 [14526.897782] [<ffffffff810ea122>] ? proc_cgroup_show+0x52/0x200 [14526.897783] [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 [14526.897785] [<ffffffff81215b64>] proc_single_show+0x54/0xa0 [14526.897786] [<ffffffff811d082d>] seq_read+0xed/0x380 [14526.897787] [<ffffffff811aa32f>] vfs_read+0x9f/0x180 [14526.897788] [<ffffffff811aaea9>] SyS_read+0x49/0xb0 [14526.897789] [<ffffffff81736936>] system_call_fastpath+0x16/0x1b [14526.897789] 2 locks held by lxc-ls/21856: [14526.897793] #0: (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380 [14526.897796] #1: (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 [14646.952905] INFO: task kswork:69 blocked for more than 120 seconds. [14646.952912] Tainted: G E 3.18.17-realtime-2-rt14 #3 [14646.952914] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [14646.952917] kswork D ffff8803fbf1c500 0 69 2 0x00000000 [14646.952925] ffff8804015e3c18 0000000000000092 00000000015e3c18 ffff8804015e3fd8 [14646.952926] ffff8804015e3fd8 ffff8804015e3fd8 ffff8804015e3fd8 ffff8804015e3ce0 [14646.952928] ffff880401928000 ffff8804015d8000 ffff8804015e3c38 ffff8804015d8000 [14646.952932] Call Trace: [14646.952941] [<ffffffff81732694>] schedule+0x34/0xa0 [14646.952944] [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0 [14646.952946] [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320 [14646.952948] [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20 [14646.952951] [<ffffffff81735f19>] _mutex_lock+0x39/0x40 [14646.952956] [<ffffffff810e3d4f>] ? css_release_work_fn+0x2f/0xd0 [14646.952960] [<ffffffff810e3d4f>] css_release_work_fn+0x2f/0xd0 [14646.952965] [<ffffffff81092a7a>] swork_kthread+0xfa/0x150 [14646.952968] [<ffffffff81092980>] ? swork_readable+0x40/0x40 [14646.952972] [<ffffffff81070036>] kthread+0xd6/0xf0 [14646.952976] [<ffffffff8107581f>] ? finish_task_switch+0x3f/0x140 [14646.952979] [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220 [14646.952982] [<ffffffff81736888>] ret_from_fork+0x58/0x90 [14646.952985] [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220 [14646.952987] 1 lock held by kswork/69: [14646.952997] #0: (cgroup_mutex){......}, at: [<ffffffff810e3d4f>] css_release_work_fn+0x2f/0xd0 [14646.953052] INFO: task kworker/3:0:10502 blocked for more than 120 seconds. [14646.953054] Tainted: G E 3.18.17-realtime-2-rt14 #3 [14646.953054] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [14646.953055] kworker/3:0 D ffff8803a9426d40 0 10502 2 0x00000000 [14646.953059] Workqueue: cgroup_destroy css_killed_work_fn [14646.953060] ffff8803a947fb98 0000000000000096 00000000a947fb98 ffff8803a947ffd8 [14646.953061] ffff8803a947ffd8 ffff8803a947ffd8 ffff8803a947ffd8 ffffffff81d25470 [14646.953062] ffff8803dace0000 ffff8800995f2290 ffff8803a947fbb8 ffff8800995f2290 [14646.953063] Call Trace: [14646.953065] [<ffffffff81732694>] schedule+0x34/0xa0 [14646.953066] [<ffffffff81733dd5>] __rt_mutex_slowlock+0x55/0x1c0 [14646.953067] [<ffffffff81734045>] rt_mutex_slowlock+0x105/0x320 [14646.953067] [<ffffffff8173443e>] ? rt_spin_lock_slowlock+0x5e/0x2c0 [14646.953068] [<ffffffff8173427a>] rt_mutex_lock+0x1a/0x20 [14646.953069] [<ffffffff81735f19>] _mutex_lock+0x39/0x40 [14646.953070] [<ffffffff810e549f>] ? css_killed_work_fn+0x1f/0x170 [14646.953071] [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170 [14646.953073] [<ffffffff8106a92a>] process_one_work+0x1fa/0x5b0 [14646.953074] [<ffffffff8106a88d>] ? process_one_work+0x15d/0x5b0 [14646.953075] [<ffffffff8106ae4b>] worker_thread+0x16b/0x4c0 [14646.953076] [<ffffffff8106ace0>] ? process_one_work+0x5b0/0x5b0 [14646.953077] [<ffffffff81070036>] kthread+0xd6/0xf0 [14646.953078] [<ffffffff8107581f>] ? finish_task_switch+0x3f/0x140 [14646.953080] [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220 [14646.953080] [<ffffffff81736888>] ret_from_fork+0x58/0x90 [14646.953082] [<ffffffff8106ff60>] ? kthread_create_on_node+0x220/0x220 [14646.953083] 3 locks held by kworker/3:0/10502: [14646.953088] #0: ("cgroup_destroy"){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 [14646.953091] #1: ((&css->destroy_work)){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 [14646.953094] #2: (cgroup_mutex){......}, at: [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170 [16929.175280] SysRq : Show Locks Held [16929.175286] [16929.175286] Showing all locks held in the system: [16929.175301] 1 lock held by kswork/69: [16929.175302] #0: (cgroup_mutex){......}, at: [<ffffffff810e3d4f>] css_release_work_fn+0x2f/0xd0 [16929.175316] 2 locks held by systemd-logind/583: [16929.175317] #0: (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380 [16929.175321] #1: (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 [16929.175327] 1 lock held by in:imklog/620: [16929.175327] #0: (&f->f_pos_lock){......}, at: [<ffffffff811ca69a>] __fdget_pos+0x4a/0x50 [16929.175334] 3 locks held by polkitd/913: [16929.175334] #0: (&f->f_pos_lock){......}, at: [<ffffffff811ca69a>] __fdget_pos+0x4a/0x50 [16929.175337] #1: (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380 [16929.175340] #2: (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 [16929.175346] 2 locks held by getty/1095: [16929.175347] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175351] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175355] 2 locks held by getty/1098: [16929.175356] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175358] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175362] 2 locks held by getty/1106: [16929.175363] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175365] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175368] 2 locks held by getty/1107: [16929.175369] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175371] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175375] 2 locks held by getty/1110: [16929.175375] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175378] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175384] 2 locks held by getty/1749: [16929.175385] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175387] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175415] 1 lock held by rsyslogd/3299: [16929.175415] #0: (&f->f_pos_lock){......}, at: [<ffffffff811ca69a>] __fdget_pos+0x4a/0x50 [16929.175420] 2 locks held by getty/3445: [16929.175421] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175423] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175427] 2 locks held by getty/3452: [16929.175427] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175429] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175433] 2 locks held by getty/3455: [16929.175433] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175436] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175440] 2 locks held by getty/3513: [16929.175441] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175443] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175446] 2 locks held by getty/3515: [16929.175446] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175449] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175457] 2 locks held by zsh/14000: [16929.175457] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175460] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175464] 2 locks held by zsh/7684: [16929.175464] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175466] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175470] 3 locks held by kworker/3:0/10502: [16929.175470] #0: ("cgroup_destroy"){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 [16929.175476] #1: ((&css->destroy_work)){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 [16929.175478] #2: (cgroup_mutex){......}, at: [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170 [16929.175484] 1 lock held by rsyslogd/16108: [16929.175485] #0: (&f->f_pos_lock){......}, at: [<ffffffff811ca69a>] __fdget_pos+0x4a/0x50 [16929.175489] 2 locks held by getty/16596: [16929.175490] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175492] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175496] 2 locks held by getty/16601: [16929.175497] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175499] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175502] 2 locks held by getty/16603: [16929.175503] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175505] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175509] 2 locks held by getty/16621: [16929.175509] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175511] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175514] 2 locks held by getty/16627: [16929.175515] #0: (&tty->ldisc_sem){......}, at: [<ffffffff81451334>] tty_ldisc_ref_wait+0x24/0x60 [16929.175517] #1: (&ldata->atomic_read_lock){......}, at: [<ffffffff8144cbce>] n_tty_read+0xae/0xb90 [16929.175521] 3 locks held by kworker/2:3/19520: [16929.175522] #0: ("cgroup_destroy"){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 [16929.175524] #1: ((&css->destroy_work)){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 [16929.175527] #2: (cgroup_mutex){......}, at: [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170 [16929.175531] 2 locks held by lxc-start/21854: [16929.175531] #0: (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380 [16929.175534] #1: (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 [16929.175538] 2 locks held by lxc-ls/21856: [16929.175539] #0: (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380 [16929.175545] #1: (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 [16929.175554] 3 locks held by bash/10002: [16929.175555] #0: (sb_writers#6){......}, at: [<ffffffff811aa5c3>] vfs_write+0x1b3/0x1f0 [16929.175559] #1: (rcu_read_lock){......}, at: [<ffffffff81455f45>] __handle_sysrq+0x5/0x1b0 [16929.175563] #2: (tasklist_lock){......}, at: [<ffffffff8109a923>] debug_show_all_locks+0x43/0x1e0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Hung task for proc_cgroup_show 2015-08-03 6:19 ` Christoph Mathys @ 2015-12-11 16:14 ` Sebastian Andrzej Siewior 2015-12-21 8:58 ` Christoph Mathys 0 siblings, 1 reply; 12+ messages in thread From: Sebastian Andrzej Siewior @ 2015-12-11 16:14 UTC (permalink / raw) To: Christoph Mathys; +Cc: Linux RT Users * Christoph Mathys | 2015-08-03 08:19:59 [+0200]: >I could reproduce the lockup with cgroup stuff again, this time I with >information about locks. According to the info you sent, we have: |Showing all locks held in the system: |1 lock held by kswork/69: | #0: (cgroup_mutex){......}, at: [<ffffffff810e3d4f>] css_release_work_fn+0x2f/0xd0 |2 locks held by systemd-logind/583: | #0: (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380 | #1: (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 |3 locks held by polkitd/913: | #0: (&f->f_pos_lock){......}, at: [<ffffffff811ca69a>] __fdget_pos+0x4a/0x50 | #1: (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380 | #2: (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 |3 locks held by kworker/3:0/10502: | #0: ("cgroup_destroy"){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 | #1: ((&css->destroy_work)){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 | #2: (cgroup_mutex){......}, at: [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170 |3 locks held by kworker/2:3/19520: | #0: ("cgroup_destroy"){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 | #1: ((&css->destroy_work)){......}, at: [<ffffffff8106a88d>] process_one_work+0x15d/0x5b0 | #2: (cgroup_mutex){......}, at: [<ffffffff810e549f>] css_killed_work_fn+0x1f/0x170 |2 locks held by lxc-start/21854: | #0: (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380 | #1: (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 |2 locks held by lxc-ls/21856: | #0: (&p->lock){......}, at: [<ffffffff811d077b>] seq_read+0x3b/0x380 | #1: (cgroup_mutex){......}, at: [<ffffffff810ea122>] proc_cgroup_show+0x52/0x200 One of them owns the lock, the others are blocked on it. We see cgroup_mutex multiple times because we first tell lockdep that we own it and then we block on it. I think none of proc_cgroup_show() functions really own the mutex because I would expect to see css_set_rwsem as well in the list of owned locks which I don't. This means css_release_work_fn() or css_killed_work_fn(). Can you issue a task dump which says what the tasks are doing? What is it that you are doing exactly? Sebastian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Hung task for proc_cgroup_show 2015-12-11 16:14 ` Sebastian Andrzej Siewior @ 2015-12-21 8:58 ` Christoph Mathys 2015-12-21 9:49 ` Christoph Mathys 2015-12-21 18:11 ` Sebastian Andrzej Siewior 0 siblings, 2 replies; 12+ messages in thread From: Christoph Mathys @ 2015-12-21 8:58 UTC (permalink / raw) To: Sebastian Andrzej Siewior; +Cc: Linux RT Users On Fri, Dec 11, 2015 at 5:14 PM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > This means css_release_work_fn() or css_killed_work_fn(). Can you issue > a task dump which says what the tasks are doing? Here is another of those lockups, this time with 3.18.24-rt22. I took the liberty to remove everything not related to cgroups from the logs. Seeing the memory control group inside one of the call stacks I tried disabling this cgroup. I did not see the lockup again, but two system freezes instead. [ 1951.780283] SysRq : Show Locks Held [ 1951.780288] [ 1951.780288] Showing all locks held in the system: [ 1951.780294] 1 lock held by kswork/69: [ 1951.780295] #0: (cgroup_mutex){+.+.+.}, at: [<ffffffff810ea7bf>] css_release_work_fn+0x2f/0xe0 [ 1951.780307] 4 locks held by cgmanager/349: [ 1951.780308] #0: (sb_writers#9){.+.+.+}, at: [<ffffffff811da1c4>] mnt_want_write+0x24/0x50 [ 1951.780313] #1: (&type->i_mutex_dir_key#3/1){+.+.+.}, at: [<ffffffff811c7e7c>] do_rmdir+0x15c/0x220 [ 1951.780317] #2: (&type->i_mutex_dir_key#3){+.+.+.}, at: [<ffffffff811c2017>] vfs_rmdir+0x57/0x150 [ 1951.780322] #3: (cgroup_mutex){+.+.+.}, at: [<ffffffff810ebd40>] cgroup_kn_lock_live+0x50/0x180 [ 1951.780399] 3 locks held by kworker/2:2/3642: [ 1951.780399] #0: ("cgroup_destroy"){.+.+.+}, at: [<ffffffff8106b801>] process_one_work+0x171/0x5b0 [ 1951.780403] #1: ((&css->destroy_work)){+.+.+.}, at: [<ffffffff8106b801>] process_one_work+0x171/0x5b0 [ 1951.780406] #2: (cgroup_mutex){+.+.+.}, at: [<ffffffff810ebf2f>] css_killed_work_fn+0x1f/0x170 [ 2769.928061] SysRq : Show State [ 2769.928065] task PC stack pid father [ 2769.944889] kworker/2:2 D ffff8803dccba580 0 3642 2 0x00000000 [ 2769.944900] Workqueue: cgroup_destroy css_killed_work_fn [ 2769.944901] ffff8803cae47a38 0000000000000096 ffff8803cae47a88 ffff8803cae47fd8 [ 2769.944902] ffff8803cae47fd8 ffff8803cae47fd8 ffff8803cae47fd8 000000000000002b [ 2769.944903] ffff8803fe2ecc60 ffff8803dcdacc60 0000000000000005 ffff8803dcdacc60 [ 2769.944904] Call Trace: [ 2769.944906] [<ffffffff8174e044>] schedule+0x34/0xa0 [ 2769.944907] [<ffffffff81750494>] schedule_timeout+0x1f4/0x260 [ 2769.944908] [<ffffffff8109e325>] ? mark_held_locks+0x75/0xa0 [ 2769.944909] [<ffffffff81751720>] ? _raw_spin_unlock_irq+0x30/0x60 [ 2769.944910] [<ffffffff8109e44d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2769.944911] [<ffffffff8174f394>] wait_for_completion+0xb4/0xf0 [ 2769.944912] [<ffffffff810bdde0>] ? __call_rcu.constprop.69+0x240/0x240 [ 2769.944913] [<ffffffff810bad06>] wait_rcu_gp+0x46/0x50 [ 2769.944914] [<ffffffff810baca0>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [ 2769.944916] [<ffffffff8174f316>] ? wait_for_completion+0x36/0xf0 [ 2769.944917] [<ffffffff810be33f>] synchronize_rcu.part.67+0x1f/0x40 [ 2769.944918] [<ffffffff810be378>] synchronize_rcu+0x18/0x20 [ 2769.944919] [<ffffffff811a4c65>] mem_cgroup_start_move+0x15/0x20 [ 2769.944920] [<ffffffff811aa603>] mem_cgroup_reparent_charges+0x73/0x3f0 [ 2769.944921] [<ffffffff811aaafb>] mem_cgroup_css_offline+0x14b/0x3e0 [ 2769.944922] [<ffffffff810ebf5e>] css_killed_work_fn+0x4e/0x170 [ 2769.944924] [<ffffffff8106b89c>] process_one_work+0x20c/0x5b0 [ 2769.944925] [<ffffffff8106b801>] ? process_one_work+0x171/0x5b0 [ 2769.944926] [<ffffffff8106bdab>] worker_thread+0x16b/0x4c0 [ 2769.944927] [<ffffffff8106bc40>] ? process_one_work+0x5b0/0x5b0 [ 2769.944928] [<ffffffff81070fc9>] kthread+0xd9/0xf0 [ 2769.944929] [<ffffffff8109e51d>] ? trace_hardirqs_on+0xd/0x10 [ 2769.944931] [<ffffffff81070ef0>] ? kthread_create_on_node+0x240/0x240 [ 2769.944932] [<ffffffff817528c8>] ret_from_fork+0x58/0x90 [ 2769.944933] [<ffffffff81070ef0>] ? kthread_create_on_node+0x240/0x240 [ 2769.930980] cgmanager D ffff8803f7ce7080 0 349 1 0x00000000 [ 2769.930988] ffff8803f998bc18 0000000000000092 ffff8803f998bbe8 ffff8803f998bfd8 [ 2769.930990] ffff8803f998bfd8 ffff8803f998bfd8 ffff8803f998bfd8 ffff8803f998bce0 [ 2769.930992] ffffffff81c224c0 ffff8803f7c4cc60 ffff8803f998bc38 ffff8803f7c4cc60 [ 2769.930995] Call Trace: [ 2769.930998] [<ffffffff8174e044>] schedule+0x34/0xa0 [ 2769.930999] [<ffffffff8174f865>] __rt_mutex_slowlock+0x55/0x1c0 [ 2769.931001] [<ffffffff8174fad5>] rt_mutex_slowlock+0x105/0x320 [ 2769.931003] [<ffffffff8174fd28>] rt_mutex_lock+0x38/0x40 [ 2769.931004] [<ffffffff81751d5c>] _mutex_lock+0x3c/0x50 [ 2769.931006] [<ffffffff810ebd40>] ? cgroup_kn_lock_live+0x50/0x180 [ 2769.931008] [<ffffffff810ebd40>] cgroup_kn_lock_live+0x50/0x180 [ 2769.931010] [<ffffffff810ebd60>] ? cgroup_kn_lock_live+0x70/0x180 [ 2769.931011] [<ffffffff810efe84>] cgroup_rmdir+0x14/0x40 [ 2769.931013] [<ffffffff8123308e>] kernfs_iop_rmdir+0x4e/0x80 [ 2769.931016] [<ffffffff811c208e>] vfs_rmdir+0xce/0x150 [ 2769.931017] [<ffffffff811c7f1b>] do_rmdir+0x1fb/0x220 [ 2769.931018] [<ffffffff817536d1>] ? retint_swapgs+0xe/0x13 [ 2769.931020] [<ffffffff8109e44d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2769.931021] [<ffffffff8139edac>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 2769.931023] [<ffffffff811c9256>] SyS_rmdir+0x16/0x20 [ 2769.931024] [<ffffffff81752976>] system_call_fastpath+0x16/0x1b [ 2769.929671] kswork D ffff8803fa6df080 0 69 2 0x00000000 [ 2769.929676] ffff8803fda23be8 0000000000000092 ffff8803fda23bb8 ffff8803fda23fd8 [ 2769.929677] ffff8803fda23fd8 ffff8803fda23fd8 ffff8803fda23fd8 ffff8803fda23cb0 [ 2769.929678] ffff8803fe2ea630 ffff8803fda02630 ffff8803fda23c08 ffff8803fda02630 [ 2769.929679] Call Trace: [ 2769.929681] [<ffffffff8174e044>] schedule+0x34/0xa0 [ 2769.929683] [<ffffffff8174f865>] __rt_mutex_slowlock+0x55/0x1c0 [ 2769.929684] [<ffffffff8174fad5>] rt_mutex_slowlock+0x105/0x320 [ 2769.929686] [<ffffffff8174fd28>] rt_mutex_lock+0x38/0x40 [ 2769.929687] [<ffffffff81751d5c>] _mutex_lock+0x3c/0x50 [ 2769.929689] [<ffffffff810ea7bf>] ? css_release_work_fn+0x2f/0xe0 [ 2769.929689] [<ffffffff8109e44d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2769.929691] [<ffffffff810ea7bf>] css_release_work_fn+0x2f/0xe0 [ 2769.929692] [<ffffffff81093eba>] swork_kthread+0xfa/0x150 [ 2769.929693] [<ffffffff81093dc0>] ? swork_readable+0x40/0x40 [ 2769.929694] [<ffffffff81070fc9>] kthread+0xd9/0xf0 [ 2769.929695] [<ffffffff8109e51d>] ? trace_hardirqs_on+0xd/0x10 [ 2769.929697] [<ffffffff81070ef0>] ? kthread_create_on_node+0x240/0x240 [ 2769.929698] [<ffffffff817528c8>] ret_from_fork+0x58/0x90 [ 2769.929700] [<ffffffff81070ef0>] ? kthread_create_on_node+0x240/0x240 > What is it that you are doing exactly? My test script (see below) has two processes that each start and stop a Linux Container (linuxcontainers.org). There are two other processes that try to get information about the state of the containers (running or not, IP). #!/bin/bash set -e set -u cont1=test1 cont2=test2 create_container() { local container=$1 if [ -z "$(sudo lxc-ls | grep $container)" ] ; then sudo lxc-create -n $container -t ubuntu -- --release=precise fi } start_stop_container_forever() { local container=$1 while true ; do sudo lxc-start -n $container -d sleep 1 sudo lxc-stop -n $container done } list_container_forever() { set +e while true ; do sudo lxc-ls -f sleep 1; done } create_container $cont1 create_container $cont2 (start_stop_container_forever $cont1) & (start_stop_container_forever $cont2) & (list_container_forever) & (list_container_forever) & wait The following is just my current idea of how lxc works: When the container is started (lxc-start), cgroups for different resource kinds are created and init of the containerized Linux is started into a new namespaces and attached to the cgroups created before. Init will start more processes that inherit its namespace and cgroups. When the container is stopped (lxc-stop), the init process inside the container is sent a signal to shutdown. After the shutdown, all cgroups are removed. The processes gathering information (lxc-ls) about the containers attach themselves to the namespace of the containers init. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Hung task for proc_cgroup_show 2015-12-21 8:58 ` Christoph Mathys @ 2015-12-21 9:49 ` Christoph Mathys 2015-12-21 17:59 ` Sebastian Andrzej Siewior 2015-12-21 18:11 ` Sebastian Andrzej Siewior 1 sibling, 1 reply; 12+ messages in thread From: Christoph Mathys @ 2015-12-21 9:49 UTC (permalink / raw) To: Sebastian Andrzej Siewior; +Cc: Linux RT Users When testing on another computer I got a BUG from the same general area: [ 57.430698] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 [ 57.430699] in_atomic(): 1, irqs_disabled(): 0, pid: 3329, name: init [ 57.430700] 2 locks held by init/3329: [ 57.430709] #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8103f17e>] __do_page_fault+0x12e/0x5f0 [ 57.430715] #1: (ptlock_ptr(page)#2){+.+...}, at: [<ffffffff8117235f>] do_read_fault.isra.66+0xaf/0x2d0 [ 57.430718] CPU: 1 PID: 3329 Comm: init Tainted: G E 3.18.24-realtime-1-rt22 #1 [ 57.430719] Hardware name: Komax AG, Dierikon Komax-PC/DH61DL, BIOS BEH6110H.86A.0042.2012.0327.2202 03/27/2012 [ 57.430721] 0000000000000003 ffff8800d03c3888 ffffffff8174ad8a 0000000000000005 [ 57.430723] ffff8800d010a630 ffff8800d03c38a8 ffffffff810771a7 ffff8800d010a630 [ 57.430724] ffffffff81c5f220 ffff8800d03c38d8 ffffffff81751b44 ffff8800d03c38d8 [ 57.430725] Call Trace: [ 57.430729] [<ffffffff8174ad8a>] dump_stack+0x4e/0x84 [ 57.430733] [<ffffffff810771a7>] __might_sleep+0xe7/0x120 [ 57.430735] [<ffffffff81751b44>] rt_spin_lock+0x24/0x60 [ 57.430737] [<ffffffff8109e44d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 57.430740] [<ffffffff81093349>] __wake_up+0x29/0x60 [ 57.430742] [<ffffffff810bcd04>] rcu_report_exp_rnp.isra.64+0xb4/0xc0 [ 57.430744] [<ffffffff810c0a9b>] rcu_read_unlock_special+0x1fb/0x250 [ 57.430747] [<ffffffff8174e4ba>] ? preempt_schedule_irq+0x4a/0x80 [ 57.430748] [<ffffffff810c0cd4>] rcu_note_context_switch+0x1e4/0x420 [ 57.430750] [<ffffffff8174d7c6>] __schedule+0x56/0x8a0 [ 57.430752] [<ffffffff8174e4ae>] ? preempt_schedule_irq+0x3e/0x80 [ 57.430754] [<ffffffff8174e4b4>] preempt_schedule_irq+0x44/0x80 [ 57.430757] [<ffffffff811ac815>] ? mem_cgroup_begin_page_stat+0x5/0x110 [ 57.430758] [<ffffffff81753883>] check_int_off+0x11/0x1e [ 57.430760] [<ffffffff810c08a5>] ? rcu_read_unlock_special+0x5/0x250 [ 57.430762] [<ffffffff810bada5>] ? __rcu_read_unlock+0x95/0xa0 [ 57.430764] [<ffffffff81143848>] filemap_map_pages+0x188/0x270 [ 57.430765] [<ffffffff811436c5>] ? filemap_map_pages+0x5/0x270 [ 57.430768] [<ffffffff8117246a>] do_read_fault.isra.66+0x1ba/0x2d0 [ 57.430770] [<ffffffff81173abf>] handle_mm_fault+0x5ef/0xec0 [ 57.430772] [<ffffffff8103f17e>] ? __do_page_fault+0x12e/0x5f0 [ 57.430774] [<ffffffff8103f1e2>] __do_page_fault+0x192/0x5f0 [ 57.430777] [<ffffffff8139edeb>] ? trace_hardirqs_off_thunk+0x3a/0x3f [ 57.430779] [<ffffffff8103f64c>] do_page_fault+0xc/0x10 [ 57.430781] [<ffffffff81754752>] page_fault+0x22/0x30 [ 57.430783] [<ffffffff813bd9dc>] ? strncpy_from_user+0x6c/0x120 [ 57.430786] [<ffffffff811c7a75>] getname_flags+0xb5/0x1a0 [ 57.430788] [<ffffffff811c7b72>] getname+0x12/0x20 [ 57.430790] [<ffffffff811b594d>] do_sys_open+0xfd/0x220 [ 57.430792] [<ffffffff8109e44d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 57.430793] [<ffffffff811b5a8e>] SyS_open+0x1e/0x20 [ 57.430795] [<ffffffff81752976>] system_call_fastpath+0x16/0x1b ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Hung task for proc_cgroup_show 2015-12-21 9:49 ` Christoph Mathys @ 2015-12-21 17:59 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 12+ messages in thread From: Sebastian Andrzej Siewior @ 2015-12-21 17:59 UTC (permalink / raw) To: Christoph Mathys; +Cc: Linux RT Users * Christoph Mathys | 2015-12-21 10:49:41 [+0100]: >When testing on another computer I got a BUG from the same general area: If you can reproduce this then the following patch should fix it: --- Subject: [PATCH] kernel/rcu: use swait for sync_rcu_preempt_exp_wq __schedule() invokes rcu_note_context_switch() with preemption disabled: |BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 |in_atomic(): 1, irqs_disabled(): 0, pid: 3329, name: init |2 locks held by init/3329: | #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8103f17e>] __do_page_fault+0x12e/0x5f0 | #1: (ptlock_ptr(page)#2){+.+...}, at: [<ffffffff8117235f>] do_read_fault.isra.66+0xaf/0x2d0 |CPU: 1 PID: 3329 Comm: init Tainted: G E 3.18.24-realtime-1-rt22 #1 |Call Trace: | [<ffffffff81751b44>] rt_spin_lock+0x24/0x60 | [<ffffffff81093349>] __wake_up+0x29/0x60 | [<ffffffff810bcd04>] rcu_report_exp_rnp.isra.64+0xb4/0xc0 | [<ffffffff810c0a9b>] rcu_read_unlock_special+0x1fb/0x250 | [<ffffffff810c0cd4>] rcu_note_context_switch+0x1e4/0x420 | [<ffffffff8174d7c6>] __schedule+0x56/0x8a0 | [<ffffffff8174e4b4>] preempt_schedule_irq+0x44/0x80 | [<ffffffff81753883>] check_int_off+0x11/0x1e | [<ffffffff81143848>] filemap_map_pages+0x188/0x270 | [<ffffffff8117246a>] do_read_fault.isra.66+0x1ba/0x2d0 | [<ffffffff81173abf>] handle_mm_fault+0x5ef/0xec0 | [<ffffffff8103f1e2>] __do_page_fault+0x192/0x5f0 | [<ffffffff8103f64c>] do_page_fault+0xc/0x10 | [<ffffffff81754752>] page_fault+0x22/0x30 | [<ffffffff811c7a75>] getname_flags+0xb5/0x1a0 | [<ffffffff811c7b72>] getname+0x12/0x20 | [<ffffffff811b594d>] do_sys_open+0xfd/0x220 | [<ffffffff8109e44d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 | [<ffffffff811b5a8e>] SyS_open+0x1e/0x20 | [<ffffffff81752976>] system_call_fastpath+0x16/0x1b Reported-by: Christoph Mathys <eraserix@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- kernel/rcu/tree_plugin.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 54da8f44d586..1cbeb0aa79e2 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -527,7 +527,7 @@ void synchronize_rcu(void) } EXPORT_SYMBOL_GPL(synchronize_rcu); -static DECLARE_WAIT_QUEUE_HEAD(sync_rcu_preempt_exp_wq); +static DEFINE_SWAIT_HEAD(sync_rcu_preempt_exp_wq); static unsigned long sync_rcu_preempt_exp_count; static DEFINE_MUTEX(sync_rcu_preempt_exp_mutex); @@ -584,7 +584,7 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp, raw_spin_unlock_irqrestore(&rnp->lock, flags); if (wake) { smp_mb(); /* EGP done before wake_up(). */ - wake_up(&sync_rcu_preempt_exp_wq); + swait_wake(&sync_rcu_preempt_exp_wq); } break; } @@ -759,7 +759,7 @@ void synchronize_rcu_expedited(void) /* Wait for snapshotted ->blkd_tasks lists to drain. */ rnp = rcu_get_root(rsp); - wait_event(sync_rcu_preempt_exp_wq, + swait_event(sync_rcu_preempt_exp_wq, sync_rcu_preempt_exp_done(rnp)); /* Clean up and exit. */ -- 2.6.4 Sebastian ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Hung task for proc_cgroup_show 2015-12-21 8:58 ` Christoph Mathys 2015-12-21 9:49 ` Christoph Mathys @ 2015-12-21 18:11 ` Sebastian Andrzej Siewior 2015-12-22 12:36 ` Christoph Mathys 1 sibling, 1 reply; 12+ messages in thread From: Sebastian Andrzej Siewior @ 2015-12-21 18:11 UTC (permalink / raw) To: Christoph Mathys; +Cc: Linux RT Users * Christoph Mathys | 2015-12-21 09:58:11 [+0100]: >On Fri, Dec 11, 2015 at 5:14 PM, Sebastian Andrzej Siewior ><bigeasy@linutronix.de> wrote: >> This means css_release_work_fn() or css_killed_work_fn(). Can you issue >> a task dump which says what the tasks are doing? > >Here is another of those lockups, this time with 3.18.24-rt22. I took >the liberty to remove everything not related to cgroups from the logs. > >Seeing the memory control group inside one of the call stacks I tried >disabling this cgroup. I did not see the lockup again, but two system >freezes instead. > So based on this output: SysRq : Show Locks Held Showing all locks held in the system: 1 lock held by kswork/69: #0: (cgroup_mutex){+.+.+.}, at: [<ffffffff810ea7bf>] css_release_work_fn+0x2f/0xe0 4 locks held by cgmanager/349: #0: (sb_writers#9){.+.+.+}, at: [<ffffffff811da1c4>] mnt_want_write+0x24/0x50 #1: (&type->i_mutex_dir_key#3/1){+.+.+.}, at: [<ffffffff811c7e7c>] do_rmdir+0x15c/0x220 #2: (&type->i_mutex_dir_key#3){+.+.+.}, at: [<ffffffff811c2017>] vfs_rmdir+0x57/0x150 #3: (cgroup_mutex){+.+.+.}, at: [<ffffffff810ebd40>] cgroup_kn_lock_live+0x50/0x180 3 locks held by kworker/2:2/3642: #0: ("cgroup_destroy"){.+.+.+}, at: [<ffffffff8106b801>] process_one_work+0x171/0x5b0 #1: ((&css->destroy_work)){+.+.+.}, at: [<ffffffff8106b801>] process_one_work+0x171/0x5b0 #2: (cgroup_mutex){+.+.+.}, at: [<ffffffff810ebf2f>] css_killed_work_fn+0x1f/0x170 and the stack trace below it looks like each one of those three tasks block on the mutex but none of them owns it. So the question is if any of those tree waiters are marked as such and if so why is there no wake up in progress. Thank you for the steps how to reproduce it. I will try it next year. Sebastian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Hung task for proc_cgroup_show 2015-12-21 18:11 ` Sebastian Andrzej Siewior @ 2015-12-22 12:36 ` Christoph Mathys 2015-12-22 13:51 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 12+ messages in thread From: Christoph Mathys @ 2015-12-22 12:36 UTC (permalink / raw) To: Sebastian Andrzej Siewior; +Cc: Linux RT Users I tried to reproduce this bug in kvm, hoping to get copy/pastable dumps. The test didn't run at all, instead I got a stunning number of the following slowpath warning (seem like it was repeated over 100 times during 9ms). I see them occasionally (less than once every few minutes) on real hardware as well. I don't know what to make of them and if they can help in any way. Here is the repeated message: [ 63.141432] WARNING: CPU: 0 PID: 54 at kernel/res_counter.c:28 res_counter_uncharge_until+0x98/0xa0() [ 63.141434] Modules linked in: veth(E) xt_CHECKSUM(E) iptable_mangle(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_co nntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) iptable_filter(E) ip_tables (E) ppdev(E) x_tables(E) kvm(E) microcode(E) cirrus(E) ttm(E) serio_raw(E) drm_kms_helper(E) virtio_balloon(E) drm(E) syscopyarea(E) sysfil lrect(E) sysimgblt(E) i2c_piix4(E) parport_pc(E) mac_hid(E) lp(E) parport(E) psmouse(E) pata_acpi(E) floppy(E) [ 63.141456] CPU: 0 PID: 54 Comm: kworker/0:1 Tainted: G W E 3.18.24-realtime-1-with-patch-rt22+ #2 [ 63.141457] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 63.141459] Workqueue: cgroup_destroy css_killed_work_fn [ 63.141462] 0000000000000009 ffff88013639bb98 ffffffff8174ad9a 0000000000000002 [ 63.141465] 0000000000000000 ffff88013639bbd8 ffffffff8104f371 ffff8800ba093100 [ 63.141467] ffff8800ba093100 0000000000000000 0000000000001000 ffff8800ba093100 [ 63.141468] Call Trace: [ 63.141470] [<ffffffff8174ad9a>] dump_stack+0x4e/0x84 [ 63.141471] [<ffffffff8104f371>] warn_slowpath_common+0x81/0xc0 [ 63.141473] [<ffffffff8104f46a>] warn_slowpath_null+0x1a/0x20 [ 63.141475] [<ffffffff810f76a8>] res_counter_uncharge_until+0x98/0xa0 [ 63.141476] [<ffffffff811aa88e>] mem_cgroup_reparent_charges+0x2ee/0x3f0 [ 63.141478] [<ffffffff811aab0b>] mem_cgroup_css_offline+0x14b/0x3e0 [ 63.141480] [<ffffffff810ebf6e>] css_killed_work_fn+0x4e/0x170 [ 63.141481] [<ffffffff8106b89c>] process_one_work+0x20c/0x5b0 [ 63.141483] [<ffffffff8106b801>] ? process_one_work+0x171/0x5b0 [ 63.141485] [<ffffffff8106bdab>] worker_thread+0x16b/0x4c0 [ 63.141486] [<ffffffff8106bc40>] ? process_one_work+0x5b0/0x5b0 [ 63.141488] [<ffffffff81070fc9>] kthread+0xd9/0xf0 [ 63.141490] [<ffffffff8109e51d>] ? trace_hardirqs_on+0xd/0x10 [ 63.141492] [<ffffffff81070ef0>] ? kthread_create_on_node+0x240/0x240 [ 63.141493] [<ffffffff817528c8>] ret_from_fork+0x58/0x90 [ 63.141495] [<ffffffff81070ef0>] ? kthread_create_on_node+0x240/0x240 [ 63.141496] ---[ end trace 0000000000000075 ]--- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Hung task for proc_cgroup_show 2015-12-22 12:36 ` Christoph Mathys @ 2015-12-22 13:51 ` Sebastian Andrzej Siewior 2015-12-23 6:53 ` Christoph Mathys 0 siblings, 1 reply; 12+ messages in thread From: Sebastian Andrzej Siewior @ 2015-12-22 13:51 UTC (permalink / raw) To: Christoph Mathys; +Cc: Linux RT Users * Christoph Mathys | 2015-12-22 13:36:19 [+0100]: >I tried to reproduce this bug in kvm, hoping to get copy/pastable >dumps. The test didn't run at all, instead I got a stunning number of >the following slowpath warning (seem like it was repeated over 100 >times during 9ms). I see them occasionally (less than once every few >minutes) on real hardware as well. I don't know what to make of them >and if they can help in any way. Here is the repeated message: > >[ 63.141432] WARNING: CPU: 0 PID: 54 at kernel/res_counter.c:28 >res_counter_uncharge_until+0x98/0xa0() It seems the ressource counter code is somehow unbalanced. It has its own lock but you see what you see. You could try without CONFIG_RESOURCE_COUNTERS or switch to v4.1 instead since it has the file in question removed as per 5b1efc027c0b ("kernel: res_counter: remove the unused API"). Sebastian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Hung task for proc_cgroup_show 2015-12-22 13:51 ` Sebastian Andrzej Siewior @ 2015-12-23 6:53 ` Christoph Mathys 0 siblings, 0 replies; 12+ messages in thread From: Christoph Mathys @ 2015-12-23 6:53 UTC (permalink / raw) To: Sebastian Andrzej Siewior; +Cc: Linux RT Users On Tue, Dec 22, 2015 at 2:51 PM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > You could try without CONFIG_RESOURCE_COUNTERS or switch to v4.1 instead > since it has the file in question removed as per 5b1efc027c0b ("kernel: > res_counter: remove the unused API"). Yup, 4.1 does not show this behaviour. Thanks. Christoph ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2015-12-23 6:53 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-14 14:40 Hung task for proc_cgroup_show Christoph Mathys 2015-07-14 14:45 ` Christoph Mathys 2015-07-14 15:00 ` Sebastian Andrzej Siewior 2015-08-03 6:19 ` Christoph Mathys 2015-12-11 16:14 ` Sebastian Andrzej Siewior 2015-12-21 8:58 ` Christoph Mathys 2015-12-21 9:49 ` Christoph Mathys 2015-12-21 17:59 ` Sebastian Andrzej Siewior 2015-12-21 18:11 ` Sebastian Andrzej Siewior 2015-12-22 12:36 ` Christoph Mathys 2015-12-22 13:51 ` Sebastian Andrzej Siewior 2015-12-23 6:53 ` Christoph Mathys
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).