* Hang triggered by udev coldplug, looks like a race @ 2015-12-04 5:52 Andy Lutomirski 2015-12-07 19:18 ` Andy Lutomirski 0 siblings, 1 reply; 16+ messages in thread From: Andy Lutomirski @ 2015-12-04 5:52 UTC (permalink / raw) To: linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel Sometimes udevadm trigger --action=add hangs the system, and the splat below happens. This seems to be timing dependent, and I haven't been able to trigger it yet with lockdep enabled, sadly. Any ideas? I not, I'll try to instrument it better tomorrow. This is 4.4-rc3 plus some patches that I don't think are related. The rootfs is 9p. --Andy [ 60.048359] INFO: task kworker/u2:1:980 blocked for more than 30 seconds. [ 60.051673] Not tainted 4.3.0-rc4+ #3 [ 60.052149] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 60.053013] kworker/u2:1 D ffff880006823bc8 14152 980 14 0x00000000 [ 60.053786] ffff880006823bc8 ffff8800072e4740 ffff8800072e63c0 ffff880006824000 [ 60.054637] ffff88000613cfd0 ffff8800072e63c0 00000000ffffffff 0000000000000246 [ 60.055447] ffff880006823be0 ffffffff818d2bfe ffff88000613cfc8 ffff880006823bf0 [ 60.056250] Call Trace: [ 60.056513] [<ffffffff818d2bfe>] schedule+0x2e/0x70 [ 60.057017] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 [ 60.057632] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 [ 60.058242] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 [ 60.058790] [<ffffffff818d5655>] mutex_lock+0x15/0x30 [ 60.059272] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 [ 60.059805] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 [ 60.060352] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 [ 60.060966] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 [ 60.061805] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 [ 60.062316] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 [ 60.062916] [<ffffffff811c6297>] do_execve+0x27/0x30 [ 60.063355] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 [ 60.063951] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 [ 60.064416] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 [ 60.064889] INFO: task kworker/u2:1:981 blocked for more than 30 seconds. [ 60.065473] Not tainted 4.3.0-rc4+ #3 [ 60.065834] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 60.066507] kworker/u2:1 D ffff880006827bc8 14152 981 14 0x00000000 [ 60.067135] ffff880006827bc8 ffff880006965580 ffff8800072e0e40 ffff880006828000 [ 60.067810] ffff88000613cfd0 ffff8800072e0e40 00000000ffffffff 0000000000000246 [ 60.068491] ffff880006827be0 ffffffff818d2bfe ffff88000613cfc8 ffff880006827bf0 [ 60.069169] Call Trace: [ 60.069386] [<ffffffff818d2bfe>] schedule+0x2e/0x70 [ 60.069814] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 [ 60.070361] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 [ 60.070898] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 [ 60.071381] [<ffffffff818d5655>] mutex_lock+0x15/0x30 [ 60.071825] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 [ 60.072299] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 [ 60.072756] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 [ 60.073289] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 [ 60.073832] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 [ 60.074316] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 [ 60.075083] [<ffffffff811c6297>] do_execve+0x27/0x30 [ 60.075736] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 [ 60.076648] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 [ 60.077235] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 [ 60.077805] INFO: task kworker/u2:1:982 blocked for more than 30 seconds. [ 60.078529] Not tainted 4.3.0-rc4+ #3 [ 60.078969] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 60.079891] kworker/u2:1 D ffff88000682bbc8 14584 982 14 0x00000000 [ 60.080658] ffff88000682bbc8 ffff8800064cb900 ffff880006bc0e40 ffff88000682c000 [ 60.081474] ffff88000613cfd0 ffff880006bc0e40 00000000ffffffff 0000000000000246 [ 60.082282] ffff88000682bbe0 ffffffff818d2bfe ffff88000613cfc8 ffff88000682bbf0 [ 60.083095] Call Trace: [ 60.083357] [<ffffffff818d2bfe>] schedule+0x2e/0x70 [ 60.083871] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 [ 60.084533] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 [ 60.085181] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 [ 60.085761] [<ffffffff818d5655>] mutex_lock+0x15/0x30 [ 60.086300] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 [ 60.086869] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 [ 60.087422] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 [ 60.088067] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 [ 60.088722] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 [ 60.089301] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 [ 60.089991] [<ffffffff811c6297>] do_execve+0x27/0x30 [ 60.090552] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 [ 60.091274] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 [ 60.091832] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 [ 60.092401] INFO: task kworker/u2:1:983 blocked for more than 30 seconds. [ 60.093138] Not tainted 4.3.0-rc4+ #3 [ 60.093574] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 60.094379] kworker/u2:1 D ffff88000682fbc8 14544 983 14 0x00000000 [ 60.095139] ffff88000682fbc8 ffff880006bc0e40 ffff88000644c740 ffff880006830000 [ 60.095973] ffff88000613cfd0 ffff88000644c740 00000000ffffffff 0000000000000246 [ 60.096871] ffff88000682fbe0 ffffffff818d2bfe ffff88000613cfc8 ffff88000682fbf0 [ 60.097686] Call Trace: [ 60.097945] [<ffffffff818d2bfe>] schedule+0x2e/0x70 [ 60.098472] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 [ 60.099136] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 [ 60.099786] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 [ 60.100368] [<ffffffff818d5655>] mutex_lock+0x15/0x30 [ 60.100902] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 [ 60.101478] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 [ 60.102060] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 [ 60.102706] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 [ 60.103366] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 [ 60.103947] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 [ 60.104647] [<ffffffff811c6297>] do_execve+0x27/0x30 [ 60.105205] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 [ 60.105923] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 [ 60.106494] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 [ 60.107065] INFO: task kworker/u2:1:984 blocked for more than 30 seconds. [ 60.107795] Not tainted 4.3.0-rc4+ #3 [ 60.108248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 60.109115] kworker/u2:1 D ffff8800068e3bc8 14152 984 14 0x00000000 [ 60.109881] ffff8800068e3bc8 ffff880006a25580 ffff880006b65580 ffff8800068e4000 [ 60.110737] ffff88000613cfd0 ffff880006b65580 00000000ffffffff 0000000000000246 [ 60.111577] ffff8800068e3be0 ffffffff818d2bfe ffff88000613cfc8 ffff8800068e3bf0 [ 60.112533] Call Trace: [ 60.112798] [<ffffffff818d2bfe>] schedule+0x2e/0x70 [ 60.113325] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 [ 60.113990] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 [ 60.114652] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 [ 60.115240] [<ffffffff818d5655>] mutex_lock+0x15/0x30 [ 60.115807] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 [ 60.116385] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 [ 60.116942] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 [ 60.117595] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 [ 60.118261] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 [ 60.118845] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 [ 60.119548] [<ffffffff811c6297>] do_execve+0x27/0x30 [ 60.120110] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 [ 60.120836] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 [ 60.121406] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 [ 60.121981] INFO: task kworker/u2:1:985 blocked for more than 30 seconds. [ 60.122732] Not tainted 4.3.0-rc4+ #3 [ 60.123177] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 60.123992] kworker/u2:1 D ffff8800068e7bc8 14152 985 14 0x00000000 [ 60.124759] ffff8800068e7bc8 ffff8800073c0000 ffff880006b61c80 ffff8800068e8000 [ 60.125579] ffff88000613cfd0 ffff880006b61c80 00000000ffffffff 0000000000000246 [ 60.126389] ffff8800068e7be0 ffffffff818d2bfe ffff88000613cfc8 ffff8800068e7bf0 [ 60.127258] Call Trace: [ 60.127525] [<ffffffff818d2bfe>] schedule+0x2e/0x70 [ 60.128052] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 [ 60.128720] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 [ 60.129380] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 [ 60.129995] [<ffffffff818d5655>] mutex_lock+0x15/0x30 [ 60.130570] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 [ 60.131150] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 [ 60.131733] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 [ 60.132410] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 [ 60.133106] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 [ 60.133692] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 [ 60.134393] [<ffffffff811c6297>] do_execve+0x27/0x30 [ 60.134951] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 [ 60.135711] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 [ 60.136280] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 [ 60.136856] INFO: task kworker/u2:1:986 blocked for more than 30 seconds. [ 60.137600] Not tainted 4.3.0-rc4+ #3 [ 60.138073] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 60.138881] kworker/u2:1 D ffff8800068ebbc8 14152 986 14 0x00000000 [ 60.139672] ffff8800068ebbc8 ffff8800066bc740 ffff880006b663c0 ffff8800068ec000 [ 60.140545] ffff88000613cfd0 ffff880006b663c0 00000000ffffffff 0000000000000246 [ 60.141356] ffff8800068ebbe0 ffffffff818d2bfe ffff88000613cfc8 ffff8800068ebbf0 [ 60.142209] Call Trace: [ 60.142487] [<ffffffff818d2bfe>] schedule+0x2e/0x70 [ 60.142998] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 [ 60.143657] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 [ 60.144307] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 [ 60.144895] [<ffffffff818d5655>] mutex_lock+0x15/0x30 [ 60.145455] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 [ 60.146117] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 [ 60.146715] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 [ 60.147391] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 [ 60.148081] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 [ 60.148660] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 [ 60.149380] [<ffffffff811c6297>] do_execve+0x27/0x30 [ 60.149905] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 [ 60.150629] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 [ 60.151189] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 [ 60.151787] INFO: task kworker/u2:1:987 blocked for more than 30 seconds. [ 60.152497] Not tainted 4.3.0-rc4+ #3 [ 60.152927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 60.153735] kworker/u2:1 D ffff8800068efbc8 14152 987 14 0x00000000 [ 60.154524] ffff8800068efbc8 ffff880007234740 ffff880006bf9c80 ffff8800068f0000 [ 60.155389] ffff88000613cfd0 ffff880006bf9c80 00000000ffffffff 0000000000000246 [ 60.156227] ffff8800068efbe0 ffffffff818d2bfe ffff88000613cfc8 ffff8800068efbf0 [ 60.157040] Call Trace: [ 60.157299] [<ffffffff818d2bfe>] schedule+0x2e/0x70 [ 60.157813] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 [ 60.158478] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 [ 60.159125] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 [ 60.159731] [<ffffffff818d5655>] mutex_lock+0x15/0x30 [ 60.160294] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 [ 60.160864] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 [ 60.161414] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 [ 60.162058] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 [ 60.162711] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 [ 60.163315] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 [ 60.164038] [<ffffffff811c6297>] do_execve+0x27/0x30 [ 60.164586] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 [ 60.165305] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 [ 60.165890] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 [ 60.166490] INFO: task kworker/u2:1:988 blocked for more than 30 seconds. [ 60.167220] Not tainted 4.3.0-rc4+ #3 [ 60.167682] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 60.168515] kworker/u2:1 D ffff880006913bc8 14152 988 14 0x00000000 [ 60.169270] ffff880006913bc8 ffff8800001663c0 ffff880006bfc740 ffff880006914000 [ 60.170081] ffff88000613cfd0 ffff880006bfc740 00000000ffffffff 0000000000000246 [ 60.170884] ffff880006913be0 ffffffff818d2bfe ffff88000613cfc8 ffff880006913bf0 [ 60.171742] Call Trace: [ 60.172037] [<ffffffff818d2bfe>] schedule+0x2e/0x70 [ 60.172587] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 [ 60.173263] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 [ 60.173928] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 [ 60.174550] [<ffffffff818d5655>] mutex_lock+0x15/0x30 [ 60.175097] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 [ 60.175703] [<ffffffff811afa1c>] ? alloc_pages_current+0x8c/0x100 [ 60.176354] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 [ 60.176955] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 [ 60.177630] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 [ 60.178303] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 [ 60.178984] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 [ 60.179701] [<ffffffff811c6297>] do_execve+0x27/0x30 [ 60.180268] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 [ 60.181057] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 [ 60.181625] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 [ 60.182233] INFO: task kworker/u2:1:989 blocked for more than 30 seconds. [ 60.182972] Not tainted 4.3.0-rc4+ #3 [ 60.183420] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 60.184268] kworker/u2:1 D ffff880006917bc8 14152 989 14 0x00000000 [ 60.185063] ffff880006917bc8 ffff880000149c80 ffff880006bf8e40 ffff880006918000 [ 60.185954] ffff88000613cfd0 ffff880006bf8e40 00000000ffffffff 0000000000000246 [ 60.186806] ffff880006917be0 ffffffff818d2bfe ffff88000613cfc8 ffff880006917bf0 [ 60.187672] Call Trace: [ 60.187937] [<ffffffff818d2bfe>] schedule+0x2e/0x70 [ 60.188495] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 [ 60.189193] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 [ 60.189851] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 [ 60.190472] [<ffffffff818d5655>] mutex_lock+0x15/0x30 [ 60.191043] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 [ 60.191649] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 [ 60.192112] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 [ 60.192678] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 [ 60.193248] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 [ 60.193722] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 [ 60.194291] [<ffffffff811c6297>] do_execve+0x27/0x30 [ 60.194724] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 [ 60.195318] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 [ 60.195781] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Hang triggered by udev coldplug, looks like a race 2015-12-04 5:52 Hang triggered by udev coldplug, looks like a race Andy Lutomirski @ 2015-12-07 19:18 ` Andy Lutomirski 2015-12-07 22:46 ` [V9fs-developer] " Dominique Martinet 0 siblings, 1 reply; 16+ messages in thread From: Andy Lutomirski @ 2015-12-07 19:18 UTC (permalink / raw) To: linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel On Thu, Dec 3, 2015 at 9:52 PM, Andy Lutomirski <luto@amacapital.net> wrote: > Sometimes udevadm trigger --action=add hangs the system, and the splat > below happens. This seems to be timing dependent, and I haven't been > able to trigger it yet with lockdep enabled, sadly. > > Any ideas? I not, I'll try to instrument it better tomorrow. More details: this is caused by a storm of /sbin/hotplug UMH calls (yes, misconfigured kernel, but still). /sbin is a symlink to /usr/sbin, /usr/sbin/hotplug doesn't exist, and all of the above is on rootfs, which is 9p over virtio. Pointing uevent_helper at /usr/sbin/hotplug (which still doesn't exist) seems to work around it. > > This is 4.4-rc3 plus some patches that I don't think are related. The > rootfs is 9p. > > --Andy > > [ 60.048359] INFO: task kworker/u2:1:980 blocked for more than 30 seconds. > [ 60.051673] Not tainted 4.3.0-rc4+ #3 > [ 60.052149] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 60.053013] kworker/u2:1 D ffff880006823bc8 14152 980 14 0x00000000 > [ 60.053786] ffff880006823bc8 ffff8800072e4740 ffff8800072e63c0 > ffff880006824000 > [ 60.054637] ffff88000613cfd0 ffff8800072e63c0 00000000ffffffff > 0000000000000246 > [ 60.055447] ffff880006823be0 ffffffff818d2bfe ffff88000613cfc8 > ffff880006823bf0 > [ 60.056250] Call Trace: > [ 60.056513] [<ffffffff818d2bfe>] schedule+0x2e/0x70 > [ 60.057017] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 > [ 60.057632] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 > [ 60.058242] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 > [ 60.058790] [<ffffffff818d5655>] mutex_lock+0x15/0x30 > [ 60.059272] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 > [ 60.059805] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 > [ 60.060352] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 > [ 60.060966] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 > [ 60.061805] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 > [ 60.062316] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 > [ 60.062916] [<ffffffff811c6297>] do_execve+0x27/0x30 > [ 60.063355] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 > [ 60.063951] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 > [ 60.064416] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 > [ 60.064889] INFO: task kworker/u2:1:981 blocked for more than 30 seconds. > [ 60.065473] Not tainted 4.3.0-rc4+ #3 > [ 60.065834] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 60.066507] kworker/u2:1 D ffff880006827bc8 14152 981 14 0x00000000 > [ 60.067135] ffff880006827bc8 ffff880006965580 ffff8800072e0e40 > ffff880006828000 > [ 60.067810] ffff88000613cfd0 ffff8800072e0e40 00000000ffffffff > 0000000000000246 > [ 60.068491] ffff880006827be0 ffffffff818d2bfe ffff88000613cfc8 > ffff880006827bf0 > [ 60.069169] Call Trace: > [ 60.069386] [<ffffffff818d2bfe>] schedule+0x2e/0x70 > [ 60.069814] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 > [ 60.070361] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 > [ 60.070898] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 > [ 60.071381] [<ffffffff818d5655>] mutex_lock+0x15/0x30 > [ 60.071825] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 > [ 60.072299] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 > [ 60.072756] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 > [ 60.073289] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 > [ 60.073832] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 > [ 60.074316] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 > [ 60.075083] [<ffffffff811c6297>] do_execve+0x27/0x30 > [ 60.075736] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 > [ 60.076648] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 > [ 60.077235] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 > [ 60.077805] INFO: task kworker/u2:1:982 blocked for more than 30 seconds. > [ 60.078529] Not tainted 4.3.0-rc4+ #3 > [ 60.078969] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 60.079891] kworker/u2:1 D ffff88000682bbc8 14584 982 14 0x00000000 > [ 60.080658] ffff88000682bbc8 ffff8800064cb900 ffff880006bc0e40 > ffff88000682c000 > [ 60.081474] ffff88000613cfd0 ffff880006bc0e40 00000000ffffffff > 0000000000000246 > [ 60.082282] ffff88000682bbe0 ffffffff818d2bfe ffff88000613cfc8 > ffff88000682bbf0 > [ 60.083095] Call Trace: > [ 60.083357] [<ffffffff818d2bfe>] schedule+0x2e/0x70 > [ 60.083871] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 > [ 60.084533] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 > [ 60.085181] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 > [ 60.085761] [<ffffffff818d5655>] mutex_lock+0x15/0x30 > [ 60.086300] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 > [ 60.086869] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 > [ 60.087422] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 > [ 60.088067] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 > [ 60.088722] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 > [ 60.089301] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 > [ 60.089991] [<ffffffff811c6297>] do_execve+0x27/0x30 > [ 60.090552] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 > [ 60.091274] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 > [ 60.091832] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 > [ 60.092401] INFO: task kworker/u2:1:983 blocked for more than 30 seconds. > [ 60.093138] Not tainted 4.3.0-rc4+ #3 > [ 60.093574] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 60.094379] kworker/u2:1 D ffff88000682fbc8 14544 983 14 0x00000000 > [ 60.095139] ffff88000682fbc8 ffff880006bc0e40 ffff88000644c740 > ffff880006830000 > [ 60.095973] ffff88000613cfd0 ffff88000644c740 00000000ffffffff > 0000000000000246 > [ 60.096871] ffff88000682fbe0 ffffffff818d2bfe ffff88000613cfc8 > ffff88000682fbf0 > [ 60.097686] Call Trace: > [ 60.097945] [<ffffffff818d2bfe>] schedule+0x2e/0x70 > [ 60.098472] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 > [ 60.099136] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 > [ 60.099786] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 > [ 60.100368] [<ffffffff818d5655>] mutex_lock+0x15/0x30 > [ 60.100902] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 > [ 60.101478] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 > [ 60.102060] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 > [ 60.102706] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 > [ 60.103366] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 > [ 60.103947] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 > [ 60.104647] [<ffffffff811c6297>] do_execve+0x27/0x30 > [ 60.105205] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 > [ 60.105923] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 > [ 60.106494] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 > [ 60.107065] INFO: task kworker/u2:1:984 blocked for more than 30 seconds. > [ 60.107795] Not tainted 4.3.0-rc4+ #3 > [ 60.108248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 60.109115] kworker/u2:1 D ffff8800068e3bc8 14152 984 14 0x00000000 > [ 60.109881] ffff8800068e3bc8 ffff880006a25580 ffff880006b65580 > ffff8800068e4000 > [ 60.110737] ffff88000613cfd0 ffff880006b65580 00000000ffffffff > 0000000000000246 > [ 60.111577] ffff8800068e3be0 ffffffff818d2bfe ffff88000613cfc8 > ffff8800068e3bf0 > [ 60.112533] Call Trace: > [ 60.112798] [<ffffffff818d2bfe>] schedule+0x2e/0x70 > [ 60.113325] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 > [ 60.113990] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 > [ 60.114652] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 > [ 60.115240] [<ffffffff818d5655>] mutex_lock+0x15/0x30 > [ 60.115807] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 > [ 60.116385] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 > [ 60.116942] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 > [ 60.117595] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 > [ 60.118261] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 > [ 60.118845] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 > [ 60.119548] [<ffffffff811c6297>] do_execve+0x27/0x30 > [ 60.120110] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 > [ 60.120836] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 > [ 60.121406] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 > [ 60.121981] INFO: task kworker/u2:1:985 blocked for more than 30 seconds. > [ 60.122732] Not tainted 4.3.0-rc4+ #3 > [ 60.123177] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 60.123992] kworker/u2:1 D ffff8800068e7bc8 14152 985 14 0x00000000 > [ 60.124759] ffff8800068e7bc8 ffff8800073c0000 ffff880006b61c80 > ffff8800068e8000 > [ 60.125579] ffff88000613cfd0 ffff880006b61c80 00000000ffffffff > 0000000000000246 > [ 60.126389] ffff8800068e7be0 ffffffff818d2bfe ffff88000613cfc8 > ffff8800068e7bf0 > [ 60.127258] Call Trace: > [ 60.127525] [<ffffffff818d2bfe>] schedule+0x2e/0x70 > [ 60.128052] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 > [ 60.128720] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 > [ 60.129380] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 > [ 60.129995] [<ffffffff818d5655>] mutex_lock+0x15/0x30 > [ 60.130570] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 > [ 60.131150] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 > [ 60.131733] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 > [ 60.132410] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 > [ 60.133106] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 > [ 60.133692] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 > [ 60.134393] [<ffffffff811c6297>] do_execve+0x27/0x30 > [ 60.134951] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 > [ 60.135711] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 > [ 60.136280] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 > [ 60.136856] INFO: task kworker/u2:1:986 blocked for more than 30 seconds. > [ 60.137600] Not tainted 4.3.0-rc4+ #3 > [ 60.138073] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 60.138881] kworker/u2:1 D ffff8800068ebbc8 14152 986 14 0x00000000 > [ 60.139672] ffff8800068ebbc8 ffff8800066bc740 ffff880006b663c0 > ffff8800068ec000 > [ 60.140545] ffff88000613cfd0 ffff880006b663c0 00000000ffffffff > 0000000000000246 > [ 60.141356] ffff8800068ebbe0 ffffffff818d2bfe ffff88000613cfc8 > ffff8800068ebbf0 > [ 60.142209] Call Trace: > [ 60.142487] [<ffffffff818d2bfe>] schedule+0x2e/0x70 > [ 60.142998] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 > [ 60.143657] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 > [ 60.144307] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 > [ 60.144895] [<ffffffff818d5655>] mutex_lock+0x15/0x30 > [ 60.145455] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 > [ 60.146117] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 > [ 60.146715] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 > [ 60.147391] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 > [ 60.148081] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 > [ 60.148660] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 > [ 60.149380] [<ffffffff811c6297>] do_execve+0x27/0x30 > [ 60.149905] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 > [ 60.150629] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 > [ 60.151189] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 > [ 60.151787] INFO: task kworker/u2:1:987 blocked for more than 30 seconds. > [ 60.152497] Not tainted 4.3.0-rc4+ #3 > [ 60.152927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 60.153735] kworker/u2:1 D ffff8800068efbc8 14152 987 14 0x00000000 > [ 60.154524] ffff8800068efbc8 ffff880007234740 ffff880006bf9c80 > ffff8800068f0000 > [ 60.155389] ffff88000613cfd0 ffff880006bf9c80 00000000ffffffff > 0000000000000246 > [ 60.156227] ffff8800068efbe0 ffffffff818d2bfe ffff88000613cfc8 > ffff8800068efbf0 > [ 60.157040] Call Trace: > [ 60.157299] [<ffffffff818d2bfe>] schedule+0x2e/0x70 > [ 60.157813] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 > [ 60.158478] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 > [ 60.159125] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 > [ 60.159731] [<ffffffff818d5655>] mutex_lock+0x15/0x30 > [ 60.160294] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 > [ 60.160864] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 > [ 60.161414] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 > [ 60.162058] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 > [ 60.162711] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 > [ 60.163315] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 > [ 60.164038] [<ffffffff811c6297>] do_execve+0x27/0x30 > [ 60.164586] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 > [ 60.165305] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 > [ 60.165890] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 > [ 60.166490] INFO: task kworker/u2:1:988 blocked for more than 30 seconds. > [ 60.167220] Not tainted 4.3.0-rc4+ #3 > [ 60.167682] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 60.168515] kworker/u2:1 D ffff880006913bc8 14152 988 14 0x00000000 > [ 60.169270] ffff880006913bc8 ffff8800001663c0 ffff880006bfc740 > ffff880006914000 > [ 60.170081] ffff88000613cfd0 ffff880006bfc740 00000000ffffffff > 0000000000000246 > [ 60.170884] ffff880006913be0 ffffffff818d2bfe ffff88000613cfc8 > ffff880006913bf0 > [ 60.171742] Call Trace: > [ 60.172037] [<ffffffff818d2bfe>] schedule+0x2e/0x70 > [ 60.172587] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 > [ 60.173263] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 > [ 60.173928] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 > [ 60.174550] [<ffffffff818d5655>] mutex_lock+0x15/0x30 > [ 60.175097] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 > [ 60.175703] [<ffffffff811afa1c>] ? alloc_pages_current+0x8c/0x100 > [ 60.176354] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 > [ 60.176955] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 > [ 60.177630] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 > [ 60.178303] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 > [ 60.178984] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 > [ 60.179701] [<ffffffff811c6297>] do_execve+0x27/0x30 > [ 60.180268] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 > [ 60.181057] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 > [ 60.181625] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 > [ 60.182233] INFO: task kworker/u2:1:989 blocked for more than 30 seconds. > [ 60.182972] Not tainted 4.3.0-rc4+ #3 > [ 60.183420] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 60.184268] kworker/u2:1 D ffff880006917bc8 14152 989 14 0x00000000 > [ 60.185063] ffff880006917bc8 ffff880000149c80 ffff880006bf8e40 > ffff880006918000 > [ 60.185954] ffff88000613cfd0 ffff880006bf8e40 00000000ffffffff > 0000000000000246 > [ 60.186806] ffff880006917be0 ffffffff818d2bfe ffff88000613cfc8 > ffff880006917bf0 > [ 60.187672] Call Trace: > [ 60.187937] [<ffffffff818d2bfe>] schedule+0x2e/0x70 > [ 60.188495] [<ffffffff818d2f29>] schedule_preempt_disabled+0x9/0x10 > [ 60.189193] [<ffffffff818d5457>] __mutex_lock_slowpath+0x107/0x2f0 > [ 60.189851] [<ffffffff811c9123>] ? lookup_fast+0x133/0x2e0 > [ 60.190472] [<ffffffff818d5655>] mutex_lock+0x15/0x30 > [ 60.191043] [<ffffffff811cc76f>] path_openat+0x63f/0x1220 > [ 60.191649] [<ffffffff811ce339>] do_filp_open+0x79/0xd0 > [ 60.192112] [<ffffffff8130a136>] ? selinux_cred_prepare+0x16/0x30 > [ 60.192678] [<ffffffff8130449e>] ? security_prepare_creds+0x3e/0x60 > [ 60.193248] [<ffffffff811c403f>] do_open_execat+0x5f/0x150 > [ 60.193722] [<ffffffff811c5d53>] do_execveat_common.isra.33+0x1a3/0x6c0 > [ 60.194291] [<ffffffff811c6297>] do_execve+0x27/0x30 > [ 60.194724] [<ffffffff810b8d2b>] call_usermodehelper_exec_async+0xeb/0x140 > [ 60.195318] [<ffffffff818d7712>] ret_from_fork+0x22/0x40 > [ 60.195781] [<ffffffff810b8c40>] ? umh_complete+0x40/0x40 -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2015-12-07 19:18 ` Andy Lutomirski @ 2015-12-07 22:46 ` Dominique Martinet 2015-12-08 1:59 ` Andy Lutomirski 0 siblings, 1 reply; 16+ messages in thread From: Dominique Martinet @ 2015-12-07 22:46 UTC (permalink / raw) To: Andy Lutomirski Cc: linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel Andy Lutomirski wrote on Mon, Dec 07, 2015: > On Thu, Dec 3, 2015 at 9:52 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > Sometimes udevadm trigger --action=add hangs the system, and the splat > > below happens. This seems to be timing dependent, and I haven't been > > able to trigger it yet with lockdep enabled, sadly. > > > > Any ideas? I not, I'll try to instrument it better tomorrow. > > More details: this is caused by a storm of /sbin/hotplug UMH calls > (yes, misconfigured kernel, but still). /sbin is a symlink to > /usr/sbin, /usr/sbin/hotplug doesn't exist, and all of the above is on > rootfs, which is 9p over virtio. > > Pointing uevent_helper at /usr/sbin/hotplug (which still doesn't > exist) seems to work around it. Can you reproduce it on a booted system with something like `seq 1 1000000 | xargs -P 1024 -I{} cat /sbin/foo >&/dev/null` ? (trying execs might be closer to your workload, not sure how much this or using umh might change) Also, what qemu version please just to try to match your environment ? -- Dominique Martinet ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2015-12-07 22:46 ` [V9fs-developer] " Dominique Martinet @ 2015-12-08 1:59 ` Andy Lutomirski 2015-12-08 2:33 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: Andy Lutomirski @ 2015-12-08 1:59 UTC (permalink / raw) To: Dominique Martinet Cc: linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel [-- Attachment #1: Type: text/plain, Size: 2093 bytes --] On Mon, Dec 7, 2015 at 2:46 PM, Dominique Martinet <dominique.martinet@cea.fr> wrote: > Andy Lutomirski wrote on Mon, Dec 07, 2015: >> On Thu, Dec 3, 2015 at 9:52 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> > Sometimes udevadm trigger --action=add hangs the system, and the splat >> > below happens. This seems to be timing dependent, and I haven't been >> > able to trigger it yet with lockdep enabled, sadly. >> > >> > Any ideas? I not, I'll try to instrument it better tomorrow. >> >> More details: this is caused by a storm of /sbin/hotplug UMH calls >> (yes, misconfigured kernel, but still). /sbin is a symlink to >> /usr/sbin, /usr/sbin/hotplug doesn't exist, and all of the above is on >> rootfs, which is 9p over virtio. >> >> Pointing uevent_helper at /usr/sbin/hotplug (which still doesn't >> exist) seems to work around it. > > Can you reproduce it on a booted system with something like > `seq 1 1000000 | xargs -P 1024 -I{} cat /sbin/foo >&/dev/null` ? This doesn't reproduce it. This doesn't either: seq 1 1000000 | xargs -P 1024 -I{} bash -c 'exec /sbin/foo' &>/dev/null > > (trying execs might be closer to your workload, not sure how much this > or using umh might change) > > > Also, what qemu version please just to try to match your environment ? $ qemu-system-x86_64 --version QEMU emulator version 2.4.1 (qemu-2.4.1-1.fc23), Copyright (c) 2003-2008 Fabrice Bellard My reproducer is: $ virtme-run --kdir . --pwd using this virtme version: https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git/commit/?id=17363c2900e8b796c80c920c6fcdcc6747784ef7 Bad kernel config attached. This config with v4.4-rc3 (and no additional patches) reproduces it reliably for me. With the latest virtme, I don't reproduce it -- the latest virtme turns off uevent_helper early in boot, which suppresses the bug for me, at least most of the time. If I dump all task states (see attached typescript), I see a bunch of things blocked in 9p rpc. This makes me think it could be a QEMU bug, not a kernel bug. --Andy -- Andy Lutomirski AMA Capital Management, LLC [-- Attachment #2: bad-config.xz --] [-- Type: application/x-xz, Size: 21352 bytes --] [-- Attachment #3: typescript.xz --] [-- Type: application/x-xz, Size: 18584 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2015-12-08 1:59 ` Andy Lutomirski @ 2015-12-08 2:33 ` Al Viro 2015-12-08 22:49 ` Andy Lutomirski 0 siblings, 1 reply; 16+ messages in thread From: Al Viro @ 2015-12-08 2:33 UTC (permalink / raw) To: Andy Lutomirski Cc: Dominique Martinet, linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel On Mon, Dec 07, 2015 at 05:59:41PM -0800, Andy Lutomirski wrote: > If I dump all task states (see attached typescript), I see a bunch of > things blocked in 9p rpc. This makes me think it could be a QEMU bug, > not a kernel bug. Maybe, maybe not - I'd suggest dumping the 9p traffic and checking the request and response IDs; at least one bug in that area got fixed (see commit a84b69 for gory details). Really, check if something gets screwed with IDs. They should be unique among outstanding requests and never reused until either a response with the matching ID arrives, *or* TFLUSH on that ID is sent and reply has arrived. If those rules get buggered, it's very easy to get hangs like that. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2015-12-08 2:33 ` Al Viro @ 2015-12-08 22:49 ` Andy Lutomirski 2015-12-09 6:23 ` Dominique Martinet 0 siblings, 1 reply; 16+ messages in thread From: Andy Lutomirski @ 2015-12-08 22:49 UTC (permalink / raw) To: Al Viro Cc: Dominique Martinet, linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel [-- Attachment #1: Type: text/plain, Size: 947 bytes --] On Mon, Dec 7, 2015 at 6:33 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Mon, Dec 07, 2015 at 05:59:41PM -0800, Andy Lutomirski wrote: > >> If I dump all task states (see attached typescript), I see a bunch of >> things blocked in 9p rpc. This makes me think it could be a QEMU bug, >> not a kernel bug. > > Maybe, maybe not - I'd suggest dumping the 9p traffic and checking the > request and response IDs; at least one bug in that area got fixed > (see commit a84b69 for gory details). > > Really, check if something gets screwed with IDs. They should be unique > among outstanding requests and never reused until either a response > with the matching ID arrives, *or* TFLUSH on that ID is sent and reply > has arrived. If those rules get buggered, it's very easy to get hangs > like that. Trace attached. I don't see anything wrong, but I also don't know what I'm looking for. --Andy -- Andy Lutomirski AMA Capital Management, LLC [-- Attachment #2: 9ptrace.txt.xz --] [-- Type: application/x-xz, Size: 19916 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2015-12-08 22:49 ` Andy Lutomirski @ 2015-12-09 6:23 ` Dominique Martinet 2015-12-09 6:41 ` Dominique Martinet 2015-12-09 6:45 ` Al Viro 0 siblings, 2 replies; 16+ messages in thread From: Dominique Martinet @ 2015-12-09 6:23 UTC (permalink / raw) To: Andy Lutomirski Cc: Al Viro, linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel [-- Attachment #1: Type: text/plain, Size: 629 bytes --] Andy Lutomirski wrote on Tue, Dec 08, 2015: > Trace attached. I don't see anything wrong, but I also don't know > what I'm looking for. Actually doesn't look good, not sure if trace could be missing messages but it looks like tags get reused... Quick and dirty parse script (attached output, it doesn't handle TFLUSH but there are none in his message): awk '{print NR " " $0}' 9ptrace.txt | \ awk '/^[0-9]* T/ { if ($5 in tag) { print "DOUBLETAG " $0 " || had " tag[$5]; } else { tag[$5] = $0; } } /^[0-9]* R/ { if (! $5 in tag ) { print "MISSTAG " $0; } else { delete tag[$5]; } }' -- Dominique Martinet [-- Attachment #2: 9pdoubles.xz --] [-- Type: application/octet-stream, Size: 5524 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2015-12-09 6:23 ` Dominique Martinet @ 2015-12-09 6:41 ` Dominique Martinet 2015-12-09 6:45 ` Al Viro 1 sibling, 0 replies; 16+ messages in thread From: Dominique Martinet @ 2015-12-09 6:41 UTC (permalink / raw) To: Andy Lutomirski Cc: Al Viro, linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel Dominique Martinet wrote on Wed, Dec 09, 2015: > Andy Lutomirski wrote on Tue, Dec 08, 2015: > > Trace attached. I don't see anything wrong, but I also don't know > > what I'm looking for. > > Actually doesn't look good, not sure if trace could be missing messages > but it looks like tags get reused... talked too fast as usual, it looks like traces are just missing RCLUNKS so *from what I can see* there's no overlap -- Dominique ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2015-12-09 6:23 ` Dominique Martinet 2015-12-09 6:41 ` Dominique Martinet @ 2015-12-09 6:45 ` Al Viro 2015-12-17 21:06 ` Andy Lutomirski 1 sibling, 1 reply; 16+ messages in thread From: Al Viro @ 2015-12-09 6:45 UTC (permalink / raw) To: Dominique Martinet Cc: Andy Lutomirski, linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel On Wed, Dec 09, 2015 at 07:23:16AM +0100, Dominique Martinet wrote: > Andy Lutomirski wrote on Tue, Dec 08, 2015: > > Trace attached. I don't see anything wrong, but I also don't know > > what I'm looking for. > > Actually doesn't look good, not sure if trace could be missing messages > but it looks like tags get reused... > > Quick and dirty parse script (attached output, it doesn't handle TFLUSH > but there are none in his message): There's also not a single RCLUNK in there, which looks like missing packets. AFAICS, all (apparent) tag reuses caught by your awk script are of that nature. And all RERROR are plain "no such file" in response to TWALK; all those TCLUNK in there get no reply whatsoever - neither success nor failure... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2015-12-09 6:45 ` Al Viro @ 2015-12-17 21:06 ` Andy Lutomirski 2015-12-24 10:51 ` Dominique Martinet 0 siblings, 1 reply; 16+ messages in thread From: Andy Lutomirski @ 2015-12-17 21:06 UTC (permalink / raw) To: Al Viro Cc: Dominique Martinet, linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel On Tue, Dec 8, 2015 at 10:45 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Wed, Dec 09, 2015 at 07:23:16AM +0100, Dominique Martinet wrote: >> Andy Lutomirski wrote on Tue, Dec 08, 2015: >> > Trace attached. I don't see anything wrong, but I also don't know >> > what I'm looking for. >> >> Actually doesn't look good, not sure if trace could be missing messages >> but it looks like tags get reused... >> >> Quick and dirty parse script (attached output, it doesn't handle TFLUSH >> but there are none in his message): > > There's also not a single RCLUNK in there, which looks like missing packets. > AFAICS, all (apparent) tag reuses caught by your awk script are of that nature. > And all RERROR are plain "no such file" in response to TWALK; all those > TCLUNK in there get no reply whatsoever - neither success nor failure... This could be QEMU's analysis script screwing up. Is there a good way for me to gather more info? --Andy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2015-12-17 21:06 ` Andy Lutomirski @ 2015-12-24 10:51 ` Dominique Martinet 2015-12-30 6:43 ` Andy Lutomirski 0 siblings, 1 reply; 16+ messages in thread From: Dominique Martinet @ 2015-12-24 10:51 UTC (permalink / raw) To: Andy Lutomirski Cc: Al Viro, linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel Andy Lutomirski wrote on Thu, Dec 17, 2015: > This could be QEMU's analysis script screwing up. Is there a good way > for me to gather more info? I finally took some time to reproduce it (sorry for the delay) Using your config, virtme commit (17363c2) and kernel tag v4.4-rc3 I was able to reproduce it just fine with my qemu (2.4.90) Now for the fun bit... I ran it with a gdb server, attaching gdb and running cont always 'unblocks' it Using the kernel gdb scripts (lx-ps) I see about 250 kworker threads running, the backtraces all look the same: [ 20.273945] [<ffffffff818c3910>] schedule+0x30/0x80 [ 20.274644] [<ffffffff818c3c39>] schedule_preempt_disabled+0x9/0x10 [ 20.275539] [<ffffffff818c6147>] __mutex_lock_slowpath+0x107/0x2f0 [ 20.276421] [<ffffffff811cf02e>] ? lookup_fast+0xbe/0x320 [ 20.277195] [<ffffffff818c6345>] mutex_lock+0x15/0x30 [ 20.277916] [<ffffffff811d0df7>] walk_component+0x1a7/0x270 so given it unblocks after hooking gdb + cont I'm actually thinking this might be a pure scheduling issue? (e.g. thread is never re-scheduled or something like that?) I can't see any task not in schedule() in your sysrq dump task transcript either. Not sure how to go around debugging that, to be honest. I've tried both default one virtual cpu and -smp 3 or 4 and both can reproduce it; cpu usage on the host is always low so it doesn't look like there's any busy-polling involved.. This is a pretty subtle bug we have there.. -- Dominique Martinet ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2015-12-24 10:51 ` Dominique Martinet @ 2015-12-30 6:43 ` Andy Lutomirski 2016-01-04 15:59 ` Peter Zijlstra 0 siblings, 1 reply; 16+ messages in thread From: Andy Lutomirski @ 2015-12-30 6:43 UTC (permalink / raw) To: Dominique Martinet, Peter Zijlstra, Thomas Gleixner, Ingo Molnar Cc: Al Viro, linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel [add cc's] Hi scheduler people: This is relatively easy for me to reproduce. Any hints for debugging it? Could we really have a bug in which processes that are schedulable as a result of mutex unlock aren't always reliably scheduled? --Andy On Thu, Dec 24, 2015 at 2:51 AM, Dominique Martinet <dominique.martinet@cea.fr> wrote: > Andy Lutomirski wrote on Thu, Dec 17, 2015: >> This could be QEMU's analysis script screwing up. Is there a good way >> for me to gather more info? > > I finally took some time to reproduce it (sorry for the delay) > > Using your config, virtme commit (17363c2) and kernel tag v4.4-rc3 I was > able to reproduce it just fine with my qemu (2.4.90) > > Now for the fun bit... I ran it with a gdb server, attaching gdb and > running cont always 'unblocks' it > Using the kernel gdb scripts (lx-ps) I see about 250 kworker threads > running, the backtraces all look the same: > > [ 20.273945] [<ffffffff818c3910>] schedule+0x30/0x80 > [ 20.274644] [<ffffffff818c3c39>] schedule_preempt_disabled+0x9/0x10 > [ 20.275539] [<ffffffff818c6147>] __mutex_lock_slowpath+0x107/0x2f0 > [ 20.276421] [<ffffffff811cf02e>] ? lookup_fast+0xbe/0x320 > [ 20.277195] [<ffffffff818c6345>] mutex_lock+0x15/0x30 > [ 20.277916] [<ffffffff811d0df7>] walk_component+0x1a7/0x270 > > > so given it unblocks after hooking gdb + cont I'm actually thinking this > might be a pure scheduling issue? (e.g. thread is never re-scheduled or > something like that?) > I can't see any task not in schedule() in your sysrq dump task > transcript either. > > > Not sure how to go around debugging that, to be honest. > I've tried both default one virtual cpu and -smp 3 or 4 and both can > reproduce it; cpu usage on the host is always low so it doesn't look > like there's any busy-polling involved.. This is a pretty subtle bug we > have there.. > > -- > Dominique Martinet -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2015-12-30 6:43 ` Andy Lutomirski @ 2016-01-04 15:59 ` Peter Zijlstra 2016-01-04 16:09 ` Dominique Martinet 0 siblings, 1 reply; 16+ messages in thread From: Peter Zijlstra @ 2016-01-04 15:59 UTC (permalink / raw) To: Andy Lutomirski Cc: Dominique Martinet, Thomas Gleixner, Ingo Molnar, Al Viro, linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel On Tue, Dec 29, 2015 at 10:43:26PM -0800, Andy Lutomirski wrote: > [add cc's] > > Hi scheduler people: > > This is relatively easy for me to reproduce. Any hints for debugging > it? Could we really have a bug in which processes that are > schedulable as a result of mutex unlock aren't always reliably > scheduled? I would expect that to cause wide-spread fail, then again, virt is known to tickle timing issues that are improbable on actual hardware so anything is possible. Does it reproduce with DEBUG_MUTEXES set? (I'm not seeing a .config here). If its really easy you could start by tracing events/sched/sched_switch events/sched/sched_wakeup, those would be the actual scheduling events. Without DEBUG_MUTEXES there's the MUTEX_SPIN_ON_OWNER code that could still confuse things, but that's mutex internal and not scheduler related. If it ends up being the SPIN_ON_OWNER bits we'll have to cook up some extra debug patches. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2016-01-04 15:59 ` Peter Zijlstra @ 2016-01-04 16:09 ` Dominique Martinet 2016-01-04 19:07 ` Andy Lutomirski 0 siblings, 1 reply; 16+ messages in thread From: Dominique Martinet @ 2016-01-04 16:09 UTC (permalink / raw) To: Peter Zijlstra Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Al Viro, linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel [-- Attachment #1: Type: text/plain, Size: 1325 bytes --] Peter Zijlstra wrote on Mon, Jan 04, 2016 at 04:59:15PM +0100: > On Tue, Dec 29, 2015 at 10:43:26PM -0800, Andy Lutomirski wrote: > > [add cc's] > > > > Hi scheduler people: > > > > This is relatively easy for me to reproduce. Any hints for debugging > > it? Could we really have a bug in which processes that are > > schedulable as a result of mutex unlock aren't always reliably > > scheduled? > > I would expect that to cause wide-spread fail, then again, virt is known > to tickle timing issues that are improbable on actual hardware so > anything is possible. > > Does it reproduce with DEBUG_MUTEXES set? (I'm not seeing a .config > here). The config has CONFIG_DEBUG_MUTEXES=y It got attached a while ago, reposting it here. > If its really easy you could start by tracing events/sched/sched_switch > events/sched/sched_wakeup, those would be the actual scheduling events. I'm sure I've missed something in /Documentation but I'm not aware how to trace these? (I'm happy to save Andy some precious time as I've got a reproducer all set up now) > Without DEBUG_MUTEXES there's the MUTEX_SPIN_ON_OWNER code that could > still confuse things, but that's mutex internal and not scheduler > related. > > If it ends up being the SPIN_ON_OWNER bits we'll have to cook up some > extra debug patches. -- Dominique [-- Attachment #2: bad-config.xz --] [-- Type: application/octet-stream, Size: 21352 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2016-01-04 16:09 ` Dominique Martinet @ 2016-01-04 19:07 ` Andy Lutomirski 2016-01-04 19:27 ` Peter Zijlstra 0 siblings, 1 reply; 16+ messages in thread From: Andy Lutomirski @ 2016-01-04 19:07 UTC (permalink / raw) To: Dominique Martinet Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Al Viro, linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel On Mon, Jan 4, 2016 at 8:09 AM, Dominique Martinet <dominique.martinet@cea.fr> wrote: > Peter Zijlstra wrote on Mon, Jan 04, 2016 at 04:59:15PM +0100: >> On Tue, Dec 29, 2015 at 10:43:26PM -0800, Andy Lutomirski wrote: >> > [add cc's] >> > >> > Hi scheduler people: >> > >> > This is relatively easy for me to reproduce. Any hints for debugging >> > it? Could we really have a bug in which processes that are >> > schedulable as a result of mutex unlock aren't always reliably >> > scheduled? >> >> I would expect that to cause wide-spread fail, then again, virt is known >> to tickle timing issues that are improbable on actual hardware so >> anything is possible. >> >> Does it reproduce with DEBUG_MUTEXES set? (I'm not seeing a .config >> here). > > The config has CONFIG_DEBUG_MUTEXES=y > > It got attached a while ago, reposting it here. > >> If its really easy you could start by tracing events/sched/sched_switch >> events/sched/sched_wakeup, those would be the actual scheduling events. > > I'm sure I've missed something in /Documentation but I'm not aware how > to trace these? (I'm happy to save Andy some precious time as I've got a > reproducer all set up now) My reproducer, at least, would make this tricky -- the system ends up mostly hung, so I don't know how I'd read out the result. Maybe I'd try to get something to dump the ftrace buffer to serial console after a delay and stick all that in initramfs where it wouldn't get stuck behind the same mutex as everything else. Or is there a way to tell the kernel to do that for us? --Andy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race 2016-01-04 19:07 ` Andy Lutomirski @ 2016-01-04 19:27 ` Peter Zijlstra 0 siblings, 0 replies; 16+ messages in thread From: Peter Zijlstra @ 2016-01-04 19:27 UTC (permalink / raw) To: Andy Lutomirski Cc: Dominique Martinet, Thomas Gleixner, Ingo Molnar, Al Viro, linux-kernel@vger.kernel.org, V9FS Developers, Linux FS Devel On Mon, Jan 04, 2016 at 11:07:43AM -0800, Andy Lutomirski wrote: > On Mon, Jan 4, 2016 at 8:09 AM, Dominique Martinet > <dominique.martinet@cea.fr> wrote: > > Peter Zijlstra wrote on Mon, Jan 04, 2016 at 04:59:15PM +0100: > >> On Tue, Dec 29, 2015 at 10:43:26PM -0800, Andy Lutomirski wrote: > >> > [add cc's] > >> > > >> > Hi scheduler people: > >> > > >> > This is relatively easy for me to reproduce. Any hints for debugging > >> > it? Could we really have a bug in which processes that are > >> > schedulable as a result of mutex unlock aren't always reliably > >> > scheduled? > >> > >> I would expect that to cause wide-spread fail, then again, virt is known > >> to tickle timing issues that are improbable on actual hardware so > >> anything is possible. > >> > >> Does it reproduce with DEBUG_MUTEXES set? (I'm not seeing a .config > >> here). > > > > The config has CONFIG_DEBUG_MUTEXES=y > > > > It got attached a while ago, reposting it here. > > > >> If its really easy you could start by tracing events/sched/sched_switch > >> events/sched/sched_wakeup, those would be the actual scheduling events. > > > > I'm sure I've missed something in /Documentation but I'm not aware how > > to trace these? (I'm happy to save Andy some precious time as I've got a > > reproducer all set up now) > > My reproducer, at least, would make this tricky -- the system ends up > mostly hung, so I don't know how I'd read out the result. Maybe I'd > try to get something to dump the ftrace buffer to serial console after > a delay and stick all that in initramfs where it wouldn't get stuck > behind the same mutex as everything else. > > Or is there a way to tell the kernel to do that for us? If you can generate a core, I think crash knows how to read the ftrace buffers from it. http://people.redhat.com/anderson/extensions/trace_help_trace.html But yes, you can use one of the watchdog thingies to dump the buffers over 'serial' too, but I suspect that will take a little longer, even with virtual serial ports. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-01-04 19:27 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-12-04 5:52 Hang triggered by udev coldplug, looks like a race Andy Lutomirski 2015-12-07 19:18 ` Andy Lutomirski 2015-12-07 22:46 ` [V9fs-developer] " Dominique Martinet 2015-12-08 1:59 ` Andy Lutomirski 2015-12-08 2:33 ` Al Viro 2015-12-08 22:49 ` Andy Lutomirski 2015-12-09 6:23 ` Dominique Martinet 2015-12-09 6:41 ` Dominique Martinet 2015-12-09 6:45 ` Al Viro 2015-12-17 21:06 ` Andy Lutomirski 2015-12-24 10:51 ` Dominique Martinet 2015-12-30 6:43 ` Andy Lutomirski 2016-01-04 15:59 ` Peter Zijlstra 2016-01-04 16:09 ` Dominique Martinet 2016-01-04 19:07 ` Andy Lutomirski 2016-01-04 19:27 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).