* [syzbot] INFO: task hung in freeze_super (3)
@ 2022-12-09 9:08 syzbot
2023-01-04 13:47 ` Tetsuo Handa
0 siblings, 1 reply; 3+ messages in thread
From: syzbot @ 2022-12-09 9:08 UTC (permalink / raw)
To: agruenba, cluster-devel, linux-kernel, rpeterso, syzkaller-bugs
Hello,
syzbot found the following issue on:
HEAD commit: f3e8416619ce Merge tag 'soc-fixes-6.1-5' of git://git.kern..
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=12fb534d880000
kernel config: https://syzkaller.appspot.com/x/.config?x=d58e7fe7f9cf5e24
dashboard link: https://syzkaller.appspot.com/bug?extid=f51cb4b9afbd87ec06f2
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=123d216b880000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16e46f5b880000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/628abc27cbe7/disk-f3e84166.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/2f19ea836174/vmlinux-f3e84166.xz
kernel image: https://storage.googleapis.com/syzbot-assets/f2e1347e85a5/bzImage-f3e84166.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/c239a19c8749/mount_0.gz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+f51cb4b9afbd87ec06f2@syzkaller.appspotmail.com
INFO: task kworker/0:1H:120 blocked for more than 143 seconds.
Not tainted 6.1.0-rc8-syzkaller-00035-gf3e8416619ce #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:1H state:D stack:23480 pid:120 ppid:2 flags:0x00004000
Workqueue: glock_workqueue glock_work_func
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5209 [inline]
__schedule+0x8c9/0xd70 kernel/sched/core.c:6521
schedule+0xcb/0x190 kernel/sched/core.c:6597
rwsem_down_write_slowpath+0xfc1/0x1480 kernel/locking/rwsem.c:1190
__down_write_common kernel/locking/rwsem.c:1305 [inline]
__down_write kernel/locking/rwsem.c:1314 [inline]
down_write+0x231/0x270 kernel/locking/rwsem.c:1563
freeze_super+0x45/0x420 fs/super.c:1697
freeze_go_sync+0x178/0x340 fs/gfs2/glops.c:573
do_xmote+0x34d/0x13d0 fs/gfs2/glock.c:776
glock_work_func+0x2c2/0x450 fs/gfs2/glock.c:1082
process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
INFO: task syz-executor330:3631 blocked for more than 143 seconds.
Not tainted 6.1.0-rc8-syzkaller-00035-gf3e8416619ce #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor330 state:D stack:20176 pid:3631 ppid:3630 flags:0x00004004
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5209 [inline]
__schedule+0x8c9/0xd70 kernel/sched/core.c:6521
schedule+0xcb/0x190 kernel/sched/core.c:6597
schedule_timeout+0x98/0x2f0 kernel/time/timer.c:1911
do_wait_for_common+0x3ea/0x560 kernel/sched/completion.c:85
__wait_for_common kernel/sched/completion.c:106 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion+0x46/0x60 kernel/sched/completion.c:138
__flush_workqueue+0x74a/0x1680 kernel/workqueue.c:2861
gfs2_gl_hash_clear+0xbe/0x300 fs/gfs2/glock.c:2262
gfs2_fill_super+0x2202/0x2700 fs/gfs2/ops_fstype.c:1307
get_tree_bdev+0x400/0x620 fs/super.c:1324
gfs2_get_tree+0x50/0x210 fs/gfs2/ops_fstype.c:1330
vfs_get_tree+0x88/0x270 fs/super.c:1531
do_new_mount+0x289/0xad0 fs/namespace.c:3040
do_mount fs/namespace.c:3383 [inline]
__do_sys_mount fs/namespace.c:3591 [inline]
__se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3568
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f19da3f458a
RSP: 002b:00007ffd6b2d69e8 EFLAGS: 00000282 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f19da3f458a
RDX: 0000000020000040 RSI: 000000002003b300 RDI: 00007ffd6b2d6a00
RBP: 00007ffd6b2d6a00 R08: 00007ffd6b2d6a40 R09: 0000000000000002
R10: 0000000000000000 R11: 0000000000000282 R12: 0000000000000004
R13: 0000555556b0d2c0 R14: 0000000000000000 R15: 00007ffd6b2d6a40
</TASK>
Showing all locks held in the system:
1 lock held by rcu_tasks_kthre/12:
#0: ffffffff8d127330 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0xd00 kernel/rcu/tasks.h:507
1 lock held by rcu_tasks_trace/13:
#0: ffffffff8d127b30 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0xd00 kernel/rcu/tasks.h:507
1 lock held by khungtaskd/28:
#0: ffffffff8d127160 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x30
3 locks held by kworker/0:1H/120:
#0: ffff8880194f5d38 ((wq_completion)glock_workqueue){+.+.}-{0:0}, at: process_one_work+0x7f2/0xdb0
#1: ffffc90002587d00 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x831/0xdb0 kernel/workqueue.c:2264
#2: ffff888026c2c0e0 (&type->s_umount_key#42){+.+.}-{3:3}, at: freeze_super+0x45/0x420 fs/super.c:1697
2 locks held by getty/3308:
#0: ffff888028211098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x21/0x70 drivers/tty/tty_ldisc.c:244
#1: ffffc900031262f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x53b/0x1650 drivers/tty/n_tty.c:2177
1 lock held by syz-executor330/3631:
#0: ffff888026c2c0e0 (&type->s_umount_key#41/1){+.+.}-{3:3}, at: alloc_super+0x212/0x920 fs/super.c:228
=============================================
NMI backtrace for cpu 0
CPU: 0 PID: 28 Comm: khungtaskd Not tainted 6.1.0-rc8-syzkaller-00035-gf3e8416619ce #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
nmi_cpu_backtrace+0x46f/0x4f0 lib/nmi_backtrace.c:111
nmi_trigger_cpumask_backtrace+0x1ba/0x420 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:148 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:220 [inline]
watchdog+0xcf5/0xd40 kernel/hung_task.c:377
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.0-rc8-syzkaller-00035-gf3e8416619ce #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:debug_smp_processor_id+0x7/0x10 lib/smp_processor_id.c:60
Code: e8 0e 97 d7 f9 e9 50 ff ff ff 48 c7 c7 00 44 1c 8d 48 89 de e8 fa 96 d7 f9 e9 64 ff ff ff cc cc cc cc cc 48 c7 c7 40 3f 4b 8b <48> c7 c6 80 3f 4b 8b eb 00 41 57 41 56 41 54 53 48 83 ec 10 65 48
RSP: 0018:ffffc90000177a88 EFLAGS: 00000093
RAX: ffffffff8179e533 RBX: ffff8880b992839d RCX: ffff888012a29d40
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff8b4b3f40
RBP: ffffffffffffff01 R08: dffffc0000000000 R09: ffffed1017325069
R10: ffffed1017325069 R11: 1ffff11017325068 R12: ffff8880b9928340
R13: 1ffff11017325071 R14: 00000042d0b03c33 R15: 00000042d6a61d33
FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b144b2f000 CR3: 000000000ce8e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
hrtimer_get_next_event+0x18/0x810 kernel/time/hrtimer.c:1481
cmp_next_hrtimer_event kernel/time/timer.c:1654 [inline]
get_next_timer_interrupt+0x311/0x430 kernel/time/timer.c:1737
tick_nohz_next_event+0x383/0x5f0 kernel/time/tick-sched.c:821
tick_nohz_get_sleep_length+0x132/0x1f0 kernel/time/tick-sched.c:1238
menu_select+0x5d1/0x1d30 drivers/cpuidle/governors/menu.c:283
cpuidle_idle_call kernel/sched/idle.c:229 [inline]
do_idle+0x339/0x640 kernel/sched/idle.c:303
cpu_startup_entry+0x15/0x20 kernel/sched/idle.c:400
start_secondary+0xe4/0xf0 arch/x86/kernel/smpboot.c:262
secondary_startup_64_no_verify+0xcf/0xdb
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [syzbot] INFO: task hung in freeze_super (3)
2022-12-09 9:08 [syzbot] INFO: task hung in freeze_super (3) syzbot
@ 2023-01-04 13:47 ` Tetsuo Handa
0 siblings, 0 replies; 3+ messages in thread
From: Tetsuo Handa @ 2023-01-04 13:47 UTC (permalink / raw)
To: agruenba, cluster-devel, rpeterso
Cc: Hillf Danton, Waiman Long, syzbot, syzkaller-bugs
syzbot is reporting hung task at freeze_super() after emitting
"gfs2: fsid=loop0.0: can't make FS RW: -5" message due to gfs2_find_jhead()
from gfs2_make_fs_rw() from gfs2_fill_super() failing with -EIO.
When hung task message is printed, I can observe that glock_workqueue is
trying to hold type->s_umount_key from glock_workqueue work function
whereas the reproducer is trying to flush glock_workqueue workqueue with
type->s_umount_key held; leading to silent deadlock.
[ 259.867348] task:kworker/3:1H state:D stack:13736 pid:2497 ppid:2 flags:0x00004000
[ 259.870666] Workqueue: glock_workqueue glock_work_func
[ 259.872886] Call Trace:
[ 259.874279] <TASK>
[ 259.875452] __schedule+0x498/0x590
[ 259.877255] schedule+0x55/0x90
[ 259.878932] rwsem_down_write_slowpath+0x385/0x760
[ 259.881206] freeze_super+0x29/0x1f0
[ 259.883022] freeze_go_sync+0xa6/0x1f0
[ 259.884905] do_xmote+0x1ae/0xa60
[ 259.886625] glock_work_func+0x19a/0x220
[ 259.888559] process_one_work+0x306/0x530
[ 259.890548] worker_thread+0x357/0x630
[ 259.892443] kthread+0x133/0x170
[ 259.894430] ? rcu_lock_release+0x30/0x30
[ 259.896408] ? kthread_blkcg+0x60/0x60
[ 259.898289] ret_from_fork+0x1f/0x30
[ 259.900114] </TASK>
[ 259.901407] 3 locks held by kworker/3:1H/2497:
[ 259.903534] #0: ffff88810653c338 ((wq_completion)glock_workqueue){+.+.}-{0:0}, at: process_one_work+0x2a7/0x530
[ 259.907874] #1: ffffc900013dbe58 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x2d1/0x530
[ 259.912529] #2: ffff88812be0d0e0 (&type->s_umount_key#51){+.+.}-{3:3}, at: freeze_super+0x29/0x1f0
[ 259.975640] task:a.out state:D stack:12664 pid:4579 ppid:3638 flags:0x00004006
[ 259.979256] Call Trace:
[ 259.980634] <TASK>
[ 259.981886] __schedule+0x498/0x590
[ 259.983653] schedule+0x55/0x90
[ 259.985291] schedule_timeout+0x9e/0x1d0
[ 259.987219] do_wait_for_common+0xff/0x180
[ 259.989212] ? console_conditional_schedule+0x40/0x40
[ 259.991585] wait_for_completion+0x4a/0x60
[ 259.993573] __flush_workqueue+0x360/0x820
[ 259.995584] gfs2_gl_hash_clear+0x58/0x1b0
[ 259.997579] ? _raw_spin_unlock_irqrestore+0x43/0xb0
[ 259.999883] ? lockdep_hardirqs_on+0x99/0x140
[ 260.001970] gfs2_fill_super+0xe24/0x1110
[ 260.003934] ? gfs2_reconfigure+0x4d0/0x4d0
[ 260.005955] get_tree_bdev+0x228/0x2f0
[ 260.007821] gfs2_get_tree+0x2b/0xe0
[ 260.009620] vfs_get_tree+0x30/0xe0
[ 260.011570] do_new_mount+0x1d7/0x540
[ 260.013408] path_mount+0x3c5/0xb50
[ 260.015176] __se_sys_mount+0x298/0x2f0
[ 260.017073] do_syscall_64+0x41/0x90
[ 260.018870] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 260.021262] RIP: 0033:0x7fc273f26eae
[ 260.023077] RSP: 002b:00007ffffebc6f38 EFLAGS: 00000282 ORIG_RAX: 00000000000000a5
[ 260.026419] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc273f26eae
[ 260.029573] RDX: 00000000200124c0 RSI: 0000000020012500 RDI: 00007ffffebc6fb0
[ 260.032728] RBP: 00007ffffebc7100 R08: 00007ffffebc6ff0 R09: 0000000000000000
[ 260.035903] R10: 0000000000000000 R11: 0000000000000282 R12: 00007ffffebc7278
[ 260.039063] R13: 0000559922083246 R14: 0000559922097cb8 R15: 00007fc27413b040
[ 260.042241] </TASK>
[ 260.043522] 1 lock held by a.out/4579:
[ 260.045379] #0: ffff88812be0d0e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0x102/0x450
It is difficult to reproduce lockdep warning because this is timing dependent.
So far I was able to reproduce lockdep warning only once.
[ 111.928183] [ T4537] gfs2: fsid=loop0.0: can't make FS RW: -5
[ 111.931578] [ T88]
[ 111.937591] [ T88] ======================================================
[ 111.940620] [ T88] WARNING: possible circular locking dependency detected
[ 111.943553] [ T88] 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28 Not tainted
[ 111.946879] [ T88] ------------------------------------------------------
[ 111.949832] [ T88] kworker/2:1H/88 is trying to acquire lock:
[ 111.952383] [ T88] ffff888128ff50e0 (&type->s_umount_key#51){+.+.}-{3:3}, at: freeze_super+0x29/0x1f0
[ 111.956324] [ T88]
[ 111.956324] [ T88] but task is already holding lock:
[ 111.959406] [ T88] ffffc90000bfbe58 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x2d1/0x530
[ 111.964081] [ T88]
[ 111.964081] [ T88] which lock already depends on the new lock.
[ 111.964081] [ T88]
[ 111.968219] [ T88]
[ 111.968219] [ T88] the existing dependency chain (in reverse order) is:
[ 111.971959] [ T88]
[ 111.971959] [ T88] -> #2 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}:
[ 111.976006] [ T88] process_one_work+0x2f3/0x530
[ 111.978307] [ T88] worker_thread+0x357/0x630
[ 111.980510] [ T88] kthread+0x133/0x170
[ 111.982518] [ T88] ret_from_fork+0x1f/0x30
[ 111.984621] [ T88]
[ 111.984621] [ T88] -> #1 ((wq_completion)glock_workqueue){+.+.}-{0:0}:
[ 111.987992] [ T88] __flush_workqueue+0x107/0x820
[ 111.990137] [ T88] gfs2_gl_hash_clear+0x3d/0x1b0
[ 111.992431] [ T88] gfs2_put_super+0x3a2/0x3f0
[ 111.994399] [ T88] generic_shutdown_super+0x9a/0x1c0
[ 111.996554] [ T88] kill_block_super+0x2d/0x70
[ 111.999015] [ T88] deactivate_locked_super+0x5e/0xb0
[ 112.001467] [ T88] cleanup_mnt+0x1c9/0x220
[ 112.003591] [ T88] task_work_run+0xe4/0x130
[ 112.005749] [ T88] exit_to_user_mode_loop+0x123/0x150
[ 112.008224] [ T88] exit_to_user_mode_prepare+0xaa/0x130
[ 112.010765] [ T88] syscall_exit_to_user_mode+0x32/0x70
[ 112.013457] [ T88] do_syscall_64+0x4d/0x90
[ 112.015811] [ T88] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 112.018497] [ T88]
[ 112.018497] [ T88] -> #0 (&type->s_umount_key#51){+.+.}-{3:3}:
[ 112.021921] [ T88] __lock_acquire+0x1712/0x33b0
[ 112.024223] [ T88] lock_acquire+0xd3/0x200
[ 112.026342] [ T88] down_write+0x3b/0xd0
[ 112.028384] [ T88] freeze_super+0x29/0x1f0
[ 112.030503] [ T88] freeze_go_sync+0xa6/0x1f0
[ 112.032651] [ T88] do_xmote+0x1ae/0xa60
[ 112.034420] [ T88] glock_work_func+0x19a/0x220
[ 112.036411] [ T88] process_one_work+0x306/0x530
[ 112.038635] [ T88] worker_thread+0x357/0x630
[ 112.040702] [ T88] kthread+0x133/0x170
[ 112.042509] [ T88] ret_from_fork+0x1f/0x30
[ 112.044388] [ T88]
[ 112.044388] [ T88] other info that might help us debug this:
[ 112.044388] [ T88]
[ 112.048102] [ T88] Chain exists of:
[ 112.048102] [ T88] &type->s_umount_key#51 --> (wq_completion)glock_workqueue --> (work_completion)(&(&gl->gl_work)->work)
[ 112.048102] [ T88]
[ 112.054316] [ T88] Possible unsafe locking scenario:
[ 112.054316] [ T88]
[ 112.057425] [ T88] CPU0 CPU1
[ 112.059722] [ T88] ---- ----
[ 112.062002] [ T88] lock((work_completion)(&(&gl->gl_work)->work));
[ 112.064732] [ T88] lock((wq_completion)glock_workqueue);
[ 112.068131] [ T88] lock((work_completion)(&(&gl->gl_work)->work));
[ 112.071852] [ T88] lock(&type->s_umount_key#51);
[ 112.074014] [ T88]
[ 112.074014] [ T88] *** DEADLOCK ***
[ 112.074014] [ T88]
[ 112.077358] [ T88] 2 locks held by kworker/2:1H/88:
[ 112.079541] [ T88] #0: ffff88812530f538 ((wq_completion)glock_workqueue){+.+.}-{0:0}, at: process_one_work+0x2a7/0x530
[ 112.083975] [ T88] #1: ffffc90000bfbe58 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x2d1/0x530
[ 112.088673] [ T88]
[ 112.088673] [ T88] stack backtrace:
[ 112.091159] [ T88] CPU: 2 PID: 88 Comm: kworker/2:1H Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28
[ 112.095547] [ T88] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 112.099240] [ T88] Workqueue: glock_workqueue glock_work_func
[ 112.101745] [ T88] Call Trace:
[ 112.103257] [ T88] <TASK>
[ 112.104640] [ T88] dump_stack_lvl+0xfe/0x190
[ 112.106676] [ T88] check_noncircular+0x12e/0x140
[ 112.112220] [ T88] __lock_acquire+0x1712/0x33b0
[ 112.114390] [ T88] ? __lock_acquire+0x65f/0x33b0
[ 112.116857] [ T88] lock_acquire+0xd3/0x200
[ 112.118861] [ T88] ? freeze_super+0x29/0x1f0
[ 112.124130] [ T88] down_write+0x3b/0xd0
[ 112.127778] [ T88] ? freeze_super+0x29/0x1f0
[ 112.129819] [ T88] freeze_super+0x29/0x1f0
[ 112.134526] [ T88] freeze_go_sync+0xa6/0x1f0
[ 112.136585] [ T88] do_xmote+0x1ae/0xa60
[ 112.138651] [ T88] glock_work_func+0x19a/0x220
[ 112.140970] [ T88] process_one_work+0x306/0x530
[ 112.143112] [ T88] worker_thread+0x357/0x630
[ 112.145129] [ T88] kthread+0x133/0x170
[ 112.146954] [ T88] ? rcu_lock_release+0x30/0x30
[ 112.149534] [ T88] ? kthread_blkcg+0x60/0x60
[ 112.151495] [ T88] ret_from_fork+0x1f/0x30
[ 112.153850] [ T88] </TASK>
[ 256.043767] [ T59] INFO: task kworker/2:1H:88 blocked for more than 141 seconds.
[ 256.049591] [ T59] Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28
[ 256.053325] [ T59] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 256.056783] [ T59] task:kworker/2:1H state:D stack:12792 pid:88 ppid:2 flags:0x00004000
[ 256.060412] [ T59] Workqueue: glock_workqueue glock_work_func
[ 256.062783] [ T59] Call Trace:
[ 256.064008] [ T59] <TASK>
[ 256.065150] [ T59] __schedule+0x498/0x590
[ 256.066916] [ T59] schedule+0x55/0x90
[ 256.068647] [ T59] rwsem_down_write_slowpath+0x385/0x760
[ 256.072030] [ T59] freeze_super+0x29/0x1f0
[ 256.073874] [ T59] freeze_go_sync+0xa6/0x1f0
[ 256.075760] [ T59] do_xmote+0x1ae/0xa60
[ 256.077500] [ T59] glock_work_func+0x19a/0x220
[ 256.079462] [ T59] process_one_work+0x306/0x530
[ 256.081727] [ T59] worker_thread+0x357/0x630
[ 256.083619] [ T59] kthread+0x133/0x170
[ 256.085371] [ T59] ? rcu_lock_release+0x30/0x30
[ 256.087391] [ T59] ? kthread_blkcg+0x60/0x60
[ 256.089277] [ T59] ret_from_fork+0x1f/0x30
[ 256.093784] [ T59] </TASK>
[ 256.095131] [ T59] INFO: task a.out:4537 blocked for more than 141 seconds.
[ 256.098069] [ T59] Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28
[ 256.101464] [ T59] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 256.104916] [ T59] task:a.out state:D stack:12664 pid:4537 ppid:4531 flags:0x00004006
[ 256.108532] [ T59] Call Trace:
[ 256.109916] [ T59] <TASK>
[ 256.111161] [ T59] __schedule+0x498/0x590
[ 256.112959] [ T59] schedule+0x55/0x90
[ 256.114589] [ T59] schedule_timeout+0x9e/0x1d0
[ 256.116519] [ T59] do_wait_for_common+0xff/0x180
[ 256.118515] [ T59] ? console_conditional_schedule+0x40/0x40
[ 256.120860] [ T59] wait_for_completion+0x4a/0x60
[ 256.122860] [ T59] __flush_workqueue+0x360/0x820
[ 256.124855] [ T59] gfs2_gl_hash_clear+0x58/0x1b0
[ 256.126868] [ T59] ? _raw_spin_unlock_irqrestore+0x43/0xb0
[ 256.129180] [ T59] ? lockdep_hardirqs_on+0x99/0x140
[ 256.131264] [ T59] gfs2_fill_super+0xe24/0x1110
[ 256.133222] [ T59] ? gfs2_reconfigure+0x4d0/0x4d0
[ 256.135239] [ T59] get_tree_bdev+0x228/0x2f0
[ 256.137296] [ T59] gfs2_get_tree+0x2b/0xe0
[ 256.139122] [ T59] vfs_get_tree+0x30/0xe0
[ 256.140879] [ T59] do_new_mount+0x1d7/0x540
[ 256.142707] [ T59] path_mount+0x3c5/0xb50
[ 256.144461] [ T59] __se_sys_mount+0x298/0x2f0
[ 256.146345] [ T59] do_syscall_64+0x41/0x90
[ 256.148148] [ T59] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 256.150484] [ T59] RIP: 0033:0x7faf87b26eae
[ 256.152285] [ T59] RSP: 002b:00007ffdb0d40f68 EFLAGS: 00000282 ORIG_RAX: 00000000000000a5
[ 256.155594] [ T59] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007faf87b26eae
[ 256.160847] [ T59] RDX: 00000000200124c0 RSI: 0000000020012500 RDI: 00007ffdb0d40fe0
[ 256.164184] [ T59] RBP: 00007ffdb0d41130 R08: 00007ffdb0d41020 R09: 0000000000000000
[ 256.167404] [ T59] R10: 0000000000000000 R11: 0000000000000282 R12: 00007ffdb0d412a8
[ 256.170913] [ T59] R13: 00005574b9254246 R14: 00005574b9268cb8 R15: 00007faf87da3040
[ 256.174771] [ T59] </TASK>
[ 256.176029] [ T59] INFO: lockdep is turned off.
[ 256.178522] [ T59] NMI backtrace for cpu 7
[ 256.180317] [ T59] CPU: 7 PID: 59 Comm: khungtaskd Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28
[ 256.184614] [ T59] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 256.188220] [ T59] Call Trace:
[ 256.189626] [ T59] <TASK>
[ 256.190893] [ T59] dump_stack_lvl+0xfe/0x190
[ 256.194767] [ T59] nmi_cpu_backtrace+0x25d/0x2c0
[ 256.196804] [ T59] ? arch_trigger_cpumask_backtrace+0x10/0x10
[ 256.199337] [ T59] nmi_trigger_cpumask_backtrace+0x11f/0x250
[ 256.202150] [ T59] watchdog+0x8a5/0x8e0
[ 256.203918] [ T59] kthread+0x133/0x170
[ 256.205681] [ T59] ? hungtask_pm_notify+0x90/0x90
[ 256.207783] [ T59] ? kthread_blkcg+0x60/0x60
[ 256.209689] [ T59] ret_from_fork+0x1f/0x30
[ 256.211690] [ T59] </TASK>
[ 256.213052] [ T59] Sending NMI from CPU 7 to CPUs 0-6:
[ 256.215232] [ C4] NMI backtrace for cpu 4 skipped: idling at default_idle+0x13/0x20
[ 256.218391] [ C6] NMI backtrace for cpu 6 skipped: idling at default_idle+0x13/0x20
[ 256.218378] [ C4] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 3.145 msecs
[ 256.218378] [ C2] NMI backtrace for cpu 2 skipped: idling at default_idle+0x13/0x20
[ 256.218766] [ C0] NMI backtrace for cpu 0 skipped: idling at default_idle+0x13/0x20
[ 256.218889] [ C1] NMI backtrace for cpu 1 skipped: idling at default_idle+0x13/0x20
[ 256.219085] [ C5] NMI backtrace for cpu 5 skipped: idling at default_idle+0x13/0x20
[ 256.219230] [ C3] NMI backtrace for cpu 3 skipped: idling at default_idle+0x13/0x20
I suspect that cleanup is not done appropriately when gfs2_find_jhead() failed.
Looking into gfs2_make_fs_rw(), I see there are two worrisome things.
One is that gfs2_make_fs_rw() returns an error without calling gfs2_consist(sdp) when
gfs2_find_jhead() returned an error whereas gfs2_make_fs_rw() returns -EIO after calling
gfs2_consist(sdp) when head.lh_flags does not contain GFS2_LOG_HEAD_UNMOUNT flag.
Since head.lh_flags is assigned by gfs2_find_jhead(), we might want to call gfs2_consist(sdp)
when gfs2_find_jhead() returned an error. And actually
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -138,7 +138,11 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp)
return -EIO;
error = gfs2_find_jhead(sdp->sd_jdesc, &head, false);
- if (error || gfs2_withdrawn(sdp))
+ if (error) {
+ gfs2_consist(sdp);
+ return error;
+ }
+ if (gfs2_withdrawn(sdp))
return error;
if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {
avoids this deadlock. But I don't know when/how to use gfs2_withdraw().
Please check if this change is appropriate.
The other worrisome thing is that gfs2_make_fs_rw() is returning 0 (and
mount operation will continue) when gfs2_withdrawn() == true. Can the caller
of gfs2_make_fs_rw() survive when callgfs2_make_fs_rw() returned 0 without
processing
/* Initialize some head of the log stuff */
sdp->sd_log_sequence = head.lh_sequence + 1;
gfs2_log_pointers_init(sdp, head.lh_blkno);
lines? Shouldn't the caller of gfs2_make_fs_rw() observe an error when
gfs2_make_fs_rw() did not execute the
set_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
line due to gfs2_withdrawn() == true?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [syzbot] INFO: task hung in freeze_super (3)
[not found] <20230104040132.4103-1-hdanton@sina.com>
@ 2023-01-08 11:11 ` Tetsuo Handa
0 siblings, 0 replies; 3+ messages in thread
From: Tetsuo Handa @ 2023-01-08 11:11 UTC (permalink / raw)
To: Hillf Danton, Tejun Heo
Cc: agruenba, linux-mm, Waiman Long, linux-kernel, rpeterso,
syzkaller-bugs, syzbot
On 2023/01/04 13:01, Hillf Danton wrote:
> On 09 Dec 2022 01:08:38 -0800
>> syzbot found the following issue on:
>>
>> HEAD commit: f3e8416619ce Merge tag 'soc-fixes-6.1-5' of git://git.kern..
>> git tree: upstream
>> console+strace: https://syzkaller.appspot.com/x/log.txt?x=12fb534d880000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=d58e7fe7f9cf5e24
>> dashboard link: https://syzkaller.appspot.com/bug?extid=f51cb4b9afbd87ec06f2
>> compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=123d216b880000
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16e46f5b880000
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/628abc27cbe7/disk-f3e84166.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/2f19ea836174/vmlinux-f3e84166.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/f2e1347e85a5/bzImage-f3e84166.xz
>> mounted in repro: https://storage.googleapis.com/syzbot-assets/c239a19c8749/mount_0.gz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+f51cb4b9afbd87ec06f2@syzkaller.appspotmail.com
>>
>> INFO: task kworker/0:1H:120 blocked for more than 143 seconds.
>> Not tainted 6.1.0-rc8-syzkaller-00035-gf3e8416619ce #0
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:kworker/0:1H state:D stack:23480 pid:120 ppid:2 flags:0x00004000
>> Workqueue: glock_workqueue glock_work_func
>> Call Trace:
>> <TASK>
>> context_switch kernel/sched/core.c:5209 [inline]
>> __schedule+0x8c9/0xd70 kernel/sched/core.c:6521
>> schedule+0xcb/0x190 kernel/sched/core.c:6597
>> rwsem_down_write_slowpath+0xfc1/0x1480 kernel/locking/rwsem.c:1190
>> __down_write_common kernel/locking/rwsem.c:1305 [inline]
>> __down_write kernel/locking/rwsem.c:1314 [inline]
>> down_write+0x231/0x270 kernel/locking/rwsem.c:1563
>> freeze_super+0x45/0x420 fs/super.c:1697
>> freeze_go_sync+0x178/0x340 fs/gfs2/glops.c:573
>> do_xmote+0x34d/0x13d0 fs/gfs2/glock.c:776
>> glock_work_func+0x2c2/0x450 fs/gfs2/glock.c:1082
>> process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
>> worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
>> kthread+0x266/0x300 kernel/kthread.c:376
>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
>> </TASK>
>> INFO: task syz-executor330:3631 blocked for more than 143 seconds.
>> Not tainted 6.1.0-rc8-syzkaller-00035-gf3e8416619ce #0
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:syz-executor330 state:D stack:20176 pid:3631 ppid:3630 flags:0x00004004
>> Call Trace:
>> <TASK>
>> context_switch kernel/sched/core.c:5209 [inline]
>> __schedule+0x8c9/0xd70 kernel/sched/core.c:6521
>> schedule+0xcb/0x190 kernel/sched/core.c:6597
>> schedule_timeout+0x98/0x2f0 kernel/time/timer.c:1911
>> do_wait_for_common+0x3ea/0x560 kernel/sched/completion.c:85
>> __wait_for_common kernel/sched/completion.c:106 [inline]
>> wait_for_common kernel/sched/completion.c:117 [inline]
>> wait_for_completion+0x46/0x60 kernel/sched/completion.c:138
>> __flush_workqueue+0x74a/0x1680 kernel/workqueue.c:2861
>> gfs2_gl_hash_clear+0xbe/0x300 fs/gfs2/glock.c:2262
>> gfs2_fill_super+0x2202/0x2700 fs/gfs2/ops_fstype.c:1307
>> get_tree_bdev+0x400/0x620 fs/super.c:1324
>> gfs2_get_tree+0x50/0x210 fs/gfs2/ops_fstype.c:1330
>> vfs_get_tree+0x88/0x270 fs/super.c:1531
>> do_new_mount+0x289/0xad0 fs/namespace.c:3040
>> do_mount fs/namespace.c:3383 [inline]
>> __do_sys_mount fs/namespace.c:3591 [inline]
>> __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3568
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> RIP: 0033:0x7f19da3f458a
>> RSP: 002b:00007ffd6b2d69e8 EFLAGS: 00000282 ORIG_RAX: 00000000000000a5
>> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f19da3f458a
>> RDX: 0000000020000040 RSI: 000000002003b300 RDI: 00007ffd6b2d6a00
>> RBP: 00007ffd6b2d6a00 R08: 00007ffd6b2d6a40 R09: 0000000000000002
>> R10: 0000000000000000 R11: 0000000000000282 R12: 0000000000000004
>> R13: 0000555556b0d2c0 R14: 0000000000000000 R15: 00007ffd6b2d6a40
>> </TASK>
>>
>> Showing all locks held in the system:
>> 1 lock held by rcu_tasks_kthre/12:
>> #0: ffffffff8d127330 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0xd00 kernel/rcu/tasks.h:507
>> 1 lock held by rcu_tasks_trace/13:
>> #0: ffffffff8d127b30 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0xd00 kernel/rcu/tasks.h:507
>> 1 lock held by khungtaskd/28:
>> #0: ffffffff8d127160 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x30
>> 3 locks held by kworker/0:1H/120:
>> #0: ffff8880194f5d38 ((wq_completion)glock_workqueue){+.+.}-{0:0}, at: process_one_work+0x7f2/0xdb0
>> #1: ffffc90002587d00 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x831/0xdb0 kernel/workqueue.c:2264
>> #2: ffff888026c2c0e0 (&type->s_umount_key#42){+.+.}-{3:3}, at: freeze_super+0x45/0x420 fs/super.c:1697
>> 2 locks held by getty/3308:
>> #0: ffff888028211098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x21/0x70 drivers/tty/tty_ldisc.c:244
>> #1: ffffc900031262f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x53b/0x1650 drivers/tty/n_tty.c:2177
>> 1 lock held by syz-executor330/3631:
>> #0: ffff888026c2c0e0 (&type->s_umount_key#41/1){+.+.}-{3:3}, at: alloc_super+0x212/0x920 fs/super.c:228
>>
>
> syz-executor330/3631 kworker/0:1H/120
> --- ---
> down_write_nested(&s->s_umount);
> glock_work_func()
> down_write(&sb->s_umount);
> flush_workqueue(glock_workqueue);
>
>
> No deadlock is reported yet. Whyyy?
> See if extending the aperture of wq->lockdep_map can catch it.
As far as I tested, this patch did not change likeliness of hitting lockdep warning.
I feel that it was just by chance that this test request was able to hit lockdep warning.
>
> #syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>
> --- x/kernel/workqueue.c
> +++ y/kernel/workqueue.c
> @@ -2807,7 +2807,6 @@ void __flush_workqueue(struct workqueue_
> return;
>
> lock_map_acquire(&wq->lockdep_map);
> - lock_map_release(&wq->lockdep_map);
>
> mutex_lock(&wq->mutex);
>
> @@ -2867,7 +2866,7 @@ void __flush_workqueue(struct workqueue_
> * handling overflow. Non-first flushers can simply return.
> */
> if (READ_ONCE(wq->first_flusher) != &this_flusher)
> - return;
> + goto out;
>
> mutex_lock(&wq->mutex);
>
> @@ -2942,6 +2941,8 @@ void __flush_workqueue(struct workqueue_
>
> out_unlock:
> mutex_unlock(&wq->mutex);
> +out:
> + lock_map_release(&wq->lockdep_map);
> }
> EXPORT_SYMBOL(__flush_workqueue);
>
> --
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-01-08 11:12 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-09 9:08 [syzbot] INFO: task hung in freeze_super (3) syzbot
2023-01-04 13:47 ` Tetsuo Handa
[not found] <20230104040132.4103-1-hdanton@sina.com>
2023-01-08 11:11 ` Tetsuo Handa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox