[syzbot] INFO: task hung in freeze

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [syzbot] INFO: task hung in freeze_super (3)
@ 2022-12-09  9:08 syzbot
  2023-01-04 13:47 ` Tetsuo Handa
  0 siblings, 1 reply; 3+ messages in thread
From: syzbot @ 2022-12-09  9:08 UTC (permalink / raw)
  To: agruenba, cluster-devel, linux-kernel, rpeterso, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    f3e8416619ce Merge tag 'soc-fixes-6.1-5' of git://git.kern..
git tree:       upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=12fb534d880000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d58e7fe7f9cf5e24
dashboard link: https://syzkaller.appspot.com/bug?extid=f51cb4b9afbd87ec06f2
compiler:       Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=123d216b880000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16e46f5b880000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/628abc27cbe7/disk-f3e84166.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/2f19ea836174/vmlinux-f3e84166.xz
kernel image: https://storage.googleapis.com/syzbot-assets/f2e1347e85a5/bzImage-f3e84166.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/c239a19c8749/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+f51cb4b9afbd87ec06f2@syzkaller.appspotmail.com

INFO: task kworker/0:1H:120 blocked for more than 143 seconds.
      Not tainted 6.1.0-rc8-syzkaller-00035-gf3e8416619ce #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:1H    state:D stack:23480 pid:120   ppid:2      flags:0x00004000
Workqueue: glock_workqueue glock_work_func
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5209 [inline]
 __schedule+0x8c9/0xd70 kernel/sched/core.c:6521
 schedule+0xcb/0x190 kernel/sched/core.c:6597
 rwsem_down_write_slowpath+0xfc1/0x1480 kernel/locking/rwsem.c:1190
 __down_write_common kernel/locking/rwsem.c:1305 [inline]
 __down_write kernel/locking/rwsem.c:1314 [inline]
 down_write+0x231/0x270 kernel/locking/rwsem.c:1563
 freeze_super+0x45/0x420 fs/super.c:1697
 freeze_go_sync+0x178/0x340 fs/gfs2/glops.c:573
 do_xmote+0x34d/0x13d0 fs/gfs2/glock.c:776
 glock_work_func+0x2c2/0x450 fs/gfs2/glock.c:1082
 process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
 kthread+0x266/0x300 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
 </TASK>
INFO: task syz-executor330:3631 blocked for more than 143 seconds.
      Not tainted 6.1.0-rc8-syzkaller-00035-gf3e8416619ce #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor330 state:D stack:20176 pid:3631  ppid:3630   flags:0x00004004
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5209 [inline]
 __schedule+0x8c9/0xd70 kernel/sched/core.c:6521
 schedule+0xcb/0x190 kernel/sched/core.c:6597
 schedule_timeout+0x98/0x2f0 kernel/time/timer.c:1911
 do_wait_for_common+0x3ea/0x560 kernel/sched/completion.c:85
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x46/0x60 kernel/sched/completion.c:138
 __flush_workqueue+0x74a/0x1680 kernel/workqueue.c:2861
 gfs2_gl_hash_clear+0xbe/0x300 fs/gfs2/glock.c:2262
 gfs2_fill_super+0x2202/0x2700 fs/gfs2/ops_fstype.c:1307
 get_tree_bdev+0x400/0x620 fs/super.c:1324
 gfs2_get_tree+0x50/0x210 fs/gfs2/ops_fstype.c:1330
 vfs_get_tree+0x88/0x270 fs/super.c:1531
 do_new_mount+0x289/0xad0 fs/namespace.c:3040
 do_mount fs/namespace.c:3383 [inline]
 __do_sys_mount fs/namespace.c:3591 [inline]
 __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3568
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f19da3f458a
RSP: 002b:00007ffd6b2d69e8 EFLAGS: 00000282 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f19da3f458a
RDX: 0000000020000040 RSI: 000000002003b300 RDI: 00007ffd6b2d6a00
RBP: 00007ffd6b2d6a00 R08: 00007ffd6b2d6a40 R09: 0000000000000002
R10: 0000000000000000 R11: 0000000000000282 R12: 0000000000000004
R13: 0000555556b0d2c0 R14: 0000000000000000 R15: 00007ffd6b2d6a40
 </TASK>

Showing all locks held in the system:
1 lock held by rcu_tasks_kthre/12:
 #0: ffffffff8d127330 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0xd00 kernel/rcu/tasks.h:507
1 lock held by rcu_tasks_trace/13:
 #0: ffffffff8d127b30 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0xd00 kernel/rcu/tasks.h:507
1 lock held by khungtaskd/28:
 #0: ffffffff8d127160 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x30
3 locks held by kworker/0:1H/120:
 #0: ffff8880194f5d38 ((wq_completion)glock_workqueue){+.+.}-{0:0}, at: process_one_work+0x7f2/0xdb0
 #1: ffffc90002587d00 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x831/0xdb0 kernel/workqueue.c:2264
 #2: ffff888026c2c0e0 (&type->s_umount_key#42){+.+.}-{3:3}, at: freeze_super+0x45/0x420 fs/super.c:1697
2 locks held by getty/3308:
 #0: ffff888028211098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x21/0x70 drivers/tty/tty_ldisc.c:244
 #1: ffffc900031262f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x53b/0x1650 drivers/tty/n_tty.c:2177
1 lock held by syz-executor330/3631:
 #0: ffff888026c2c0e0 (&type->s_umount_key#41/1){+.+.}-{3:3}, at: alloc_super+0x212/0x920 fs/super.c:228

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 28 Comm: khungtaskd Not tainted 6.1.0-rc8-syzkaller-00035-gf3e8416619ce #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
 nmi_cpu_backtrace+0x46f/0x4f0 lib/nmi_backtrace.c:111
 nmi_trigger_cpumask_backtrace+0x1ba/0x420 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:148 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:220 [inline]
 watchdog+0xcf5/0xd40 kernel/hung_task.c:377
 kthread+0x266/0x300 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
 </TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.0-rc8-syzkaller-00035-gf3e8416619ce #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:debug_smp_processor_id+0x7/0x10 lib/smp_processor_id.c:60
Code: e8 0e 97 d7 f9 e9 50 ff ff ff 48 c7 c7 00 44 1c 8d 48 89 de e8 fa 96 d7 f9 e9 64 ff ff ff cc cc cc cc cc 48 c7 c7 40 3f 4b 8b <48> c7 c6 80 3f 4b 8b eb 00 41 57 41 56 41 54 53 48 83 ec 10 65 48
RSP: 0018:ffffc90000177a88 EFLAGS: 00000093
RAX: ffffffff8179e533 RBX: ffff8880b992839d RCX: ffff888012a29d40
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff8b4b3f40
RBP: ffffffffffffff01 R08: dffffc0000000000 R09: ffffed1017325069
R10: ffffed1017325069 R11: 1ffff11017325068 R12: ffff8880b9928340
R13: 1ffff11017325071 R14: 00000042d0b03c33 R15: 00000042d6a61d33
FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b144b2f000 CR3: 000000000ce8e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 hrtimer_get_next_event+0x18/0x810 kernel/time/hrtimer.c:1481
 cmp_next_hrtimer_event kernel/time/timer.c:1654 [inline]
 get_next_timer_interrupt+0x311/0x430 kernel/time/timer.c:1737
 tick_nohz_next_event+0x383/0x5f0 kernel/time/tick-sched.c:821
 tick_nohz_get_sleep_length+0x132/0x1f0 kernel/time/tick-sched.c:1238
 menu_select+0x5d1/0x1d30 drivers/cpuidle/governors/menu.c:283
 cpuidle_idle_call kernel/sched/idle.c:229 [inline]
 do_idle+0x339/0x640 kernel/sched/idle.c:303
 cpu_startup_entry+0x15/0x20 kernel/sched/idle.c:400
 start_secondary+0xe4/0xf0 arch/x86/kernel/smpboot.c:262
 secondary_startup_64_no_verify+0xcf/0xdb
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [syzbot] INFO: task hung in freeze_super (3)
  2022-12-09  9:08 [syzbot] INFO: task hung in freeze_super (3) syzbot
@ 2023-01-04 13:47 ` Tetsuo Handa
  0 siblings, 0 replies; 3+ messages in thread
From: Tetsuo Handa @ 2023-01-04 13:47 UTC (permalink / raw)
  To: agruenba, cluster-devel, rpeterso
  Cc: Hillf Danton, Waiman Long, syzbot, syzkaller-bugs

syzbot is reporting hung task at freeze_super() after emitting
"gfs2: fsid=loop0.0: can't make FS RW: -5" message due to gfs2_find_jhead()
 from gfs2_make_fs_rw() from gfs2_fill_super() failing with -EIO.

When hung task message is printed, I can observe that glock_workqueue is
trying to hold type->s_umount_key from glock_workqueue work function
whereas the reproducer is trying to flush glock_workqueue workqueue with
type->s_umount_key held; leading to silent deadlock.

[  259.867348] task:kworker/3:1H    state:D stack:13736 pid:2497  ppid:2      flags:0x00004000
[  259.870666] Workqueue: glock_workqueue glock_work_func
[  259.872886] Call Trace:
[  259.874279]  <TASK>
[  259.875452]  __schedule+0x498/0x590
[  259.877255]  schedule+0x55/0x90
[  259.878932]  rwsem_down_write_slowpath+0x385/0x760
[  259.881206]  freeze_super+0x29/0x1f0
[  259.883022]  freeze_go_sync+0xa6/0x1f0
[  259.884905]  do_xmote+0x1ae/0xa60
[  259.886625]  glock_work_func+0x19a/0x220
[  259.888559]  process_one_work+0x306/0x530
[  259.890548]  worker_thread+0x357/0x630
[  259.892443]  kthread+0x133/0x170
[  259.894430]  ? rcu_lock_release+0x30/0x30
[  259.896408]  ? kthread_blkcg+0x60/0x60
[  259.898289]  ret_from_fork+0x1f/0x30
[  259.900114]  </TASK>
[  259.901407] 3 locks held by kworker/3:1H/2497:
[  259.903534]  #0: ffff88810653c338 ((wq_completion)glock_workqueue){+.+.}-{0:0}, at: process_one_work+0x2a7/0x530
[  259.907874]  #1: ffffc900013dbe58 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x2d1/0x530
[  259.912529]  #2: ffff88812be0d0e0 (&type->s_umount_key#51){+.+.}-{3:3}, at: freeze_super+0x29/0x1f0

[  259.975640] task:a.out           state:D stack:12664 pid:4579  ppid:3638   flags:0x00004006
[  259.979256] Call Trace:
[  259.980634]  <TASK>
[  259.981886]  __schedule+0x498/0x590
[  259.983653]  schedule+0x55/0x90
[  259.985291]  schedule_timeout+0x9e/0x1d0
[  259.987219]  do_wait_for_common+0xff/0x180
[  259.989212]  ? console_conditional_schedule+0x40/0x40
[  259.991585]  wait_for_completion+0x4a/0x60
[  259.993573]  __flush_workqueue+0x360/0x820
[  259.995584]  gfs2_gl_hash_clear+0x58/0x1b0
[  259.997579]  ? _raw_spin_unlock_irqrestore+0x43/0xb0
[  259.999883]  ? lockdep_hardirqs_on+0x99/0x140
[  260.001970]  gfs2_fill_super+0xe24/0x1110
[  260.003934]  ? gfs2_reconfigure+0x4d0/0x4d0
[  260.005955]  get_tree_bdev+0x228/0x2f0
[  260.007821]  gfs2_get_tree+0x2b/0xe0
[  260.009620]  vfs_get_tree+0x30/0xe0
[  260.011570]  do_new_mount+0x1d7/0x540
[  260.013408]  path_mount+0x3c5/0xb50
[  260.015176]  __se_sys_mount+0x298/0x2f0
[  260.017073]  do_syscall_64+0x41/0x90
[  260.018870]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  260.021262] RIP: 0033:0x7fc273f26eae
[  260.023077] RSP: 002b:00007ffffebc6f38 EFLAGS: 00000282 ORIG_RAX: 00000000000000a5
[  260.026419] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc273f26eae
[  260.029573] RDX: 00000000200124c0 RSI: 0000000020012500 RDI: 00007ffffebc6fb0
[  260.032728] RBP: 00007ffffebc7100 R08: 00007ffffebc6ff0 R09: 0000000000000000
[  260.035903] R10: 0000000000000000 R11: 0000000000000282 R12: 00007ffffebc7278
[  260.039063] R13: 0000559922083246 R14: 0000559922097cb8 R15: 00007fc27413b040
[  260.042241]  </TASK>
[  260.043522] 1 lock held by a.out/4579:
[  260.045379]  #0: ffff88812be0d0e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0x102/0x450

It is difficult to reproduce lockdep warning because this is timing dependent.
So far I was able to reproduce lockdep warning only once.

[  111.928183] [ T4537] gfs2: fsid=loop0.0: can't make FS RW: -5
[  111.931578] [   T88]
[  111.937591] [   T88] ======================================================
[  111.940620] [   T88] WARNING: possible circular locking dependency detected
[  111.943553] [   T88] 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28 Not tainted
[  111.946879] [   T88] ------------------------------------------------------
[  111.949832] [   T88] kworker/2:1H/88 is trying to acquire lock:
[  111.952383] [   T88] ffff888128ff50e0 (&type->s_umount_key#51){+.+.}-{3:3}, at: freeze_super+0x29/0x1f0
[  111.956324] [   T88]
[  111.956324] [   T88] but task is already holding lock:
[  111.959406] [   T88] ffffc90000bfbe58 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x2d1/0x530
[  111.964081] [   T88]
[  111.964081] [   T88] which lock already depends on the new lock.
[  111.964081] [   T88]
[  111.968219] [   T88]
[  111.968219] [   T88] the existing dependency chain (in reverse order) is:
[  111.971959] [   T88]
[  111.971959] [   T88] -> #2 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}:
[  111.976006] [   T88]        process_one_work+0x2f3/0x530
[  111.978307] [   T88]        worker_thread+0x357/0x630
[  111.980510] [   T88]        kthread+0x133/0x170
[  111.982518] [   T88]        ret_from_fork+0x1f/0x30
[  111.984621] [   T88]
[  111.984621] [   T88] -> #1 ((wq_completion)glock_workqueue){+.+.}-{0:0}:
[  111.987992] [   T88]        __flush_workqueue+0x107/0x820
[  111.990137] [   T88]        gfs2_gl_hash_clear+0x3d/0x1b0
[  111.992431] [   T88]        gfs2_put_super+0x3a2/0x3f0
[  111.994399] [   T88]        generic_shutdown_super+0x9a/0x1c0
[  111.996554] [   T88]        kill_block_super+0x2d/0x70
[  111.999015] [   T88]        deactivate_locked_super+0x5e/0xb0
[  112.001467] [   T88]        cleanup_mnt+0x1c9/0x220
[  112.003591] [   T88]        task_work_run+0xe4/0x130
[  112.005749] [   T88]        exit_to_user_mode_loop+0x123/0x150
[  112.008224] [   T88]        exit_to_user_mode_prepare+0xaa/0x130
[  112.010765] [   T88]        syscall_exit_to_user_mode+0x32/0x70
[  112.013457] [   T88]        do_syscall_64+0x4d/0x90
[  112.015811] [   T88]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  112.018497] [   T88]
[  112.018497] [   T88] -> #0 (&type->s_umount_key#51){+.+.}-{3:3}:
[  112.021921] [   T88]        __lock_acquire+0x1712/0x33b0
[  112.024223] [   T88]        lock_acquire+0xd3/0x200
[  112.026342] [   T88]        down_write+0x3b/0xd0
[  112.028384] [   T88]        freeze_super+0x29/0x1f0
[  112.030503] [   T88]        freeze_go_sync+0xa6/0x1f0
[  112.032651] [   T88]        do_xmote+0x1ae/0xa60
[  112.034420] [   T88]        glock_work_func+0x19a/0x220
[  112.036411] [   T88]        process_one_work+0x306/0x530
[  112.038635] [   T88]        worker_thread+0x357/0x630
[  112.040702] [   T88]        kthread+0x133/0x170
[  112.042509] [   T88]        ret_from_fork+0x1f/0x30
[  112.044388] [   T88]
[  112.044388] [   T88] other info that might help us debug this:
[  112.044388] [   T88]
[  112.048102] [   T88] Chain exists of:
[  112.048102] [   T88]   &type->s_umount_key#51 --> (wq_completion)glock_workqueue --> (work_completion)(&(&gl->gl_work)->work)
[  112.048102] [   T88]
[  112.054316] [   T88]  Possible unsafe locking scenario:
[  112.054316] [   T88]
[  112.057425] [   T88]        CPU0                    CPU1
[  112.059722] [   T88]        ----                    ----
[  112.062002] [   T88]   lock((work_completion)(&(&gl->gl_work)->work));
[  112.064732] [   T88]                                lock((wq_completion)glock_workqueue);
[  112.068131] [   T88]                                lock((work_completion)(&(&gl->gl_work)->work));
[  112.071852] [   T88]   lock(&type->s_umount_key#51);
[  112.074014] [   T88]
[  112.074014] [   T88]  *** DEADLOCK ***
[  112.074014] [   T88]
[  112.077358] [   T88] 2 locks held by kworker/2:1H/88:
[  112.079541] [   T88]  #0: ffff88812530f538 ((wq_completion)glock_workqueue){+.+.}-{0:0}, at: process_one_work+0x2a7/0x530
[  112.083975] [   T88]  #1: ffffc90000bfbe58 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x2d1/0x530
[  112.088673] [   T88]
[  112.088673] [   T88] stack backtrace:
[  112.091159] [   T88] CPU: 2 PID: 88 Comm: kworker/2:1H Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28
[  112.095547] [   T88] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  112.099240] [   T88] Workqueue: glock_workqueue glock_work_func
[  112.101745] [   T88] Call Trace:
[  112.103257] [   T88]  <TASK>
[  112.104640] [   T88]  dump_stack_lvl+0xfe/0x190
[  112.106676] [   T88]  check_noncircular+0x12e/0x140
[  112.112220] [   T88]  __lock_acquire+0x1712/0x33b0
[  112.114390] [   T88]  ? __lock_acquire+0x65f/0x33b0
[  112.116857] [   T88]  lock_acquire+0xd3/0x200
[  112.118861] [   T88]  ? freeze_super+0x29/0x1f0
[  112.124130] [   T88]  down_write+0x3b/0xd0
[  112.127778] [   T88]  ? freeze_super+0x29/0x1f0
[  112.129819] [   T88]  freeze_super+0x29/0x1f0
[  112.134526] [   T88]  freeze_go_sync+0xa6/0x1f0
[  112.136585] [   T88]  do_xmote+0x1ae/0xa60
[  112.138651] [   T88]  glock_work_func+0x19a/0x220
[  112.140970] [   T88]  process_one_work+0x306/0x530
[  112.143112] [   T88]  worker_thread+0x357/0x630
[  112.145129] [   T88]  kthread+0x133/0x170
[  112.146954] [   T88]  ? rcu_lock_release+0x30/0x30
[  112.149534] [   T88]  ? kthread_blkcg+0x60/0x60
[  112.151495] [   T88]  ret_from_fork+0x1f/0x30
[  112.153850] [   T88]  </TASK>
[  256.043767] [   T59] INFO: task kworker/2:1H:88 blocked for more than 141 seconds.
[  256.049591] [   T59]       Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28
[  256.053325] [   T59] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  256.056783] [   T59] task:kworker/2:1H    state:D stack:12792 pid:88    ppid:2      flags:0x00004000
[  256.060412] [   T59] Workqueue: glock_workqueue glock_work_func
[  256.062783] [   T59] Call Trace:
[  256.064008] [   T59]  <TASK>
[  256.065150] [   T59]  __schedule+0x498/0x590
[  256.066916] [   T59]  schedule+0x55/0x90
[  256.068647] [   T59]  rwsem_down_write_slowpath+0x385/0x760
[  256.072030] [   T59]  freeze_super+0x29/0x1f0
[  256.073874] [   T59]  freeze_go_sync+0xa6/0x1f0
[  256.075760] [   T59]  do_xmote+0x1ae/0xa60
[  256.077500] [   T59]  glock_work_func+0x19a/0x220
[  256.079462] [   T59]  process_one_work+0x306/0x530
[  256.081727] [   T59]  worker_thread+0x357/0x630
[  256.083619] [   T59]  kthread+0x133/0x170
[  256.085371] [   T59]  ? rcu_lock_release+0x30/0x30
[  256.087391] [   T59]  ? kthread_blkcg+0x60/0x60
[  256.089277] [   T59]  ret_from_fork+0x1f/0x30
[  256.093784] [   T59]  </TASK>
[  256.095131] [   T59] INFO: task a.out:4537 blocked for more than 141 seconds.
[  256.098069] [   T59]       Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28
[  256.101464] [   T59] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  256.104916] [   T59] task:a.out           state:D stack:12664 pid:4537  ppid:4531   flags:0x00004006
[  256.108532] [   T59] Call Trace:
[  256.109916] [   T59]  <TASK>
[  256.111161] [   T59]  __schedule+0x498/0x590
[  256.112959] [   T59]  schedule+0x55/0x90
[  256.114589] [   T59]  schedule_timeout+0x9e/0x1d0
[  256.116519] [   T59]  do_wait_for_common+0xff/0x180
[  256.118515] [   T59]  ? console_conditional_schedule+0x40/0x40
[  256.120860] [   T59]  wait_for_completion+0x4a/0x60
[  256.122860] [   T59]  __flush_workqueue+0x360/0x820
[  256.124855] [   T59]  gfs2_gl_hash_clear+0x58/0x1b0
[  256.126868] [   T59]  ? _raw_spin_unlock_irqrestore+0x43/0xb0
[  256.129180] [   T59]  ? lockdep_hardirqs_on+0x99/0x140
[  256.131264] [   T59]  gfs2_fill_super+0xe24/0x1110
[  256.133222] [   T59]  ? gfs2_reconfigure+0x4d0/0x4d0
[  256.135239] [   T59]  get_tree_bdev+0x228/0x2f0
[  256.137296] [   T59]  gfs2_get_tree+0x2b/0xe0
[  256.139122] [   T59]  vfs_get_tree+0x30/0xe0
[  256.140879] [   T59]  do_new_mount+0x1d7/0x540
[  256.142707] [   T59]  path_mount+0x3c5/0xb50
[  256.144461] [   T59]  __se_sys_mount+0x298/0x2f0
[  256.146345] [   T59]  do_syscall_64+0x41/0x90
[  256.148148] [   T59]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  256.150484] [   T59] RIP: 0033:0x7faf87b26eae
[  256.152285] [   T59] RSP: 002b:00007ffdb0d40f68 EFLAGS: 00000282 ORIG_RAX: 00000000000000a5
[  256.155594] [   T59] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007faf87b26eae
[  256.160847] [   T59] RDX: 00000000200124c0 RSI: 0000000020012500 RDI: 00007ffdb0d40fe0
[  256.164184] [   T59] RBP: 00007ffdb0d41130 R08: 00007ffdb0d41020 R09: 0000000000000000
[  256.167404] [   T59] R10: 0000000000000000 R11: 0000000000000282 R12: 00007ffdb0d412a8
[  256.170913] [   T59] R13: 00005574b9254246 R14: 00005574b9268cb8 R15: 00007faf87da3040
[  256.174771] [   T59]  </TASK>
[  256.176029] [   T59] INFO: lockdep is turned off.
[  256.178522] [   T59] NMI backtrace for cpu 7
[  256.180317] [   T59] CPU: 7 PID: 59 Comm: khungtaskd Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28
[  256.184614] [   T59] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  256.188220] [   T59] Call Trace:
[  256.189626] [   T59]  <TASK>
[  256.190893] [   T59]  dump_stack_lvl+0xfe/0x190
[  256.194767] [   T59]  nmi_cpu_backtrace+0x25d/0x2c0
[  256.196804] [   T59]  ? arch_trigger_cpumask_backtrace+0x10/0x10
[  256.199337] [   T59]  nmi_trigger_cpumask_backtrace+0x11f/0x250
[  256.202150] [   T59]  watchdog+0x8a5/0x8e0
[  256.203918] [   T59]  kthread+0x133/0x170
[  256.205681] [   T59]  ? hungtask_pm_notify+0x90/0x90
[  256.207783] [   T59]  ? kthread_blkcg+0x60/0x60
[  256.209689] [   T59]  ret_from_fork+0x1f/0x30
[  256.211690] [   T59]  </TASK>
[  256.213052] [   T59] Sending NMI from CPU 7 to CPUs 0-6:
[  256.215232] [    C4] NMI backtrace for cpu 4 skipped: idling at default_idle+0x13/0x20
[  256.218391] [    C6] NMI backtrace for cpu 6 skipped: idling at default_idle+0x13/0x20
[  256.218378] [    C4] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 3.145 msecs
[  256.218378] [    C2] NMI backtrace for cpu 2 skipped: idling at default_idle+0x13/0x20
[  256.218766] [    C0] NMI backtrace for cpu 0 skipped: idling at default_idle+0x13/0x20
[  256.218889] [    C1] NMI backtrace for cpu 1 skipped: idling at default_idle+0x13/0x20
[  256.219085] [    C5] NMI backtrace for cpu 5 skipped: idling at default_idle+0x13/0x20
[  256.219230] [    C3] NMI backtrace for cpu 3 skipped: idling at default_idle+0x13/0x20

I suspect that cleanup is not done appropriately when gfs2_find_jhead() failed.
Looking into gfs2_make_fs_rw(), I see there are two worrisome things.

One is that gfs2_make_fs_rw() returns an error without calling gfs2_consist(sdp) when
gfs2_find_jhead() returned an error whereas gfs2_make_fs_rw() returns -EIO after calling
gfs2_consist(sdp) when head.lh_flags does not contain GFS2_LOG_HEAD_UNMOUNT flag.

Since head.lh_flags is assigned by gfs2_find_jhead(), we might want to call gfs2_consist(sdp)
when gfs2_find_jhead() returned an error. And actually

--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -138,7 +138,11 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp)
                return -EIO;

        error = gfs2_find_jhead(sdp->sd_jdesc, &head, false);
-       if (error || gfs2_withdrawn(sdp))
+       if (error) {
+               gfs2_consist(sdp);
+               return error;
+       }
+       if (gfs2_withdrawn(sdp))
                return error;

        if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {

avoids this deadlock. But I don't know when/how to use gfs2_withdraw().
Please check if this change is appropriate.

The other worrisome thing is that gfs2_make_fs_rw() is returning 0 (and
mount operation will continue) when gfs2_withdrawn() == true. Can the caller
of gfs2_make_fs_rw() survive when callgfs2_make_fs_rw() returned 0 without
processing

        /*  Initialize some head of the log stuff  */
        sdp->sd_log_sequence = head.lh_sequence + 1;
        gfs2_log_pointers_init(sdp, head.lh_blkno);

lines? Shouldn't the caller of gfs2_make_fs_rw() observe an error when
gfs2_make_fs_rw() did not execute the

	set_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);

line due to gfs2_withdrawn() == true?


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [syzbot] INFO: task hung in freeze_super (3)
       [not found] <20230104040132.4103-1-hdanton@sina.com>
@ 2023-01-08 11:11 ` Tetsuo Handa
  0 siblings, 0 replies; 3+ messages in thread
From: Tetsuo Handa @ 2023-01-08 11:11 UTC (permalink / raw)
  To: Hillf Danton, Tejun Heo
  Cc: agruenba, linux-mm, Waiman Long, linux-kernel, rpeterso,
	syzkaller-bugs, syzbot

On 2023/01/04 13:01, Hillf Danton wrote:
> On 09 Dec 2022 01:08:38 -0800
>> syzbot found the following issue on:
>>
>> HEAD commit:    f3e8416619ce Merge tag 'soc-fixes-6.1-5' of git://git.kern..
>> git tree:       upstream
>> console+strace: https://syzkaller.appspot.com/x/log.txt?x=12fb534d880000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=d58e7fe7f9cf5e24
>> dashboard link: https://syzkaller.appspot.com/bug?extid=f51cb4b9afbd87ec06f2
>> compiler:       Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=123d216b880000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16e46f5b880000
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/628abc27cbe7/disk-f3e84166.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/2f19ea836174/vmlinux-f3e84166.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/f2e1347e85a5/bzImage-f3e84166.xz
>> mounted in repro: https://storage.googleapis.com/syzbot-assets/c239a19c8749/mount_0.gz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+f51cb4b9afbd87ec06f2@syzkaller.appspotmail.com
>>
>> INFO: task kworker/0:1H:120 blocked for more than 143 seconds.
>>       Not tainted 6.1.0-rc8-syzkaller-00035-gf3e8416619ce #0
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:kworker/0:1H    state:D stack:23480 pid:120   ppid:2      flags:0x00004000
>> Workqueue: glock_workqueue glock_work_func
>> Call Trace:
>>  <TASK>
>>  context_switch kernel/sched/core.c:5209 [inline]
>>  __schedule+0x8c9/0xd70 kernel/sched/core.c:6521
>>  schedule+0xcb/0x190 kernel/sched/core.c:6597
>>  rwsem_down_write_slowpath+0xfc1/0x1480 kernel/locking/rwsem.c:1190
>>  __down_write_common kernel/locking/rwsem.c:1305 [inline]
>>  __down_write kernel/locking/rwsem.c:1314 [inline]
>>  down_write+0x231/0x270 kernel/locking/rwsem.c:1563
>>  freeze_super+0x45/0x420 fs/super.c:1697
>>  freeze_go_sync+0x178/0x340 fs/gfs2/glops.c:573
>>  do_xmote+0x34d/0x13d0 fs/gfs2/glock.c:776
>>  glock_work_func+0x2c2/0x450 fs/gfs2/glock.c:1082
>>  process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
>>  worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
>>  kthread+0x266/0x300 kernel/kthread.c:376
>>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
>>  </TASK>
>> INFO: task syz-executor330:3631 blocked for more than 143 seconds.
>>       Not tainted 6.1.0-rc8-syzkaller-00035-gf3e8416619ce #0
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:syz-executor330 state:D stack:20176 pid:3631  ppid:3630   flags:0x00004004
>> Call Trace:
>>  <TASK>
>>  context_switch kernel/sched/core.c:5209 [inline]
>>  __schedule+0x8c9/0xd70 kernel/sched/core.c:6521
>>  schedule+0xcb/0x190 kernel/sched/core.c:6597
>>  schedule_timeout+0x98/0x2f0 kernel/time/timer.c:1911
>>  do_wait_for_common+0x3ea/0x560 kernel/sched/completion.c:85
>>  __wait_for_common kernel/sched/completion.c:106 [inline]
>>  wait_for_common kernel/sched/completion.c:117 [inline]
>>  wait_for_completion+0x46/0x60 kernel/sched/completion.c:138
>>  __flush_workqueue+0x74a/0x1680 kernel/workqueue.c:2861
>>  gfs2_gl_hash_clear+0xbe/0x300 fs/gfs2/glock.c:2262
>>  gfs2_fill_super+0x2202/0x2700 fs/gfs2/ops_fstype.c:1307
>>  get_tree_bdev+0x400/0x620 fs/super.c:1324
>>  gfs2_get_tree+0x50/0x210 fs/gfs2/ops_fstype.c:1330
>>  vfs_get_tree+0x88/0x270 fs/super.c:1531
>>  do_new_mount+0x289/0xad0 fs/namespace.c:3040
>>  do_mount fs/namespace.c:3383 [inline]
>>  __do_sys_mount fs/namespace.c:3591 [inline]
>>  __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3568
>>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> RIP: 0033:0x7f19da3f458a
>> RSP: 002b:00007ffd6b2d69e8 EFLAGS: 00000282 ORIG_RAX: 00000000000000a5
>> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f19da3f458a
>> RDX: 0000000020000040 RSI: 000000002003b300 RDI: 00007ffd6b2d6a00
>> RBP: 00007ffd6b2d6a00 R08: 00007ffd6b2d6a40 R09: 0000000000000002
>> R10: 0000000000000000 R11: 0000000000000282 R12: 0000000000000004
>> R13: 0000555556b0d2c0 R14: 0000000000000000 R15: 00007ffd6b2d6a40
>>  </TASK>
>>
>> Showing all locks held in the system:
>> 1 lock held by rcu_tasks_kthre/12:
>>  #0: ffffffff8d127330 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0xd00 kernel/rcu/tasks.h:507
>> 1 lock held by rcu_tasks_trace/13:
>>  #0: ffffffff8d127b30 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0xd00 kernel/rcu/tasks.h:507
>> 1 lock held by khungtaskd/28:
>>  #0: ffffffff8d127160 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x30
>> 3 locks held by kworker/0:1H/120:
>>  #0: ffff8880194f5d38 ((wq_completion)glock_workqueue){+.+.}-{0:0}, at: process_one_work+0x7f2/0xdb0
>>  #1: ffffc90002587d00 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x831/0xdb0 kernel/workqueue.c:2264
>>  #2: ffff888026c2c0e0 (&type->s_umount_key#42){+.+.}-{3:3}, at: freeze_super+0x45/0x420 fs/super.c:1697
>> 2 locks held by getty/3308:
>>  #0: ffff888028211098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x21/0x70 drivers/tty/tty_ldisc.c:244
>>  #1: ffffc900031262f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x53b/0x1650 drivers/tty/n_tty.c:2177
>> 1 lock held by syz-executor330/3631:
>>  #0: ffff888026c2c0e0 (&type->s_umount_key#41/1){+.+.}-{3:3}, at: alloc_super+0x212/0x920 fs/super.c:228
>>
> 
> 	syz-executor330/3631			kworker/0:1H/120
> 	---					---
> 	down_write_nested(&s->s_umount);
> 						glock_work_func()
> 						down_write(&sb->s_umount);
> 	flush_workqueue(glock_workqueue);
> 
> 
> No deadlock is reported yet. Whyyy?
> See if extending the aperture of wq->lockdep_map can catch it.

As far as I tested, this patch did not change likeliness of hitting lockdep warning.
I feel that it was just by chance that this test request was able to hit lockdep warning.

> 
> #syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git  master
> 
> --- x/kernel/workqueue.c
> +++ y/kernel/workqueue.c
> @@ -2807,7 +2807,6 @@ void __flush_workqueue(struct workqueue_
>  		return;
>  
>  	lock_map_acquire(&wq->lockdep_map);
> -	lock_map_release(&wq->lockdep_map);
>  
>  	mutex_lock(&wq->mutex);
>  
> @@ -2867,7 +2866,7 @@ void __flush_workqueue(struct workqueue_
>  	 * handling overflow.  Non-first flushers can simply return.
>  	 */
>  	if (READ_ONCE(wq->first_flusher) != &this_flusher)
> -		return;
> +		goto out;
>  
>  	mutex_lock(&wq->mutex);
>  
> @@ -2942,6 +2941,8 @@ void __flush_workqueue(struct workqueue_
>  
>  out_unlock:
>  	mutex_unlock(&wq->mutex);
> +out:
> +	lock_map_release(&wq->lockdep_map);
>  }
>  EXPORT_SYMBOL(__flush_workqueue);
>  
> --


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-01-08 11:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-09  9:08 [syzbot] INFO: task hung in freeze_super (3) syzbot
2023-01-04 13:47 ` Tetsuo Handa
     [not found] <20230104040132.4103-1-hdanton@sina.com>
2023-01-08 11:11 ` Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox