* [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
@ 2024-05-28 17:54 syzbot
2024-05-28 18:10 ` Jeff Layton
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: syzbot @ 2024-05-28 17:54 UTC (permalink / raw)
To: Dai.Ngo, chuck.lever, jlayton, kolga, linux-kernel, linux-nfs,
neilb, syzkaller-bugs, tom
Hello,
syzbot found the following issue on:
HEAD commit: 72ece20127a3 Merge tag 'f2fs-for-6.10.rc1' of git://git.ke..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=125f397c980000
kernel config: https://syzkaller.appspot.com/x/.config?x=6be91306a8917025
dashboard link: https://syzkaller.appspot.com/bug?extid=d1e76d963f757db40f91
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14be4d58980000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/e0adb12828a5/disk-72ece201.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/08a675560ce7/vmlinux-72ece201.xz
kernel image: https://storage.googleapis.com/syzbot-assets/b7124361f0f7/bzImage-72ece201.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d1e76d963f757db40f91@syzkaller.appspotmail.com
INFO: task syz-executor:6423 blocked for more than 143 seconds.
Not tainted 6.9.0-syzkaller-10215-g72ece20127a3 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor state:D stack:26544 pid:6423 tgid:6423 ppid:6410 flags:0x00000006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5408 [inline]
__schedule+0x1796/0x4a00 kernel/sched/core.c:6745
__schedule_loop kernel/sched/core.c:6822 [inline]
schedule+0x14b/0x320 kernel/sched/core.c:6837
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6894
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752
nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x8e1/0xcb0 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
___sys_sendmsg net/socket.c:2639 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f3b90c7cee9
RSP: 002b:00007ffc41a63068 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f3b90dabf80 RCX: 00007f3b90c7cee9
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 00007f3b90cc949e R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f3b90dabf8c R14: 00007f3b90dabf80 R15: 0000000000000000
</TASK>
INFO: task syz-executor.1:6455 blocked for more than 143 seconds.
Not tainted 6.9.0-syzkaller-10215-g72ece20127a3 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.1 state:D stack:26544 pid:6455 tgid:6455 ppid:6425 flags:0x00000006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5408 [inline]
__schedule+0x1796/0x4a00 kernel/sched/core.c:6745
__schedule_loop kernel/sched/core.c:6822 [inline]
schedule+0x14b/0x320 kernel/sched/core.c:6837
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6894
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752
nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x8e1/0xcb0 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
___sys_sendmsg net/socket.c:2639 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f01a5c7cee9
RSP: 002b:00007fff7f6d4b28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f01a5dabf80 RCX: 00007f01a5c7cee9
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 00007f01a5cc949e R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f01a5dabf8c R14: 00007f01a5dabf80 R15: 0000000000000000
</TASK>
Showing all locks held in the system:
3 locks held by kworker/1:0/25:
#0: ffff8880b953e798 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559
#1: ffff8880b9528948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x441/0x770 kernel/sched/psi.c:988
#2: ffff8880b9529430 (krc.lock){..-.}-{2:2}, at: kvfree_rcu_bulk+0x26b/0x4e0 kernel/rcu/tree.c:3383
1 lock held by khungtaskd/30:
#0: ffffffff8e333d20 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
#0: ffffffff8e333d20 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
#0: ffffffff8e333d20 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x55/0x2a0 kernel/locking/lockdep.c:6614
2 locks held by getty/4842:
#0: ffff88802f00a0a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
#1: ffffc900031332f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6b5/0x1e10 drivers/tty/n_tty.c:2201
4 locks held by kworker/0:4/5197:
#0: ffff888015080948 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3206 [inline]
#0: ffff888015080948 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x90a/0x1830 kernel/workqueue.c:3312
#1: ffffc90003b77d00 ((linkwatch_work).work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3207 [inline]
#1: ffffc90003b77d00 ((linkwatch_work).work){+.+.}-{0:0}, at: process_scheduled_works+0x945/0x1830 kernel/workqueue.c:3312
#2: ffffffff8f5d3fc8 (rtnl_mutex){+.+.}-{3:3}, at: linkwatch_event+0xe/0x60 net/core/link_watch.c:276
#3: ffffffff8e3390f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:291 [inline]
#3: ffffffff8e3390f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x381/0x830 kernel/rcu/tree_exp.h:939
2 locks held by kworker/u8:3/5328:
2 locks held by syz-executor.2/6408:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor/6423:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.1/6455:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.4/6482:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.3/6486:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.2/6501:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor/6506:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.1/6523:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.4/6545:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.3/6560:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.2/6584:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor/6589:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.1/6611:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.4/6632:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.3/6647:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.2/6671:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor/6676:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}
, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.1/6691:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.4/6713:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.3/6727:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.2/6751:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor/6756:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.1/6772:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
2 locks held by syz-executor.4/6793:
#0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
1 lock held by syz-executor.3/6795:
#0: ffffffff8f5d3fc8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
#0: ffffffff8f5d3fc8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
=============================================
NMI backtrace for cpu 0
CPU: 0 PID: 30 Comm: khungtaskd Not tainted 6.9.0-syzkaller-10215-g72ece20127a3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
nmi_cpu_backtrace+0x49c/0x4d0 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x198/0x320 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:162 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:223 [inline]
watchdog+0xfde/0x1020 kernel/hung_task.c:379
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 6795 Comm: syz-executor.3 Not tainted 6.9.0-syzkaller-10215-g72ece20127a3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
RIP: 0010:__slab_alloc_node mm/slub.c:3804 [inline]
RIP: 0010:slab_alloc_node mm/slub.c:3988 [inline]
RIP: 0010:__do_kmalloc_node mm/slub.c:4120 [inline]
RIP: 0010:kmalloc_node_track_caller_noprof+0x10d/0x450 mm/slub.c:4141
Code: e4 0f 84 2b 01 00 00 85 c0 0f 85 23 01 00 00 4c 89 7c 24 10 0f 1f 44 00 00 65 48 8b 05 3c e0 14 7e 49 8b 0c 24 48 8b 54 08 08 <48> 8b 2c 08 48 85 ed 0f 84 57 01 00 00 48 8b 7c 08 10 48 85 ff 0f
RSP: 0018:ffffc9000c0068b8 EFLAGS: 00000246
RAX: ffff8880b9500000 RBX: 0000000000000000 RCX: 00000000000426d0
RDX: 000000000005bc31 RSI: 0000000000000008 RDI: ffffffff8e443c60
RBP: 0000000000000000 R08: ffffffff8fac0c6f R09: 1ffffffff1f5818d
R10: dffffc0000000000 R11: fffffbfff1f5818e R12: ffff888015041500
R13: 0000000000000cc0 R14: 00000000ffffffff R15: 0000000000000005
FS: 000055556a2ec480(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5c15af311c CR3: 000000005ebda000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<TASK>
kvasprintf+0xdf/0x190 lib/kasprintf.c:25
kobject_set_name_vargs+0x61/0x120 lib/kobject.c:274
kobject_add_varg lib/kobject.c:368 [inline]
kobject_init_and_add+0xde/0x190 lib/kobject.c:457
rx_queue_add_kobject net/core/net-sysfs.c:1105 [inline]
net_rx_queue_update_kobjects+0x1cf/0x5b0 net/core/net-sysfs.c:1160
register_queue_kobjects net/core/net-sysfs.c:1895 [inline]
netdev_register_kobject+0x224/0x320 net/core/net-sysfs.c:2140
register_netdevice+0x11ff/0x1a10 net/core/dev.c:10374
veth_newlink+0x628/0xcd0 drivers/net/veth.c:1829
rtnl_newlink_create net/core/rtnetlink.c:3510 [inline]
__rtnl_newlink net/core/rtnetlink.c:3730 [inline]
rtnl_newlink+0x158f/0x20a0 net/core/rtnetlink.c:3743
rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6595
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x8e1/0xcb0 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
__sys_sendto+0x3a4/0x4f0 net/socket.c:2192
__do_sys_sendto net/socket.c:2204 [inline]
__se_sys_sendto net/socket.c:2200 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2200
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5c15a7ebdc
Code: 1a 51 02 00 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 60 51 02 00 48 8b
RSP: 002b:00007ffd345dd670 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f5c166d4620 RCX: 00007f5c15a7ebdc
RDX: 0000000000000068 RSI: 00007f5c166d4670 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007ffd345dd6c4 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
R13: 0000000000000000 R14: 00007f5c166d4670 R15: 0000000000000000
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-05-28 17:54 [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit syzbot
@ 2024-05-28 18:10 ` Jeff Layton
2024-08-04 23:16 ` syzbot
2024-08-31 18:22 ` syzbot
2 siblings, 0 replies; 16+ messages in thread
From: Jeff Layton @ 2024-05-28 18:10 UTC (permalink / raw)
To: syzbot, Dai.Ngo, chuck.lever, kolga, linux-kernel, linux-nfs,
neilb, syzkaller-bugs, tom
On Tue, 2024-05-28 at 10:54 -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 72ece20127a3 Merge tag 'f2fs-for-6.10.rc1' of
> git://git.ke..
> git tree: upstream
> console output:
> https://syzkaller.appspot.com/x/log.txt?x=125f397c980000
> kernel config:
> https://syzkaller.appspot.com/x/.config?x=6be91306a8917025
> dashboard link:
> https://syzkaller.appspot.com/bug?extid=d1e76d963f757db40f91
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for
> Debian) 2.40
> syz repro:
> https://syzkaller.appspot.com/x/repro.syz?x=14be4d58980000
>
> Downloadable assets:
> disk image:
> https://storage.googleapis.com/syzbot-assets/e0adb12828a5/disk-72ece201.raw.xz
> vmlinux:
> https://storage.googleapis.com/syzbot-assets/08a675560ce7/vmlinux-72ece201.xz
> kernel image:
> https://storage.googleapis.com/syzbot-assets/b7124361f0f7/bzImage-72ece201.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the
> commit:
> Reported-by: syzbot+d1e76d963f757db40f91@syzkaller.appspotmail.com
>
> INFO: task syz-executor:6423 blocked for more than 143 seconds.
> Not tainted 6.9.0-syzkaller-10215-g72ece20127a3 #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> task:syz-executor state:D stack:26544 pid:6423 tgid:6423
> ppid:6410 flags:0x00000006
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5408 [inline]
> __schedule+0x1796/0x4a00 kernel/sched/core.c:6745
> __schedule_loop kernel/sched/core.c:6822 [inline]
> schedule+0x14b/0x320 kernel/sched/core.c:6837
> schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6894
> __mutex_lock_common kernel/locking/mutex.c:684 [inline]
> __mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
Looks like this line at the beginning of nfsd_nl_listener_set_doit:
mutex_lock(&nfsd_mutex);
Which probably means that some other task is squatting on the mutex.
> genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
> genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
> genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
> netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
> genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
> netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
> netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
> netlink_sendmsg+0x8e1/0xcb0 net/netlink/af_netlink.c:1905
> sock_sendmsg_nosec net/socket.c:730 [inline]
> __sock_sendmsg+0x221/0x270 net/socket.c:745
> ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
> ___sys_sendmsg net/socket.c:2639 [inline]
> __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f3b90c7cee9
> RSP: 002b:00007ffc41a63068 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f3b90dabf80 RCX: 00007f3b90c7cee9
> RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
> RBP: 00007f3b90cc949e R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f3b90dabf8c R14: 00007f3b90dabf80 R15: 0000000000000000
> </TASK>
> INFO: task syz-executor.1:6455 blocked for more than 143 seconds.
> Not tainted 6.9.0-syzkaller-10215-g72ece20127a3 #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> task:syz-executor.1 state:D stack:26544 pid:6455 tgid:6455
> ppid:6425 flags:0x00000006
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5408 [inline]
> __schedule+0x1796/0x4a00 kernel/sched/core.c:6745
> __schedule_loop kernel/sched/core.c:6822 [inline]
> schedule+0x14b/0x320 kernel/sched/core.c:6837
> schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6894
> __mutex_lock_common kernel/locking/mutex.c:684 [inline]
> __mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
> genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
> genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
> netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
> genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
> netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
> netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
> netlink_sendmsg+0x8e1/0xcb0 net/netlink/af_netlink.c:1905
> sock_sendmsg_nosec net/socket.c:730 [inline]
> __sock_sendmsg+0x221/0x270 net/socket.c:745
> ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
> ___sys_sendmsg net/socket.c:2639 [inline]
> __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f01a5c7cee9
> RSP: 002b:00007fff7f6d4b28 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f01a5dabf80 RCX: 00007f01a5c7cee9
> RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
> RBP: 00007f01a5cc949e R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f01a5dabf8c R14: 00007f01a5dabf80 R15: 0000000000000000
> </TASK>
>
> Showing all locks held in the system:
> 3 locks held by kworker/1:0/25:
> #0: ffff8880b953e798 (&rq->__lock){-.-.}-{2:2}, at:
> raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559
> #1: ffff8880b9528948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-
> {0:0}, at: psi_task_switch+0x441/0x770 kernel/sched/psi.c:988
> #2: ffff8880b9529430 (krc.lock){..-.}-{2:2}, at:
> kvfree_rcu_bulk+0x26b/0x4e0 kernel/rcu/tree.c:3383
> 1 lock held by khungtaskd/30:
> #0: ffffffff8e333d20 (rcu_read_lock){....}-{1:2}, at:
> rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
> #0: ffffffff8e333d20 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock
> include/linux/rcupdate.h:781 [inline]
> #0: ffffffff8e333d20 (rcu_read_lock){....}-{1:2}, at:
> debug_show_all_locks+0x55/0x2a0 kernel/locking/lockdep.c:6614
> 2 locks held by getty/4842:
> #0: ffff88802f00a0a0 (&tty->ldisc_sem){++++}-{0:0}, at:
> tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
> #1: ffffc900031332f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at:
> n_tty_read+0x6b5/0x1e10 drivers/tty/n_tty.c:2201
> 4 locks held by kworker/0:4/5197:
> #0: ffff888015080948 ((wq_completion)events){+.+.}-{0:0}, at:
> process_one_work kernel/workqueue.c:3206 [inline]
> #0: ffff888015080948 ((wq_completion)events){+.+.}-{0:0}, at:
> process_scheduled_works+0x90a/0x1830 kernel/workqueue.c:3312
> #1: ffffc90003b77d00 ((linkwatch_work).work){+.+.}-{0:0}, at:
> process_one_work kernel/workqueue.c:3207 [inline]
> #1: ffffc90003b77d00 ((linkwatch_work).work){+.+.}-{0:0}, at:
> process_scheduled_works+0x945/0x1830 kernel/workqueue.c:3312
> #2: ffffffff8f5d3fc8 (rtnl_mutex){+.+.}-{3:3}, at:
> linkwatch_event+0xe/0x60 net/core/link_watch.c:276
> #3: ffffffff8e3390f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at:
> exp_funnel_lock kernel/rcu/tree_exp.h:291 [inline]
> #3: ffffffff8e3390f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at:
> synchronize_rcu_expedited+0x381/0x830 kernel/rcu/tree_exp.h:939
> 2 locks held by kworker/u8:3/5328:
> 2 locks held by syz-executor.2/6408:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
There are a bunch of these lines that say that this lock is held. I
presume only one actually holds the lock, and the rest are blocked on
it?
Presumably one of these tasks is holding the nfsd_mutex and is stuck in
some fashion (or failed to release the mutex properly or something). It
would be helpful if we knew which task actually holds it, and to see
what its stack looks like. Is there some way to tell from this output?
> 2 locks held by syz-executor/6423:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.1/6455:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.4/6482:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.3/6486:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.2/6501:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor/6506:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.1/6523:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.4/6545:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.3/6560:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.2/6584:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor/6589:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.1/6611:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.4/6632:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.3/6647:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.2/6671:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor/6676:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}
> , at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.1/6691:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.4/6713:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.3/6727:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.2/6751:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor/6756:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.1/6772:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 2 locks held by syz-executor.4/6793:
> #0: ffffffff8f63a5d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
> net/netlink/genetlink.c:1218
> #1: ffffffff8e600528 (nfsd_mutex){+.+.}-{3:3}, at:
> nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1966
> 1 lock held by syz-executor.3/6795:
> #0: ffffffff8f5d3fc8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock
> net/core/rtnetlink.c:79 [inline]
> #0: ffffffff8f5d3fc8 (rtnl_mutex){+.+.}-{3:3}, at:
> rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
>
> =============================================
>
> NMI backtrace for cpu 0
> CPU: 0 PID: 30 Comm: khungtaskd Not tainted 6.9.0-syzkaller-10215-
> g72ece20127a3 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 04/02/2024
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
> nmi_cpu_backtrace+0x49c/0x4d0 lib/nmi_backtrace.c:113
> nmi_trigger_cpumask_backtrace+0x198/0x320 lib/nmi_backtrace.c:62
> trigger_all_cpu_backtrace include/linux/nmi.h:162 [inline]
> check_hung_uninterruptible_tasks kernel/hung_task.c:223 [inline]
> watchdog+0xfde/0x1020 kernel/hung_task.c:379
> kthread+0x2f0/0x390 kernel/kthread.c:389
> ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> </TASK>
> Sending NMI from CPU 0 to CPUs 1:
> NMI backtrace for cpu 1
> CPU: 1 PID: 6795 Comm: syz-executor.3 Not tainted 6.9.0-syzkaller-
> 10215-g72ece20127a3 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 04/02/2024
> RIP: 0010:__slab_alloc_node mm/slub.c:3804 [inline]
> RIP: 0010:slab_alloc_node mm/slub.c:3988 [inline]
> RIP: 0010:__do_kmalloc_node mm/slub.c:4120 [inline]
> RIP: 0010:kmalloc_node_track_caller_noprof+0x10d/0x450 mm/slub.c:4141
> Code: e4 0f 84 2b 01 00 00 85 c0 0f 85 23 01 00 00 4c 89 7c 24 10 0f
> 1f 44 00 00 65 48 8b 05 3c e0 14 7e 49 8b 0c 24 48 8b 54 08 08 <48>
> 8b 2c 08 48 85 ed 0f 84 57 01 00 00 48 8b 7c 08 10 48 85 ff 0f
> RSP: 0018:ffffc9000c0068b8 EFLAGS: 00000246
> RAX: ffff8880b9500000 RBX: 0000000000000000 RCX: 00000000000426d0
> RDX: 000000000005bc31 RSI: 0000000000000008 RDI: ffffffff8e443c60
> RBP: 0000000000000000 R08: ffffffff8fac0c6f R09: 1ffffffff1f5818d
> R10: dffffc0000000000 R11: fffffbfff1f5818e R12: ffff888015041500
> R13: 0000000000000cc0 R14: 00000000ffffffff R15: 0000000000000005
> FS: 000055556a2ec480(0000) GS:ffff8880b9500000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f5c15af311c CR3: 000000005ebda000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <NMI>
> </NMI>
> <TASK>
> kvasprintf+0xdf/0x190 lib/kasprintf.c:25
> kobject_set_name_vargs+0x61/0x120 lib/kobject.c:274
> kobject_add_varg lib/kobject.c:368 [inline]
> kobject_init_and_add+0xde/0x190 lib/kobject.c:457
> rx_queue_add_kobject net/core/net-sysfs.c:1105 [inline]
> net_rx_queue_update_kobjects+0x1cf/0x5b0 net/core/net-sysfs.c:1160
> register_queue_kobjects net/core/net-sysfs.c:1895 [inline]
> netdev_register_kobject+0x224/0x320 net/core/net-sysfs.c:2140
> register_netdevice+0x11ff/0x1a10 net/core/dev.c:10374
> veth_newlink+0x628/0xcd0 drivers/net/veth.c:1829
> rtnl_newlink_create net/core/rtnetlink.c:3510 [inline]
> __rtnl_newlink net/core/rtnetlink.c:3730 [inline]
> rtnl_newlink+0x158f/0x20a0 net/core/rtnetlink.c:3743
> rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6595
> netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
> netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
> netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
> netlink_sendmsg+0x8e1/0xcb0 net/netlink/af_netlink.c:1905
> sock_sendmsg_nosec net/socket.c:730 [inline]
> __sock_sendmsg+0x221/0x270 net/socket.c:745
> __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
> __do_sys_sendto net/socket.c:2204 [inline]
> __se_sys_sendto net/socket.c:2200 [inline]
> __x64_sys_sendto+0xde/0x100 net/socket.c:2200
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f5c15a7ebdc
> Code: 1a 51 02 00 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28
> 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48>
> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 60 51 02 00 48 8b
> RSP: 002b:00007ffd345dd670 EFLAGS: 00000293 ORIG_RAX:
> 000000000000002c
> RAX: ffffffffffffffda RBX: 00007f5c166d4620 RCX: 00007f5c15a7ebdc
> RDX: 0000000000000068 RSI: 00007f5c166d4670 RDI: 0000000000000003
> RBP: 0000000000000000 R08: 00007ffd345dd6c4 R09: 000000000000000c
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
> R13: 0000000000000000 R14: 00007f5c166d4670 R15: 0000000000000000
> </TASK>
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before
> testing.
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-05-28 17:54 [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit syzbot
2024-05-28 18:10 ` Jeff Layton
@ 2024-08-04 23:16 ` syzbot
2024-08-31 18:22 ` syzbot
2 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2024-08-04 23:16 UTC (permalink / raw)
To: Dai.Ngo, chuck.lever, dai.ngo, jlayton, kolga, linux-kernel,
linux-nfs, lorenzo, neilb, syzkaller-bugs, tom
syzbot has bisected this issue to:
commit 16a471177496c8e04a9793812c187a2c1a2192fa
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Tue Apr 23 13:25:44 2024 +0000
NFSD: add listener-{set,get} netlink command
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=16af38d3980000
start commit: ee78a17615ad Add linux-next specific files for 20240606
git tree: linux-next
final oops: https://syzkaller.appspot.com/x/report.txt?x=15af38d3980000
console output: https://syzkaller.appspot.com/x/log.txt?x=11af38d3980000
kernel config: https://syzkaller.appspot.com/x/.config?x=a8c69ee180de0793
dashboard link: https://syzkaller.appspot.com/bug?extid=d1e76d963f757db40f91
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=173f2eac980000
Reported-by: syzbot+d1e76d963f757db40f91@syzkaller.appspotmail.com
Fixes: 16a471177496 ("NFSD: add listener-{set,get} netlink command")
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-05-28 17:54 [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit syzbot
2024-05-28 18:10 ` Jeff Layton
2024-08-04 23:16 ` syzbot
@ 2024-08-31 18:22 ` syzbot
2024-09-01 4:35 ` Edward Adam Davis
` (2 more replies)
2 siblings, 3 replies; 16+ messages in thread
From: syzbot @ 2024-08-31 18:22 UTC (permalink / raw)
To: Dai.Ngo, chuck.lever, dai.ngo, jlayton, kolga, linux-kernel,
linux-nfs, lorenzo, neilb, netdev, okorniev, syzkaller-bugs, tom
syzbot has found a reproducer for the following issue on:
HEAD commit: fe1910f9337b tcp_bpf: fix return value of tcp_bpf_sendmsg()
git tree: net
console+strace: https://syzkaller.appspot.com/x/log.txt?x=100e272b980000
kernel config: https://syzkaller.appspot.com/x/.config?x=996585887acdadb3
dashboard link: https://syzkaller.appspot.com/bug?extid=d1e76d963f757db40f91
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=113234fb980000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11140943980000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/cb37d16e2860/disk-fe1910f9.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/315198aa296e/vmlinux-fe1910f9.xz
kernel image: https://storage.googleapis.com/syzbot-assets/b3e6fb9fa8a4/bzImage-fe1910f9.xz
The issue was bisected to:
commit 16a471177496c8e04a9793812c187a2c1a2192fa
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Tue Apr 23 13:25:44 2024 +0000
NFSD: add listener-{set,get} netlink command
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=16af38d3980000
final oops: https://syzkaller.appspot.com/x/report.txt?x=15af38d3980000
console output: https://syzkaller.appspot.com/x/log.txt?x=11af38d3980000
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d1e76d963f757db40f91@syzkaller.appspotmail.com
Fixes: 16a471177496 ("NFSD: add listener-{set,get} netlink command")
INFO: task syz-executor388:5254 blocked for more than 143 seconds.
Not tainted 6.11.0-rc5-syzkaller-00151-gfe1910f9337b #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor388 state:D stack:23736 pid:5254 tgid:5254 ppid:5252 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5188 [inline]
__schedule+0x1800/0x4a60 kernel/sched/core.c:6529
__schedule_loop kernel/sched/core.c:6606 [inline]
schedule+0x14b/0x320 kernel/sched/core.c:6621
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6678
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752
nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1956
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2550
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
___sys_sendmsg net/socket.c:2651 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4204435ce9
RSP: 002b:00007ffea2feb278 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4204435ce9
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00000000000000a0 R09: 00000000000000a0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffea2feb2a0 R15: 00007ffea2feb290
</TASK>
INFO: task syz-executor388:5258 blocked for more than 143 seconds.
Not tainted 6.11.0-rc5-syzkaller-00151-gfe1910f9337b #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor388 state:D stack:26576 pid:5258 tgid:5258 ppid:5255 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5188 [inline]
__schedule+0x1800/0x4a60 kernel/sched/core.c:6529
__schedule_loop kernel/sched/core.c:6606 [inline]
schedule+0x14b/0x320 kernel/sched/core.c:6621
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6678
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752
nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1956
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2550
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
___sys_sendmsg net/socket.c:2651 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4204435ce9
RSP: 002b:00007ffea2feb278 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4204435ce9
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00000000000000a0 R09: 00000000000000a0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffea2feb2a0 R15: 00007ffea2feb290
</TASK>
INFO: task syz-executor388:5259 blocked for more than 144 seconds.
Not tainted 6.11.0-rc5-syzkaller-00151-gfe1910f9337b #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor388 state:D stack:26576 pid:5259 tgid:5259 ppid:5257 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5188 [inline]
__schedule+0x1800/0x4a60 kernel/sched/core.c:6529
__schedule_loop kernel/sched/core.c:6606 [inline]
schedule+0x14b/0x320 kernel/sched/core.c:6621
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6678
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752
nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1956
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2550
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
___sys_sendmsg net/socket.c:2651 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4204435ce9
RSP: 002b:00007ffea2feb278 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4204435ce9
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00000000000000a0 R09: 00000000000000a0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffea2feb2a0 R15: 00007ffea2feb290
</TASK>
INFO: task syz-executor388:5270 blocked for more than 144 seconds.
Not tainted 6.11.0-rc5-syzkaller-00151-gfe1910f9337b #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor388 state:D stack:26576 pid:5270 tgid:5270 ppid:5253 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5188 [inline]
__schedule+0x1800/0x4a60 kernel/sched/core.c:6529
__schedule_loop kernel/sched/core.c:6606 [inline]
schedule+0x14b/0x320 kernel/sched/core.c:6621
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6678
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752
nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1956
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2550
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
___sys_sendmsg net/socket.c:2651 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4204435ce9
RSP: 002b:00007ffea2feb278 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4204435ce9
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00000000000000a0 R09: 00000000000000a0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffea2feb2a0 R15: 00007ffea2feb290
</TASK>
Showing all locks held in the system:
1 lock held by khungtaskd/30:
#0: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
#0: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#0: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x55/0x2a0 kernel/locking/lockdep.c:6626
2 locks held by kworker/u8:5/1073:
4 locks held by klogd/4679:
#0: ffff8880b893e9d8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:560
#1: ffff8880b8928948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x441/0x770 kernel/sched/psi.c:989
#2: ffff88807cdc6418 (&p->pi_lock){-.-.}-{2:2}, at: class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:551 [inline]
#2: ffff88807cdc6418 (&p->pi_lock){-.-.}-{2:2}, at: try_to_wake_up+0xb0/0x1470 kernel/sched/core.c:4051
#3: ffff8880b893e9d8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:560
2 locks held by getty/4991:
#0: ffff8880309080a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
#1: ffffc900031332f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6ac/0x1e00 drivers/tty/n_tty.c:2211
2 locks held by syz-executor388/5254:
#0: ffffffff8fcf11b0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8ec0b628 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1956
2 locks held by syz-executor388/5258:
#0: ffffffff8fcf11b0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8ec0b628 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1956
2 locks held by syz-executor388/5259:
#0: ffffffff8fcf11b0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8ec0b628 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1956
2 locks held by syz-executor388/5261:
#0: ffffffff8fcf11b0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8ec0b628 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1956
2 locks held by syz-executor388/5270:
#0: ffffffff8fcf11b0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1218
#1: ffffffff8ec0b628 (nfsd_mutex){+.+.}-{3:3}, at: nfsd_nl_listener_set_doit+0x12d/0x1a90 fs/nfsd/nfsctl.c:1956
=============================================
NMI backtrace for cpu 0
CPU: 0 UID: 0 PID: 30 Comm: khungtaskd Not tainted 6.11.0-rc5-syzkaller-00151-gfe1910f9337b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
nmi_cpu_backtrace+0x49c/0x4d0 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x198/0x320 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:162 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:223 [inline]
watchdog+0xff4/0x1040 kernel/hung_task.c:379
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 4679 Comm: klogd Not tainted 6.11.0-rc5-syzkaller-00151-gfe1910f9337b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
RIP: 0010:unwind_next_frame+0x690/0x2a00 arch/x86/kernel/unwind_orc.c:505
Code: 3c 5b 4d 01 ff 49 01 cf 0f 84 82 00 00 00 49 89 ee e8 64 5e 52 00 49 8d 6f 04 49 8d 5f 05 48 89 e8 48 c1 e8 03 42 0f b6 04 28 <84> c0 0f 85 88 1b 00 00 48 89 d8 48 c1 e8 03 42 0f b6 04 28 84 c0
RSP: 0018:ffffc90004187268 EFLAGS: 00000a02
RAX: 0000000000000000 RBX: ffffffff90a9f4f1 RCX: ffff88807c3c8000
RDX: 0000000000000000 RSI: ffffffff81faa2e5 RDI: ffffffff81faa2ea
RBP: ffffffff90a9f4f0 R08: ffffffff81412c60 R09: ffffc90004187430
R10: 0000000000000003 R11: ffffffff817f2f00 R12: ffffffff90310524
R13: dffffc0000000000 R14: 1ffff92000830e68 R15: ffffffff90a9f4ec
FS: 00007fb93ee6f380(0000) GS:ffff8880b8900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555557e676f8 CR3: 000000007c060000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<TASK>
arch_stack_walk+0x151/0x1b0 arch/x86/kernel/stacktrace.c:25
stack_trace_save+0x118/0x1d0 kernel/stacktrace.c:122
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
unpoison_slab_object mm/kasan/common.c:312 [inline]
__kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3988 [inline]
slab_alloc_node mm/slub.c:4037 [inline]
kmem_cache_alloc_node_noprof+0x16b/0x320 mm/slub.c:4080
__alloc_skb+0x1c3/0x440 net/core/skbuff.c:664
alloc_skb include/linux/skbuff.h:1320 [inline]
alloc_skb_with_frags+0xc3/0x770 net/core/skbuff.c:6526
sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2815
unix_dgram_sendmsg+0x6d3/0x1f80 net/unix/af_unix.c:2030
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
__sys_sendto+0x3a4/0x4f0 net/socket.c:2204
__do_sys_sendto net/socket.c:2216 [inline]
__se_sys_sendto net/socket.c:2212 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2212
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb93efd19b5
Code: 8b 44 24 08 48 83 c4 28 48 98 c3 48 98 c3 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 26 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 76 7a 48 8b 15 44 c4 0c 00 f7 d8 64 89 02 48 83
RSP: 002b:00007ffecbd3b868 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fb93efd19b5
RDX: 000000000000008b RSI: 000056134a140d80 RDI: 0000000000000003
RBP: 000056134a13c910 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000013
R13: 00007fb93f15f212 R14: 00007ffecbd3b968 R15: 0000000000000000
</TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.486 msecs
---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-08-31 18:22 ` syzbot
@ 2024-09-01 4:35 ` Edward Adam Davis
2024-09-01 4:51 ` syzbot
2024-09-01 6:25 ` Edward Adam Davis
2024-09-02 1:57 ` NeilBrown
2 siblings, 1 reply; 16+ messages in thread
From: Edward Adam Davis @ 2024-09-01 4:35 UTC (permalink / raw)
To: syzbot+d1e76d963f757db40f91; +Cc: linux-kernel, syzkaller-bugs
#syz test
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 9ab49bfeae78..51c4e811c01c 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -137,6 +137,7 @@ struct genl_info {
possible_net_t _net;
void * user_ptr[2];
struct netlink_ext_ack *extack;
+ struct mutex fslock;
};
static inline struct net *genl_info_net(const struct genl_info *info)
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index feb54c63a116..abf52285e26a 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -1105,6 +1105,7 @@ static int genl_family_rcv_msg_doit(const struct genl_family *family,
info.extack = extack;
genl_info_net_set(&info, net);
memset(&info.user_ptr, 0, sizeof(info.user_ptr));
+ mutex_init(&info.fslock);
if (ops->pre_doit) {
err = ops->pre_doit(ops, skb, &info);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 34eb2c2cbcde..8d0ac4f699bd 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1953,11 +1953,11 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
struct nfsd_net *nn;
int err, rem;
- mutex_lock(&nfsd_mutex);
+ mutex_lock(&info->fslock);
err = nfsd_create_serv(net);
if (err) {
- mutex_unlock(&nfsd_mutex);
+ mutex_unlock(&info->fslock);
return err;
}
@@ -2080,7 +2080,7 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
nfsd_destroy_serv(net);
out_unlock_mtx:
- mutex_unlock(&nfsd_mutex);
+ mutex_unlock(&info->fslock);
return err;
}
@@ -2110,7 +2110,7 @@ int nfsd_nl_listener_get_doit(struct sk_buff *skb, struct genl_info *info)
goto err_free_msg;
}
- mutex_lock(&nfsd_mutex);
+ mutex_lock(&info->fslock);
nn = net_generic(genl_info_net(info), nfsd_net_id);
/* no nfs server? Just send empty socket list */
@@ -2141,14 +2141,14 @@ int nfsd_nl_listener_get_doit(struct sk_buff *skb, struct genl_info *info)
}
spin_unlock_bh(&serv->sv_lock);
out_unlock_mtx:
- mutex_unlock(&nfsd_mutex);
+ mutex_unlock(&info->fslock);
genlmsg_end(skb, hdr);
return genlmsg_reply(skb, info);
err_serv_unlock:
spin_unlock_bh(&serv->sv_lock);
- mutex_unlock(&nfsd_mutex);
+ mutex_unlock(&info->fslock);
err_free_msg:
nlmsg_free(skb);
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-09-01 4:35 ` Edward Adam Davis
@ 2024-09-01 4:51 ` syzbot
0 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2024-09-01 4:51 UTC (permalink / raw)
To: eadavis, linux-kernel, syzkaller-bugs
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in nfsd_create_serv
------------[ cut here ]------------
WARNING: CPU: 0 PID: 6124 at fs/nfsd/nfssvc.c:656 nfsd_create_serv+0x631/0x760 fs/nfsd/nfssvc.c:656
Modules linked in:
CPU: 0 UID: 0 PID: 6124 Comm: syz.3.18 Not tainted 6.11.0-rc5-syzkaller-00151-gfe1910f9337b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
RIP: 0010:nfsd_create_serv+0x631/0x760 fs/nfsd/nfssvc.c:656
Code: 43 0f b6 44 25 00 84 c0 0f 85 f5 00 00 00 48 8b 44 24 18 ff 00 48 8b 7c 24 30 e8 ba 60 15 09 e9 cd fa ff ff e8 c0 53 e5 fe 90 <0f> 0b 90 e9 91 fa ff ff 48 c7 c1 00 88 47 9a 80 e1 07 80 c1 03 38
RSP: 0018:ffffc90003aaf000 EFLAGS: 00010293
RAX: ffffffff82ae3870 RBX: ffff88801276bb80 RCX: ffff888055cf9e00
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff8ec0b5c0
RBP: ffffc90003aaf190 R08: ffffffff8ec0b5c7 R09: 1ffffffff1d816b8
R10: dffffc0000000000 R11: fffffbfff1d816b9 R12: dffffc0000000000
R13: dffffc0000000000 R14: ffff888032c74000 R15: 1ffff92000755e08
FS: 00007fb26fd8a6c0(0000) GS:ffff8880b8800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000100 CR3: 00000000318c8000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
nfsd_nl_listener_set_doit+0x13f/0x19f0 fs/nfsd/nfsctl.c:1958
genl_family_rcv_msg_doit net/netlink/genetlink.c:1116 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1196 [inline]
genl_rcv_msg+0xb4e/0xef0 net/netlink/genetlink.c:1211
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2550
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1220
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
___sys_sendmsg net/socket.c:2651 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb26ef79eb9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb26fd8a038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fb26f115f80 RCX: 00007fb26ef79eb9
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 00007fb26efe793e R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fb26f115f80 R15: 00007fffd4289438
</TASK>
Tested on:
commit: fe1910f9 tcp_bpf: fix return value of tcp_bpf_sendmsg()
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=1548dc67980000
kernel config: https://syzkaller.appspot.com/x/.config?x=996585887acdadb3
dashboard link: https://syzkaller.appspot.com/bug?extid=d1e76d963f757db40f91
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=1517ab39980000
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-08-31 18:22 ` syzbot
2024-09-01 4:35 ` Edward Adam Davis
@ 2024-09-01 6:25 ` Edward Adam Davis
2024-09-01 6:50 ` syzbot
2024-09-02 1:57 ` NeilBrown
2 siblings, 1 reply; 16+ messages in thread
From: Edward Adam Davis @ 2024-09-01 6:25 UTC (permalink / raw)
To: syzbot+d1e76d963f757db40f91; +Cc: linux-kernel, syzkaller-bugs
#syz test
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 9ab49bfeae78..51c4e811c01c 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -137,6 +137,7 @@ struct genl_info {
possible_net_t _net;
void * user_ptr[2];
struct netlink_ext_ack *extack;
+ struct mutex fslock;
};
static inline struct net *genl_info_net(const struct genl_info *info)
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index feb54c63a116..abf52285e26a 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -1105,6 +1105,7 @@ static int genl_family_rcv_msg_doit(const struct genl_family *family,
info.extack = extack;
genl_info_net_set(&info, net);
memset(&info.user_ptr, 0, sizeof(info.user_ptr));
+ mutex_init(&info.fslock);
if (ops->pre_doit) {
err = ops->pre_doit(ops, skb, &info);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 34eb2c2cbcde..8d0ac4f699bd 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1953,11 +1953,11 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
struct nfsd_net *nn;
int err, rem;
- mutex_lock(&nfsd_mutex);
+ mutex_lock(&info->fslock);
err = nfsd_create_serv(net);
if (err) {
- mutex_unlock(&nfsd_mutex);
+ mutex_unlock(&info->fslock);
return err;
}
@@ -2080,7 +2080,7 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
nfsd_destroy_serv(net);
out_unlock_mtx:
- mutex_unlock(&nfsd_mutex);
+ mutex_unlock(&info->fslock);
return err;
}
@@ -2110,7 +2110,7 @@ int nfsd_nl_listener_get_doit(struct sk_buff *skb, struct genl_info *info)
goto err_free_msg;
}
- mutex_lock(&nfsd_mutex);
+ mutex_lock(&info->fslock);
nn = net_generic(genl_info_net(info), nfsd_net_id);
/* no nfs server? Just send empty socket list */
@@ -2141,14 +2141,14 @@ int nfsd_nl_listener_get_doit(struct sk_buff *skb, struct genl_info *info)
}
spin_unlock_bh(&serv->sv_lock);
out_unlock_mtx:
- mutex_unlock(&nfsd_mutex);
+ mutex_unlock(&info->fslock);
genlmsg_end(skb, hdr);
return genlmsg_reply(skb, info);
err_serv_unlock:
spin_unlock_bh(&serv->sv_lock);
- mutex_unlock(&nfsd_mutex);
+ mutex_unlock(&info->fslock);
err_free_msg:
nlmsg_free(skb);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 0bc8eaa5e009..3f2fe684f0e0 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -653,7 +653,6 @@ int nfsd_create_serv(struct net *net)
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct svc_serv *serv;
- WARN_ON(!mutex_is_locked(&nfsd_mutex));
if (nn->nfsd_serv)
return 0;
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-09-01 6:25 ` Edward Adam Davis
@ 2024-09-01 6:50 ` syzbot
0 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2024-09-01 6:50 UTC (permalink / raw)
To: eadavis, linux-kernel, syzkaller-bugs
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger any issue:
Reported-by: syzbot+d1e76d963f757db40f91@syzkaller.appspotmail.com
Tested-by: syzbot+d1e76d963f757db40f91@syzkaller.appspotmail.com
Tested on:
commit: fe1910f9 tcp_bpf: fix return value of tcp_bpf_sendmsg()
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=17d6a943980000
kernel config: https://syzkaller.appspot.com/x/.config?x=996585887acdadb3
dashboard link: https://syzkaller.appspot.com/bug?extid=d1e76d963f757db40f91
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=16a8590b980000
Note: testing is done by a robot and is best-effort only.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-08-31 18:22 ` syzbot
2024-09-01 4:35 ` Edward Adam Davis
2024-09-01 6:25 ` Edward Adam Davis
@ 2024-09-02 1:57 ` NeilBrown
2024-09-04 14:23 ` Chuck Lever
2 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2024-09-02 1:57 UTC (permalink / raw)
To: syzbot
Cc: Dai.Ngo, chuck.lever, dai.ngo, jlayton, kolga, linux-kernel,
linux-nfs, lorenzo, netdev, okorniev, syzkaller-bugs, tom
On Sun, 01 Sep 2024, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
I had a poke around using the provided disk image and kernel for
exploring.
I think the problem is demonstrated by this stack :
[<0>] rpc_wait_bit_killable+0x1b/0x160
[<0>] __rpc_execute+0x723/0x1460
[<0>] rpc_execute+0x1ec/0x3f0
[<0>] rpc_run_task+0x562/0x6c0
[<0>] rpc_call_sync+0x197/0x2e0
[<0>] rpcb_register+0x36b/0x670
[<0>] svc_unregister+0x208/0x730
[<0>] svc_bind+0x1bb/0x1e0
[<0>] nfsd_create_serv+0x3f0/0x760
[<0>] nfsd_nl_listener_set_doit+0x135/0x1a90
[<0>] genl_rcv_msg+0xb16/0xec0
[<0>] netlink_rcv_skb+0x1e5/0x430
No rpcbind is running on this host so that "svc_unregister" takes a
long time. Maybe not forever but if a few of these get queued up all
blocking some other thread, then maybe that pushed it over the limit.
The fact that rpcbind is not running might not be relevant as the test
messes up the network. "ping 127.0.0.1" stops working.
So this bug comes down to "we try to contact rpcbind while holding a
mutex and if that gets no response and no error, then we can hold the
mutex for a long time".
Are we surprised? Do we want to fix this? Any suggestions how?
NeilBrown
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-09-02 1:57 ` NeilBrown
@ 2024-09-04 14:23 ` Chuck Lever
2024-09-04 14:36 ` Jeff Layton
2024-10-09 20:26 ` Jeff Layton
0 siblings, 2 replies; 16+ messages in thread
From: Chuck Lever @ 2024-09-04 14:23 UTC (permalink / raw)
To: NeilBrown
Cc: syzbot, Dai.Ngo, jlayton, kolga, linux-kernel, linux-nfs, lorenzo,
netdev, okorniev, syzkaller-bugs, tom
On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote:
> On Sun, 01 Sep 2024, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
>
> I had a poke around using the provided disk image and kernel for
> exploring.
>
> I think the problem is demonstrated by this stack :
>
> [<0>] rpc_wait_bit_killable+0x1b/0x160
> [<0>] __rpc_execute+0x723/0x1460
> [<0>] rpc_execute+0x1ec/0x3f0
> [<0>] rpc_run_task+0x562/0x6c0
> [<0>] rpc_call_sync+0x197/0x2e0
> [<0>] rpcb_register+0x36b/0x670
> [<0>] svc_unregister+0x208/0x730
> [<0>] svc_bind+0x1bb/0x1e0
> [<0>] nfsd_create_serv+0x3f0/0x760
> [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90
> [<0>] genl_rcv_msg+0xb16/0xec0
> [<0>] netlink_rcv_skb+0x1e5/0x430
>
> No rpcbind is running on this host so that "svc_unregister" takes a
> long time. Maybe not forever but if a few of these get queued up all
> blocking some other thread, then maybe that pushed it over the limit.
>
> The fact that rpcbind is not running might not be relevant as the test
> messes up the network. "ping 127.0.0.1" stops working.
>
> So this bug comes down to "we try to contact rpcbind while holding a
> mutex and if that gets no response and no error, then we can hold the
> mutex for a long time".
>
> Are we surprised? Do we want to fix this? Any suggestions how?
In the past, we've tried to address "hanging upcall" issues where
the kernel part of an administrative command needs a user space
service that isn't working or present. (eg mount needing a running
gssd)
If NFSD is using the kernel RPC client for the upcall, then maybe
adding the RPC_TASK_SOFTCONN flag might turn the hang into an
immediate failure.
IMO this should be addressed.
--
Chuck Lever
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-09-04 14:23 ` Chuck Lever
@ 2024-09-04 14:36 ` Jeff Layton
2024-10-09 20:26 ` Jeff Layton
1 sibling, 0 replies; 16+ messages in thread
From: Jeff Layton @ 2024-09-04 14:36 UTC (permalink / raw)
To: Chuck Lever, NeilBrown
Cc: syzbot, Dai.Ngo, kolga, linux-kernel, linux-nfs, lorenzo, netdev,
okorniev, syzkaller-bugs, tom
On Wed, 2024-09-04 at 10:23 -0400, Chuck Lever wrote:
> On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote:
> > On Sun, 01 Sep 2024, syzbot wrote:
> > > syzbot has found a reproducer for the following issue on:
> >
> > I had a poke around using the provided disk image and kernel for
> > exploring.
> >
> > I think the problem is demonstrated by this stack :
> >
> > [<0>] rpc_wait_bit_killable+0x1b/0x160
> > [<0>] __rpc_execute+0x723/0x1460
> > [<0>] rpc_execute+0x1ec/0x3f0
> > [<0>] rpc_run_task+0x562/0x6c0
> > [<0>] rpc_call_sync+0x197/0x2e0
> > [<0>] rpcb_register+0x36b/0x670
> > [<0>] svc_unregister+0x208/0x730
> > [<0>] svc_bind+0x1bb/0x1e0
> > [<0>] nfsd_create_serv+0x3f0/0x760
> > [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90
> > [<0>] genl_rcv_msg+0xb16/0xec0
> > [<0>] netlink_rcv_skb+0x1e5/0x430
> >
> > No rpcbind is running on this host so that "svc_unregister" takes a
> > long time. Maybe not forever but if a few of these get queued up all
> > blocking some other thread, then maybe that pushed it over the limit.
> >
> > The fact that rpcbind is not running might not be relevant as the test
> > messes up the network. "ping 127.0.0.1" stops working.
> >
> > So this bug comes down to "we try to contact rpcbind while holding a
> > mutex and if that gets no response and no error, then we can hold the
> > mutex for a long time".
> >
> > Are we surprised? Do we want to fix this? Any suggestions how?
>
> In the past, we've tried to address "hanging upcall" issues where
> the kernel part of an administrative command needs a user space
> service that isn't working or present. (eg mount needing a running
> gssd)
>
> If NFSD is using the kernel RPC client for the upcall, then maybe
> adding the RPC_TASK_SOFTCONN flag might turn the hang into an
> immediate failure.
>
> IMO this should be addressed.
>
Looking at rpcb_register_call, it looks like we already set SOFTCONN if
is_set is true. We probably did that assuming that we only call
svc_unregister on shutdown. svc_rpcb_setup does this though:
/* Remove any stale portmap registrations */
svc_unregister(serv, net);
return 0;
What would be the risk in just setting SOFTCONN unconditionally?
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-09-04 14:23 ` Chuck Lever
2024-09-04 14:36 ` Jeff Layton
@ 2024-10-09 20:26 ` Jeff Layton
2024-10-11 18:18 ` Chuck Lever III
1 sibling, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2024-10-09 20:26 UTC (permalink / raw)
To: Chuck Lever, NeilBrown
Cc: syzbot, Dai.Ngo, kolga, linux-kernel, linux-nfs, lorenzo, netdev,
okorniev, syzkaller-bugs, tom
On Wed, 2024-09-04 at 10:23 -0400, Chuck Lever wrote:
> On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote:
> > On Sun, 01 Sep 2024, syzbot wrote:
> > > syzbot has found a reproducer for the following issue on:
> >
> > I had a poke around using the provided disk image and kernel for
> > exploring.
> >
> > I think the problem is demonstrated by this stack :
> >
> > [<0>] rpc_wait_bit_killable+0x1b/0x160
> > [<0>] __rpc_execute+0x723/0x1460
> > [<0>] rpc_execute+0x1ec/0x3f0
> > [<0>] rpc_run_task+0x562/0x6c0
> > [<0>] rpc_call_sync+0x197/0x2e0
> > [<0>] rpcb_register+0x36b/0x670
> > [<0>] svc_unregister+0x208/0x730
> > [<0>] svc_bind+0x1bb/0x1e0
> > [<0>] nfsd_create_serv+0x3f0/0x760
> > [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90
> > [<0>] genl_rcv_msg+0xb16/0xec0
> > [<0>] netlink_rcv_skb+0x1e5/0x430
> >
> > No rpcbind is running on this host so that "svc_unregister" takes a
> > long time. Maybe not forever but if a few of these get queued up all
> > blocking some other thread, then maybe that pushed it over the limit.
> >
> > The fact that rpcbind is not running might not be relevant as the test
> > messes up the network. "ping 127.0.0.1" stops working.
> >
> > So this bug comes down to "we try to contact rpcbind while holding a
> > mutex and if that gets no response and no error, then we can hold the
> > mutex for a long time".
> >
> > Are we surprised? Do we want to fix this? Any suggestions how?
>
> In the past, we've tried to address "hanging upcall" issues where
> the kernel part of an administrative command needs a user space
> service that isn't working or present. (eg mount needing a running
> gssd)
>
> If NFSD is using the kernel RPC client for the upcall, then maybe
> adding the RPC_TASK_SOFTCONN flag might turn the hang into an
> immediate failure.
>
> IMO this should be addressed.
>
>
I sent a patch that does the above, but now I'm wondering if we ought
to take another approach. The listener array can be pretty long. What
if we instead were to just drop and reacquire the mutex in the loop at
strategic points? Then we wouldn't squat on the mutex for so long.
Something like this maybe? It's ugly but it might prevent hung task
warnings, and listener setup isn't a fastpath anyway.
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 3adbc05ebaac..5de01fb4c557 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2042,7 +2042,9 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
set_bit(XPT_CLOSE, &xprt->xpt_flags);
spin_unlock_bh(&serv->sv_lock);
svc_xprt_close(xprt);
+
+ /* ensure we don't squat on the mutex for too long */
+ mutex_unlock(&nfsd_mutex);
+ mutex_lock(&nfsd_mutex);
spin_lock_bh(&serv->sv_lock);
}
@@ -2082,6 +2084,10 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
/* always save the latest error */
if (ret < 0)
err = ret;
+
+ /* ensure we don't squat on the mutex for too long */
+ mutex_unlock(&nfsd_mutex);
+ mutex_lock(&nfsd_mutex);
}
if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks))
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-10-09 20:26 ` Jeff Layton
@ 2024-10-11 18:18 ` Chuck Lever III
2024-10-11 19:15 ` Jeff Layton
2024-10-11 21:08 ` NeilBrown
0 siblings, 2 replies; 16+ messages in thread
From: Chuck Lever III @ 2024-10-11 18:18 UTC (permalink / raw)
To: Jeff Layton
Cc: Neil Brown, syzbot, Dai Ngo, Olga Kornievskaia,
Linux Kernel Mailing List, Linux NFS Mailing List,
Lorenzo Bianconi, netdev, Olga Kornievskaia,
syzkaller-bugs@googlegroups.com, Tom Talpey
> On Oct 9, 2024, at 4:26 PM, Jeff Layton <jlayton@kernel.org> wrote:
>
> On Wed, 2024-09-04 at 10:23 -0400, Chuck Lever wrote:
>> On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote:
>>> On Sun, 01 Sep 2024, syzbot wrote:
>>>> syzbot has found a reproducer for the following issue on:
>>>
>>> I had a poke around using the provided disk image and kernel for
>>> exploring.
>>>
>>> I think the problem is demonstrated by this stack :
>>>
>>> [<0>] rpc_wait_bit_killable+0x1b/0x160
>>> [<0>] __rpc_execute+0x723/0x1460
>>> [<0>] rpc_execute+0x1ec/0x3f0
>>> [<0>] rpc_run_task+0x562/0x6c0
>>> [<0>] rpc_call_sync+0x197/0x2e0
>>> [<0>] rpcb_register+0x36b/0x670
>>> [<0>] svc_unregister+0x208/0x730
>>> [<0>] svc_bind+0x1bb/0x1e0
>>> [<0>] nfsd_create_serv+0x3f0/0x760
>>> [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90
>>> [<0>] genl_rcv_msg+0xb16/0xec0
>>> [<0>] netlink_rcv_skb+0x1e5/0x430
>>>
>>> No rpcbind is running on this host so that "svc_unregister" takes a
>>> long time. Maybe not forever but if a few of these get queued up all
>>> blocking some other thread, then maybe that pushed it over the limit.
>>>
>>> The fact that rpcbind is not running might not be relevant as the test
>>> messes up the network. "ping 127.0.0.1" stops working.
>>>
>>> So this bug comes down to "we try to contact rpcbind while holding a
>>> mutex and if that gets no response and no error, then we can hold the
>>> mutex for a long time".
>>>
>>> Are we surprised? Do we want to fix this? Any suggestions how?
>>
>> In the past, we've tried to address "hanging upcall" issues where
>> the kernel part of an administrative command needs a user space
>> service that isn't working or present. (eg mount needing a running
>> gssd)
>>
>> If NFSD is using the kernel RPC client for the upcall, then maybe
>> adding the RPC_TASK_SOFTCONN flag might turn the hang into an
>> immediate failure.
>>
>> IMO this should be addressed.
>>
>>
>
> I sent a patch that does the above, but now I'm wondering if we ought
> to take another approach. The listener array can be pretty long. What
> if we instead were to just drop and reacquire the mutex in the loop at
> strategic points? Then we wouldn't squat on the mutex for so long.
>
> Something like this maybe? It's ugly but it might prevent hung task
> warnings, and listener setup isn't a fastpath anyway.
>
>
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 3adbc05ebaac..5de01fb4c557 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -2042,7 +2042,9 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>
> set_bit(XPT_CLOSE, &xprt->xpt_flags);
> spin_unlock_bh(&serv->sv_lock);
>
> svc_xprt_close(xprt);
> +
> + /* ensure we don't squat on the mutex for too long */
> + mutex_unlock(&nfsd_mutex);
> + mutex_lock(&nfsd_mutex);
> spin_lock_bh(&serv->sv_lock);
> }
>
> @@ -2082,6 +2084,10 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> /* always save the latest error */
> if (ret < 0)
> err = ret;
> +
> + /* ensure we don't squat on the mutex for too long */
> + mutex_unlock(&nfsd_mutex);
> + mutex_lock(&nfsd_mutex);
> }
>
> if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks))
I had a look at the rpcb upcall code a couple of weeks ago.
I'm not convinced that setting SOFTCONN in all cases will
help but unfortunately the reasons for my skepticism have
all but leaked out of my head.
Releasing and re-acquiring the mutex is often a sign of
a deeper problem. I think you're in the right vicinity
but I'd like to better understand the actual cause of
the delay. The listener list shouldn't be all that long,
but maybe it has a unintentional loop in it?
I wish we had a reproducer for these issues.
--
Chuck Lever
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-10-11 18:18 ` Chuck Lever III
@ 2024-10-11 19:15 ` Jeff Layton
2024-10-11 21:08 ` NeilBrown
1 sibling, 0 replies; 16+ messages in thread
From: Jeff Layton @ 2024-10-11 19:15 UTC (permalink / raw)
To: Chuck Lever III
Cc: Neil Brown, syzbot, Dai Ngo, Olga Kornievskaia,
Linux Kernel Mailing List, Linux NFS Mailing List,
Lorenzo Bianconi, netdev, Olga Kornievskaia,
syzkaller-bugs@googlegroups.com, Tom Talpey
On Fri, 2024-10-11 at 18:18 +0000, Chuck Lever III wrote:
>
> > On Oct 9, 2024, at 4:26 PM, Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Wed, 2024-09-04 at 10:23 -0400, Chuck Lever wrote:
> > > On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote:
> > > > On Sun, 01 Sep 2024, syzbot wrote:
> > > > > syzbot has found a reproducer for the following issue on:
> > > >
> > > > I had a poke around using the provided disk image and kernel for
> > > > exploring.
> > > >
> > > > I think the problem is demonstrated by this stack :
> > > >
> > > > [<0>] rpc_wait_bit_killable+0x1b/0x160
> > > > [<0>] __rpc_execute+0x723/0x1460
> > > > [<0>] rpc_execute+0x1ec/0x3f0
> > > > [<0>] rpc_run_task+0x562/0x6c0
> > > > [<0>] rpc_call_sync+0x197/0x2e0
> > > > [<0>] rpcb_register+0x36b/0x670
> > > > [<0>] svc_unregister+0x208/0x730
> > > > [<0>] svc_bind+0x1bb/0x1e0
> > > > [<0>] nfsd_create_serv+0x3f0/0x760
> > > > [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90
> > > > [<0>] genl_rcv_msg+0xb16/0xec0
> > > > [<0>] netlink_rcv_skb+0x1e5/0x430
> > > >
> > > > No rpcbind is running on this host so that "svc_unregister" takes a
> > > > long time. Maybe not forever but if a few of these get queued up all
> > > > blocking some other thread, then maybe that pushed it over the limit.
> > > >
> > > > The fact that rpcbind is not running might not be relevant as the test
> > > > messes up the network. "ping 127.0.0.1" stops working.
> > > >
> > > > So this bug comes down to "we try to contact rpcbind while holding a
> > > > mutex and if that gets no response and no error, then we can hold the
> > > > mutex for a long time".
> > > >
> > > > Are we surprised? Do we want to fix this? Any suggestions how?
> > >
> > > In the past, we've tried to address "hanging upcall" issues where
> > > the kernel part of an administrative command needs a user space
> > > service that isn't working or present. (eg mount needing a running
> > > gssd)
> > >
> > > If NFSD is using the kernel RPC client for the upcall, then maybe
> > > adding the RPC_TASK_SOFTCONN flag might turn the hang into an
> > > immediate failure.
> > >
> > > IMO this should be addressed.
> > >
> > >
> >
> > I sent a patch that does the above, but now I'm wondering if we ought
> > to take another approach. The listener array can be pretty long. What
> > if we instead were to just drop and reacquire the mutex in the loop at
> > strategic points? Then we wouldn't squat on the mutex for so long.
> >
> > Something like this maybe? It's ugly but it might prevent hung task
> > warnings, and listener setup isn't a fastpath anyway.
> >
> >
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index 3adbc05ebaac..5de01fb4c557 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -2042,7 +2042,9 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> >
> > set_bit(XPT_CLOSE, &xprt->xpt_flags);
> > spin_unlock_bh(&serv->sv_lock);
> >
> > svc_xprt_close(xprt);
> > +
> > + /* ensure we don't squat on the mutex for too long */
> > + mutex_unlock(&nfsd_mutex);
> > + mutex_lock(&nfsd_mutex);
> > spin_lock_bh(&serv->sv_lock);
> > }
> >
> > @@ -2082,6 +2084,10 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> > /* always save the latest error */
> > if (ret < 0)
> > err = ret;
> > +
> > + /* ensure we don't squat on the mutex for too long */
> > + mutex_unlock(&nfsd_mutex);
> > + mutex_lock(&nfsd_mutex);
> > }
> >
> > if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks))
>
> I had a look at the rpcb upcall code a couple of weeks ago.
> I'm not convinced that setting SOFTCONN in all cases will
> help but unfortunately the reasons for my skepticism have
> all but leaked out of my head.
>
> Releasing and re-acquiring the mutex is often a sign of
> a deeper problem.
It might look that way, but in this case we're iterating over the list
in the netlink call. There's no reason we need to hold the nfsd_mutex
over that entire process. Probably we can reduce the scope a bit
further and make this look a little less sketchy. I'll send a revised
patch.
>
> I think you're in the right vicinity
> but I'd like to better understand the actual cause of
> the delay. The listener list shouldn't be all that long,
> but maybe it has a unintentional loop in it?
>
I don't think that's possible given that this is in a netlink request,
but I'm no expert there.
> I wish we had a reproducer for these issues.
The old interface only allowed one listener to be set at a time. The
new one allows a list, and I don't think it's bounded in any way. You
could send down hundreds of listeners at once, and if they all end up
hanging for a little while trying to talk to rpcbind, then that could
easily cause the hung task warning.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-10-11 18:18 ` Chuck Lever III
2024-10-11 19:15 ` Jeff Layton
@ 2024-10-11 21:08 ` NeilBrown
2024-10-11 21:13 ` Chuck Lever III
1 sibling, 1 reply; 16+ messages in thread
From: NeilBrown @ 2024-10-11 21:08 UTC (permalink / raw)
To: Chuck Lever III
Cc: Jeff Layton, syzbot, Dai Ngo, Olga Kornievskaia,
Linux Kernel Mailing List, Linux NFS Mailing List,
Lorenzo Bianconi, netdev, Olga Kornievskaia,
syzkaller-bugs@googlegroups.com, Tom Talpey
On Sat, 12 Oct 2024, Chuck Lever III wrote:
>
>
> > On Oct 9, 2024, at 4:26 PM, Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Wed, 2024-09-04 at 10:23 -0400, Chuck Lever wrote:
> >> On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote:
> >>> On Sun, 01 Sep 2024, syzbot wrote:
> >>>> syzbot has found a reproducer for the following issue on:
> >>>
> >>> I had a poke around using the provided disk image and kernel for
> >>> exploring.
> >>>
> >>> I think the problem is demonstrated by this stack :
> >>>
> >>> [<0>] rpc_wait_bit_killable+0x1b/0x160
> >>> [<0>] __rpc_execute+0x723/0x1460
> >>> [<0>] rpc_execute+0x1ec/0x3f0
> >>> [<0>] rpc_run_task+0x562/0x6c0
> >>> [<0>] rpc_call_sync+0x197/0x2e0
> >>> [<0>] rpcb_register+0x36b/0x670
> >>> [<0>] svc_unregister+0x208/0x730
> >>> [<0>] svc_bind+0x1bb/0x1e0
> >>> [<0>] nfsd_create_serv+0x3f0/0x760
> >>> [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90
> >>> [<0>] genl_rcv_msg+0xb16/0xec0
> >>> [<0>] netlink_rcv_skb+0x1e5/0x430
> >>>
> >>> No rpcbind is running on this host so that "svc_unregister" takes a
> >>> long time. Maybe not forever but if a few of these get queued up all
> >>> blocking some other thread, then maybe that pushed it over the limit.
> >>>
> >>> The fact that rpcbind is not running might not be relevant as the test
> >>> messes up the network. "ping 127.0.0.1" stops working.
> >>>
> >>> So this bug comes down to "we try to contact rpcbind while holding a
> >>> mutex and if that gets no response and no error, then we can hold the
> >>> mutex for a long time".
> >>>
> >>> Are we surprised? Do we want to fix this? Any suggestions how?
> >>
> >> In the past, we've tried to address "hanging upcall" issues where
> >> the kernel part of an administrative command needs a user space
> >> service that isn't working or present. (eg mount needing a running
> >> gssd)
> >>
> >> If NFSD is using the kernel RPC client for the upcall, then maybe
> >> adding the RPC_TASK_SOFTCONN flag might turn the hang into an
> >> immediate failure.
> >>
> >> IMO this should be addressed.
> >>
> >>
> >
> > I sent a patch that does the above, but now I'm wondering if we ought
> > to take another approach. The listener array can be pretty long. What
> > if we instead were to just drop and reacquire the mutex in the loop at
> > strategic points? Then we wouldn't squat on the mutex for so long.
> >
> > Something like this maybe? It's ugly but it might prevent hung task
> > warnings, and listener setup isn't a fastpath anyway.
> >
> >
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index 3adbc05ebaac..5de01fb4c557 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -2042,7 +2042,9 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> >
> > set_bit(XPT_CLOSE, &xprt->xpt_flags);
> > spin_unlock_bh(&serv->sv_lock);
> >
> > svc_xprt_close(xprt);
> > +
> > + /* ensure we don't squat on the mutex for too long */
> > + mutex_unlock(&nfsd_mutex);
> > + mutex_lock(&nfsd_mutex);
> > spin_lock_bh(&serv->sv_lock);
> > }
> >
> > @@ -2082,6 +2084,10 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> > /* always save the latest error */
> > if (ret < 0)
> > err = ret;
> > +
> > + /* ensure we don't squat on the mutex for too long */
> > + mutex_unlock(&nfsd_mutex);
> > + mutex_lock(&nfsd_mutex);
> > }
> >
> > if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks))
>
> I had a look at the rpcb upcall code a couple of weeks ago.
> I'm not convinced that setting SOFTCONN in all cases will
> help but unfortunately the reasons for my skepticism have
> all but leaked out of my head.
>
> Releasing and re-acquiring the mutex is often a sign of
> a deeper problem. I think you're in the right vicinity
> but I'd like to better understand the actual cause of
> the delay. The listener list shouldn't be all that long,
> but maybe it has a unintentional loop in it?
I think it is wrong to register with rpcbind while holding a mutex.
Registering with rpcbind doesn't need to by synchronous does it? Could
we punt that to a workqueue?
Do we need to get a failure status back somehow??
wait_for_completion_killable() somewhere??
>
> I wish we had a reproducer for these issues.
We do I think. I downloaded the provided kernel and root image and ran
the reproduced supplied (or maybe an earlier version) and it triggered
quite easily.
NeilBrown
>
>
> --
> Chuck Lever
>
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
2024-10-11 21:08 ` NeilBrown
@ 2024-10-11 21:13 ` Chuck Lever III
0 siblings, 0 replies; 16+ messages in thread
From: Chuck Lever III @ 2024-10-11 21:13 UTC (permalink / raw)
To: Neil Brown
Cc: Jeff Layton, syzbot, Dai Ngo, Olga Kornievskaia,
Linux Kernel Mailing List, Linux NFS Mailing List,
Lorenzo Bianconi, netdev, Olga Kornievskaia,
syzkaller-bugs@googlegroups.com, Tom Talpey
> On Oct 11, 2024, at 5:08 PM, NeilBrown <neilb@suse.de> wrote:
>
> On Sat, 12 Oct 2024, Chuck Lever III wrote:
>>
>>
>>> On Oct 9, 2024, at 4:26 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>
>>> On Wed, 2024-09-04 at 10:23 -0400, Chuck Lever wrote:
>>>> On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote:
>>>>> On Sun, 01 Sep 2024, syzbot wrote:
>>>>>> syzbot has found a reproducer for the following issue on:
>>>>>
>>>>> I had a poke around using the provided disk image and kernel for
>>>>> exploring.
>>>>>
>>>>> I think the problem is demonstrated by this stack :
>>>>>
>>>>> [<0>] rpc_wait_bit_killable+0x1b/0x160
>>>>> [<0>] __rpc_execute+0x723/0x1460
>>>>> [<0>] rpc_execute+0x1ec/0x3f0
>>>>> [<0>] rpc_run_task+0x562/0x6c0
>>>>> [<0>] rpc_call_sync+0x197/0x2e0
>>>>> [<0>] rpcb_register+0x36b/0x670
>>>>> [<0>] svc_unregister+0x208/0x730
>>>>> [<0>] svc_bind+0x1bb/0x1e0
>>>>> [<0>] nfsd_create_serv+0x3f0/0x760
>>>>> [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90
>>>>> [<0>] genl_rcv_msg+0xb16/0xec0
>>>>> [<0>] netlink_rcv_skb+0x1e5/0x430
>>>>>
>>>>> No rpcbind is running on this host so that "svc_unregister" takes a
>>>>> long time. Maybe not forever but if a few of these get queued up all
>>>>> blocking some other thread, then maybe that pushed it over the limit.
>>>>>
>>>>> The fact that rpcbind is not running might not be relevant as the test
>>>>> messes up the network. "ping 127.0.0.1" stops working.
>>>>>
>>>>> So this bug comes down to "we try to contact rpcbind while holding a
>>>>> mutex and if that gets no response and no error, then we can hold the
>>>>> mutex for a long time".
>>>>>
>>>>> Are we surprised? Do we want to fix this? Any suggestions how?
>>>>
>>>> In the past, we've tried to address "hanging upcall" issues where
>>>> the kernel part of an administrative command needs a user space
>>>> service that isn't working or present. (eg mount needing a running
>>>> gssd)
>>>>
>>>> If NFSD is using the kernel RPC client for the upcall, then maybe
>>>> adding the RPC_TASK_SOFTCONN flag might turn the hang into an
>>>> immediate failure.
>>>>
>>>> IMO this should be addressed.
>>>>
>>>>
>>>
>>> I sent a patch that does the above, but now I'm wondering if we ought
>>> to take another approach. The listener array can be pretty long. What
>>> if we instead were to just drop and reacquire the mutex in the loop at
>>> strategic points? Then we wouldn't squat on the mutex for so long.
>>>
>>> Something like this maybe? It's ugly but it might prevent hung task
>>> warnings, and listener setup isn't a fastpath anyway.
>>>
>>>
>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>> index 3adbc05ebaac..5de01fb4c557 100644
>>> --- a/fs/nfsd/nfsctl.c
>>> +++ b/fs/nfsd/nfsctl.c
>>> @@ -2042,7 +2042,9 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>
>>> set_bit(XPT_CLOSE, &xprt->xpt_flags);
>>> spin_unlock_bh(&serv->sv_lock);
>>>
>>> svc_xprt_close(xprt);
>>> +
>>> + /* ensure we don't squat on the mutex for too long */
>>> + mutex_unlock(&nfsd_mutex);
>>> + mutex_lock(&nfsd_mutex);
>>> spin_lock_bh(&serv->sv_lock);
>>> }
>>>
>>> @@ -2082,6 +2084,10 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>> /* always save the latest error */
>>> if (ret < 0)
>>> err = ret;
>>> +
>>> + /* ensure we don't squat on the mutex for too long */
>>> + mutex_unlock(&nfsd_mutex);
>>> + mutex_lock(&nfsd_mutex);
>>> }
>>>
>>> if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks))
>>
>> I had a look at the rpcb upcall code a couple of weeks ago.
>> I'm not convinced that setting SOFTCONN in all cases will
>> help but unfortunately the reasons for my skepticism have
>> all but leaked out of my head.
>>
>> Releasing and re-acquiring the mutex is often a sign of
>> a deeper problem. I think you're in the right vicinity
>> but I'd like to better understand the actual cause of
>> the delay. The listener list shouldn't be all that long,
>> but maybe it has a unintentional loop in it?
>
> I think it is wrong to register with rpcbind while holding a mutex.
> Registering with rpcbind doesn't need to by synchronous does it? Could
> we punt that to a workqueue?
> Do we need to get a failure status back somehow??
> wait_for_completion_killable() somewhere??
I think kernel RPC service start-up needs to fail immediately
if rpcbind registration doesn't work.
--
Chuck Lever
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-10-11 21:13 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28 17:54 [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit syzbot
2024-05-28 18:10 ` Jeff Layton
2024-08-04 23:16 ` syzbot
2024-08-31 18:22 ` syzbot
2024-09-01 4:35 ` Edward Adam Davis
2024-09-01 4:51 ` syzbot
2024-09-01 6:25 ` Edward Adam Davis
2024-09-01 6:50 ` syzbot
2024-09-02 1:57 ` NeilBrown
2024-09-04 14:23 ` Chuck Lever
2024-09-04 14:36 ` Jeff Layton
2024-10-09 20:26 ` Jeff Layton
2024-10-11 18:18 ` Chuck Lever III
2024-10-11 19:15 ` Jeff Layton
2024-10-11 21:08 ` NeilBrown
2024-10-11 21:13 ` Chuck Lever III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox