linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_read_iter
@ 2024-11-27 21:22 syzbot
  2024-11-29 16:19 ` David Howells
  0 siblings, 1 reply; 3+ messages in thread
From: syzbot @ 2024-11-27 21:22 UTC (permalink / raw)
  To: asmadeus, bharathsm, brauner, dhowells, ericvh, jlayton,
	linux-afs, linux-cifs, linux-fsdevel, linux-kernel,
	linux-trace-kernel, linux_oss, lucho, marc.dionne,
	mathieu.desnoyers, mhiramat, netfs, pc, ronniesahlberg, rostedt,
	samba-technical, sfrench, sprasad, syzkaller-bugs, tom, v9fs

Hello,

syzbot found the following issue on:

HEAD commit:    ed9a4ad6e5bd Add linux-next specific files for 20241126
git tree:       linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=17615530580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=e348a4873516af92
dashboard link: https://syzkaller.appspot.com/bug?extid=8965fea6a159ab9aa32d
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=175c8dc0580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1103a1e8580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/3b9ba5fbd895/disk-ed9a4ad6.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/e30aed8d0305/vmlinux-ed9a4ad6.xz
kernel image: https://storage.googleapis.com/syzbot-assets/1092546e50cf/bzImage-ed9a4ad6.xz

The issue was bisected to:

commit 1bd9011ee163e11f186b72705978fd6b21bdc07b
Author: David Howells <dhowells@redhat.com>
Date:   Fri Nov 8 17:32:29 2024 +0000

    netfs: Change the read result collector to only use one work item

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=174cc3c0580000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=14ccc3c0580000
console output: https://syzkaller.appspot.com/x/log.txt?x=10ccc3c0580000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8965fea6a159ab9aa32d@syzkaller.appspotmail.com
Fixes: 1bd9011ee163 ("netfs: Change the read result collector to only use one work item")

INFO: task syz-executor246:5857 blocked for more than 143 seconds.
      Not tainted 6.12.0-next-20241126-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor246 state:D stack:22584 pid:5857  tgid:5857  ppid:5854   flags:0x00004006
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5369 [inline]
 __schedule+0x1850/0x4c30 kernel/sched/core.c:6756
 __schedule_loop kernel/sched/core.c:6833 [inline]
 schedule+0x14b/0x320 kernel/sched/core.c:6848
 bit_wait+0x12/0xd0 kernel/sched/wait_bit.c:237
 __wait_on_bit+0xb0/0x2f0 kernel/sched/wait_bit.c:49
 out_of_line_wait_on_bit+0x1d5/0x260 kernel/sched/wait_bit.c:64
 wait_on_bit include/linux/wait_bit.h:77 [inline]
 netfs_dispatch_unbuffered_reads fs/netfs/direct_read.c:107 [inline]
 netfs_unbuffered_read fs/netfs/direct_read.c:146 [inline]
 netfs_unbuffered_read_iter_locked+0xd7e/0x1560 fs/netfs/direct_read.c:231
 netfs_unbuffered_read_iter+0xbf/0xe0 fs/netfs/direct_read.c:266
 new_sync_read fs/read_write.c:484 [inline]
 vfs_read+0x991/0xb70 fs/read_write.c:565
 ksys_read+0x18f/0x2b0 fs/read_write.c:708
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f1999cb4f79
RSP: 002b:00007ffef9ec4f98 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0030656c69662f2e RCX: 00007f1999cb4f79
RDX: 0000000000002020 RSI: 000000002001b640 RDI: 0000000000000006
RBP: 00007f1999cfe04e R08: 0000000000000006 R09: 0000000000000006
R10: 0000000000000006 R11: 0000000000000246 R12: 00007ffef9ec4fac
R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001
 </TASK>

Showing all locks held in the system:
1 lock held by khungtaskd/30:
 #0: ffffffff8e937ae0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
 #0: ffffffff8e937ae0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
 #0: ffffffff8e937ae0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x55/0x2a0 kernel/locking/lockdep.c:6744
5 locks held by kworker/u8:2/35:
2 locks held by getty/5592:
 #0: ffff888034e320a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
 #1: ffffc90002fde2f0 (&ldata->atomic_read_lock){+.+.}-{4:4}, at: n_tty_read+0x6a6/0x1e00 drivers/tty/n_tty.c:2211
1 lock held by syz-executor246/5857:
 #0: ffff888072050148 (&sb->s_type->i_mutex_key#15){++++}-{4:4}, at: netfs_start_io_direct+0x1d4/0x210 fs/netfs/locking.c:188

=============================================

NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 30 Comm: khungtaskd Not tainted 6.12.0-next-20241126-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 nmi_cpu_backtrace+0x49c/0x4d0 lib/nmi_backtrace.c:113
 nmi_trigger_cpumask_backtrace+0x198/0x320 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:162 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:234 [inline]
 watchdog+0xffb/0x1040 kernel/hung_task.c:397
 kthread+0x2f0/0x390 kernel/kthread.c:389
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 UID: 0 PID: 35 Comm: kworker/u8:2 Not tainted 6.12.0-next-20241126-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: events_unbound toggle_allocation_gate
RIP: 0010:insn_get_sib arch/x86/lib/insn.c:447 [inline]
RIP: 0010:insn_get_displacement+0x2ca/0x9a0 arch/x86/lib/insn.c:484
Code: cb f5 49 bc 00 00 00 00 00 fc ff df e9 12 04 00 00 49 8d 5d 25 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df 0f b6 04 08 <84> c0 0f 85 95 05 00 00 0f b6 1b 31 ff 89 de e8 72 c9 cb f5 85 db
RSP: 0018:ffffc90000ab7790 EFLAGS: 00000a07
RAX: 0000000000000000 RBX: ffffc90000ab7945 RCX: dffffc0000000000
RDX: ffff888020a81e00 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90000ab794c R08: ffffffff8bd3c21d R09: ffffffff8bd395b6
R10: 0000000000000002 R11: ffff888020a81e00 R12: 1ffff92000156f29
R13: ffffc90000ab7920 R14: 1ffff92000156f2a R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbd0e4b4580 CR3: 000000000e736000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <TASK>
 insn_get_immediate+0x62/0x11f0 arch/x86/lib/insn.c:650
 insn_get_length arch/x86/lib/insn.c:723 [inline]
 insn_decode+0x2d6/0x4c0 arch/x86/lib/insn.c:762
 text_poke_loc_init+0xed/0x870 arch/x86/kernel/alternative.c:2432
 arch_jump_label_transform_queue+0x8f/0x100 arch/x86/kernel/jump_label.c:138
 __jump_label_update+0x177/0x3a0 kernel/jump_label.c:513
 static_key_disable_cpuslocked+0xd2/0x1c0 kernel/jump_label.c:240
 static_key_disable+0x1a/0x20 kernel/jump_label.c:248
 toggle_allocation_gate+0x1bf/0x260 mm/kfence/core.c:854
 process_one_work kernel/workqueue.c:3229 [inline]
 process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
 worker_thread+0x870/0xd30 kernel/workqueue.c:3391
 kthread+0x2f0/0x390 kernel/kthread.c:389
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.344 msecs


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_read_iter
  2024-11-27 21:22 [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_read_iter syzbot
@ 2024-11-29 16:19 ` David Howells
  2024-11-29 17:15   ` syzbot
  0 siblings, 1 reply; 3+ messages in thread
From: David Howells @ 2024-11-29 16:19 UTC (permalink / raw)
  To: syzbot
  Cc: dhowells, asmadeus, bharathsm, brauner, ericvh, jlayton,
	linux-afs, linux-cifs, linux-fsdevel, linux-kernel,
	linux-trace-kernel, linux_oss, lucho, marc.dionne,
	mathieu.desnoyers, mhiramat, netfs, pc, ronniesahlberg, rostedt,
	samba-technical, sfrench, sprasad, syzkaller-bugs, tom, v9fs

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git netfs-writeback

commit 1daca71a815b0d8cfe3db81a31b6dd3fc0da4b50
Author: David Howells <dhowells@redhat.com>
Date:   Fri Nov 29 16:19:03 2024 +0000

    netfs: Fix hang in synchronous read due to failed subreq
    
    When netfs is performing a synchronous read, it doesn't offload the
    collection of results off to a workqueue, but rather does the collection in
    the app thread - thereby avoiding the overhead of using a work queue.
    
    However, if a failure occurs and we might want to retry or if it wants to
    throttle the production of new subreqs, netfs can throw a pause on the
    producer by setting NETFS_RREQ_PAUSE.  This is fine if collection is done
    by workqueue, but in synchronous mode, the collection and retry is done
    after the producer loop completes - thereby deadlocking the two parts.
    
    Fix this by making the synchronous read producer loop, when it sees the
    PAUSE flag, go and wait for the flag to be cleared or the op to complete
    whilst running the collector to process results.
    
    This fixes "netfs: Change the read result collector to only use one work
    item" - but there's no stable commit ID yet.
    
    Reported-by: syzbot+8965fea6a159ab9aa32d@syzkaller.appspotmail.com
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Jeff Layton <jlayton@kernel.org>
    cc: netfs@lists.linux.dev
    cc: linux-fsdevel@vger.kernel.org

diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c
index dedcfc2bab2d..0bf3c2f5a710 100644
--- a/fs/netfs/direct_read.c
+++ b/fs/netfs/direct_read.c
@@ -102,10 +102,8 @@ static int netfs_dispatch_unbuffered_reads(struct netfs_io_request *rreq)
 
 		rreq->netfs_ops->issue_read(subreq);
 
-		if (test_bit(NETFS_RREQ_PAUSE, &rreq->flags)) {
-			trace_netfs_rreq(rreq, netfs_rreq_trace_wait_pause);
-			wait_on_bit(&rreq->flags, NETFS_RREQ_PAUSE, TASK_UNINTERRUPTIBLE);
-		}
+		if (test_bit(NETFS_RREQ_PAUSE, &rreq->flags))
+			netfs_wait_for_pause(rreq);
 		if (test_bit(NETFS_RREQ_FAILED, &rreq->flags))
 			break;
 		if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) &&
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 334bf9f6e6f2..db59ed8880e3 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -96,6 +96,7 @@ void netfs_read_collection_worker(struct work_struct *work);
 void netfs_wake_read_collector(struct netfs_io_request *rreq);
 void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error, bool was_async);
 ssize_t netfs_wait_for_read(struct netfs_io_request *rreq);
+void netfs_wait_for_pause(struct netfs_io_request *rreq);
 
 /*
  * read_pgpriv2.c
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index f1b15c20b6f8..3803ef5894c8 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -312,7 +312,7 @@ static void netfs_collect_read_results(struct netfs_io_request *rreq)
 	if ((notes & MADE_PROGRESS) && test_bit(NETFS_RREQ_PAUSE, &rreq->flags)) {
 		trace_netfs_rreq(rreq, netfs_rreq_trace_unpause);
 		clear_bit_unlock(NETFS_RREQ_PAUSE, &rreq->flags);
-		wake_up_bit(&rreq->flags, NETFS_RREQ_PAUSE);
+		wake_up(&rreq->waitq);
 	}
 
 	if (notes & MADE_PROGRESS) {
@@ -659,3 +659,39 @@ ssize_t netfs_wait_for_read(struct netfs_io_request *rreq)
 
 	return ret;
 }
+
+/*
+ * Wait for a paused read operation to unpause or complete in some manner.
+ */
+void netfs_wait_for_pause(struct netfs_io_request *rreq)
+{
+	struct netfs_io_subrequest *subreq;
+	struct netfs_io_stream *stream = &rreq->io_streams[0];
+	DEFINE_WAIT(myself);
+
+	trace_netfs_rreq(rreq, netfs_rreq_trace_wait_pause);
+
+	for (;;) {
+		trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue);
+		prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE);
+
+		subreq = list_first_entry_or_null(&stream->subrequests,
+						  struct netfs_io_subrequest, rreq_link);
+		if (subreq &&
+		    (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags) ||
+		     test_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags))) {
+			__set_current_state(TASK_RUNNING);
+			netfs_read_collection(rreq);
+			continue;
+		}
+
+		if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags) ||
+		    !test_bit(NETFS_RREQ_PAUSE, &rreq->flags))
+			break;
+
+		schedule();
+		trace_netfs_rreq(rreq, netfs_rreq_trace_woke_queue);
+	}
+
+	finish_wait(&rreq->waitq, &myself);
+}
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index 4a1499167770..bb74f30a4216 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -324,7 +324,7 @@ static void netfs_collect_write_results(struct netfs_io_request *wreq)
 	if ((notes & MADE_PROGRESS) && test_bit(NETFS_RREQ_PAUSE, &wreq->flags)) {
 		trace_netfs_rreq(wreq, netfs_rreq_trace_unpause);
 		clear_bit_unlock(NETFS_RREQ_PAUSE, &wreq->flags);
-		wake_up_bit(&wreq->flags, NETFS_RREQ_PAUSE);
+		wake_up(&wreq->waitq);
 	}
 
 	if (notes & NEED_REASSESS) {
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index cd2b349243b3..6506bf1d970e 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -721,7 +721,7 @@ int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t
 		rolling_buffer_advance(&wreq->buffer, part);
 		if (test_bit(NETFS_RREQ_PAUSE, &wreq->flags)) {
 			trace_netfs_rreq(wreq, netfs_rreq_trace_wait_pause);
-			wait_on_bit(&wreq->flags, NETFS_RREQ_PAUSE, TASK_UNINTERRUPTIBLE);
+			wait_event(wreq->waitq, !test_bit(NETFS_RREQ_PAUSE, &wreq->flags));
 		}
 		if (test_bit(NETFS_RREQ_FAILED, &wreq->flags))
 			break;


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_read_iter
  2024-11-29 16:19 ` David Howells
@ 2024-11-29 17:15   ` syzbot
  0 siblings, 0 replies; 3+ messages in thread
From: syzbot @ 2024-11-29 17:15 UTC (permalink / raw)
  To: asmadeus, bharathsm, brauner, dhowells, ericvh, jlayton,
	linux-afs, linux-cifs, linux-fsdevel, linux-kernel,
	linux-trace-kernel, linux_oss, lucho, marc.dionne,
	mathieu.desnoyers, mhiramat, netfs, pc, ronniesahlberg, rostedt,
	samba-technical, sfrench, sprasad, syzkaller-bugs, tom, v9fs

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+8965fea6a159ab9aa32d@syzkaller.appspotmail.com
Tested-by: syzbot+8965fea6a159ab9aa32d@syzkaller.appspotmail.com

Tested on:

commit:         2aece382 netfs: Report on NULL folioq in netfs_writeba..
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git netfs-writeback
console output: https://syzkaller.appspot.com/x/log.txt?x=159cf3c0580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=225053d900535838
dashboard link: https://syzkaller.appspot.com/bug?extid=8965fea6a159ab9aa32d
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=1328f3c0580000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-11-29 17:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-27 21:22 [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_read_iter syzbot
2024-11-29 16:19 ` David Howells
2024-11-29 17:15   ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).