linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync
@ 2025-08-18  8:04 syzbot
  2025-08-18 11:44 ` Oleg Nesterov
  2025-08-19 16:10 ` [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN Oleg Nesterov
  0 siblings, 2 replies; 9+ messages in thread
From: syzbot @ 2025-08-18  8:04 UTC (permalink / raw)
  To: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
	linux-fsdevel, linux-kernel, linux-mm, oleg, syzkaller-bugs, viro,
	willy

Hello,

syzbot found the following issue on:

HEAD commit:    038d61fd6422 Linux 6.16
git tree:       upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=15f5a234580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=515ec0b49771bcd1
dashboard link: https://syzkaller.appspot.com/bug?extid=d1b5dace43896bc386c3
compiler:       Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=158063a2580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1335d3a2580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/34e894532715/disk-038d61fd.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/b6a27a46b9dc/vmlinux-038d61fd.xz
kernel image: https://storage.googleapis.com/syzbot-assets/f97a9c8d8216/bzImage-038d61fd.xz

The issue was bisected to:

commit aaec5a95d59615523db03dd53c2052f0a87beea7
Author: Oleg Nesterov <oleg@redhat.com>
Date:   Thu Jan 2 14:07:15 2025 +0000

    pipe_read: don't wake up the writer if the pipe is still full

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1498e3a2580000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=1698e3a2580000
console output: https://syzkaller.appspot.com/x/log.txt?x=1298e3a2580000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
Fixes: aaec5a95d596 ("pipe_read: don't wake up the writer if the pipe is still full")

INFO: task syz-executor224:5849 blocked for more than 143 seconds.
      Not tainted 6.16.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor224 state:D stack:22952 pid:5849  tgid:5849  ppid:5848   task_flags:0x400140 flags:0x00004006
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5397 [inline]
 __schedule+0x16aa/0x4c90 kernel/sched/core.c:6786
 __schedule_loop kernel/sched/core.c:6864 [inline]
 schedule+0x165/0x360 kernel/sched/core.c:6879
 io_schedule+0x81/0xe0 kernel/sched/core.c:7724
 folio_wait_bit_common+0x6b0/0xb90 mm/filemap.c:1317
 folio_wait_writeback+0xb0/0x100 mm/page-writeback.c:3126
 __filemap_fdatawait_range+0x147/0x230 mm/filemap.c:539
 file_write_and_wait_range+0x275/0x330 mm/filemap.c:798
 v9fs_file_fsync+0xcf/0x1a0 fs/9p/vfs_file.c:418
 generic_write_sync include/linux/fs.h:3031 [inline]
 netfs_file_write_iter+0x3d8/0x4a0 fs/netfs/buffered_write.c:494
 new_sync_write fs/read_write.c:593 [inline]
 vfs_write+0x54b/0xa90 fs/read_write.c:686
 ksys_write+0x145/0x250 fs/read_write.c:738
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb29049bef9
RSP: 002b:00007ffeb3361588 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000200000000140 RCX: 00007fb29049bef9
RDX: 0000000000007fec RSI: 0000200000000300 RDI: 0000000000000007
RBP: 0030656c69662f2e R08: 0000000000000006 R09: 0000000000000006
R10: 0000000000000006 R11: 0000000000000246 R12: 0000200000000180
R13: 00007fb2904e504e R14: 0000000000000001 R15: 0000000000000001
 </TASK>

Showing all locks held in the system:
2 locks held by kworker/u8:0/12:
1 lock held by khungtaskd/31:
 #0: ffffffff8e13f0e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
 #0: ffffffff8e13f0e0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline]
 #0: ffffffff8e13f0e0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x2e/0x180 kernel/locking/lockdep.c:6770
2 locks held by kworker/u8:6/1337:
 #0: ffff88801a489148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3213 [inline]
 #0: ffff88801a489148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x9b4/0x17b0 kernel/workqueue.c:3321
 #1: ffffc9000451fbc0 ((work_completion)(&rreq->work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3214 [inline]
 #1: ffffc9000451fbc0 ((work_completion)(&rreq->work)){+.+.}-{0:0}, at: process_scheduled_works+0x9ef/0x17b0 kernel/workqueue.c:3321
2 locks held by getty/5596:
 #0: ffff88803095f0a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
 #1: ffffc900036cb2f0 (&ldata->atomic_read_lock){+.+.}-{4:4}, at: n_tty_read+0x43e/0x1400 drivers/tty/n_tty.c:2222
1 lock held by syz-executor224/5849:
 #0: ffff88807f8cc428 (sb_writers#8){.+.+}-{0:0}, at: file_start_write include/linux/fs.h:3096 [inline]
 #0: ffff88807f8cc428 (sb_writers#8){.+.+}-{0:0}, at: vfs_write+0x211/0xa90 fs/read_write.c:682

=============================================

NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 31 Comm: khungtaskd Not tainted 6.16.0-syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
Call Trace:
 <TASK>
 dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
 nmi_cpu_backtrace+0x39e/0x3d0 lib/nmi_backtrace.c:113
 nmi_trigger_cpumask_backtrace+0x17a/0x300 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:158 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:307 [inline]
 watchdog+0xfee/0x1030 kernel/hung_task.c:470
 kthread+0x70e/0x8a0 kernel/kthread.c:464
 ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.16.0-syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:pv_native_safe_halt+0x13/0x20 arch/x86/kernel/paravirt.c:82
Code: 53 de 02 00 cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d d3 ad 21 00 f3 0f 1e fa fb f4 <c3> cc cc cc cc cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffffff8de07d80 EFLAGS: 000002c2
RAX: eefad1cde067ed00 RBX: ffffffff81976918 RCX: eefad1cde067ed00
RDX: 0000000000000001 RSI: ffffffff8d982fba RDI: ffffffff8be1ba40
RBP: ffffffff8de07ea8 R08: ffff8880b8632f5b R09: 1ffff110170c65eb
R10: dffffc0000000000 R11: ffffed10170c65ec R12: ffffffff8fa0b3f0
R13: 0000000000000000 R14: 0000000000000000 R15: 1ffffffff1bd2a50
FS:  0000000000000000(0000) GS:ffff888125c57000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055943a295660 CR3: 000000000df38000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 arch_safe_halt arch/x86/include/asm/paravirt.h:107 [inline]
 default_idle+0x13/0x20 arch/x86/kernel/process.c:749
 default_idle_call+0x74/0xb0 kernel/sched/idle.c:117
 cpuidle_idle_call kernel/sched/idle.c:185 [inline]
 do_idle+0x1e8/0x510 kernel/sched/idle.c:325
 cpu_startup_entry+0x44/0x60 kernel/sched/idle.c:423
 rest_init+0x2de/0x300 init/main.c:745
 start_kernel+0x47d/0x500 init/main.c:1102
 x86_64_start_reservations+0x24/0x30 arch/x86/kernel/head64.c:307
 x86_64_start_kernel+0x143/0x1c0 arch/x86/kernel/head64.c:288
 common_startup_64+0x13e/0x147
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync
  2025-08-18  8:04 [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync syzbot
@ 2025-08-18 11:44 ` Oleg Nesterov
  2025-08-18 12:36   ` syzbot
  2025-08-19 16:10 ` [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN Oleg Nesterov
  1 sibling, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2025-08-18 11:44 UTC (permalink / raw)
  To: syzbot
  Cc: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
	linux-fsdevel, linux-kernel, linux-mm, syzkaller-bugs, viro,
	willy

On 08/18, syzbot wrote:
>
> HEAD commit:    038d61fd6422 Linux 6.16

#syz test: upstream 038d61fd6422

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 339ec4e54778..474fe67f72ac 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -666,7 +666,6 @@ static void p9_poll_mux(struct p9_conn *m)
 
 static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 {
-	__poll_t n;
 	int err;
 	struct p9_trans_fd *ts = client->trans;
 	struct p9_conn *m = &ts->conn;
@@ -686,13 +685,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 	list_add_tail(&req->req_list, &m->unsent_req_list);
 	spin_unlock(&m->req_lock);
 
-	if (test_and_clear_bit(Wpending, &m->wsched))
-		n = EPOLLOUT;
-	else
-		n = p9_fd_poll(m->client, NULL, NULL);
-
-	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
-		schedule_work(&m->wq);
+	p9_poll_mux(m);
 
 	return 0;
 }



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync
  2025-08-18 11:44 ` Oleg Nesterov
@ 2025-08-18 12:36   ` syzbot
  2025-08-18 12:56     ` Oleg Nesterov
  2025-08-18 13:04     ` Oleg Nesterov
  0 siblings, 2 replies; 9+ messages in thread
From: syzbot @ 2025-08-18 12:36 UTC (permalink / raw)
  To: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
	linux-fsdevel, linux-kernel, linux-mm, oleg, syzkaller-bugs, viro,
	willy

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
Tested-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com

Tested on:

commit:         038d61fd Linux 6.16
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1317eba2580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=515ec0b49771bcd1
dashboard link: https://syzkaller.appspot.com/bug?extid=d1b5dace43896bc386c3
compiler:       Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
patch:          https://syzkaller.appspot.com/x/patch.diff?x=15806442580000

Note: testing is done by a robot and is best-effort only.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync
  2025-08-18 12:36   ` syzbot
@ 2025-08-18 12:56     ` Oleg Nesterov
  2025-08-18 21:55       ` Dominique Martinet
  2025-08-18 13:04     ` Oleg Nesterov
  1 sibling, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2025-08-18 12:56 UTC (permalink / raw)
  To: syzbot, David Howells, Dominique Martinet, K Prateek Nayak
  Cc: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
	linux-fsdevel, linux-kernel, linux-mm, syzkaller-bugs, viro,
	willy

On 08/18, syzbot wrote:
>
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger any issue:
>
> Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
> Tested-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
>
> Tested on:
>
> commit:         038d61fd Linux 6.16

And trans_fd.c wasn't changed since 038d61fd...

Dominique, David,

Perhaps you can reconsider the fix that Prateek and I tried to propose
in this thread

	[syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
	https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/

Oleg.
---

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 339ec4e54778..474fe67f72ac 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -666,7 +666,6 @@ static void p9_poll_mux(struct p9_conn *m)
 
 static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 {
-	__poll_t n;
 	int err;
 	struct p9_trans_fd *ts = client->trans;
 	struct p9_conn *m = &ts->conn;
@@ -686,13 +685,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 	list_add_tail(&req->req_list, &m->unsent_req_list);
 	spin_unlock(&m->req_lock);
 
-	if (test_and_clear_bit(Wpending, &m->wsched))
-		n = EPOLLOUT;
-	else
-		n = p9_fd_poll(m->client, NULL, NULL);
-
-	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
-		schedule_work(&m->wq);
+	p9_poll_mux(m);
 
 	return 0;
 }



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync
  2025-08-18 12:36   ` syzbot
  2025-08-18 12:56     ` Oleg Nesterov
@ 2025-08-18 13:04     ` Oleg Nesterov
  1 sibling, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2025-08-18 13:04 UTC (permalink / raw)
  To: syzbot, David Howells, Dominique Martinet, K Prateek Nayak
  Cc: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
	linux-fsdevel, linux-kernel, linux-mm, syzkaller-bugs, viro,
	willy, Eric Van Hensbergen, Latchesar Ionkov, v9fs

On 08/18, syzbot wrote:
>
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger any issue:
>
> Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
> Tested-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
>
> Tested on:
>
> commit:         038d61fd Linux 6.16

And trans_fd.c wasn't changed since 038d61fd...

Dominique, David,

Perhaps you can reconsider the fix that Prateek and I tried to propose
in this thread

	[syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
	https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/

Oleg.
---

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 339ec4e54778..474fe67f72ac 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -666,7 +666,6 @@ static void p9_poll_mux(struct p9_conn *m)
 
 static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 {
-	__poll_t n;
 	int err;
 	struct p9_trans_fd *ts = client->trans;
 	struct p9_conn *m = &ts->conn;
@@ -686,13 +685,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 	list_add_tail(&req->req_list, &m->unsent_req_list);
 	spin_unlock(&m->req_lock);
 
-	if (test_and_clear_bit(Wpending, &m->wsched))
-		n = EPOLLOUT;
-	else
-		n = p9_fd_poll(m->client, NULL, NULL);
-
-	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
-		schedule_work(&m->wq);
+	p9_poll_mux(m);
 
 	return 0;
 }



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync
  2025-08-18 12:56     ` Oleg Nesterov
@ 2025-08-18 21:55       ` Dominique Martinet
  0 siblings, 0 replies; 9+ messages in thread
From: Dominique Martinet @ 2025-08-18 21:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: syzbot, David Howells, K Prateek Nayak, akpm, brauner, dvyukov,
	elver, glider, jack, kasan-dev, linux-fsdevel, linux-kernel,
	linux-mm, syzkaller-bugs, viro, willy

Hi Oleg,

Oleg Nesterov wrote on Mon, Aug 18, 2025 at 02:56:26PM +0200:
> On 08/18, syzbot wrote:
> > syzbot has tested the proposed patch and the reproducer did not trigger any issue:

(I hate that syzbot identified "hung in v9fs_file_fsync" but doesn't
bother to Cc 9p folks... all the time..)

> Dominique, David,
> 
> Perhaps you can reconsider the fix that Prateek and I tried to propose
> in this thread
> 
> 	[syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
> 	https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/

I've re-read that thread, and I still think this must be a problem
specific to syzbot doing obviously bogus things (e.g. replying before
request, or whatever it is this particular repro is doing), but I guess
your patch is also sane enough and the 9p optimization is probably not
really needed here

Please resend as a proper patch, and I'll just run some quick check (and
a trivial benchmark) and pick it up

Thanks,
-- 
Dominique Martinet | Asmadeus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN
  2025-08-18  8:04 [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync syzbot
  2025-08-18 11:44 ` Oleg Nesterov
@ 2025-08-19 16:10 ` Oleg Nesterov
  2025-08-19 16:13   ` Oleg Nesterov
  2025-08-20  6:29   ` K Prateek Nayak
  1 sibling, 2 replies; 9+ messages in thread
From: Oleg Nesterov @ 2025-08-19 16:10 UTC (permalink / raw)
  To: Dominique Martinet, K Prateek Nayak, syzbot
  Cc: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
	linux-fsdevel, linux-kernel, linux-mm, syzkaller-bugs, viro,
	willy, v9fs, David Howells

p9_read_work() doesn't set Rworksched and doesn't do schedule_work(m->rq)
if list_empty(&m->req_list).

However, if the pipe is full, we need to read more data and this used to
work prior to commit aaec5a95d59615 ("pipe_read: don't wake up the writer
if the pipe is still full").

p9_read_work() does p9_fd_read() -> ... -> anon_pipe_read() which (before
the commit above) triggered the unnecessary wakeup. This wakeup calls
p9_pollwake() which kicks p9_poll_workfn() -> p9_poll_mux(), p9_poll_mux()
will notice EPOLLIN and schedule_work(&m->rq).

This no longer happens after the optimization above, change p9_fd_request()
to use p9_poll_mux() instead of only checking for EPOLLOUT.

Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
Tested-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68a2de8f.050a0220.e29e5.0097.GAE@google.com/
Link: https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/
Co-developed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 net/9p/trans_fd.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 339ec4e54778..474fe67f72ac 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -666,7 +666,6 @@ static void p9_poll_mux(struct p9_conn *m)
 
 static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 {
-	__poll_t n;
 	int err;
 	struct p9_trans_fd *ts = client->trans;
 	struct p9_conn *m = &ts->conn;
@@ -686,13 +685,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 	list_add_tail(&req->req_list, &m->unsent_req_list);
 	spin_unlock(&m->req_lock);
 
-	if (test_and_clear_bit(Wpending, &m->wsched))
-		n = EPOLLOUT;
-	else
-		n = p9_fd_poll(m->client, NULL, NULL);
-
-	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
-		schedule_work(&m->wq);
+	p9_poll_mux(m);
 
 	return 0;
 }
-- 
2.25.1.362.g51ebf55




^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN
  2025-08-19 16:10 ` [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN Oleg Nesterov
@ 2025-08-19 16:13   ` Oleg Nesterov
  2025-08-20  6:29   ` K Prateek Nayak
  1 sibling, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2025-08-19 16:13 UTC (permalink / raw)
  To: Dominique Martinet, K Prateek Nayak, syzbot
  Cc: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
	linux-fsdevel, linux-kernel, linux-mm, syzkaller-bugs, viro,
	willy, v9fs, David Howells

On 08/19, Oleg Nesterov wrote:
>
> Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
> Tested-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68a2de8f.050a0220.e29e5.0097.GAE@google.com/
> Link: https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/
> Co-developed-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Prateek, I turned your "Reviewed-by" from the previous discussion
https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/
into Co-developed-by + Signed-off-by, I hope you won't object?

Oleg.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN
  2025-08-19 16:10 ` [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN Oleg Nesterov
  2025-08-19 16:13   ` Oleg Nesterov
@ 2025-08-20  6:29   ` K Prateek Nayak
  1 sibling, 0 replies; 9+ messages in thread
From: K Prateek Nayak @ 2025-08-20  6:29 UTC (permalink / raw)
  To: Oleg Nesterov, Dominique Martinet, syzbot
  Cc: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
	linux-fsdevel, linux-kernel, linux-mm, syzkaller-bugs, viro,
	willy, v9fs, David Howells

Hello Oleg,

On 8/19/2025 9:40 PM, Oleg Nesterov wrote:
> p9_read_work() doesn't set Rworksched and doesn't do schedule_work(m->rq)
> if list_empty(&m->req_list).
> 
> However, if the pipe is full, we need to read more data and this used to
> work prior to commit aaec5a95d59615 ("pipe_read: don't wake up the writer
> if the pipe is still full").
> 
> p9_read_work() does p9_fd_read() -> ... -> anon_pipe_read() which (before
> the commit above) triggered the unnecessary wakeup. This wakeup calls
> p9_pollwake() which kicks p9_poll_workfn() -> p9_poll_mux(), p9_poll_mux()
> will notice EPOLLIN and schedule_work(&m->rq).
> 
> This no longer happens after the optimization above, change p9_fd_request()
> to use p9_poll_mux() instead of only checking for EPOLLOUT.
> 
> Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
> Tested-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68a2de8f.050a0220.e29e5.0097.GAE@google.com/
> Link: https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/
> Co-developed-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>

A "Debugged-by:" or equivalent would have been fine too since you did
most of the heavy lifting by finding p9_poll_mux() but I don't mind
standing behind this since it is doing the right thing :)

I tested this on top of v6.17-rc2 and the upstream runs into a hang
instantly with the syzbot's reproducer. The dmesg logs:

    INFO: task repro:4150 blocked for more than 120 seconds.
          Not tainted 6.17.0-rc2-upstream #34
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:repro           state:D stack:0     pid:4150  tgid:4150  ppid:1      task_flags:0x400140 flags:0x00004006
    Call Trace:
     <TASK>
     __schedule+0x474/0x1620
     ? __wb_update_bandwidth+0x37/0x1d0
     schedule+0x27/0xd0
     io_schedule+0x46/0x70
     folio_wait_bit_common+0x112/0x300
     ? filemap_get_folios_tag+0x232/0x2a0
     ? __pfx_wake_page_function+0x10/0x10
     folio_wait_writeback+0x2b/0x80
     __filemap_fdatawait_range+0x7c/0xe0
     file_write_and_wait_range+0x89/0xb0
     v9fs_file_fsync+0x2d/0x90 [9p]
     netfs_file_write_iter+0xec/0x120 [netfs]
     vfs_write+0x305/0x420
     ksys_write+0x65/0xe0
     do_syscall_64+0x85/0xb30
     ? do_syscall_64+0x223/0xb30
     ? count_memcg_events+0xd9/0x1c0
     ? handle_mm_fault+0x1af/0x290
     ? do_user_addr_fault+0x2d0/0x8c0
     entry_SYSCALL_64_after_hwframe+0x76/0x7e
    RIP: 0033:0x7f3b26d1e88d
    RSP: 002b:00007ffe581fa348 EFLAGS: 00000213 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b26d1e88d
    RDX: 0000000000007fec RSI: 0000200000000300 RDI: 0000000000000007
    RBP: 00007ffe581fa360 R08: 00007ffe581fa360 R09: 00007ffe581fa360
    R10: 00007ffe581fa360 R11: 0000000000000213 R12: 00007ffe581fa4b8
    R13: 0000558168a6de12 R14: 0000558168a6fd10 R15: 00007f3b26f03040
     </TASK>

With this patch applied on top, I haven't seen a hang yet and I've been
running it for 30min now so feel free to also include:

Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  net/9p/trans_fd.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 339ec4e54778..474fe67f72ac 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -666,7 +666,6 @@ static void p9_poll_mux(struct p9_conn *m)
>  
>  static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
>  {
> -	__poll_t n;
>  	int err;
>  	struct p9_trans_fd *ts = client->trans;
>  	struct p9_conn *m = &ts->conn;
> @@ -686,13 +685,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
>  	list_add_tail(&req->req_list, &m->unsent_req_list);
>  	spin_unlock(&m->req_lock);
>  
> -	if (test_and_clear_bit(Wpending, &m->wsched))
> -		n = EPOLLOUT;
> -	else
> -		n = p9_fd_poll(m->client, NULL, NULL);
> -
> -	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
> -		schedule_work(&m->wq);
> +	p9_poll_mux(m);
>  
>  	return 0;
>  }

-- 
Thanks and Regards,
Prateek



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-08-20  6:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-18  8:04 [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync syzbot
2025-08-18 11:44 ` Oleg Nesterov
2025-08-18 12:36   ` syzbot
2025-08-18 12:56     ` Oleg Nesterov
2025-08-18 21:55       ` Dominique Martinet
2025-08-18 13:04     ` Oleg Nesterov
2025-08-19 16:10 ` [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN Oleg Nesterov
2025-08-19 16:13   ` Oleg Nesterov
2025-08-20  6:29   ` K Prateek Nayak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).