* [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync
@ 2025-08-18 8:04 syzbot
2025-08-18 11:44 ` Oleg Nesterov
2025-08-19 16:10 ` [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN Oleg Nesterov
0 siblings, 2 replies; 9+ messages in thread
From: syzbot @ 2025-08-18 8:04 UTC (permalink / raw)
To: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
linux-fsdevel, linux-kernel, linux-mm, oleg, syzkaller-bugs, viro,
willy
Hello,
syzbot found the following issue on:
HEAD commit: 038d61fd6422 Linux 6.16
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=15f5a234580000
kernel config: https://syzkaller.appspot.com/x/.config?x=515ec0b49771bcd1
dashboard link: https://syzkaller.appspot.com/bug?extid=d1b5dace43896bc386c3
compiler: Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=158063a2580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1335d3a2580000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/34e894532715/disk-038d61fd.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/b6a27a46b9dc/vmlinux-038d61fd.xz
kernel image: https://storage.googleapis.com/syzbot-assets/f97a9c8d8216/bzImage-038d61fd.xz
The issue was bisected to:
commit aaec5a95d59615523db03dd53c2052f0a87beea7
Author: Oleg Nesterov <oleg@redhat.com>
Date: Thu Jan 2 14:07:15 2025 +0000
pipe_read: don't wake up the writer if the pipe is still full
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1498e3a2580000
final oops: https://syzkaller.appspot.com/x/report.txt?x=1698e3a2580000
console output: https://syzkaller.appspot.com/x/log.txt?x=1298e3a2580000
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
Fixes: aaec5a95d596 ("pipe_read: don't wake up the writer if the pipe is still full")
INFO: task syz-executor224:5849 blocked for more than 143 seconds.
Not tainted 6.16.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor224 state:D stack:22952 pid:5849 tgid:5849 ppid:5848 task_flags:0x400140 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5397 [inline]
__schedule+0x16aa/0x4c90 kernel/sched/core.c:6786
__schedule_loop kernel/sched/core.c:6864 [inline]
schedule+0x165/0x360 kernel/sched/core.c:6879
io_schedule+0x81/0xe0 kernel/sched/core.c:7724
folio_wait_bit_common+0x6b0/0xb90 mm/filemap.c:1317
folio_wait_writeback+0xb0/0x100 mm/page-writeback.c:3126
__filemap_fdatawait_range+0x147/0x230 mm/filemap.c:539
file_write_and_wait_range+0x275/0x330 mm/filemap.c:798
v9fs_file_fsync+0xcf/0x1a0 fs/9p/vfs_file.c:418
generic_write_sync include/linux/fs.h:3031 [inline]
netfs_file_write_iter+0x3d8/0x4a0 fs/netfs/buffered_write.c:494
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0x54b/0xa90 fs/read_write.c:686
ksys_write+0x145/0x250 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb29049bef9
RSP: 002b:00007ffeb3361588 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000200000000140 RCX: 00007fb29049bef9
RDX: 0000000000007fec RSI: 0000200000000300 RDI: 0000000000000007
RBP: 0030656c69662f2e R08: 0000000000000006 R09: 0000000000000006
R10: 0000000000000006 R11: 0000000000000246 R12: 0000200000000180
R13: 00007fb2904e504e R14: 0000000000000001 R15: 0000000000000001
</TASK>
Showing all locks held in the system:
2 locks held by kworker/u8:0/12:
1 lock held by khungtaskd/31:
#0: ffffffff8e13f0e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
#0: ffffffff8e13f0e0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline]
#0: ffffffff8e13f0e0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x2e/0x180 kernel/locking/lockdep.c:6770
2 locks held by kworker/u8:6/1337:
#0: ffff88801a489148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3213 [inline]
#0: ffff88801a489148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x9b4/0x17b0 kernel/workqueue.c:3321
#1: ffffc9000451fbc0 ((work_completion)(&rreq->work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3214 [inline]
#1: ffffc9000451fbc0 ((work_completion)(&rreq->work)){+.+.}-{0:0}, at: process_scheduled_works+0x9ef/0x17b0 kernel/workqueue.c:3321
2 locks held by getty/5596:
#0: ffff88803095f0a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
#1: ffffc900036cb2f0 (&ldata->atomic_read_lock){+.+.}-{4:4}, at: n_tty_read+0x43e/0x1400 drivers/tty/n_tty.c:2222
1 lock held by syz-executor224/5849:
#0: ffff88807f8cc428 (sb_writers#8){.+.+}-{0:0}, at: file_start_write include/linux/fs.h:3096 [inline]
#0: ffff88807f8cc428 (sb_writers#8){.+.+}-{0:0}, at: vfs_write+0x211/0xa90 fs/read_write.c:682
=============================================
NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 31 Comm: khungtaskd Not tainted 6.16.0-syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
nmi_cpu_backtrace+0x39e/0x3d0 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x17a/0x300 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:158 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:307 [inline]
watchdog+0xfee/0x1030 kernel/hung_task.c:470
kthread+0x70e/0x8a0 kernel/kthread.c:464
ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.16.0-syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:pv_native_safe_halt+0x13/0x20 arch/x86/kernel/paravirt.c:82
Code: 53 de 02 00 cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d d3 ad 21 00 f3 0f 1e fa fb f4 <c3> cc cc cc cc cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffffff8de07d80 EFLAGS: 000002c2
RAX: eefad1cde067ed00 RBX: ffffffff81976918 RCX: eefad1cde067ed00
RDX: 0000000000000001 RSI: ffffffff8d982fba RDI: ffffffff8be1ba40
RBP: ffffffff8de07ea8 R08: ffff8880b8632f5b R09: 1ffff110170c65eb
R10: dffffc0000000000 R11: ffffed10170c65ec R12: ffffffff8fa0b3f0
R13: 0000000000000000 R14: 0000000000000000 R15: 1ffffffff1bd2a50
FS: 0000000000000000(0000) GS:ffff888125c57000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055943a295660 CR3: 000000000df38000 CR4: 00000000003526f0
Call Trace:
<TASK>
arch_safe_halt arch/x86/include/asm/paravirt.h:107 [inline]
default_idle+0x13/0x20 arch/x86/kernel/process.c:749
default_idle_call+0x74/0xb0 kernel/sched/idle.c:117
cpuidle_idle_call kernel/sched/idle.c:185 [inline]
do_idle+0x1e8/0x510 kernel/sched/idle.c:325
cpu_startup_entry+0x44/0x60 kernel/sched/idle.c:423
rest_init+0x2de/0x300 init/main.c:745
start_kernel+0x47d/0x500 init/main.c:1102
x86_64_start_reservations+0x24/0x30 arch/x86/kernel/head64.c:307
x86_64_start_kernel+0x143/0x1c0 arch/x86/kernel/head64.c:288
common_startup_64+0x13e/0x147
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync
2025-08-18 8:04 [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync syzbot
@ 2025-08-18 11:44 ` Oleg Nesterov
2025-08-18 12:36 ` syzbot
2025-08-19 16:10 ` [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN Oleg Nesterov
1 sibling, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2025-08-18 11:44 UTC (permalink / raw)
To: syzbot
Cc: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
linux-fsdevel, linux-kernel, linux-mm, syzkaller-bugs, viro,
willy
On 08/18, syzbot wrote:
>
> HEAD commit: 038d61fd6422 Linux 6.16
#syz test: upstream 038d61fd6422
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 339ec4e54778..474fe67f72ac 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -666,7 +666,6 @@ static void p9_poll_mux(struct p9_conn *m)
static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
{
- __poll_t n;
int err;
struct p9_trans_fd *ts = client->trans;
struct p9_conn *m = &ts->conn;
@@ -686,13 +685,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
list_add_tail(&req->req_list, &m->unsent_req_list);
spin_unlock(&m->req_lock);
- if (test_and_clear_bit(Wpending, &m->wsched))
- n = EPOLLOUT;
- else
- n = p9_fd_poll(m->client, NULL, NULL);
-
- if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
- schedule_work(&m->wq);
+ p9_poll_mux(m);
return 0;
}
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync
2025-08-18 11:44 ` Oleg Nesterov
@ 2025-08-18 12:36 ` syzbot
2025-08-18 12:56 ` Oleg Nesterov
2025-08-18 13:04 ` Oleg Nesterov
0 siblings, 2 replies; 9+ messages in thread
From: syzbot @ 2025-08-18 12:36 UTC (permalink / raw)
To: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
linux-fsdevel, linux-kernel, linux-mm, oleg, syzkaller-bugs, viro,
willy
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger any issue:
Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
Tested-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
Tested on:
commit: 038d61fd Linux 6.16
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1317eba2580000
kernel config: https://syzkaller.appspot.com/x/.config?x=515ec0b49771bcd1
dashboard link: https://syzkaller.appspot.com/bug?extid=d1b5dace43896bc386c3
compiler: Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
patch: https://syzkaller.appspot.com/x/patch.diff?x=15806442580000
Note: testing is done by a robot and is best-effort only.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync
2025-08-18 12:36 ` syzbot
@ 2025-08-18 12:56 ` Oleg Nesterov
2025-08-18 21:55 ` Dominique Martinet
2025-08-18 13:04 ` Oleg Nesterov
1 sibling, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2025-08-18 12:56 UTC (permalink / raw)
To: syzbot, David Howells, Dominique Martinet, K Prateek Nayak
Cc: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
linux-fsdevel, linux-kernel, linux-mm, syzkaller-bugs, viro,
willy
On 08/18, syzbot wrote:
>
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger any issue:
>
> Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
> Tested-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
>
> Tested on:
>
> commit: 038d61fd Linux 6.16
And trans_fd.c wasn't changed since 038d61fd...
Dominique, David,
Perhaps you can reconsider the fix that Prateek and I tried to propose
in this thread
[syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/
Oleg.
---
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 339ec4e54778..474fe67f72ac 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -666,7 +666,6 @@ static void p9_poll_mux(struct p9_conn *m)
static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
{
- __poll_t n;
int err;
struct p9_trans_fd *ts = client->trans;
struct p9_conn *m = &ts->conn;
@@ -686,13 +685,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
list_add_tail(&req->req_list, &m->unsent_req_list);
spin_unlock(&m->req_lock);
- if (test_and_clear_bit(Wpending, &m->wsched))
- n = EPOLLOUT;
- else
- n = p9_fd_poll(m->client, NULL, NULL);
-
- if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
- schedule_work(&m->wq);
+ p9_poll_mux(m);
return 0;
}
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync
2025-08-18 12:36 ` syzbot
2025-08-18 12:56 ` Oleg Nesterov
@ 2025-08-18 13:04 ` Oleg Nesterov
1 sibling, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2025-08-18 13:04 UTC (permalink / raw)
To: syzbot, David Howells, Dominique Martinet, K Prateek Nayak
Cc: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
linux-fsdevel, linux-kernel, linux-mm, syzkaller-bugs, viro,
willy, Eric Van Hensbergen, Latchesar Ionkov, v9fs
On 08/18, syzbot wrote:
>
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger any issue:
>
> Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
> Tested-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
>
> Tested on:
>
> commit: 038d61fd Linux 6.16
And trans_fd.c wasn't changed since 038d61fd...
Dominique, David,
Perhaps you can reconsider the fix that Prateek and I tried to propose
in this thread
[syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/
Oleg.
---
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 339ec4e54778..474fe67f72ac 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -666,7 +666,6 @@ static void p9_poll_mux(struct p9_conn *m)
static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
{
- __poll_t n;
int err;
struct p9_trans_fd *ts = client->trans;
struct p9_conn *m = &ts->conn;
@@ -686,13 +685,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
list_add_tail(&req->req_list, &m->unsent_req_list);
spin_unlock(&m->req_lock);
- if (test_and_clear_bit(Wpending, &m->wsched))
- n = EPOLLOUT;
- else
- n = p9_fd_poll(m->client, NULL, NULL);
-
- if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
- schedule_work(&m->wq);
+ p9_poll_mux(m);
return 0;
}
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync
2025-08-18 12:56 ` Oleg Nesterov
@ 2025-08-18 21:55 ` Dominique Martinet
0 siblings, 0 replies; 9+ messages in thread
From: Dominique Martinet @ 2025-08-18 21:55 UTC (permalink / raw)
To: Oleg Nesterov
Cc: syzbot, David Howells, K Prateek Nayak, akpm, brauner, dvyukov,
elver, glider, jack, kasan-dev, linux-fsdevel, linux-kernel,
linux-mm, syzkaller-bugs, viro, willy
Hi Oleg,
Oleg Nesterov wrote on Mon, Aug 18, 2025 at 02:56:26PM +0200:
> On 08/18, syzbot wrote:
> > syzbot has tested the proposed patch and the reproducer did not trigger any issue:
(I hate that syzbot identified "hung in v9fs_file_fsync" but doesn't
bother to Cc 9p folks... all the time..)
> Dominique, David,
>
> Perhaps you can reconsider the fix that Prateek and I tried to propose
> in this thread
>
> [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
> https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/
I've re-read that thread, and I still think this must be a problem
specific to syzbot doing obviously bogus things (e.g. replying before
request, or whatever it is this particular repro is doing), but I guess
your patch is also sane enough and the 9p optimization is probably not
really needed here
Please resend as a proper patch, and I'll just run some quick check (and
a trivial benchmark) and pick it up
Thanks,
--
Dominique Martinet | Asmadeus
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN
2025-08-18 8:04 [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync syzbot
2025-08-18 11:44 ` Oleg Nesterov
@ 2025-08-19 16:10 ` Oleg Nesterov
2025-08-19 16:13 ` Oleg Nesterov
2025-08-20 6:29 ` K Prateek Nayak
1 sibling, 2 replies; 9+ messages in thread
From: Oleg Nesterov @ 2025-08-19 16:10 UTC (permalink / raw)
To: Dominique Martinet, K Prateek Nayak, syzbot
Cc: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
linux-fsdevel, linux-kernel, linux-mm, syzkaller-bugs, viro,
willy, v9fs, David Howells
p9_read_work() doesn't set Rworksched and doesn't do schedule_work(m->rq)
if list_empty(&m->req_list).
However, if the pipe is full, we need to read more data and this used to
work prior to commit aaec5a95d59615 ("pipe_read: don't wake up the writer
if the pipe is still full").
p9_read_work() does p9_fd_read() -> ... -> anon_pipe_read() which (before
the commit above) triggered the unnecessary wakeup. This wakeup calls
p9_pollwake() which kicks p9_poll_workfn() -> p9_poll_mux(), p9_poll_mux()
will notice EPOLLIN and schedule_work(&m->rq).
This no longer happens after the optimization above, change p9_fd_request()
to use p9_poll_mux() instead of only checking for EPOLLOUT.
Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
Tested-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68a2de8f.050a0220.e29e5.0097.GAE@google.com/
Link: https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/
Co-developed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
net/9p/trans_fd.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 339ec4e54778..474fe67f72ac 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -666,7 +666,6 @@ static void p9_poll_mux(struct p9_conn *m)
static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
{
- __poll_t n;
int err;
struct p9_trans_fd *ts = client->trans;
struct p9_conn *m = &ts->conn;
@@ -686,13 +685,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
list_add_tail(&req->req_list, &m->unsent_req_list);
spin_unlock(&m->req_lock);
- if (test_and_clear_bit(Wpending, &m->wsched))
- n = EPOLLOUT;
- else
- n = p9_fd_poll(m->client, NULL, NULL);
-
- if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
- schedule_work(&m->wq);
+ p9_poll_mux(m);
return 0;
}
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN
2025-08-19 16:10 ` [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN Oleg Nesterov
@ 2025-08-19 16:13 ` Oleg Nesterov
2025-08-20 6:29 ` K Prateek Nayak
1 sibling, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2025-08-19 16:13 UTC (permalink / raw)
To: Dominique Martinet, K Prateek Nayak, syzbot
Cc: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
linux-fsdevel, linux-kernel, linux-mm, syzkaller-bugs, viro,
willy, v9fs, David Howells
On 08/19, Oleg Nesterov wrote:
>
> Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
> Tested-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68a2de8f.050a0220.e29e5.0097.GAE@google.com/
> Link: https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/
> Co-developed-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Prateek, I turned your "Reviewed-by" from the previous discussion
https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/
into Co-developed-by + Signed-off-by, I hope you won't object?
Oleg.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN
2025-08-19 16:10 ` [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN Oleg Nesterov
2025-08-19 16:13 ` Oleg Nesterov
@ 2025-08-20 6:29 ` K Prateek Nayak
1 sibling, 0 replies; 9+ messages in thread
From: K Prateek Nayak @ 2025-08-20 6:29 UTC (permalink / raw)
To: Oleg Nesterov, Dominique Martinet, syzbot
Cc: akpm, brauner, dvyukov, elver, glider, jack, kasan-dev,
linux-fsdevel, linux-kernel, linux-mm, syzkaller-bugs, viro,
willy, v9fs, David Howells
Hello Oleg,
On 8/19/2025 9:40 PM, Oleg Nesterov wrote:
> p9_read_work() doesn't set Rworksched and doesn't do schedule_work(m->rq)
> if list_empty(&m->req_list).
>
> However, if the pipe is full, we need to read more data and this used to
> work prior to commit aaec5a95d59615 ("pipe_read: don't wake up the writer
> if the pipe is still full").
>
> p9_read_work() does p9_fd_read() -> ... -> anon_pipe_read() which (before
> the commit above) triggered the unnecessary wakeup. This wakeup calls
> p9_pollwake() which kicks p9_poll_workfn() -> p9_poll_mux(), p9_poll_mux()
> will notice EPOLLIN and schedule_work(&m->rq).
>
> This no longer happens after the optimization above, change p9_fd_request()
> to use p9_poll_mux() instead of only checking for EPOLLOUT.
>
> Reported-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
> Tested-by: syzbot+d1b5dace43896bc386c3@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68a2de8f.050a0220.e29e5.0097.GAE@google.com/
> Link: https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/
> Co-developed-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
A "Debugged-by:" or equivalent would have been fine too since you did
most of the heavy lifting by finding p9_poll_mux() but I don't mind
standing behind this since it is doing the right thing :)
I tested this on top of v6.17-rc2 and the upstream runs into a hang
instantly with the syzbot's reproducer. The dmesg logs:
INFO: task repro:4150 blocked for more than 120 seconds.
Not tainted 6.17.0-rc2-upstream #34
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:repro state:D stack:0 pid:4150 tgid:4150 ppid:1 task_flags:0x400140 flags:0x00004006
Call Trace:
<TASK>
__schedule+0x474/0x1620
? __wb_update_bandwidth+0x37/0x1d0
schedule+0x27/0xd0
io_schedule+0x46/0x70
folio_wait_bit_common+0x112/0x300
? filemap_get_folios_tag+0x232/0x2a0
? __pfx_wake_page_function+0x10/0x10
folio_wait_writeback+0x2b/0x80
__filemap_fdatawait_range+0x7c/0xe0
file_write_and_wait_range+0x89/0xb0
v9fs_file_fsync+0x2d/0x90 [9p]
netfs_file_write_iter+0xec/0x120 [netfs]
vfs_write+0x305/0x420
ksys_write+0x65/0xe0
do_syscall_64+0x85/0xb30
? do_syscall_64+0x223/0xb30
? count_memcg_events+0xd9/0x1c0
? handle_mm_fault+0x1af/0x290
? do_user_addr_fault+0x2d0/0x8c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f3b26d1e88d
RSP: 002b:00007ffe581fa348 EFLAGS: 00000213 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b26d1e88d
RDX: 0000000000007fec RSI: 0000200000000300 RDI: 0000000000000007
RBP: 00007ffe581fa360 R08: 00007ffe581fa360 R09: 00007ffe581fa360
R10: 00007ffe581fa360 R11: 0000000000000213 R12: 00007ffe581fa4b8
R13: 0000558168a6de12 R14: 0000558168a6fd10 R15: 00007f3b26f03040
</TASK>
With this patch applied on top, I haven't seen a hang yet and I've been
running it for 30min now so feel free to also include:
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
> net/9p/trans_fd.c | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 339ec4e54778..474fe67f72ac 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -666,7 +666,6 @@ static void p9_poll_mux(struct p9_conn *m)
>
> static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
> {
> - __poll_t n;
> int err;
> struct p9_trans_fd *ts = client->trans;
> struct p9_conn *m = &ts->conn;
> @@ -686,13 +685,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
> list_add_tail(&req->req_list, &m->unsent_req_list);
> spin_unlock(&m->req_lock);
>
> - if (test_and_clear_bit(Wpending, &m->wsched))
> - n = EPOLLOUT;
> - else
> - n = p9_fd_poll(m->client, NULL, NULL);
> -
> - if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
> - schedule_work(&m->wq);
> + p9_poll_mux(m);
>
> return 0;
> }
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-08-20 6:29 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-18 8:04 [syzbot] [fs?] [mm?] INFO: task hung in v9fs_file_fsync syzbot
2025-08-18 11:44 ` Oleg Nesterov
2025-08-18 12:36 ` syzbot
2025-08-18 12:56 ` Oleg Nesterov
2025-08-18 21:55 ` Dominique Martinet
2025-08-18 13:04 ` Oleg Nesterov
2025-08-19 16:10 ` [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN Oleg Nesterov
2025-08-19 16:13 ` Oleg Nesterov
2025-08-20 6:29 ` K Prateek Nayak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).