Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write

public inbox for v9fs@lists.linux.dev
 help / color / mirror / Atom feed

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
       [not found]                 ` <ff294b3c-cd24-4aa6-9d03-718ff7087158@amd.com>
@ 2025-03-25 12:15                   ` Oleg Nesterov
  2025-03-25 12:36                     ` Dominique Martinet
  2025-03-25 13:04                     ` Oleg Nesterov
  0 siblings, 2 replies; 27+ messages in thread
From: Oleg Nesterov @ 2025-03-25 12:15 UTC (permalink / raw)
  To: K Prateek Nayak, Eric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet, Christian Schoenebeck
  Cc: Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton,
	linux-fsdevel, linux-kernel, netfs, swapnil.sapkal,
	syzkaller-bugs, viro, v9fs

Prateek, thanks again for your hard work!

Yes, I think we need the help from 9p folks. I know nothing about it, but
so far this still looks as a 9p problem to me...

All I can say right now is that the "sigpending" logic in p9_client_rpc()
looks wrong. If nothing else:

	- clear_thread_flag(TIF_SIGPENDING) is not enough, it won't make
	  signal_pending() false if TIF_NOTIFY_SIGNAL is set.

	- otoh, if signal_pending() was true because of pending SIGKILL,
	  then after clear_thread_flag(TIF_SIGPENDING) wait_event_killable()
	  will act as uninterruptible wait_event().

Oleg.

On 03/25, K Prateek Nayak wrote:
>
> On 3/24/2025 8:22 PM, K Prateek Nayak wrote:
> >Hello folks,
> >
> >Some updates.
> >
> >On 3/24/2025 6:49 PM, K Prateek Nayak wrote:
> >>>
> >>>Per syzbot this attempt did not work out either.
> >>>
> >>>I think the blind stabs taken by everyone here are enough.
> >>>
> >>>The report does not provide the crucial bit: what are the other
> >>>threads doing. Presumably someone else is stuck somewhere, possibly
> >>>not even in pipe code and that stuck thread was supposed to wake up
> >>>the one which trips over hung task detector. Figuring out what that
> >>>thread is imo the next step.
> >>>
> >>>I failed to find a relevant command in
> >>>https://github.com/google/syzkaller/blob/master/docs/syzbot.md
> >>>
> >>>So if you guys know someone on syzkaller side, maybe you can ask them
> >>>to tweak the report *or* maybe syzbot can test a "fix" which makes
> >>>hung task detector also report all backtraces? I don't know if that
> >>>can work, the output may be long enough that it will get trimmed by
> >>>something.
> >>>
> >>>I don't have to time work on this for now, just throwing ideas.
> >>
> >>I got the reproducer running locally. Tracing stuff currently to see
> >>what is tripping. Will report back once I find something interesting.
> >>Might take a while since the 9p bits are so far spread out.
> >>
> >
> >So far, with tracing, this is where I'm:
> >
> >o Mainline + Oleg's optimization reverted:
> >
> >     ...
> >     kworker/43:1-1723    [043] .....   115.309065: p9_read_work: Data read wait 55
> >     kworker/43:1-1723    [043] .....   115.309066: p9_read_work: Data read 55
> >     kworker/43:1-1723    [043] .....   115.309067: p9_read_work: Data read wait 7
> >     kworker/43:1-1723    [043] .....   115.309068: p9_read_work: Data read 7
> >            repro-4138    [043] .....   115.309084: netfs_wake_write_collector: Wake collector
> >            repro-4138    [043] .....   115.309085: netfs_wake_write_collector: Queuing collector work
> >            repro-4138    [043] .....   115.309088: netfs_unbuffered_write: netfs_unbuffered_write
> >            repro-4138    [043] .....   115.309088: netfs_end_issue_write: netfs_end_issue_write
> >            repro-4138    [043] .....   115.309089: netfs_end_issue_write: Write collector need poke 0
> >            repro-4138    [043] .....   115.309091: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
> >  kworker/u1030:1-1951    [168] .....   115.309096: netfs_wake_write_collector: Wake collector
> >  kworker/u1030:1-1951    [168] .....   115.309097: netfs_wake_write_collector: Queuing collector work
> >  kworker/u1030:1-1951    [168] .....   115.309102: netfs_write_collection_worker: Write collect clearing and waking up!
> >     ... (syzbot reproducer continues)
> >
> >o Mainline:
> >
> >    kworker/185:1-1767    [185] .....   109.485961: p9_read_work: Data read wait 7
> >    kworker/185:1-1767    [185] .....   109.485962: p9_read_work: Data read 7
> >    kworker/185:1-1767    [185] .....   109.485962: p9_read_work: Data read wait 55
> >    kworker/185:1-1767    [185] .....   109.485963: p9_read_work: Data read 55
> >            repro-4038    [185] .....   114.225717: netfs_wake_write_collector: Wake collector
> >            repro-4038    [185] .....   114.225723: netfs_wake_write_collector: Queuing collector work
> >            repro-4038    [185] .....   114.225727: netfs_unbuffered_write: netfs_unbuffered_write
> >            repro-4038    [185] .....   114.225727: netfs_end_issue_write: netfs_end_issue_write
> >            repro-4038    [185] .....   114.225728: netfs_end_issue_write: Write collector need poke 0
> >            repro-4038    [185] .....   114.225728: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
> >    ... (syzbot reproducer hangs)
> >
> >There is a third "kworker/u1030" component that never gets woken up for
> >reasons currently unknown to me with Oleg's optimization. I'll keep
> >digging.
>
> More data ...
>
> I chased this down to p9_client_rpc() net/9p/client.c specifically:
>
>         err = c->trans_mod->request(c, req);
>         if (err < 0) {
>                 /* write won't happen */
>                 p9_req_put(c, req);
>                 if (err != -ERESTARTSYS && err != -EFAULT)
>                         c->status = Disconnected;
>                 goto recalc_sigpending;
>         }
>
> c->trans_mod->request() calls p9_fd_request() in net/9p/trans_fd.c
> which basically does a p9_fd_poll().
>
> Previously, the above would fail with err as -EIO which would
> cause the client to "Disconnect" and the retry logic would make
> progress. Now however, the err returned is -ERESTARTSYS which
> will not cause a disconnect and the retry logic will hang
> somewhere in p9_client_rpc() later.
>
> I'll chase it a little more but if 9p folks can chime in it would
> be great since I'm out of my depths here.
>
> P.S. There are more interactions at play and I'm trying to still
> piece them together.
>
> Relevant traces:
>
> o Mainline + Oleg's optimization reverted:
>
>            repro-4161    [239] .....   107.785644: p9_client_write: p9_client_rpc done
>            repro-4161    [239] .....   107.785644: p9_client_write: p9_pdup
>            repro-4161    [239] .....   107.785644: p9_client_write: iter revert
>            repro-4161    [239] .....   107.785644: p9_client_write: p9_client_rpc
>            repro-4161    [239] .....   107.785653: p9_fd_request: p9_fd_request
>            repro-4161    [239] ...1.   107.785653: p9_fd_request: p9_fd_request error
>            repro-4161    [239] .....   107.785653: p9_client_rpc: Client disconnected (no write) <------------- "write won't happen" case
>            repro-4161    [239] .....   107.785655: p9_client_write: p9_client_rpc done
>            repro-4161    [239] .....   107.785655: p9_client_write: p9_client_rpc error (-5)     <------------- -EIO
>            repro-4161    [239] .....   107.785656: v9fs_issue_write: Issue write done 2 err(-5)
>            repro-4161    [239] .....   107.785657: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated
>            repro-4161    [239] .....   107.785657: netfs_wake_write_collector: Wake collector
>            repro-4161    [239] .....   107.785658: netfs_wake_write_collector: Queuing collector work
>            repro-4161    [239] .....   107.785660: v9fs_issue_write: Issue write subrequest terminated 2
>            repro-4161    [239] .....   107.785661: netfs_unbuffered_write: netfs_unbuffered_write
>            repro-4161    [239] .....   107.785661: netfs_end_issue_write: netfs_end_issue_write
>            repro-4161    [239] .....   107.785662: netfs_end_issue_write: Write collector need poke 0
>            repro-4161    [239] .....   107.785662: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
>  kworker/u1038:0-1583    [105] .....   107.785667: netfs_retry_writes: netfs_reissue_write 1
>  kworker/u1038:0-1583    [105] .....   107.785668: v9fs_issue_write: v9fs_issue_write
>  kworker/u1038:0-1583    [105] .....   107.785669: p9_client_write: p9_client_rpc
>  kworker/u1038:0-1583    [105] .....   107.785669: p9_client_prepare_req: p9_client_prepare_req eio 1
>  kworker/u1038:0-1583    [105] .....   107.785670: p9_client_rpc: p9_client_rpc early err return
>  kworker/u1038:0-1583    [105] .....   107.785670: p9_client_write: p9_client_rpc done
>  kworker/u1038:0-1583    [105] .....   107.785671: p9_client_write: p9_client_rpc error (-5)
>  kworker/u1038:0-1583    [105] .....   107.785672: v9fs_issue_write: Issue write done 0 err(-5)
>  kworker/u1038:0-1583    [105] .....   107.785672: netfs_write_subrequest_terminated: Collector woken up
>  kworker/u1038:0-1583    [105] .....   107.785672: netfs_wake_write_collector: Wake collector
>  kworker/u1038:0-1583    [105] .....   107.785672: netfs_wake_write_collector: Queuing collector work
>  kworker/u1038:0-1583    [105] .....   107.785677: v9fs_issue_write: Issue write subrequest terminated 0
>  kworker/u1038:0-1583    [105] .....   107.785684: netfs_write_collection_worker: Write collect clearing and waking up!
>            repro-4161    [239] .....   107.785883: p9_client_prepare_req: p9_client_prepare_req eio 1
>            ... (continues)
>
> o Mainline:
>
>            repro-4161    [087] .....   123.474660: p9_client_write: p9_client_rpc done
>            repro-4161    [087] .....   123.474661: p9_client_write: p9_pdup
>            repro-4161    [087] .....   123.474661: p9_client_write: iter revert
>            repro-4161    [087] .....   123.474661: p9_client_write: p9_client_rpc
>            repro-4161    [087] .....   123.474672: p9_fd_request: p9_fd_request
>            repro-4161    [087] .....   123.474673: p9_fd_request: p9_fd_request EPOLL
>            repro-4161    [087] .....   123.474673: p9_fd_poll: p9_fd_poll rd poll
>            repro-4161    [087] .....   123.474674: p9_fd_poll: p9_fd_request wr poll
>            repro-4161    [087] .....   128.233025: p9_client_write: p9_client_rpc done
>            repro-4161    [087] .....   128.233033: p9_client_write: p9_client_rpc error (-512)     <------------- -ERESTARTSYS
>            repro-4161    [087] .....   128.233034: v9fs_issue_write: Issue write done 2 err(-512)
>            repro-4161    [087] .....   128.233035: netfs_write_subrequest_terminated: Collector woken
>            repro-4161    [087] .....   128.233036: netfs_wake_write_collector: Wake collector
>            repro-4161    [087] .....   128.233036: netfs_wake_write_collector: Queuing collector work
>            repro-4161    [087] .....   128.233040: v9fs_issue_write: Issue write subrequest terminated 2
>            repro-4161    [087] .....   128.233040: netfs_unbuffered_write: netfs_unbuffered_write
>            repro-4161    [087] .....   128.233040: netfs_end_issue_write: netfs_end_issue_write
>            repro-4161    [087] .....   128.233041: netfs_end_issue_write: Write collector need poke 0
>            repro-4161    [087] .....   128.233041: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
>  kworker/u1035:0-1580    [080] .....   128.233077: netfs_retry_writes: netfs_reissue_write 1
>  kworker/u1035:0-1580    [080] .....   128.233078: v9fs_issue_write: v9fs_issue_write
>  kworker/u1035:0-1580    [080] .....   128.233079: p9_client_write: p9_client_rpc
>  kworker/u1035:0-1580    [080] .....   128.233099: p9_fd_request: p9_fd_request
>  kworker/u1035:0-1580    [080] .....   128.233101: p9_fd_request: p9_fd_request EPOLL
>  kworker/u1035:0-1580    [080] .....   128.233101: p9_fd_poll: p9_fd_poll rd poll
>  kworker/u1035:0-1580    [080] .....   128.233101: p9_fd_poll: p9_fd_request wr poll
>            (Hangs)
>
> --
> Thanks and Regards,
> Prateek
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-25 12:15                   ` [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter Oleg Nesterov
@ 2025-03-25 12:36                     ` Dominique Martinet
  2025-03-25 13:04                     ` Oleg Nesterov
  1 sibling, 0 replies; 27+ messages in thread
From: Dominique Martinet @ 2025-03-25 12:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: K Prateek Nayak, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Mateusz Guzik, syzbot, brauner, dhowells,
	jack, jlayton, linux-fsdevel, linux-kernel, netfs, swapnil.sapkal,
	syzkaller-bugs, viro, v9fs

Thanks for the Cc

Just replying quickly without looking at anything

Oleg Nesterov wrote on Tue, Mar 25, 2025 at 01:15:26PM +0100:
> All I can say right now is that the "sigpending" logic in p9_client_rpc()
> looks wrong. If nothing else:
> 
> 	- clear_thread_flag(TIF_SIGPENDING) is not enough, it won't make
> 	  signal_pending() false if TIF_NOTIFY_SIGNAL is set.
> 
> 	- otoh, if signal_pending() was true because of pending SIGKILL,
> 	  then after clear_thread_flag(TIF_SIGPENDING) wait_event_killable()
> 	  will act as uninterruptible wait_event().

Yeah, this is effectively an unkillable event loop once a flush has been
sent; this is a known issue.
I've tried to address this with async rpc (so we could send the flush
and forget about it), but that caused other regressions and I never had
time to dig into these...

The patches date back 2018 and probably won't even apply cleanly
anymore, but if anyone cares they are here:
https://lore.kernel.org/all/1544532108-21689-3-git-send-email-asmadeus@codewreck.org/T/#u

(the hard work of refcounting was done just before that in order to kill
this pattern, I just pretty much ran out of free time at that point,
hobbies are hard...)

So: sorry, it's probably possible to improve this, but it won't be easy
nor immediate.

> > c->trans_mod->request() calls p9_fd_request() in net/9p/trans_fd.c
> > which basically does a p9_fd_poll().
> >
> > Previously, the above would fail with err as -EIO which would
> > cause the client to "Disconnect" and the retry logic would make
> > progress. Now however, the err returned is -ERESTARTSYS which
> > will not cause a disconnect and the retry logic will hang
> > somewhere in p9_client_rpc() later.

Now, if you got this far I think it'll be easier to make whatever
changed error out with EIO again instead; I'll try to check the rest of
the thread later this week as I didn't follow this thread at all.

Thanks,
-- 
Dominique

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-25 12:15                   ` [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter Oleg Nesterov
  2025-03-25 12:36                     ` Dominique Martinet
@ 2025-03-25 13:04                     ` Oleg Nesterov
  2025-03-25 14:49                       ` K Prateek Nayak
  1 sibling, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2025-03-25 13:04 UTC (permalink / raw)
  To: K Prateek Nayak, Eric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet, Christian Schoenebeck
  Cc: Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton,
	linux-fsdevel, linux-kernel, netfs, swapnil.sapkal,
	syzkaller-bugs, viro, v9fs

On 03/25, K Prateek Nayak wrote:
>
> I chased this down to p9_client_rpc() net/9p/client.c specifically:
>
>         err = c->trans_mod->request(c, req);
>         if (err < 0) {
>                 /* write won't happen */
>                 p9_req_put(c, req);
>                 if (err != -ERESTARTSYS && err != -EFAULT)
>                         c->status = Disconnected;
>                 goto recalc_sigpending;
>         }
>
> c->trans_mod->request() calls p9_fd_request() in net/9p/trans_fd.c
> which basically does a p9_fd_poll().

Again, I know nothing about 9p... but if p9_fd_request() returns
an err < 0, then it comes from p9_conn->err and p9_fd_request()
does nothing else.

> Previously, the above would fail with err as -EIO which would
> cause the client to "Disconnect" and the retry logic would make
> progress. Now however, the err returned is -ERESTARTSYS

OK... So p9_conn->err = -ERESTARTSYS was set by p9_conn_cancel()
called by p9_write_work() because pipe_write() returns ERESTARTSYS?

But I don't understand -EIO with the reverted commit aaec5a95d59615

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-25 13:04                     ` Oleg Nesterov
@ 2025-03-25 14:49                       ` K Prateek Nayak
  2025-03-25 14:58                         ` Dominique Martinet
  0 siblings, 1 reply; 27+ messages in thread
From: K Prateek Nayak @ 2025-03-25 14:49 UTC (permalink / raw)
  To: Oleg Nesterov, Eric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet, Christian Schoenebeck
  Cc: Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton,
	linux-fsdevel, linux-kernel, netfs, swapnil.sapkal,
	syzkaller-bugs, viro, v9fs

[-- Attachment #1: Type: text/plain, Size: 12265 bytes --]

Hello Oleg, Dominique,

On 3/25/2025 6:34 PM, Oleg Nesterov wrote:
> On 03/25, K Prateek Nayak wrote:
>>
>> I chased this down to p9_client_rpc() net/9p/client.c specifically:
>>
>>          err = c->trans_mod->request(c, req);
>>          if (err < 0) {
>>                  /* write won't happen */
>>                  p9_req_put(c, req);
>>                  if (err != -ERESTARTSYS && err != -EFAULT)
>>                          c->status = Disconnected;
>>                  goto recalc_sigpending;
>>          }
>>
>> c->trans_mod->request() calls p9_fd_request() in net/9p/trans_fd.c
>> which basically does a p9_fd_poll().
> 
> Again, I know nothing about 9p... but if p9_fd_request() returns
> an err < 0, then it comes from p9_conn->err and p9_fd_request()
> does nothing else.
> 
>> Previously, the above would fail with err as -EIO which would
>> cause the client to "Disconnect" and the retry logic would make
>> progress. Now however, the err returned is -ERESTARTSYS
> 
> OK... So p9_conn->err = -ERESTARTSYS was set by p9_conn_cancel()
> called by p9_write_work() because pipe_write() returns ERESTARTSYS?
> 
> But I don't understand -EIO with the reverted commit aaec5a95d59615

Okay it is a long explanation. These are the relevant bits of
traces with aaec5a95d59615 reverted on mainline and I'll break it
down to the best of my abilities:

    kworker/100:1-1803    [100] .....   286.618784: p9_write_work: Data write wait 32770
    kworker/100:1-1803    [100] .....   286.618792: p9_write_work: Data written 28672
    kworker/100:1-1803    [100] .....   286.618793: p9_fd_poll: p9_fd_poll rd poll
    kworker/100:1-1803    [100] .....   286.618793: p9_fd_poll: p9_fd_request wr poll
    kworker/100:1-1803    [100] .....   286.618793: p9_fd_poll: p9_fd_poll rd poll
    kworker/100:1-1803    [100] .....   286.618793: p9_fd_poll: p9_fd_request wr poll
    kworker/100:1-1803    [100] .....   286.618795: p9_read_work: Data read wait 7
    kworker/100:1-1803    [100] .....   286.618796: p9_read_work: Data read 7
    kworker/100:1-1803    [100] .....   286.618796: p9_fd_poll: p9_fd_poll rd poll
    kworker/100:1-1803    [100] .....   286.618796: p9_fd_poll: p9_fd_request wr poll
    kworker/100:1-1803    [100] .....   286.618797: p9_fd_poll: p9_fd_poll rd poll
    kworker/100:1-1803    [100] .....   286.618797: p9_fd_poll: p9_fd_request wr poll
    kworker/100:1-1803    [100] .....   286.618797: p9_read_work: Data read wait 55
    kworker/100:1-1803    [100] .....   286.618798: p9_read_work: Data read 55
            repro-4102    [100] .....   286.618802: p9_client_write: p9_client_rpc done
            repro-4102    [100] .....   286.618804: p9_client_write: p9_pdup
            repro-4102    [100] .....   286.618804: p9_client_write: iter revert

   ==== the above bits are common between the two ===

            repro-4102    [100] .....   286.618805: p9_client_write: p9_client_rpc
            repro-4102    [100] .....   286.618817: p9_fd_request: p9_fd_request
            repro-4102    [100] .....   286.618818: p9_fd_request: p9_fd_request EPOLL
            repro-4102    [100] .....   286.618818: p9_fd_poll: p9_fd_poll rd poll
            repro-4102    [100] .....   286.618818: p9_fd_poll: p9_fd_request wr poll
    kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_poll rd poll
    kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_request wr poll
    kworker/100:1-1803    [100] .....   286.618823: p9_read_work: Data read wait 7
    kworker/100:1-1803    [100] .....   286.618825: p9_read_work: Data read 7
    kworker/100:1-1803    [100] .....   286.618825: p9_read_work: p9_tag_lookup error

                                                                  ^^^^^^^^^^^^^^^^^^^
   ==== In this case the  read succeeds but p9_tag_lookup() later will error out ====
   ==== Connection is cancelled with m->err as -EIO                              ====

    kworker/100:1-1803    [100] .....   286.618825: p9_read_work: Connection cancel
    kworker/100:1-1803    [100] ...1.   286.618826: p9_conn_cancel: Connection cancelled (-5)
            repro-4102    [100] .....   286.618831: p9_client_write: p9_client_rpc done
            repro-4102    [100] .....   286.618832: p9_client_write: p9_client_rpc error (-5)

                                                                                          ^^^
   ==== This error from kworker/100 is received by the reproducer                         ====

            repro-4102    [100] .....   286.618833: v9fs_issue_write: Issue write done 2 err(-5)
            repro-4102    [100] .....   286.618833: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated
            repro-4102    [100] .....   286.618834: netfs_wake_write_collector: Wake collector
            repro-4102    [100] .....   286.618834: netfs_wake_write_collector: Queuing collector work
            repro-4102    [100] .....   286.618837: v9fs_issue_write: Issue write subrequest terminated 2
            repro-4102    [100] .....   286.618838: netfs_unbuffered_write: netfs_unbuffered_write
            repro-4102    [100] .....   286.618838: netfs_end_issue_write: netfs_end_issue_write
            repro-4102    [100] .....   286.618838: netfs_end_issue_write: Write collector need poke 0
            repro-4102    [100] .....   286.618839: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!

   ==== A reissue mechanism triggers below ====

  kworker/u1037:2-3232    [096] .....   286.618845: netfs_retry_writes: netfs_reissue_write 1
  kworker/u1037:2-3232    [096] .....   286.618846: v9fs_issue_write: v9fs_issue_write
  kworker/u1037:2-3232    [096] .....   286.618847: p9_client_write: p9_client_rpc
  kworker/u1037:2-3232    [096] .....   286.618873: p9_fd_request: p9_fd_request
  kworker/u1037:2-3232    [096] ...1.   286.618874: p9_fd_request: p9_fd_request error
  kworker/u1037:2-3232    [096] .....   286.618874: p9_client_rpc: Client disconnected (no write)

   ==== Connection with client is disconnected ====

  kworker/u1037:2-3232    [096] .....   286.618877: p9_client_write: p9_client_rpc done
  kworker/u1037:2-3232    [096] .....   286.618880: p9_client_write: p9_client_rpc error (-5)
  kworker/u1037:2-3232    [096] .....   286.618881: v9fs_issue_write: Issue write done 0 err(-5)
  kworker/u1037:2-3232    [096] .....   286.618882: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated
  kworker/u1037:2-3232    [096] .....   286.618882: netfs_wake_write_collector: Wake collector
  kworker/u1037:2-3232    [096] .....   286.618882: netfs_wake_write_collector: Queuing collector work
  kworker/u1037:2-3232    [096] .....   286.618887: v9fs_issue_write: Issue write subrequest terminated 0
  kworker/u1037:2-3232    [096] .....   286.618890: netfs_write_collection_worker: Write collect clearing and waking up!

   ==== The retry worker will clear the NETFS_RREQ_IN_PROGRESS bit and issue a wakeup ====

            repro-4102    [100] .....   286.619051: p9_client_prepare_req: p9_client_prepare_req eio 1
            repro-4102    [100] .....   286.619052: p9_client_rpc: p9_client_rpc early err return
            repro-4108    [240] .....   286.621325: p9_fd_create: Client connected

   ==== The reproducer will then re-establish the connection and the pattern repeats ====

---

So the pattern is with the optimization reverted is:

o The RPC thread reads "7" bytes.
o p9_tag_lookup() fails giving -EIO.
o connection is cancelled from rpc thread.

o The reproducer thread receives the packet and finds -EIO.
o The reproducer starts a Disconnect.
o The reproducer wakes up collector and sleeps on bit
   NETFS_RREQ_IN_PROGRESS

o A retry worker wakes up.
o Finds connection has been disconnected
o Wakes up the reproducer thread.

o Reproducer wakes up and establishes connection.

I don't know 9p bits enough to understand the intermediates steps but
this is the rough flow.

---

Now for the traces on vanilla mainline:

     kworker/31:1-1723    [031] .....    95.011362: p9_write_work: Data write wait 32770
     kworker/31:1-1723    [031] .....    95.011372: p9_write_work: Data written 28672
     kworker/31:1-1723    [031] .....    95.011373: p9_fd_poll: p9_fd_poll rd poll
     kworker/31:1-1723    [031] .....    95.011373: p9_fd_poll: p9_fd_request wr poll
     kworker/31:1-1723    [031] .....    95.011374: p9_fd_poll: p9_fd_poll rd poll
     kworker/31:1-1723    [031] .....    95.011374: p9_fd_poll: p9_fd_request wr poll
     kworker/31:1-1723    [031] .....    95.011374: p9_read_work: Data read wait 7
     kworker/31:1-1723    [031] .....    95.011375: p9_read_work: Data read 7
     kworker/31:1-1723    [031] .....    95.011375: p9_fd_poll: p9_fd_poll rd poll
     kworker/31:1-1723    [031] .....    95.011375: p9_fd_poll: p9_fd_request wr poll
     kworker/31:1-1723    [031] .....    95.011376: p9_read_work: Data read wait 55
     kworker/31:1-1723    [031] .....    95.011376: p9_read_work: Data read 55
            repro-4076    [031] .....    95.011381: p9_client_rpc: Wait event killable (0)
            repro-4076    [031] .....    95.011382: p9_client_rpc: Check error (0)
            repro-4076    [031] .....    95.011382: p9_client_write: p9_client_rpc done
            repro-4076    [031] .....    95.011382: p9_client_write: p9_pdup
            repro-4076    [031] .....    95.011382: p9_client_write: iter revert

   ==== Above bits are same as the previous traces ====

            repro-4076    [031] .....    95.011383: p9_client_write: p9_client_rpc
            repro-4076    [031] .....    95.011393: p9_fd_request: p9_fd_request
            repro-4076    [031] .....    95.011394: p9_fd_request: p9_fd_request EPOLL
            repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_poll rd poll
            repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_request wr poll
            repro-4076    [031] .....    99.731970: p9_client_rpc: Wait event killable (-512)

   ==== The wait event here fails with -ERESTARTSYS ====

            repro-4076    [031] .....    99.731979: p9_client_rpc: Flushed (-512)
            repro-4076    [031] .....    99.731983: p9_client_rpc: Final error (-512)
            repro-4076    [031] .....    99.731983: p9_client_write: p9_client_rpc done
            repro-4076    [031] .....    99.731984: p9_client_write: p9_client_rpc error (-512)
            repro-4076    [031] .....    99.731985: v9fs_issue_write: Issue write done 2 err(-512)
            repro-4076    [031] .....    99.731987: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated
            repro-4076    [031] .....    99.731987: netfs_wake_write_collector: Wake collector
            repro-4076    [031] .....    99.731988: netfs_wake_write_collector: Queuing collector work
            repro-4076    [031] .....    99.731991: v9fs_issue_write: Issue write subrequest terminated 2
            repro-4076    [031] .....    99.731991: netfs_unbuffered_write: netfs_unbuffered_write
            repro-4076    [031] .....    99.731991: netfs_end_issue_write: netfs_end_issue_write
            repro-4076    [031] .....    99.731992: netfs_end_issue_write: Write collector need poke 0
            repro-4076    [031] .....    99.731993: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!

   ==== The reproducer waits on bit NETFS_RREQ_IN_PROGRESS ====

  kworker/u1028:3-3860    [030] .....    99.732028: netfs_retry_writes: netfs_reissue_write 1
  kworker/u1028:3-3860    [030] .....    99.732030: v9fs_issue_write: v9fs_issue_write
  kworker/u1028:3-3860    [030] .....    99.732031: p9_client_write: p9_client_rpc
  kworker/u1028:3-3860    [030] .....    99.732051: p9_fd_request: p9_fd_request
  kworker/u1028:3-3860    [030] .....    99.732052: p9_fd_request: p9_fd_request EPOLL
  kworker/u1028:3-3860    [030] .....    99.732052: p9_fd_poll: p9_fd_poll rd poll
  kworker/u1028:3-3860    [030] .....    99.732053: p9_fd_poll: p9_fd_request wr poll

   ==== The retry worker seemingly gets stuck post p9_fd_poll() waiting for wakeup ====

---

That is my analysis so far.

> 
> Oleg.
> 

-- 
Thanks and Regards,
Prateek

[-- Attachment #2: debug.diff --]
[-- Type: text/plain, Size: 19614 bytes --]

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 32619d146cbc..3fbfc6882c35 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -56,10 +56,13 @@ static void v9fs_issue_write(struct netfs_io_subrequest *subreq)
 	struct p9_fid *fid = subreq->rreq->netfs_priv;
 	int err, len;
 
+	trace_printk("v9fs_issue_write\n");
 	len = p9_client_write(fid, subreq->start, &subreq->io_iter, &err);
+	trace_printk("Issue write done %d err(%d)\n", len, err);
 	if (len > 0)
 		__set_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
 	netfs_write_subrequest_terminated(subreq, len ?: err, false);
+	trace_printk("Issue write subrequest terminated %d\n", len);
 }
 
 /**
@@ -74,6 +77,7 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
 	int total, err;
 
 	total = p9_client_read(fid, pos, &subreq->io_iter, &err);
+	trace_printk("v9fs_issue_read %d err(%d)\n", total, err);
 
 	/* if we just extended the file size, any portion not in
 	 * cache won't be on server and is zeroes */
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 42ce53cc216e..6333b8bb4008 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -105,6 +105,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 
 	if (!async) {
 		trace_netfs_rreq(wreq, netfs_rreq_trace_wait_ip);
+		trace_printk("Waiting on NETFS_RREQ_IN_PROGRESS!\n");
 		wait_on_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS,
 			    TASK_UNINTERRUPTIBLE);
 		ret = wreq->error;
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index 23c75755ad4e..a609063db8a7 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -446,6 +446,7 @@ static void netfs_read_collection(struct netfs_io_request *rreq)
 	task_io_account_read(rreq->transferred);
 
 	trace_netfs_rreq(rreq, netfs_rreq_trace_wake_ip);
+	trace_printk("Read collect clearing and waking up!\n");
 	clear_and_wake_up_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
 
 	trace_netfs_rreq(rreq, netfs_rreq_trace_done);
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index 3fca59e6475d..a54aa2bcc786 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -429,6 +429,7 @@ void netfs_write_collection_worker(struct work_struct *work)
 
 	_debug("finished");
 	trace_netfs_rreq(wreq, netfs_rreq_trace_wake_ip);
+	trace_printk("Write collect clearing and waking up!\n");
 	clear_and_wake_up_bit(NETFS_RREQ_IN_PROGRESS, &wreq->flags);
 
 	if (wreq->iocb) {
@@ -449,10 +450,14 @@ void netfs_write_collection_worker(struct work_struct *work)
  */
 void netfs_wake_write_collector(struct netfs_io_request *wreq, bool was_async)
 {
+	trace_printk("Wake collector\n");
 	if (!work_pending(&wreq->work)) {
 		netfs_get_request(wreq, netfs_rreq_trace_get_work);
-		if (!queue_work(system_unbound_wq, &wreq->work))
+		trace_printk("Queuing collector work\n");
+		if (!queue_work(system_unbound_wq, &wreq->work)) {
+			trace_printk("Put request\n");
 			netfs_put_request(wreq, was_async, netfs_rreq_trace_put_work_nq);
+		}
 	}
 }
 
@@ -542,8 +547,10 @@ void netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
 	/* If we are at the head of the queue, wake up the collector,
 	 * transferring a ref to it if we were the ones to do so.
 	 */
-	if (list_is_first(&subreq->rreq_link, &stream->subrequests))
+	if (list_is_first(&subreq->rreq_link, &stream->subrequests)) {
+		trace_printk("Collector woken up from netfs_write_subrequest_terminated\n");
 		netfs_wake_write_collector(wreq, was_async);
+	}
 
 	netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated);
 }
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index 77279fc5b5a7..22a4b793e789 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -232,8 +232,10 @@ static void netfs_do_issue_write(struct netfs_io_stream *stream,
 
 	_enter("R=%x[%x],%zx", wreq->debug_id, subreq->debug_index, subreq->len);
 
-	if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
+	if (test_bit(NETFS_SREQ_FAILED, &subreq->flags)) {
+		trace_printk("netfs_do_issue_write failed!\n");
 		return netfs_write_subrequest_terminated(subreq, subreq->error, false);
+	}
 
 	trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
 	stream->issue_write(subreq);
@@ -264,6 +266,7 @@ void netfs_issue_write(struct netfs_io_request *wreq,
 
 	if (!subreq)
 		return;
+	trace_printk("netfs_issue_write!\n");
 	stream->construct = NULL;
 	subreq->io_iter.count = subreq->len;
 	netfs_do_issue_write(stream, subreq);
@@ -290,6 +293,7 @@ size_t netfs_advance_write(struct netfs_io_request *wreq,
 	_enter("R=%x[%x]", wreq->debug_id, subreq ? subreq->debug_index : 0);
 
 	if (subreq && start != subreq->start + subreq->len) {
+		trace_printk("netfs_issue_write from advance write!\n");
 		netfs_issue_write(wreq, stream);
 		subreq = NULL;
 	}
@@ -307,6 +311,7 @@ size_t netfs_advance_write(struct netfs_io_request *wreq,
 	if (subreq->len >= stream->sreq_max_len ||
 	    subreq->nr_segs >= stream->sreq_max_segs ||
 	    to_eof) {
+		trace_printk("netfs_issue_write from advance write! 2\n");
 		netfs_issue_write(wreq, stream);
 		subreq = NULL;
 	}
@@ -397,6 +402,7 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
 	 *     when they were read.  Note that these appear as a special
 	 *     write-back group.
 	 */
+	trace_printk("netfs_issue_write in folio copy\n");
 	if (fgroup == NETFS_FOLIO_COPY_TO_CACHE) {
 		netfs_issue_write(wreq, upload);
 	} else if (fgroup != wreq->group) {
@@ -526,6 +532,7 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
  */
 static void netfs_end_issue_write(struct netfs_io_request *wreq)
 {
+	trace_printk("netfs_end_issue_write\n");
 	bool needs_poke = true;
 
 	smp_wmb(); /* Write subreq lists before ALL_QUEUED. */
@@ -541,6 +548,7 @@ static void netfs_end_issue_write(struct netfs_io_request *wreq)
 		netfs_issue_write(wreq, stream);
 	}
 
+	trace_printk("Write collector need poke %d\n", needs_poke);
 	if (needs_poke)
 		netfs_wake_write_collector(wreq, false);
 }
@@ -730,6 +738,7 @@ int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t
 			break;
 	}
 
+	trace_printk("netfs_unbuffered_write\n");
 	netfs_end_issue_write(wreq);
 	_leave(" = %d", error);
 	return error;
@@ -908,6 +917,7 @@ int netfs_writeback_single(struct address_space *mapping,
 	}
 
 stop:
+	trace_printk("netfs_issue_write writeback single\n");
 	for (int s = 0; s < NR_IO_STREAMS; s++)
 		netfs_issue_write(wreq, &wreq->io_streams[s]);
 	smp_wmb(); /* Write lists before ALL_QUEUED. */
diff --git a/fs/netfs/write_retry.c b/fs/netfs/write_retry.c
index 545d33079a77..60a03079766b 100644
--- a/fs/netfs/write_retry.c
+++ b/fs/netfs/write_retry.c
@@ -43,6 +43,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
 
 				iov_iter_revert(&source, subreq->len - source.count);
 				netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+				trace_printk("netfs_reissue_write 1\n");
 				netfs_reissue_write(stream, subreq, &source);
 			}
 		}
@@ -116,6 +117,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
 				boundary = true;
 
 			netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+			trace_printk("netfs_reissue_write 2\n");
 			netfs_reissue_write(stream, subreq, &source);
 			if (subreq == to)
 				break;
@@ -183,6 +185,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
 				boundary = false;
 			}
 
+			trace_printk("netfs_reissue_write 3\n");
 			netfs_reissue_write(stream, subreq, &source);
 			if (!len)
 				break;
diff --git a/net/9p/client.c b/net/9p/client.c
index 09f8ced9f8bb..909c83b0e015 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -633,12 +633,16 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 	p9_debug(P9_DEBUG_MUX, "client %p op %d\n", c, type);
 
 	/* we allow for any status other than disconnected */
-	if (c->status == Disconnected)
+	if (c->status == Disconnected) {
+		trace_printk("p9_client_prepare_req eio 1\n");
 		return ERR_PTR(-EIO);
+	}
 
 	/* if status is begin_disconnected we allow only clunk request */
-	if (c->status == BeginDisconnect && type != P9_TCLUNK)
+	if (c->status == BeginDisconnect && type != P9_TCLUNK) {
+		trace_printk("p9_client_prepare_req eio 2\n");
 		return ERR_PTR(-EIO);
+	}
 
 	va_copy(apc, ap);
 	req = p9_tag_alloc(c, type, t_size, r_size, fmt, apc);
@@ -690,8 +694,10 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	va_start(ap, fmt);
 	req = p9_client_prepare_req(c, type, tsize, rsize, fmt, ap);
 	va_end(ap);
-	if (IS_ERR(req))
+	if (IS_ERR(req)) {
+		trace_printk("p9_client_rpc early err return\n");
 		return req;
+	}
 
 	req->tc.zc = false;
 	req->rc.zc = false;
@@ -706,9 +712,12 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	err = c->trans_mod->request(c, req);
 	if (err < 0) {
 		/* write won't happen */
+		trace_printk("write won't happnen (%d)\n", err);
 		p9_req_put(c, req);
-		if (err != -ERESTARTSYS && err != -EFAULT)
+		if (err != -ERESTARTSYS && err != -EFAULT) {
+			trace_printk("Client disconnected (no write)\n");
 			c->status = Disconnected;
+		}
 		goto recalc_sigpending;
 	}
 again:
@@ -721,6 +730,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	 */
 	smp_rmb();
 
+	trace_printk("Wait event killable (%d) (%d) (%d) (%d)\n", err, READ_ONCE(req->status), c->status, type);
 	if (err == -ERESTARTSYS && c->status == Connected &&
 	    type == P9_TFLUSH) {
 		sigpending = 1;
@@ -731,6 +741,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	if (READ_ONCE(req->status) == REQ_STATUS_ERROR) {
 		p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
 		err = req->t_err;
+		trace_printk("REQ_STATUS_ERROR (%d)\n", err);
 	}
 	if (err == -ERESTARTSYS && c->status == Connected) {
 		p9_debug(P9_DEBUG_MUX, "flushing\n");
@@ -743,6 +754,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 		/* if we received the response anyway, don't signal error */
 		if (READ_ONCE(req->status) == REQ_STATUS_RCVD)
 			err = 0;
+
+		trace_printk("Flushed (%d)\n", err);
 	}
 recalc_sigpending:
 	if (sigpending) {
@@ -754,11 +767,13 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 		goto reterr;
 
 	err = p9_check_errors(c, req);
+	trace_printk("Check error (%d)\n", err);
 	trace_9p_client_res(c, type, req->rc.tag, err);
 	if (!err)
 		return req;
 reterr:
 	p9_req_put(c, req);
+	trace_printk("Final error (%d)\n", err);
 	return ERR_PTR(safe_errno(err));
 }
 
@@ -808,8 +823,10 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 	err = c->trans_mod->zc_request(c, req, uidata, uodata,
 				       inlen, olen, in_hdrlen);
 	if (err < 0) {
-		if (err == -EIO)
+		if (err == -EIO) {
+			trace_printk("Client disconnected (zero_copy)\n");
 			c->status = Disconnected;
+		}
 		if (err != -ERESTARTSYS)
 			goto recalc_sigpending;
 	}
@@ -1093,6 +1110,7 @@ EXPORT_SYMBOL(p9_client_destroy);
 void p9_client_disconnect(struct p9_client *clnt)
 {
 	p9_debug(P9_DEBUG_9P, "clnt %p\n", clnt);
+	trace_printk("Client disconnected\n");
 	clnt->status = Disconnected;
 }
 EXPORT_SYMBOL(p9_client_disconnect);
@@ -1100,6 +1118,7 @@ EXPORT_SYMBOL(p9_client_disconnect);
 void p9_client_begin_disconnect(struct p9_client *clnt)
 {
 	p9_debug(P9_DEBUG_9P, "clnt %p\n", clnt);
+	trace_printk("Client begin disconnect\n");
 	clnt->status = BeginDisconnect;
 }
 EXPORT_SYMBOL(p9_client_begin_disconnect);
@@ -1572,11 +1591,13 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
 				       offset, rsize);
 	} else {
 		non_zc = 1;
+		trace_printk("RPC issue\n");
 		req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset,
 				    rsize);
 	}
 	if (IS_ERR(req)) {
 		*err = PTR_ERR(req);
+		trace_printk("read RPC issue error(%d)\n", *err);
 		if (!non_zc)
 			iov_iter_revert(to, count - iov_iter_count(to));
 		return 0;
@@ -1587,12 +1608,14 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
 	if (*err) {
 		if (!non_zc)
 			iov_iter_revert(to, count - iov_iter_count(to));
+		trace_printk("p9pdu_readf error(%d)\n", *err);
 		trace_9p_protocol_dump(clnt, &req->rc);
 		p9_req_put(clnt, req);
 		return 0;
 	}
 	if (rsize < received) {
 		pr_err("bogus RREAD count (%d > %d)\n", received, rsize);
+		trace_printk("bogus read\n");
 		received = rsize;
 	}
 
@@ -1602,6 +1625,7 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
 		int n = copy_to_iter(dataptr, received, to);
 
 		if (n != received) {
+			trace_printk("copy_to_iter fault\n");
 			*err = -EFAULT;
 			p9_req_put(clnt, req);
 			return n;
@@ -1642,23 +1666,29 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 					       rsize, P9_ZC_HDR_SZ, "dqd",
 					       fid->fid, offset, rsize);
 		} else {
+			trace_printk("p9_client_rpc\n");
 			req = p9_client_rpc(clnt, P9_TWRITE, "dqV", fid->fid,
 					    offset, rsize, from);
+			trace_printk("p9_client_rpc done\n");
 		}
 		if (IS_ERR(req)) {
 			iov_iter_revert(from, count - iov_iter_count(from));
 			*err = PTR_ERR(req);
+			trace_printk("p9_client_rpc error (%d)\n", *err);
 			break;
 		}
 
+		trace_printk("p9_pdup\n");
 		*err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &written);
 		if (*err) {
+			trace_printk("p9_pdup err(%d)\n", *err);
 			iov_iter_revert(from, count - iov_iter_count(from));
 			trace_9p_protocol_dump(clnt, &req->rc);
 			p9_req_put(clnt, req);
 			break;
 		}
 		if (rsize < written) {
+			trace_printk("p9_pdup bogus %d %d\n", written, rsize);
 			pr_err("bogus RWRITE count (%d > %d)\n", written, rsize);
 			written = rsize;
 		}
@@ -1666,6 +1696,7 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", written);
 
 		p9_req_put(clnt, req);
+		trace_printk("iter revert\n");
 		iov_iter_revert(from, count - written - iov_iter_count(from));
 		total += written;
 		offset += written;
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 196060dc6138..3c83b3855e82 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -196,6 +196,7 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
 		return;
 	}
 
+	trace_printk("Connection cancelled (%d)\n", err);
 	m->err = err;
 
 	list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) {
@@ -233,9 +234,12 @@ p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
 		return EPOLLERR;
 	}
 
+	trace_printk("p9_fd_poll rd poll\n");
 	ret = vfs_poll(ts->rd, pt);
-	if (ts->rd != ts->wr)
+	if (ts->rd != ts->wr) {
+		trace_printk("p9_fd_request wr poll\n");
 		ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN);
+	}
 	return ret;
 }
 
@@ -263,9 +267,13 @@ static int p9_fd_read(struct p9_client *client, void *v, int len)
 		p9_debug(P9_DEBUG_ERROR, "blocking read ...\n");
 
 	pos = ts->rd->f_pos;
+	trace_printk("Data read wait %d\n", len);
 	ret = kernel_read(ts->rd, v, len, &pos);
-	if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
+	trace_printk("Data read %d\n", ret);
+	if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN) {
+		trace_printk("Client disconnected (fd_read)\n");
 		client->status = Disconnected;
+	}
 	return ret;
 }
 
@@ -330,6 +338,7 @@ static void p9_read_work(struct work_struct *work)
 		if (!m->rreq || (m->rreq->status != REQ_STATUS_SENT)) {
 			p9_debug(P9_DEBUG_ERROR, "Unexpected packet tag %d\n",
 				 m->rc.tag);
+			trace_printk("p9_tag_lookup error\n");
 			err = -EIO;
 			goto error;
 		}
@@ -338,6 +347,7 @@ static void p9_read_work(struct work_struct *work)
 			p9_debug(P9_DEBUG_ERROR,
 				 "requested packet size too big: %d for tag %d with capacity %zd\n",
 				 m->rc.size, m->rc.tag, m->rreq->rc.capacity);
+			trace_printk("capacity error\n");
 			err = -EIO;
 			goto error;
 		}
@@ -348,6 +358,7 @@ static void p9_read_work(struct work_struct *work)
 				 m->rc.tag, m->rreq);
 			p9_req_put(m->client, m->rreq);
 			m->rreq = NULL;
+			trace_printk("sdata error\n");
 			err = -EIO;
 			goto error;
 		}
@@ -375,6 +386,7 @@ static void p9_read_work(struct work_struct *work)
 			p9_debug(P9_DEBUG_ERROR,
 				 "Request tag %d errored out while we were reading the reply\n",
 				 m->rc.tag);
+			trace_printk("request tag error %d status(%d)\n", m->rc.tag, m->rreq->status);
 			err = -EIO;
 			goto error;
 		}
@@ -403,6 +415,7 @@ static void p9_read_work(struct work_struct *work)
 
 	return;
 error:
+	trace_printk("Connection cancel\n");
 	p9_conn_cancel(m, err);
 	clear_bit(Rworksched, &m->wsched);
 }
@@ -429,9 +442,13 @@ static int p9_fd_write(struct p9_client *client, void *v, int len)
 	if (!(ts->wr->f_flags & O_NONBLOCK))
 		p9_debug(P9_DEBUG_ERROR, "blocking write ...\n");
 
+	trace_printk("Data write wait %d\n",len);
 	ret = kernel_write(ts->wr, v, len, &ts->wr->f_pos);
-	if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
+	trace_printk("Data written %ld\n", ret);
+	if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN) {
+		trace_printk("Client disconnected (fd_write)\n");
 		client->status = Disconnected;
+	}
 	return ret;
 }
 
@@ -519,6 +536,7 @@ static void p9_write_work(struct work_struct *work)
 	return;
 
 error:
+	trace_printk("Connection cancel\n");
 	p9_conn_cancel(m, err);
 	clear_bit(Wworksched, &m->wsched);
 }
@@ -628,6 +646,7 @@ static void p9_poll_mux(struct p9_conn *m)
 	n = p9_fd_poll(m->client, NULL, &err);
 	if (n & (EPOLLERR | EPOLLHUP | EPOLLNVAL)) {
 		p9_debug(P9_DEBUG_TRANS, "error mux %p err %d\n", m, n);
+		trace_printk("Connection cancel\n");
 		p9_conn_cancel(m, err);
 	}
 
@@ -668,12 +687,14 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 	struct p9_trans_fd *ts = client->trans;
 	struct p9_conn *m = &ts->conn;
 
+	trace_printk("p9_fd_request\n");
 	p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
 		 m, current, &req->tc, req->tc.id);
 
 	spin_lock(&m->req_lock);
 
 	if (m->err < 0) {
+		trace_printk("p9_fd_request error\n");
 		spin_unlock(&m->req_lock);
 		return m->err;
 	}
@@ -682,13 +703,16 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 	list_add_tail(&req->req_list, &m->unsent_req_list);
 	spin_unlock(&m->req_lock);
 
+	trace_printk("p9_fd_request EPOLL\n");
 	if (test_and_clear_bit(Wpending, &m->wsched))
 		n = EPOLLOUT;
 	else
 		n = p9_fd_poll(m->client, NULL, NULL);
 
-	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
+	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) {
+		trace_printk("p9_fd_request schedule work\n");
 		schedule_work(&m->wq);
+	}
 
 	return 0;
 }
@@ -852,6 +876,7 @@ static int p9_fd_open(struct p9_client *client, int rfd, int wfd)
 	data_race(ts->wr->f_flags |= O_NONBLOCK);
 
 	client->trans = ts;
+	trace_printk("Client connected\n");
 	client->status = Connected;
 
 	return 0;
@@ -889,6 +914,7 @@ static int p9_socket_open(struct p9_client *client, struct socket *csocket)
 	get_file(file);
 	p->wr = p->rd = file;
 	client->trans = p;
+	trace_printk("Client connected\n");
 	client->status = Connected;
 
 	p->rd->f_flags |= O_NONBLOCK;
@@ -920,6 +946,7 @@ static void p9_conn_destroy(struct p9_conn *m)
 		m->wreq = NULL;
 	}
 
+	trace_printk("Connection destroy\n");
 	p9_conn_cancel(m, -ECONNRESET);
 
 	m->client = NULL;
@@ -942,6 +969,7 @@ static void p9_fd_close(struct p9_client *client)
 	if (!ts)
 		return;
 
+	trace_printk("Client disconnected (fd_close)\n");
 	client->status = Disconnected;
 
 	p9_conn_destroy(&ts->conn);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-25 14:49                       ` K Prateek Nayak
@ 2025-03-25 14:58                         ` Dominique Martinet
  2025-03-26 12:19                           ` Oleg Nesterov
  2025-03-27 17:46                           ` K Prateek Nayak
  0 siblings, 2 replies; 27+ messages in thread
From: Dominique Martinet @ 2025-03-25 14:58 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Oleg Nesterov, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Mateusz Guzik, syzbot, brauner, dhowells,
	jack, jlayton, linux-fsdevel, linux-kernel, netfs, swapnil.sapkal,
	syzkaller-bugs, viro, v9fs

Thanks for the traces.

w/ revert
K Prateek Nayak wrote on Tue, Mar 25, 2025 at 08:19:26PM +0530:
>    kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_poll rd poll
>    kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_request wr poll
>    kworker/100:1-1803    [100] .....   286.618823: p9_read_work: Data read wait 7

new behavior
>            repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_poll rd poll
>            repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_request wr poll
>            repro-4076    [031] .....    99.731970: p9_client_rpc: Wait event killable (-512)

For me the problem isn't so much that this gets ERESTARTSYS but that it
nevers gets to read the 7 bytes that are available?

If the repro has already written the bytes in both cases then there's no
reason to wait 5 seconds here...

OTOH syzbot 9p code is silly and might have been depending on something
that's not supposed to work e.g. they might be missing a flush or
equivalent for all I know (I still haven't looked at the repro)

-- 
Dominique

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-25 14:58                         ` Dominique Martinet
@ 2025-03-26 12:19                           ` Oleg Nesterov
  2025-03-26 12:44                             ` Oleg Nesterov
  2025-03-27 17:46                           ` K Prateek Nayak
  1 sibling, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2025-03-26 12:19 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: K Prateek Nayak, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Mateusz Guzik, syzbot, brauner, dhowells,
	jack, jlayton, linux-fsdevel, linux-kernel, netfs, swapnil.sapkal,
	syzkaller-bugs, viro, v9fs

On 03/25, Dominique Martinet wrote:
>
> Thanks for the traces.
>
> w/ revert
> K Prateek Nayak wrote on Tue, Mar 25, 2025 at 08:19:26PM +0530:
> >    kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_poll rd poll
> >    kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_request wr poll
> >    kworker/100:1-1803    [100] .....   286.618823: p9_read_work: Data read wait 7
>
> new behavior
> >            repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_poll rd poll
> >            repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_request wr poll
> >            repro-4076    [031] .....    99.731970: p9_client_rpc: Wait event killable (-512)
>
> For me the problem isn't so much that this gets ERESTARTSYS but that it
> nevers gets to read the 7 bytes that are available?

Yes...

OK, lets first recall what the commit aaec5a95d59615523 ("pipe_read:
don't wake up the writer if the pipe is still full") does.
It simply removes the unnecessary/spurious wakeups when the writer
can't add more data to the pipe.

See the "stupid test-cas" in
https://lore.kernel.org/all/20250120144338.GC7432@redhat.com/

In particular this note:

	As you can see, without this patch pipe_read() wakes the writer up
	4095 times for no reason, the writer burns a bit of CPU and blocks
	again after wakeup until the last read(fd[0], &c, 1).

in this test-case the writer sleeps in pipe_write(), but the same is true
for the task sleeping in poll( { .fd = pipe_fd, .events = POLLOUT}, ...).

Now, after some grepping I have found

	static void p9_conn_create(struct p9_client *client)
	{
		...

		init_poll_funcptr(&m->pt, p9_pollwait);

		n = p9_fd_poll(client, &m->pt, NULL);

		...
	}

So, iiuc, in this case p9_fd_poll(&m->pt /* != NULL */) -> p9_pollwait()
paths will add the "dummy" pwait->wait entries with ->func = p9_pollwake
to pipe_inode_info.rd_wait and pipe_inode_info.wr_wait.

Hmm... I don't understand why the 2nd vfs_poll(ts->wr) depends on the
ret from vfs_poll(ts->rd), but I assume this is correct.

This means that every time pipe_read() does wake_up(&pipe->wr_wait)
p9_pollwake() is called. This function kicks p9_poll_workfn() which
calls p9_poll_mux() which calls p9_fd_poll() again with pt == NULL.

In this case the conditional vfs_poll(ts->wr) looks more understandable...

So. Without the commit above, p9_poll_mux()->p9_fd_poll() can be called
much more often and, in particular, can report the "additional" EPOLLIN.

Can this somehow explain the problem?

Oleg.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-26 12:19                           ` Oleg Nesterov
@ 2025-03-26 12:44                             ` Oleg Nesterov
  2025-03-26 13:05                               ` Oleg Nesterov
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2025-03-26 12:44 UTC (permalink / raw)
  To: Dominique Martinet, K Prateek Nayak
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck,
	Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton,
	linux-fsdevel, linux-kernel, netfs, swapnil.sapkal,
	syzkaller-bugs, viro, v9fs

On 03/26, Oleg Nesterov wrote:
>
> Hmm... I don't understand why the 2nd vfs_poll(ts->wr) depends on the
> ret from vfs_poll(ts->rd), but I assume this is correct.

I meant, if pt != NULL and ts->rd != ts->wr we need both
vfs_poll(ts->rd) and vfs_poll(ts->wr) ?

and the reproducer writes to the pipe before it mounts 9p...

Prateek, this is just a shot in the dark but since you can reproduce,
can you check if the patch below makes any difference?

Oleg.

--- x/net/9p/trans_fd.c
+++ x/net/9p/trans_fd.c
@@ -234,8 +234,10 @@ p9_fd_poll(struct p9_client *client, str
 	}
 
 	ret = vfs_poll(ts->rd, pt);
-	if (ts->rd != ts->wr)
+	if (ts->rd != ts->wr) {
+		if (pt != NULL) vfs_poll(ts->wr, pt);
 		ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN);
+	}
 	return ret;
 }
 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-26 12:44                             ` Oleg Nesterov
@ 2025-03-26 13:05                               ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2025-03-26 13:05 UTC (permalink / raw)
  To: Dominique Martinet, K Prateek Nayak
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck,
	Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton,
	linux-fsdevel, linux-kernel, netfs, swapnil.sapkal,
	syzkaller-bugs, viro, v9fs

Damn, sorry for the noise please ignore ;)

On 03/26, Oleg Nesterov wrote:
>
> On 03/26, Oleg Nesterov wrote:
> >
> > Hmm... I don't understand why the 2nd vfs_poll(ts->wr) depends on the
> > ret from vfs_poll(ts->rd), but I assume this is correct.
>
> I meant, if pt != NULL and ts->rd != ts->wr we need both
> vfs_poll(ts->rd) and vfs_poll(ts->wr) ?

I am stupid. After the lot of reading I can't distinguish "|" and "||".

Oleg.

> and the reproducer writes to the pipe before it mounts 9p...
> 
> Prateek, this is just a shot in the dark but since you can reproduce,
> can you check if the patch below makes any difference?
> 
> Oleg.
> 
> --- x/net/9p/trans_fd.c
> +++ x/net/9p/trans_fd.c
> @@ -234,8 +234,10 @@ p9_fd_poll(struct p9_client *client, str
>  	}
>  
>  	ret = vfs_poll(ts->rd, pt);
> -	if (ts->rd != ts->wr)
> +	if (ts->rd != ts->wr) {
> +		if (pt != NULL) vfs_poll(ts->wr, pt);
>  		ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN);
> +	}
>  	return ret;
>  }
>  


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-25 14:58                         ` Dominique Martinet
  2025-03-26 12:19                           ` Oleg Nesterov
@ 2025-03-27 17:46                           ` K Prateek Nayak
  2025-03-27 21:19                             ` syzbot
  1 sibling, 1 reply; 27+ messages in thread
From: K Prateek Nayak @ 2025-03-27 17:46 UTC (permalink / raw)
  To: Dominique Martinet, Oleg Nesterov
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck,
	Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton,
	linux-fsdevel, linux-kernel, netfs, swapnil.sapkal,
	syzkaller-bugs, viro, v9fs

Hello all,

On 3/25/2025 8:28 PM, Dominique Martinet wrote:
> Thanks for the traces.
> 
> w/ revert
> K Prateek Nayak wrote on Tue, Mar 25, 2025 at 08:19:26PM +0530:
>>     kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_poll rd poll
>>     kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_request wr poll
>>     kworker/100:1-1803    [100] .....   286.618823: p9_read_work: Data read wait 7
> 
> new behavior
>>             repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_poll rd poll
>>             repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_request wr poll
>>             repro-4076    [031] .....    99.731970: p9_client_rpc: Wait event killable (-512)
> 
> For me the problem isn't so much that this gets ERESTARTSYS but that it
> nevers gets to read the 7 bytes that are available?

I'm back after getting distracted for a bit. So here it goes:

On the mainline, the signal is actually SIGKILL from parent thread
due to a timeout:

            repro-4084    [112] d..2.   233.654264: signal_generate: sig=9 errno=0 code=0 comm=repro pid=4085 grp=1 res=0
            repro-4084    [112] d..1.   233.654272: signal_generate: sig=9 errno=0 code=0 comm=repro pid=4085 grp=1 res=1
            repro-4085    [125] .....   233.654304: p9_client_rpc: Wait event killable (-512) (1) (0) (118)
            repro-4085    [125] .....   233.654308: p9_client_rpc: Flushed (-512)
            repro-4085    [125] .....   233.654312: p9_client_rpc: Final error (-512)
            repro-4085    [125] .....   233.654313: p9_client_write: p9_client_rpc done
            repro-4085    [125] .....   233.654313: p9_client_write: p9_client_rpc error (-512)
            repro-4085    [125] .....   233.654315: v9fs_issue_write: Issue write done 2 err(-512)

Specifically, this part:

     for (;;) {
       sleep_ms(10);
       if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
         break;
       if (current_time_ms() - start < 5000)
         continue;
       kill_and_wait(pid, &status); /* <------- here */
       break;
     }

Now for the 7 bytes that were not read - If you look at the traces,
every time there is a valid trans_mod->request(), there is a kworker
wakeup:

            repro-4102    [100] .....   286.618772: p9_client_write: p9_client_rpc
            repro-4102    [100] .....   286.618780: p9_fd_request: p9_fd_request
            repro-4102    [100] .....   286.618781: p9_fd_request: p9_fd_request EPOLL
            repro-4102    [100] .....   286.618781: p9_fd_request: p9_fd_request schedule work  <----- this
    kworker/100:1-1803    [100] .....   286.618784: p9_write_work: Data write wait 32770

However, for that last 7 byte read, there is in fact no wakeup:

            repro-4102    [100] .....   286.618805: p9_client_write: p9_client_rpc
            repro-4102    [100] .....   286.618817: p9_fd_request: p9_fd_request
            repro-4102    [100] .....   286.618818: p9_fd_request: p9_fd_request EPOLL
            repro-4102    [100] .....   286.618818: p9_fd_poll: p9_fd_poll rd poll
            repro-4102    [100] .....   286.618818: p9_fd_poll: p9_fd_request wr poll  <--- No wakeup after
    kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_poll rd poll
    kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_request wr poll
    kworker/100:1-1803    [100] .....   286.618823: p9_read_work: Data read wait 7
    kworker/100:1-1803    [100] .....   286.618825: p9_read_work: Data read 7

The kworker would spuriously wakeup, do something, realizes it wasn't
suppose to read it, and then disconnects the connection waking up all
waiters.

But if we haven't sent a request, why are we waiting at all?

> 
> If the repro has already written the bytes in both cases then there's no
> reason to wait 5 seconds here...
> 
> OTOH syzbot 9p code is silly and might have been depending on something
> that's not supposed to work e.g. they might be missing a flush or
> equivalent for all I know (I still haven't looked at the repro)
> 

I believe if c->trans_mod->request(c, req) does not dispatch the
request to the worker for RPC, the caller must not wait. So here
goes something that got the benchmark going in my case which might
be totally wrong but worth a look:

#syz test: upstream aaec5a95d59615523db03dd53c2052f0a87beea7

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 196060dc6138..148533c993f1 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -687,7 +687,11 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
  	else
  		n = p9_fd_poll(m->client, NULL, NULL);
  
-	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
+	/* Request was not sent */
+	if (!(n & EPOLLOUT))
+		return -EIO;
+
+	if (!test_and_set_bit(Wworksched, &m->wsched))
  		schedule_work(&m->wq);
  
  	return 0;


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-27 17:46                           ` K Prateek Nayak
@ 2025-03-27 21:19                             ` syzbot
  2025-03-27 22:18                               ` asmadeus
  0 siblings, 1 reply; 27+ messages in thread
From: syzbot @ 2025-03-27 21:19 UTC (permalink / raw)
  To: asmadeus, brauner, dhowells, ericvh, jack, jlayton,
	kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho,
	mjguzik, netfs, oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
KASAN: slab-use-after-free Read in p9_conn_cancel

==================================================================
BUG: KASAN: slab-use-after-free in p9_conn_cancel+0x900/0x910 net/9p/trans_fd.c:205
Read of size 8 at addr ffff88807b19ea50 by task syz-executor/6595

CPU: 0 UID: 0 PID: 6595 Comm: syz-executor Not tainted 6.13.0-rc1-syzkaller-00017-gaaec5a95d596-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xc3/0x620 mm/kasan/report.c:489
 kasan_report+0xd9/0x110 mm/kasan/report.c:602
 p9_conn_cancel+0x900/0x910 net/9p/trans_fd.c:205
 p9_conn_destroy net/9p/trans_fd.c:927 [inline]
 p9_fd_close+0x3c5/0x590 net/9p/trans_fd.c:951
 p9_client_destroy+0xce/0x480 net/9p/client.c:1077
 v9fs_session_close+0x49/0x2d0 fs/9p/v9fs.c:506
 v9fs_kill_super+0x4d/0xa0 fs/9p/vfs_super.c:196
 deactivate_locked_super+0xbe/0x1a0 fs/super.c:473
 deactivate_super+0xde/0x100 fs/super.c:506
 cleanup_mnt+0x222/0x450 fs/namespace.c:1373
 task_work_run+0x14e/0x250 kernel/task_work.c:239
 exit_task_work include/linux/task_work.h:43 [inline]
 do_exit+0xadd/0x2d70 kernel/exit.c:938
 do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
 get_signal+0x24ed/0x26c0 kernel/signal.c:3017
 arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa1f978bb7c
Code: Unable to access opcode bytes at 0x7fa1f978bb52.
RSP: 002b:00007ffd5c5893b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: fffffffffffffe00 RBX: 0000000000000003 RCX: 00007fa1f978bb7c
RDX: 0000000000000030 RSI: 00007ffd5c589470 RDI: 00000000000000f9
RBP: 00007ffd5c58941c R08: 0000000000000000 R09: 0079746972756365
R10: 00007ffd5c588d70 R11: 0000000000000246 R12: 0000000000000258
R13: 00000000000927c0 R14: 0000000000019c1d R15: 00007ffd5c589470
 </TASK>

Allocated by task 52:
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
 kasan_save_track+0x14/0x30 mm/kasan/common.c:68
 unpoison_slab_object mm/kasan/common.c:319 [inline]
 __kasan_slab_alloc+0x89/0x90 mm/kasan/common.c:345
 kasan_slab_alloc include/linux/kasan.h:250 [inline]
 slab_post_alloc_hook mm/slub.c:4104 [inline]
 slab_alloc_node mm/slub.c:4153 [inline]
 kmem_cache_alloc_noprof+0x226/0x3d0 mm/slub.c:4160
 p9_tag_alloc+0x9c/0x870 net/9p/client.c:281
 p9_client_prepare_req+0x19f/0x4d0 net/9p/client.c:644
 p9_client_rpc+0x1c3/0xc10 net/9p/client.c:691
 p9_client_write+0x31f/0x680 net/9p/client.c:1645
 v9fs_issue_write+0xe2/0x180 fs/9p/vfs_addr.c:59
 netfs_do_issue_write+0x92/0x110 fs/netfs/write_issue.c:233
 netfs_retry_write_stream fs/netfs/write_collect.c:184 [inline]
 netfs_retry_writes fs/netfs/write_collect.c:361 [inline]
 netfs_collect_write_results fs/netfs/write_collect.c:529 [inline]
 netfs_write_collection_worker+0x3e10/0x47d0 fs/netfs/write_collect.c:551
 process_one_work+0x9c5/0x1ba0 kernel/workqueue.c:3229
 process_scheduled_works kernel/workqueue.c:3310 [inline]
 worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391
 kthread+0x2c1/0x3a0 kernel/kthread.c:389
 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Freed by task 5192:
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
 kasan_save_track+0x14/0x30 mm/kasan/common.c:68
 kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:582
 poison_slab_object mm/kasan/common.c:247 [inline]
 __kasan_slab_free+0x51/0x70 mm/kasan/common.c:264
 kasan_slab_free include/linux/kasan.h:233 [inline]
 slab_free_hook mm/slub.c:2338 [inline]
 slab_free_after_rcu_debug+0x115/0x340 mm/slub.c:4648
 rcu_do_batch kernel/rcu/tree.c:2567 [inline]
 rcu_core+0x79d/0x14d0 kernel/rcu/tree.c:2823
 handle_softirqs+0x213/0x8f0 kernel/softirq.c:554
 __do_softirq kernel/softirq.c:588 [inline]
 invoke_softirq kernel/softirq.c:428 [inline]
 __irq_exit_rcu+0x109/0x170 kernel/softirq.c:655
 irq_exit_rcu+0x9/0x30 kernel/softirq.c:671
 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline]
 sysvec_apic_timer_interrupt+0xa4/0xc0 arch/x86/kernel/apic/apic.c:1049
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702

Last potentially related work creation:
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
 __kasan_record_aux_stack+0xba/0xd0 mm/kasan/generic.c:544
 slab_free_hook mm/slub.c:2299 [inline]
 slab_free mm/slub.c:4598 [inline]
 kmem_cache_free+0x305/0x4c0 mm/slub.c:4700
 p9_req_put net/9p/client.c:409 [inline]
 p9_req_put+0x1c6/0x250 net/9p/client.c:402
 p9_client_rpc+0x591/0xc10 net/9p/client.c:761
 p9_client_write+0x31f/0x680 net/9p/client.c:1645
 v9fs_issue_write+0xe2/0x180 fs/9p/vfs_addr.c:59
 netfs_do_issue_write+0x92/0x110 fs/netfs/write_issue.c:233
 netfs_retry_write_stream fs/netfs/write_collect.c:184 [inline]
 netfs_retry_writes fs/netfs/write_collect.c:361 [inline]
 netfs_collect_write_results fs/netfs/write_collect.c:529 [inline]
 netfs_write_collection_worker+0x3e10/0x47d0 fs/netfs/write_collect.c:551
 process_one_work+0x9c5/0x1ba0 kernel/workqueue.c:3229
 process_scheduled_works kernel/workqueue.c:3310 [inline]
 worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391
 kthread+0x2c1/0x3a0 kernel/kthread.c:389
 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

The buggy address belongs to the object at ffff88807b19e990
 which belongs to the cache p9_req_t of size 208
The buggy address is located 192 bytes inside of
 freed 208-byte region [ffff88807b19e990, ffff88807b19ea60)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7b19e
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000000 ffff88814c7a0140 ffffea0001dd70c0 0000000000000006
raw: 0000000000000000 00000000000f000f 00000001f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52c40(GFP_NOFS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 6752, tgid 6751 (syz.0.17), ts 91802565462, free_ts 91788017536
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x2d1/0x350 mm/page_alloc.c:1556
 prep_new_page mm/page_alloc.c:1564 [inline]
 get_page_from_freelist+0xfce/0x2f80 mm/page_alloc.c:3474
 __alloc_pages_noprof+0x221/0x2470 mm/page_alloc.c:4751
 alloc_pages_mpol_noprof+0x2c9/0x610 mm/mempolicy.c:2265
 alloc_slab_page mm/slub.c:2408 [inline]
 allocate_slab mm/slub.c:2574 [inline]
 new_slab+0x2c9/0x410 mm/slub.c:2627
 ___slab_alloc+0xda4/0x1880 mm/slub.c:3815
 __slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3905
 __slab_alloc_node mm/slub.c:3980 [inline]
 slab_alloc_node mm/slub.c:4141 [inline]
 kmem_cache_alloc_noprof+0xfa/0x3d0 mm/slub.c:4160
 p9_tag_alloc+0x9c/0x870 net/9p/client.c:281
 p9_client_prepare_req+0x19f/0x4d0 net/9p/client.c:644
 p9_client_rpc+0x1c3/0xc10 net/9p/client.c:691
 p9_client_version net/9p/client.c:930 [inline]
 p9_client_create+0xc65/0x1200 net/9p/client.c:1036
 v9fs_session_init+0x1f8/0x1a80 fs/9p/v9fs.c:410
 v9fs_mount+0xc6/0xa30 fs/9p/vfs_super.c:122
 legacy_get_tree+0x109/0x220 fs/fs_context.c:662
 vfs_get_tree+0x8f/0x380 fs/super.c:1814
page last free pid 6750 tgid 6748 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1127 [inline]
 free_unref_folios+0xa7b/0x14f0 mm/page_alloc.c:2704
 folios_put_refs+0x587/0x7b0 mm/swap.c:962
 free_pages_and_swap_cache+0x45f/0x510 mm/swap_state.c:335
 __tlb_batch_free_encoded_pages+0xf9/0x290 mm/mmu_gather.c:136
 tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
 tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
 tlb_flush_mmu mm/mmu_gather.c:373 [inline]
 tlb_finish_mmu+0x168/0x7b0 mm/mmu_gather.c:465
 exit_mmap+0x3df/0xb20 mm/mmap.c:1680
 __mmput+0x12a/0x4c0 kernel/fork.c:1353
 mmput+0x62/0x70 kernel/fork.c:1375
 exit_mm kernel/exit.c:570 [inline]
 do_exit+0x9bf/0x2d70 kernel/exit.c:925
 do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
 get_signal+0x24ed/0x26c0 kernel/signal.c:3017
 arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Memory state around the buggy address:
 ffff88807b19e900: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
 ffff88807b19e980: fc fc fa fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88807b19ea00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                                                 ^
 ffff88807b19ea80: fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88807b19eb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc
==================================================================


Tested on:

commit:         aaec5a95 pipe_read: don't wake up the writer if the pi..
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=15fafde4580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8d5a2956e94d7972
dashboard link: https://syzkaller.appspot.com/bug?extid=62262fdc0e01d99573fc
compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=110a3804580000


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-27 21:19                             ` syzbot
@ 2025-03-27 22:18                               ` asmadeus
  2025-03-28  4:01                                 ` K Prateek Nayak
  0 siblings, 1 reply; 27+ messages in thread
From: asmadeus @ 2025-03-27 22:18 UTC (permalink / raw)
  To: syzbot
  Cc: brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak,
	linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs,
	oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro

syzbot wrote on Thu, Mar 27, 2025 at 02:19:03PM -0700:
> BUG: KASAN: slab-use-after-free in p9_conn_cancel+0x900/0x910 net/9p/trans_fd.c:205
> Read of size 8 at addr ffff88807b19ea50 by task syz-executor/6595

Ugh, why...
Ah, if ->request() fails p9_client_rpc assumes the request was not
written (e.g. write error), so you can't return an error after the
list_add_tail call in p9_fd_request.

I think you can call p9_conn_cancel with the error and return 0 anyway,
and this paticular workaround will probably work, regardless of whether
it's the correct thing to do here (still haven't had time to look at the
patch here)

Sorry for this mess (even if most of it predates me...)
-- 
Dominique

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-27 22:18                               ` asmadeus
@ 2025-03-28  4:01                                 ` K Prateek Nayak
  2025-03-28  4:43                                   ` syzbot
  2025-03-28 13:06                                   ` Oleg Nesterov
  0 siblings, 2 replies; 27+ messages in thread
From: K Prateek Nayak @ 2025-03-28  4:01 UTC (permalink / raw)
  To: asmadeus, syzbot
  Cc: brauner, dhowells, ericvh, jack, jlayton, linux-fsdevel,
	linux-kernel, linux_oss, lucho, mjguzik, netfs, oleg,
	swapnil.sapkal, syzkaller-bugs, v9fs, viro

Hello Dominique,

On 3/28/2025 3:48 AM, asmadeus@codewreck.org wrote:
> syzbot wrote on Thu, Mar 27, 2025 at 02:19:03PM -0700:
>> BUG: KASAN: slab-use-after-free in p9_conn_cancel+0x900/0x910 net/9p/trans_fd.c:205
>> Read of size 8 at addr ffff88807b19ea50 by task syz-executor/6595

Woops! Should have tested with KASAN enabled. This time I did.

> 
> Ugh, why...
> Ah, if ->request() fails p9_client_rpc assumes the request was not
> written (e.g. write error), so you can't return an error after the
> list_add_tail call in p9_fd_request.
> 
> I think you can call p9_conn_cancel with the error and return 0 anyway,
> and this paticular workaround will probably work, regardless of whether
> it's the correct thing to do here (still haven't had time to look at the
> patch here)

That works too! Thank you for the pointer.

> 
> Sorry for this mess (even if most of it predates me...)

And I'm sorry for the noise from hacking on stuff that I don't fully
understand.

Fingers crossed (and thank you syzbot)

#syz test: upstream aaec5a95d59615523db03dd53c2052f0a87beea7

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 196060dc6138..dab7f02dc243 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -687,7 +687,13 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
  	else
  		n = p9_fd_poll(m->client, NULL, NULL);
  
-	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
+	/* Failed to send request */
+	if (!(n & EPOLLOUT)) {
+		p9_conn_cancel(m, -EIO);
+		return 0;
+	}
+
+	if (!test_and_set_bit(Wworksched, &m->wsched))
  		schedule_work(&m->wq);
  
  	return 0;


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-28  4:01                                 ` K Prateek Nayak
@ 2025-03-28  4:43                                   ` syzbot
  2025-03-28 13:06                                   ` Oleg Nesterov
  1 sibling, 0 replies; 27+ messages in thread
From: syzbot @ 2025-03-28  4:43 UTC (permalink / raw)
  To: asmadeus, brauner, dhowells, ericvh, jack, jlayton,
	kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho,
	mjguzik, netfs, oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com
Tested-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com

Tested on:

commit:         aaec5a95 pipe_read: don't wake up the writer if the pi..
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1285fa4c580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8d5a2956e94d7972
dashboard link: https://syzkaller.appspot.com/bug?extid=62262fdc0e01d99573fc
compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=121eede4580000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-28  4:01                                 ` K Prateek Nayak
  2025-03-28  4:43                                   ` syzbot
@ 2025-03-28 13:06                                   ` Oleg Nesterov
  2025-03-28 13:07                                     ` syzbot
  1 sibling, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2025-03-28 13:06 UTC (permalink / raw)
  To: K Prateek Nayak, Dominique Martinet
  Cc: syzbot, brauner, dhowells, ericvh, jack, jlayton, linux-fsdevel,
	linux-kernel, linux_oss, lucho, mjguzik, netfs, swapnil.sapkal,
	syzkaller-bugs, v9fs, viro

On 03/28, K Prateek Nayak wrote:
>
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -687,7 +687,13 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
>  	else
>  		n = p9_fd_poll(m->client, NULL, NULL);
> -	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
> +	/* Failed to send request */
> +	if (!(n & EPOLLOUT)) {
> +		p9_conn_cancel(m, -EIO);

Thanks a lot Prateek!

Can't really the changes in net/9p, but I am not sure. !(n & EPOLLOUT)
means that pipe is full, nothing wrong. We need to kick m->rq in this
case.

Dominique says:

	For me the problem isn't so much that this gets ERESTARTSYS but that it
	nevers gets to read the 7 bytes that are available?

Yes. Of course I do not pretend I fully understand the problem, but it
seems this is because p9_read_work() doesn't set Rworksched and doesn't
do schedule_work(&m->rq) if list_empty(&m->req_list).

However, if the pipe is full, before the commit aaec5a95d59615523db0
("pipe_read: don't wake up the writer if the pipe is still full"),
p9_read_work() -> p9_fd_read() -> kernel_read(ts->rd) triggered the
unnecessary wakeup. This wakeup calls p9_pollwake() shich kicks
p9_poll_workfn() -> p9_poll_mux(), and p9_poll_mux() will notice
EPOLLIN and schedule_work(&m->rq).

May be the patch below makes more sense?

Oleg.

#syz test: upstream

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 56e62978e502..aa9cd248a243 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -669,7 +669,6 @@ static void p9_poll_mux(struct p9_conn *m)
 
 static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 {
-	__poll_t n;
 	struct p9_trans_fd *ts = client->trans;
 	struct p9_conn *m = &ts->conn;
 
@@ -687,13 +686,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 	list_add_tail(&req->req_list, &m->unsent_req_list);
 	spin_unlock(&m->req_lock);
 
-	if (test_and_clear_bit(Wpending, &m->wsched))
-		n = EPOLLOUT;
-	else
-		n = p9_fd_poll(m->client, NULL, NULL);
-
-	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
-		schedule_work(&m->wq);
+	p9_poll_mux(m);
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-28 13:06                                   ` Oleg Nesterov
@ 2025-03-28 13:07                                     ` syzbot
  2025-03-28 13:25                                       ` Oleg Nesterov
  0 siblings, 1 reply; 27+ messages in thread
From: syzbot @ 2025-03-28 13:07 UTC (permalink / raw)
  To: oleg
  Cc: asmadeus, brauner, dhowells, ericvh, jack, jlayton,
	kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho,
	mjguzik, netfs, oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro

> On 03/28, K Prateek Nayak wrote:
>>
>> --- a/net/9p/trans_fd.c
>> +++ b/net/9p/trans_fd.c
>> @@ -687,7 +687,13 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
>>  	else
>>  		n = p9_fd_poll(m->client, NULL, NULL);
>> -	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
>> +	/* Failed to send request */
>> +	if (!(n & EPOLLOUT)) {
>> +		p9_conn_cancel(m, -EIO);
>
> Thanks a lot Prateek!
>
> Can't really the changes in net/9p, but I am not sure. !(n & EPOLLOUT)
> means that pipe is full, nothing wrong. We need to kick m->rq in this
> case.
>
> Dominique says:
>
> 	For me the problem isn't so much that this gets ERESTARTSYS but that it
> 	nevers gets to read the 7 bytes that are available?
>
> Yes. Of course I do not pretend I fully understand the problem, but it
> seems this is because p9_read_work() doesn't set Rworksched and doesn't
> do schedule_work(&m->rq) if list_empty(&m->req_list).
>
> However, if the pipe is full, before the commit aaec5a95d59615523db0
> ("pipe_read: don't wake up the writer if the pipe is still full"),
> p9_read_work() -> p9_fd_read() -> kernel_read(ts->rd) triggered the
> unnecessary wakeup. This wakeup calls p9_pollwake() shich kicks
> p9_poll_workfn() -> p9_poll_mux(), and p9_poll_mux() will notice
> EPOLLIN and schedule_work(&m->rq).
>
> May be the patch below makes more sense?
>
> Oleg.
>
> #syz test: upstream

want either no args or 2 args (repo, branch), got 5

>
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 56e62978e502..aa9cd248a243 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -669,7 +669,6 @@ static void p9_poll_mux(struct p9_conn *m)
>  
>  static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
>  {
> -	__poll_t n;
>  	struct p9_trans_fd *ts = client->trans;
>  	struct p9_conn *m = &ts->conn;
>  
> @@ -687,13 +686,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
>  	list_add_tail(&req->req_list, &m->unsent_req_list);
>  	spin_unlock(&m->req_lock);
>  
> -	if (test_and_clear_bit(Wpending, &m->wsched))
> -		n = EPOLLOUT;
> -	else
> -		n = p9_fd_poll(m->client, NULL, NULL);
> -
> -	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
> -		schedule_work(&m->wq);
> +	p9_poll_mux(m);
>  
>  	return 0;
>  }
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-28 13:07                                     ` syzbot
@ 2025-03-28 13:25                                       ` Oleg Nesterov
  2025-03-28 13:49                                         ` syzbot
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2025-03-28 13:25 UTC (permalink / raw)
  To: syzbot
  Cc: asmadeus, brauner, dhowells, ericvh, jack, jlayton,
	kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho,
	mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro

On 03/28, syzbot wrote:
>
> > #syz test: upstream
>
> want either no args or 2 args (repo, branch), got 5

#syz test

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 56e62978e502..aa9cd248a243 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -669,7 +669,6 @@ static void p9_poll_mux(struct p9_conn *m)
 
 static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 {
-	__poll_t n;
 	struct p9_trans_fd *ts = client->trans;
 	struct p9_conn *m = &ts->conn;
 
@@ -687,13 +686,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 	list_add_tail(&req->req_list, &m->unsent_req_list);
 	spin_unlock(&m->req_lock);
 
-	if (test_and_clear_bit(Wpending, &m->wsched))
-		n = EPOLLOUT;
-	else
-		n = p9_fd_poll(m->client, NULL, NULL);
-
-	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
-		schedule_work(&m->wq);
+	p9_poll_mux(m);
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-28 13:25                                       ` Oleg Nesterov
@ 2025-03-28 13:49                                         ` syzbot
  2025-03-28 14:49                                           ` Oleg Nesterov
  0 siblings, 1 reply; 27+ messages in thread
From: syzbot @ 2025-03-28 13:49 UTC (permalink / raw)
  To: asmadeus, brauner, dhowells, ericvh, jack, jlayton,
	kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho,
	mjguzik, netfs, oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
unregister_netdevice: waiting for DEV to become free

unregister_netdevice: waiting for batadv0 to become free. Usage count = 3

Tested on:

commit:         acb4f337 Merge tag 'm68knommu-for-v6.15' of git://git...
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16bab804580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=95c3bbe7ce8436a7
dashboard link: https://syzkaller.appspot.com/bug?extid=62262fdc0e01d99573fc
compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=146b9bb0580000

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-28 13:49                                         ` syzbot
@ 2025-03-28 14:49                                           ` Oleg Nesterov
  2025-03-28 15:22                                             ` syzbot
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2025-03-28 14:49 UTC (permalink / raw)
  To: syzbot
  Cc: asmadeus, brauner, dhowells, ericvh, jack, jlayton,
	kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho,
	mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro

On 03/28, syzbot wrote:
>
> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> unregister_netdevice: waiting for DEV to become free
>
> unregister_netdevice: waiting for batadv0 to become free. Usage count = 3

Looks "obviously unrelated", but I'm not sure about anything anymore.

In any case I leave this to 9p maintainers.

Okay, just out of curiosity:

#syz test: upstream aaec5a95d59615523db03dd53c2052f0a87beea7


diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 56e62978e502..aa9cd248a243 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -669,7 +669,6 @@ static void p9_poll_mux(struct p9_conn *m)
 
 static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 {
-	__poll_t n;
 	struct p9_trans_fd *ts = client->trans;
 	struct p9_conn *m = &ts->conn;
 
@@ -687,13 +686,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 	list_add_tail(&req->req_list, &m->unsent_req_list);
 	spin_unlock(&m->req_lock);
 
-	if (test_and_clear_bit(Wpending, &m->wsched))
-		n = EPOLLOUT;
-	else
-		n = p9_fd_poll(m->client, NULL, NULL);
-
-	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
-		schedule_work(&m->wq);
+	p9_poll_mux(m);
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-28 14:49                                           ` Oleg Nesterov
@ 2025-03-28 15:22                                             ` syzbot
  2025-03-28 17:00                                               ` Oleg Nesterov
  0 siblings, 1 reply; 27+ messages in thread
From: syzbot @ 2025-03-28 15:22 UTC (permalink / raw)
  To: asmadeus, brauner, dhowells, ericvh, jack, jlayton,
	kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho,
	mjguzik, netfs, oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com
Tested-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com

Tested on:

commit:         aaec5a95 pipe_read: don't wake up the writer if the pi..
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1598064c580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8d5a2956e94d7972
dashboard link: https://syzkaller.appspot.com/bug?extid=62262fdc0e01d99573fc
compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=16682a4c580000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-28 15:22                                             ` syzbot
@ 2025-03-28 17:00                                               ` Oleg Nesterov
  2025-03-28 17:56                                                 ` K Prateek Nayak
  2025-03-29  0:00                                                 ` asmadeus
  0 siblings, 2 replies; 27+ messages in thread
From: Oleg Nesterov @ 2025-03-28 17:00 UTC (permalink / raw)
  To: syzbot
  Cc: asmadeus, brauner, dhowells, ericvh, jack, jlayton,
	kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho,
	mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro

Dear syzbot,

On 03/28, syzbot wrote:
>
> syzbot has tested the proposed patch and the reproducer did not trigger any issue:
>
> Reported-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com
> Tested-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com

Thanks. so the previous

	syzbot has tested the proposed patch but the reproducer is still triggering an issue:
	unregister_netdevice: waiting for DEV to become free

	unregister_netdevice: waiting for batadv0 to become free. Usage count = 3

report suggests that upstream/master has another/unrelated issue(s).

As for the patches from me or Prateek (thanks again!), I think that
the maintainers should take a look.

But at this point I am mostly confident that the bisected commit aaec5a95d5961
("pipe_read: don't wake up the writer if the pipe is still full") is innocent,
it just reveals yet another problem.

I guess (I hope ;) Prateek agrees.

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-28 17:00                                               ` Oleg Nesterov
@ 2025-03-28 17:56                                                 ` K Prateek Nayak
  2025-03-28 18:20                                                   ` Oleg Nesterov
  2025-03-29  0:00                                                 ` asmadeus
  1 sibling, 1 reply; 27+ messages in thread
From: K Prateek Nayak @ 2025-03-28 17:56 UTC (permalink / raw)
  To: Oleg Nesterov, syzbot
  Cc: asmadeus, brauner, dhowells, ericvh, jack, jlayton, linux-fsdevel,
	linux-kernel, linux_oss, lucho, mjguzik, netfs, swapnil.sapkal,
	syzkaller-bugs, v9fs, viro

Hello Oleg,

On 3/28/2025 10:30 PM, Oleg Nesterov wrote:
> Dear syzbot,
> 
> On 03/28, syzbot wrote:
>>
>> syzbot has tested the proposed patch and the reproducer did not trigger any issue:
>>
>> Reported-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com
>> Tested-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com
>

Yours is the right approach. I was searching for p9_poll_mux()
which would wake the read collector to free pipe buffers but I
didn't dig enough and I was too lazy to open code it (guilty!)
If this gets picked, feel free to add:

Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
  
> Thanks. so the previous
> 
> 	syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> 	unregister_netdevice: waiting for DEV to become free
> 
> 	unregister_netdevice: waiting for batadv0 to become free. Usage count = 3
> 
> report suggests that upstream/master has another/unrelated issue(s).

I could not reproduce this bit on mainline at commit acb4f33713b9
("Merge tag 'm68knommu-for-v6.15' of
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu") so I too
second that it is an unrelated issue.

> 
> As for the patches from me or Prateek (thanks again!), I think that
> the maintainers should take a look.
> 
> But at this point I am mostly confident that the bisected commit aaec5a95d5961
> ("pipe_read: don't wake up the writer if the pipe is still full") is innocent,
> it just reveals yet another problem.
> 
> I guess (I hope ;) Prateek agrees.

I agree aaec5a95d5961 is not guilty! (unlike me) :)

> 
> Oleg.
> 

-- 
Thanks and Regards,
Prateek


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-28 17:56                                                 ` K Prateek Nayak
@ 2025-03-28 18:20                                                   ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2025-03-28 18:20 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: syzbot, asmadeus, brauner, dhowells, ericvh, jack, jlayton,
	linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs,
	swapnil.sapkal, syzkaller-bugs, v9fs, viro

Hi Prateek,

On 03/28, K Prateek Nayak wrote:
>
> Yours is the right approach.

OK, thank you, but lets wait for 9p maintainers.

but...

> If this gets picked, feel free to add:
>
> Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

No!

My version is just a slightly updated version of your initial patch.
It was you who (unless we both are wrong) actually pinpointed the
problem.

So. If this is acked by maintainers, please-please send the updated patch
with the changelog, and feel free to add my Reviewed-by or Acked-by.

Oleg.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-28 17:00                                               ` Oleg Nesterov
  2025-03-28 17:56                                                 ` K Prateek Nayak
@ 2025-03-29  0:00                                                 ` asmadeus
  2025-03-29 14:21                                                   ` Oleg Nesterov
  1 sibling, 1 reply; 27+ messages in thread
From: asmadeus @ 2025-03-29  0:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: syzbot, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak,
	linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs,
	swapnil.sapkal, syzkaller-bugs, v9fs, viro

Oleg Nesterov wrote on Fri, Mar 28, 2025 at 06:00:12PM +0100:
> As for the patches from me or Prateek (thanks again!), I think that
> the maintainers should take a look.
> 
> But at this point I am mostly confident that the bisected commit aaec5a95d5961
> ("pipe_read: don't wake up the writer if the pipe is still full") is innocent,
> it just reveals yet another problem.
> 
> I guess (I hope ;) Prateek agrees.

Right, so your patch sounds better than Prateek's initial blowing
up workaround, but it's a bit weird anyway so let me recap:
- that syz repro has this unnatural pattern where the replies are all
written before the requests are sent
- 9p_read_work() (read worker) has an optimization that if there is no
in fly request then there obviously must be nothing to read (9p is 100%
client initiated, there's no way the server should send something
first), so at this point the reader task is idle
(OTOH, we're checking for rx right at p9_conn_create() before anything
is sent, so it's not like we're consistent on this...)
- p9_fd_request() (sending a new request) has another optimization that
only checks for tx: at this point if another request was already in
flight then the rx task should have a poll going on for rx, and if there
were no in flight request yet then there should be no point in checking
for rx, so p9_fd_request() only kick in the tx worker if there is room
to send
- at this point I don't really get the logic that'll wake the rx thread
up either... p9_pollwake() will trigger p9_poll_workfn() (through
p9_poll_work) which will call p9_poll_mux() and get the reader kicking
again, but I don't know how the waitqueue mechanism is supposed to work
(see p9_pollwait())
I'd have expected the tx task to somehow nudge this on, but it
doesn't?...
- due to the new optimization (aaec5a95d59615 "pipe_read: don't wake up
the writer if the pipe is still full"), that 'if there is room to send'
check started failing and tx thread doesn't start? Because syzbot passed
us a pipe that was already full, or they messed with it after mount?
I'm not clear about this point, but I think it's the key here -- the 9p
"mount by fd" is great for local pseudo filesystems and things like that
but it's been abused by syzbot too much, and I don't want to spend too
much time making sure that any unholy things they do with these fd
works. If possible I'd like to mark that fd unusable by userspace but
I'm honestly doubtful it's possible (if e.g. it was dup'd or something
before mount for example...)

So, yeah, well, okay I don't mind the patch even if it doesn't make
sense with a regular server.
We don't really care about trans fd performance either so it's fine if
it's a bit slower, and the error Prateek added might happen in a real
case if tx queue is full of real requests so I think your approach is
good enough.

If we understand what's happening here I think it's as good as anything
else, but I'd just like it clear in the commit message what syzbot is
doing and why the regression happened

Thank you both for the thorough analysis and follow ups!
-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-29  0:00                                                 ` asmadeus
@ 2025-03-29 14:21                                                   ` Oleg Nesterov
  2025-03-29 23:27                                                     ` asmadeus
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2025-03-29 14:21 UTC (permalink / raw)
  To: asmadeus
  Cc: syzbot, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak,
	linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs,
	swapnil.sapkal, syzkaller-bugs, v9fs, viro

First of all, let me remind that I know nothing about 9p or netfs ;)
And I am not sure that my patch is the right solution.

I am not even sure we need the fix, according to syzbot testing the
problem goes away with the fixes from David
https://web.git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes
but I didn't even try to read them, this is not my area.

Now, I'll try to answer some of your questions, but I can be easily
wrong.

On 03/29, asmadeus@codewreck.org wrote:
>
> Right, so your patch sounds better than Prateek's initial blowing
> up workaround, but it's a bit weird anyway so let me recap:
> - that syz repro has this unnatural pattern where the replies are all
> written before the requests are sent

Yes,

> - 9p_read_work() (read worker) has an optimization that if there is no
> in fly request then there obviously must be nothing to read (9p is 100%
> client initiated, there's no way the server should send something
> first), so at this point the reader task is idle

Yes. But note that it does kernel_read() -> pipe_read() before it becomes
idle. See below.

> - p9_fd_request() (sending a new request) has another optimization that
> only checks for tx: at this point if another request was already in
> flight then the rx task should have a poll going on for rx, and if there
> were no in flight request yet then there should be no point in checking
> for rx, so p9_fd_request() only kick in the tx worker if there is room
> to send

Can't comment, but

> - at this point I don't really get the logic that'll wake the rx thread
> up either... p9_pollwake() will trigger p9_poll_workfn()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Yes, but where this p9_pollwake() can come from? see below.

> - due to the new optimization (aaec5a95d59615 "pipe_read: don't wake up
> the writer if the pipe is still full"), that 'if there is room to send'
> check started failing and tx thread doesn't start?

Again, I can be easily wrong, but no.

With or without the optimization above, it doesn't make sense to start
the tx thread when the pipe is full, p9_fd_poll() can't report EPOLLOUT.

Lets recall that the idle read worker did kernel_read() -> pipe_read().
Before this optimization, pipe_read() did the unnecessary

	wake_up_interruptible_sync_poll(&pipe->wr_wait);

when the pipe was full before the reading _and_ is still full after the
reading.

This wakeup calls p9_pollwake() which kicks p9_poll_workfn().

p9_poll_workfn() calls p9_poll_mux().

p9_poll_mux() does n = p9_fd_poll().

"n & EPOLLOUT" is false, exactly because this wakeup was unnecessary,
so p9_poll_mux() won't do schedule_work(&m->wq), this is fine,

But, "n & EPOLLIN" is true, so p9_poll_mux() does schedule_work(&m->rq)
and wakes the rx thread.

p9_read_work() is called again. It reads more data and (I guess) notices
some problem and does p9_conn_cancel(EIO).

This no longer happens after the optimization. So in some sense the
p9_fd_request() -> p9_poll_mux() hack (which wakes the rx thread in this
case) restores the old behaviour.

But again, again, quite possibly I completely misread this (nontrivial)
code.

Oleg.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-29 14:21                                                   ` Oleg Nesterov
@ 2025-03-29 23:27                                                     ` asmadeus
  2025-03-30 10:21                                                       ` Oleg Nesterov
  0 siblings, 1 reply; 27+ messages in thread
From: asmadeus @ 2025-03-29 23:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: syzbot, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak,
	linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs,
	swapnil.sapkal, syzkaller-bugs, v9fs, viro

Oleg Nesterov wrote on Sat, Mar 29, 2025 at 03:21:39PM +0100:
> First of all, let me remind that I know nothing about 9p or netfs ;)
> And I am not sure that my patch is the right solution.
> 
> I am not even sure we need the fix, according to syzbot testing the
> problem goes away with the fixes from David
> https://web.git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes
> but I didn't even try to read them, this is not my area.

(gah, I hate emails when one gets added to thread later.. I've just now
opened the thread on lore and seen David's test :/)

> > - due to the new optimization (aaec5a95d59615 "pipe_read: don't wake up
> > the writer if the pipe is still full"), that 'if there is room to send'
> > check started failing and tx thread doesn't start?
> 
> Again, I can be easily wrong, but no.
> 
> With or without the optimization above, it doesn't make sense to start
> the tx thread when the pipe is full, p9_fd_poll() can't report EPOLLOUT.
> 
> Lets recall that the idle read worker did kernel_read() -> pipe_read().
> Before this optimization, pipe_read() did the unnecessary
> 
> 	wake_up_interruptible_sync_poll(&pipe->wr_wait);
> 
> when the pipe was full before the reading _and_ is still full after the
> reading.
> 
> This wakeup calls p9_pollwake() which kicks p9_poll_workfn().

Aah, that's the bit I didn't get, thank you!

> This no longer happens after the optimization. So in some sense the
> p9_fd_request() -> p9_poll_mux() hack (which wakes the rx thread in this
> case) restores the old behaviour.
> 
> But again, again, quite possibly I completely misread this (nontrivial)
> code.

Yes, this totally makes sense; I agree with your analysis.

So basically 9p was optimizing for this impossible (on a normal server)
behaviour in the 9p side (it doesn't make any sense for the tx pipe to
be full with 0 in flight request, and tx pipe never goes unfull, and
reply comes (was there) before the actual write happened!!), but this
old behaviour made it work anyway...
So part of me wants to just leave it there and if anything try to make
this kind of usage impossible by adding more checks to mount -o
trans=fd, but I don't think it's possible to lock down all kind of weird
behaviour root users (=syzbot) can engage in...
OTOH syzbot does find some useful bugs so I guess it might be worth
fixing, I don't know.
If David's patch also happens to fix it I guess we can also just wait
for that?

Thanks,
-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
  2025-03-29 23:27                                                     ` asmadeus
@ 2025-03-30 10:21                                                       ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2025-03-30 10:21 UTC (permalink / raw)
  To: asmadeus
  Cc: syzbot, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak,
	linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs,
	swapnil.sapkal, syzkaller-bugs, v9fs, viro

On 03/30, asmadeus@codewreck.org wrote:
>
> If David's patch also happens to fix it I guess we can also just wait
> for that?

Sure, whatever you prefer.

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

[parent not found: <67dedd2f.050a0220.31a16b.003f.GAE@google.com>]

* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
       [not found] <67dedd2f.050a0220.31a16b.003f.GAE@google.com>
@ 2025-08-03 12:09 ` syzbot
  0 siblings, 0 replies; 27+ messages in thread
From: syzbot @ 2025-08-03 12:09 UTC (permalink / raw)
  To: asmadeus, brauner, dhowells, eadavis, ericvh, hdanton, jack,
	jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss,
	lucho, mjguzik, netfs, oleg, pc, sfrench, swapnil.sapkal,
	syzkaller-bugs, v9fs, viro

syzbot suspects this issue was fixed by commit:

commit da8cf4bd458722d090a788c6e581eeb72695c62f
Author: David Howells <dhowells@redhat.com>
Date:   Tue Jul 1 16:38:36 2025 +0000

    netfs: Fix hang due to missing case in final DIO read result collection

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=14cfbcf0580000
start commit:   66701750d556 Merge tag 'io_uring-6.16-20250630' of git://g..
git tree:       upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=e29b8115bf337f53
dashboard link: https://syzkaller.appspot.com/bug?extid=62262fdc0e01d99573fc
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1400348c580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17b2048c580000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: netfs: Fix hang due to missing case in final DIO read result collection

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-08-03 12:09 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250323184848.GB14883@redhat.com>
     [not found] ` <67e05e30.050a0220.21942d.0003.GAE@google.com>
     [not found]   ` <20250323194701.GC14883@redhat.com>
     [not found]     ` <CAGudoHHmvU54MU8dsZy422A4+ZzWTVs7LFevP7NpKzwZ1YOqgg@mail.gmail.com>
     [not found]       ` <20250323210251.GD14883@redhat.com>
     [not found]         ` <af0134a7-6f2a-46e1-85aa-c97477bd6ed8@amd.com>
     [not found]           ` <CAGudoHH9w8VO8069iKf_TsAjnfuRSrgiJ2e2D9-NGEDgXW+Lcw@mail.gmail.com>
     [not found]             ` <7e377feb-a78b-4055-88cc-2c20f924bf82@amd.com>
     [not found]               ` <f7585a27-aaef-4334-a1de-5e081f10c901@amd.com>
     [not found]                 ` <ff294b3c-cd24-4aa6-9d03-718ff7087158@amd.com>
2025-03-25 12:15                   ` [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter Oleg Nesterov
2025-03-25 12:36                     ` Dominique Martinet
2025-03-25 13:04                     ` Oleg Nesterov
2025-03-25 14:49                       ` K Prateek Nayak
2025-03-25 14:58                         ` Dominique Martinet
2025-03-26 12:19                           ` Oleg Nesterov
2025-03-26 12:44                             ` Oleg Nesterov
2025-03-26 13:05                               ` Oleg Nesterov
2025-03-27 17:46                           ` K Prateek Nayak
2025-03-27 21:19                             ` syzbot
2025-03-27 22:18                               ` asmadeus
2025-03-28  4:01                                 ` K Prateek Nayak
2025-03-28  4:43                                   ` syzbot
2025-03-28 13:06                                   ` Oleg Nesterov
2025-03-28 13:07                                     ` syzbot
2025-03-28 13:25                                       ` Oleg Nesterov
2025-03-28 13:49                                         ` syzbot
2025-03-28 14:49                                           ` Oleg Nesterov
2025-03-28 15:22                                             ` syzbot
2025-03-28 17:00                                               ` Oleg Nesterov
2025-03-28 17:56                                                 ` K Prateek Nayak
2025-03-28 18:20                                                   ` Oleg Nesterov
2025-03-29  0:00                                                 ` asmadeus
2025-03-29 14:21                                                   ` Oleg Nesterov
2025-03-29 23:27                                                     ` asmadeus
2025-03-30 10:21                                                       ` Oleg Nesterov
     [not found] <67dedd2f.050a0220.31a16b.003f.GAE@google.com>
2025-08-03 12:09 ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox