* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter [not found] ` <ff294b3c-cd24-4aa6-9d03-718ff7087158@amd.com> @ 2025-03-25 12:15 ` Oleg Nesterov 2025-03-25 12:36 ` Dominique Martinet 2025-03-25 13:04 ` Oleg Nesterov 0 siblings, 2 replies; 27+ messages in thread From: Oleg Nesterov @ 2025-03-25 12:15 UTC (permalink / raw) To: K Prateek Nayak, Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck Cc: Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton, linux-fsdevel, linux-kernel, netfs, swapnil.sapkal, syzkaller-bugs, viro, v9fs Prateek, thanks again for your hard work! Yes, I think we need the help from 9p folks. I know nothing about it, but so far this still looks as a 9p problem to me... All I can say right now is that the "sigpending" logic in p9_client_rpc() looks wrong. If nothing else: - clear_thread_flag(TIF_SIGPENDING) is not enough, it won't make signal_pending() false if TIF_NOTIFY_SIGNAL is set. - otoh, if signal_pending() was true because of pending SIGKILL, then after clear_thread_flag(TIF_SIGPENDING) wait_event_killable() will act as uninterruptible wait_event(). Oleg. On 03/25, K Prateek Nayak wrote: > > On 3/24/2025 8:22 PM, K Prateek Nayak wrote: > >Hello folks, > > > >Some updates. > > > >On 3/24/2025 6:49 PM, K Prateek Nayak wrote: > >>> > >>>Per syzbot this attempt did not work out either. > >>> > >>>I think the blind stabs taken by everyone here are enough. > >>> > >>>The report does not provide the crucial bit: what are the other > >>>threads doing. Presumably someone else is stuck somewhere, possibly > >>>not even in pipe code and that stuck thread was supposed to wake up > >>>the one which trips over hung task detector. Figuring out what that > >>>thread is imo the next step. > >>> > >>>I failed to find a relevant command in > >>>https://github.com/google/syzkaller/blob/master/docs/syzbot.md > >>> > >>>So if you guys know someone on syzkaller side, maybe you can ask them > >>>to tweak the report *or* maybe syzbot can test a "fix" which makes > >>>hung task detector also report all backtraces? I don't know if that > >>>can work, the output may be long enough that it will get trimmed by > >>>something. > >>> > >>>I don't have to time work on this for now, just throwing ideas. > >> > >>I got the reproducer running locally. Tracing stuff currently to see > >>what is tripping. Will report back once I find something interesting. > >>Might take a while since the 9p bits are so far spread out. > >> > > > >So far, with tracing, this is where I'm: > > > >o Mainline + Oleg's optimization reverted: > > > > ... > > kworker/43:1-1723 [043] ..... 115.309065: p9_read_work: Data read wait 55 > > kworker/43:1-1723 [043] ..... 115.309066: p9_read_work: Data read 55 > > kworker/43:1-1723 [043] ..... 115.309067: p9_read_work: Data read wait 7 > > kworker/43:1-1723 [043] ..... 115.309068: p9_read_work: Data read 7 > > repro-4138 [043] ..... 115.309084: netfs_wake_write_collector: Wake collector > > repro-4138 [043] ..... 115.309085: netfs_wake_write_collector: Queuing collector work > > repro-4138 [043] ..... 115.309088: netfs_unbuffered_write: netfs_unbuffered_write > > repro-4138 [043] ..... 115.309088: netfs_end_issue_write: netfs_end_issue_write > > repro-4138 [043] ..... 115.309089: netfs_end_issue_write: Write collector need poke 0 > > repro-4138 [043] ..... 115.309091: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS! > > kworker/u1030:1-1951 [168] ..... 115.309096: netfs_wake_write_collector: Wake collector > > kworker/u1030:1-1951 [168] ..... 115.309097: netfs_wake_write_collector: Queuing collector work > > kworker/u1030:1-1951 [168] ..... 115.309102: netfs_write_collection_worker: Write collect clearing and waking up! > > ... (syzbot reproducer continues) > > > >o Mainline: > > > > kworker/185:1-1767 [185] ..... 109.485961: p9_read_work: Data read wait 7 > > kworker/185:1-1767 [185] ..... 109.485962: p9_read_work: Data read 7 > > kworker/185:1-1767 [185] ..... 109.485962: p9_read_work: Data read wait 55 > > kworker/185:1-1767 [185] ..... 109.485963: p9_read_work: Data read 55 > > repro-4038 [185] ..... 114.225717: netfs_wake_write_collector: Wake collector > > repro-4038 [185] ..... 114.225723: netfs_wake_write_collector: Queuing collector work > > repro-4038 [185] ..... 114.225727: netfs_unbuffered_write: netfs_unbuffered_write > > repro-4038 [185] ..... 114.225727: netfs_end_issue_write: netfs_end_issue_write > > repro-4038 [185] ..... 114.225728: netfs_end_issue_write: Write collector need poke 0 > > repro-4038 [185] ..... 114.225728: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS! > > ... (syzbot reproducer hangs) > > > >There is a third "kworker/u1030" component that never gets woken up for > >reasons currently unknown to me with Oleg's optimization. I'll keep > >digging. > > More data ... > > I chased this down to p9_client_rpc() net/9p/client.c specifically: > > err = c->trans_mod->request(c, req); > if (err < 0) { > /* write won't happen */ > p9_req_put(c, req); > if (err != -ERESTARTSYS && err != -EFAULT) > c->status = Disconnected; > goto recalc_sigpending; > } > > c->trans_mod->request() calls p9_fd_request() in net/9p/trans_fd.c > which basically does a p9_fd_poll(). > > Previously, the above would fail with err as -EIO which would > cause the client to "Disconnect" and the retry logic would make > progress. Now however, the err returned is -ERESTARTSYS which > will not cause a disconnect and the retry logic will hang > somewhere in p9_client_rpc() later. > > I'll chase it a little more but if 9p folks can chime in it would > be great since I'm out of my depths here. > > P.S. There are more interactions at play and I'm trying to still > piece them together. > > Relevant traces: > > o Mainline + Oleg's optimization reverted: > > repro-4161 [239] ..... 107.785644: p9_client_write: p9_client_rpc done > repro-4161 [239] ..... 107.785644: p9_client_write: p9_pdup > repro-4161 [239] ..... 107.785644: p9_client_write: iter revert > repro-4161 [239] ..... 107.785644: p9_client_write: p9_client_rpc > repro-4161 [239] ..... 107.785653: p9_fd_request: p9_fd_request > repro-4161 [239] ...1. 107.785653: p9_fd_request: p9_fd_request error > repro-4161 [239] ..... 107.785653: p9_client_rpc: Client disconnected (no write) <------------- "write won't happen" case > repro-4161 [239] ..... 107.785655: p9_client_write: p9_client_rpc done > repro-4161 [239] ..... 107.785655: p9_client_write: p9_client_rpc error (-5) <------------- -EIO > repro-4161 [239] ..... 107.785656: v9fs_issue_write: Issue write done 2 err(-5) > repro-4161 [239] ..... 107.785657: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated > repro-4161 [239] ..... 107.785657: netfs_wake_write_collector: Wake collector > repro-4161 [239] ..... 107.785658: netfs_wake_write_collector: Queuing collector work > repro-4161 [239] ..... 107.785660: v9fs_issue_write: Issue write subrequest terminated 2 > repro-4161 [239] ..... 107.785661: netfs_unbuffered_write: netfs_unbuffered_write > repro-4161 [239] ..... 107.785661: netfs_end_issue_write: netfs_end_issue_write > repro-4161 [239] ..... 107.785662: netfs_end_issue_write: Write collector need poke 0 > repro-4161 [239] ..... 107.785662: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS! > kworker/u1038:0-1583 [105] ..... 107.785667: netfs_retry_writes: netfs_reissue_write 1 > kworker/u1038:0-1583 [105] ..... 107.785668: v9fs_issue_write: v9fs_issue_write > kworker/u1038:0-1583 [105] ..... 107.785669: p9_client_write: p9_client_rpc > kworker/u1038:0-1583 [105] ..... 107.785669: p9_client_prepare_req: p9_client_prepare_req eio 1 > kworker/u1038:0-1583 [105] ..... 107.785670: p9_client_rpc: p9_client_rpc early err return > kworker/u1038:0-1583 [105] ..... 107.785670: p9_client_write: p9_client_rpc done > kworker/u1038:0-1583 [105] ..... 107.785671: p9_client_write: p9_client_rpc error (-5) > kworker/u1038:0-1583 [105] ..... 107.785672: v9fs_issue_write: Issue write done 0 err(-5) > kworker/u1038:0-1583 [105] ..... 107.785672: netfs_write_subrequest_terminated: Collector woken up > kworker/u1038:0-1583 [105] ..... 107.785672: netfs_wake_write_collector: Wake collector > kworker/u1038:0-1583 [105] ..... 107.785672: netfs_wake_write_collector: Queuing collector work > kworker/u1038:0-1583 [105] ..... 107.785677: v9fs_issue_write: Issue write subrequest terminated 0 > kworker/u1038:0-1583 [105] ..... 107.785684: netfs_write_collection_worker: Write collect clearing and waking up! > repro-4161 [239] ..... 107.785883: p9_client_prepare_req: p9_client_prepare_req eio 1 > ... (continues) > > o Mainline: > > repro-4161 [087] ..... 123.474660: p9_client_write: p9_client_rpc done > repro-4161 [087] ..... 123.474661: p9_client_write: p9_pdup > repro-4161 [087] ..... 123.474661: p9_client_write: iter revert > repro-4161 [087] ..... 123.474661: p9_client_write: p9_client_rpc > repro-4161 [087] ..... 123.474672: p9_fd_request: p9_fd_request > repro-4161 [087] ..... 123.474673: p9_fd_request: p9_fd_request EPOLL > repro-4161 [087] ..... 123.474673: p9_fd_poll: p9_fd_poll rd poll > repro-4161 [087] ..... 123.474674: p9_fd_poll: p9_fd_request wr poll > repro-4161 [087] ..... 128.233025: p9_client_write: p9_client_rpc done > repro-4161 [087] ..... 128.233033: p9_client_write: p9_client_rpc error (-512) <------------- -ERESTARTSYS > repro-4161 [087] ..... 128.233034: v9fs_issue_write: Issue write done 2 err(-512) > repro-4161 [087] ..... 128.233035: netfs_write_subrequest_terminated: Collector woken > repro-4161 [087] ..... 128.233036: netfs_wake_write_collector: Wake collector > repro-4161 [087] ..... 128.233036: netfs_wake_write_collector: Queuing collector work > repro-4161 [087] ..... 128.233040: v9fs_issue_write: Issue write subrequest terminated 2 > repro-4161 [087] ..... 128.233040: netfs_unbuffered_write: netfs_unbuffered_write > repro-4161 [087] ..... 128.233040: netfs_end_issue_write: netfs_end_issue_write > repro-4161 [087] ..... 128.233041: netfs_end_issue_write: Write collector need poke 0 > repro-4161 [087] ..... 128.233041: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS! > kworker/u1035:0-1580 [080] ..... 128.233077: netfs_retry_writes: netfs_reissue_write 1 > kworker/u1035:0-1580 [080] ..... 128.233078: v9fs_issue_write: v9fs_issue_write > kworker/u1035:0-1580 [080] ..... 128.233079: p9_client_write: p9_client_rpc > kworker/u1035:0-1580 [080] ..... 128.233099: p9_fd_request: p9_fd_request > kworker/u1035:0-1580 [080] ..... 128.233101: p9_fd_request: p9_fd_request EPOLL > kworker/u1035:0-1580 [080] ..... 128.233101: p9_fd_poll: p9_fd_poll rd poll > kworker/u1035:0-1580 [080] ..... 128.233101: p9_fd_poll: p9_fd_request wr poll > (Hangs) > > -- > Thanks and Regards, > Prateek > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-25 12:15 ` [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter Oleg Nesterov @ 2025-03-25 12:36 ` Dominique Martinet 2025-03-25 13:04 ` Oleg Nesterov 1 sibling, 0 replies; 27+ messages in thread From: Dominique Martinet @ 2025-03-25 12:36 UTC (permalink / raw) To: Oleg Nesterov Cc: K Prateek Nayak, Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck, Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton, linux-fsdevel, linux-kernel, netfs, swapnil.sapkal, syzkaller-bugs, viro, v9fs Thanks for the Cc Just replying quickly without looking at anything Oleg Nesterov wrote on Tue, Mar 25, 2025 at 01:15:26PM +0100: > All I can say right now is that the "sigpending" logic in p9_client_rpc() > looks wrong. If nothing else: > > - clear_thread_flag(TIF_SIGPENDING) is not enough, it won't make > signal_pending() false if TIF_NOTIFY_SIGNAL is set. > > - otoh, if signal_pending() was true because of pending SIGKILL, > then after clear_thread_flag(TIF_SIGPENDING) wait_event_killable() > will act as uninterruptible wait_event(). Yeah, this is effectively an unkillable event loop once a flush has been sent; this is a known issue. I've tried to address this with async rpc (so we could send the flush and forget about it), but that caused other regressions and I never had time to dig into these... The patches date back 2018 and probably won't even apply cleanly anymore, but if anyone cares they are here: https://lore.kernel.org/all/1544532108-21689-3-git-send-email-asmadeus@codewreck.org/T/#u (the hard work of refcounting was done just before that in order to kill this pattern, I just pretty much ran out of free time at that point, hobbies are hard...) So: sorry, it's probably possible to improve this, but it won't be easy nor immediate. > > c->trans_mod->request() calls p9_fd_request() in net/9p/trans_fd.c > > which basically does a p9_fd_poll(). > > > > Previously, the above would fail with err as -EIO which would > > cause the client to "Disconnect" and the retry logic would make > > progress. Now however, the err returned is -ERESTARTSYS which > > will not cause a disconnect and the retry logic will hang > > somewhere in p9_client_rpc() later. Now, if you got this far I think it'll be easier to make whatever changed error out with EIO again instead; I'll try to check the rest of the thread later this week as I didn't follow this thread at all. Thanks, -- Dominique ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-25 12:15 ` [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter Oleg Nesterov 2025-03-25 12:36 ` Dominique Martinet @ 2025-03-25 13:04 ` Oleg Nesterov 2025-03-25 14:49 ` K Prateek Nayak 1 sibling, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2025-03-25 13:04 UTC (permalink / raw) To: K Prateek Nayak, Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck Cc: Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton, linux-fsdevel, linux-kernel, netfs, swapnil.sapkal, syzkaller-bugs, viro, v9fs On 03/25, K Prateek Nayak wrote: > > I chased this down to p9_client_rpc() net/9p/client.c specifically: > > err = c->trans_mod->request(c, req); > if (err < 0) { > /* write won't happen */ > p9_req_put(c, req); > if (err != -ERESTARTSYS && err != -EFAULT) > c->status = Disconnected; > goto recalc_sigpending; > } > > c->trans_mod->request() calls p9_fd_request() in net/9p/trans_fd.c > which basically does a p9_fd_poll(). Again, I know nothing about 9p... but if p9_fd_request() returns an err < 0, then it comes from p9_conn->err and p9_fd_request() does nothing else. > Previously, the above would fail with err as -EIO which would > cause the client to "Disconnect" and the retry logic would make > progress. Now however, the err returned is -ERESTARTSYS OK... So p9_conn->err = -ERESTARTSYS was set by p9_conn_cancel() called by p9_write_work() because pipe_write() returns ERESTARTSYS? But I don't understand -EIO with the reverted commit aaec5a95d59615 Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-25 13:04 ` Oleg Nesterov @ 2025-03-25 14:49 ` K Prateek Nayak 2025-03-25 14:58 ` Dominique Martinet 0 siblings, 1 reply; 27+ messages in thread From: K Prateek Nayak @ 2025-03-25 14:49 UTC (permalink / raw) To: Oleg Nesterov, Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck Cc: Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton, linux-fsdevel, linux-kernel, netfs, swapnil.sapkal, syzkaller-bugs, viro, v9fs [-- Attachment #1: Type: text/plain, Size: 12265 bytes --] Hello Oleg, Dominique, On 3/25/2025 6:34 PM, Oleg Nesterov wrote: > On 03/25, K Prateek Nayak wrote: >> >> I chased this down to p9_client_rpc() net/9p/client.c specifically: >> >> err = c->trans_mod->request(c, req); >> if (err < 0) { >> /* write won't happen */ >> p9_req_put(c, req); >> if (err != -ERESTARTSYS && err != -EFAULT) >> c->status = Disconnected; >> goto recalc_sigpending; >> } >> >> c->trans_mod->request() calls p9_fd_request() in net/9p/trans_fd.c >> which basically does a p9_fd_poll(). > > Again, I know nothing about 9p... but if p9_fd_request() returns > an err < 0, then it comes from p9_conn->err and p9_fd_request() > does nothing else. > >> Previously, the above would fail with err as -EIO which would >> cause the client to "Disconnect" and the retry logic would make >> progress. Now however, the err returned is -ERESTARTSYS > > OK... So p9_conn->err = -ERESTARTSYS was set by p9_conn_cancel() > called by p9_write_work() because pipe_write() returns ERESTARTSYS? > > But I don't understand -EIO with the reverted commit aaec5a95d59615 Okay it is a long explanation. These are the relevant bits of traces with aaec5a95d59615 reverted on mainline and I'll break it down to the best of my abilities: kworker/100:1-1803 [100] ..... 286.618784: p9_write_work: Data write wait 32770 kworker/100:1-1803 [100] ..... 286.618792: p9_write_work: Data written 28672 kworker/100:1-1803 [100] ..... 286.618793: p9_fd_poll: p9_fd_poll rd poll kworker/100:1-1803 [100] ..... 286.618793: p9_fd_poll: p9_fd_request wr poll kworker/100:1-1803 [100] ..... 286.618793: p9_fd_poll: p9_fd_poll rd poll kworker/100:1-1803 [100] ..... 286.618793: p9_fd_poll: p9_fd_request wr poll kworker/100:1-1803 [100] ..... 286.618795: p9_read_work: Data read wait 7 kworker/100:1-1803 [100] ..... 286.618796: p9_read_work: Data read 7 kworker/100:1-1803 [100] ..... 286.618796: p9_fd_poll: p9_fd_poll rd poll kworker/100:1-1803 [100] ..... 286.618796: p9_fd_poll: p9_fd_request wr poll kworker/100:1-1803 [100] ..... 286.618797: p9_fd_poll: p9_fd_poll rd poll kworker/100:1-1803 [100] ..... 286.618797: p9_fd_poll: p9_fd_request wr poll kworker/100:1-1803 [100] ..... 286.618797: p9_read_work: Data read wait 55 kworker/100:1-1803 [100] ..... 286.618798: p9_read_work: Data read 55 repro-4102 [100] ..... 286.618802: p9_client_write: p9_client_rpc done repro-4102 [100] ..... 286.618804: p9_client_write: p9_pdup repro-4102 [100] ..... 286.618804: p9_client_write: iter revert ==== the above bits are common between the two === repro-4102 [100] ..... 286.618805: p9_client_write: p9_client_rpc repro-4102 [100] ..... 286.618817: p9_fd_request: p9_fd_request repro-4102 [100] ..... 286.618818: p9_fd_request: p9_fd_request EPOLL repro-4102 [100] ..... 286.618818: p9_fd_poll: p9_fd_poll rd poll repro-4102 [100] ..... 286.618818: p9_fd_poll: p9_fd_request wr poll kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_poll rd poll kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_request wr poll kworker/100:1-1803 [100] ..... 286.618823: p9_read_work: Data read wait 7 kworker/100:1-1803 [100] ..... 286.618825: p9_read_work: Data read 7 kworker/100:1-1803 [100] ..... 286.618825: p9_read_work: p9_tag_lookup error ^^^^^^^^^^^^^^^^^^^ ==== In this case the read succeeds but p9_tag_lookup() later will error out ==== ==== Connection is cancelled with m->err as -EIO ==== kworker/100:1-1803 [100] ..... 286.618825: p9_read_work: Connection cancel kworker/100:1-1803 [100] ...1. 286.618826: p9_conn_cancel: Connection cancelled (-5) repro-4102 [100] ..... 286.618831: p9_client_write: p9_client_rpc done repro-4102 [100] ..... 286.618832: p9_client_write: p9_client_rpc error (-5) ^^^ ==== This error from kworker/100 is received by the reproducer ==== repro-4102 [100] ..... 286.618833: v9fs_issue_write: Issue write done 2 err(-5) repro-4102 [100] ..... 286.618833: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated repro-4102 [100] ..... 286.618834: netfs_wake_write_collector: Wake collector repro-4102 [100] ..... 286.618834: netfs_wake_write_collector: Queuing collector work repro-4102 [100] ..... 286.618837: v9fs_issue_write: Issue write subrequest terminated 2 repro-4102 [100] ..... 286.618838: netfs_unbuffered_write: netfs_unbuffered_write repro-4102 [100] ..... 286.618838: netfs_end_issue_write: netfs_end_issue_write repro-4102 [100] ..... 286.618838: netfs_end_issue_write: Write collector need poke 0 repro-4102 [100] ..... 286.618839: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS! ==== A reissue mechanism triggers below ==== kworker/u1037:2-3232 [096] ..... 286.618845: netfs_retry_writes: netfs_reissue_write 1 kworker/u1037:2-3232 [096] ..... 286.618846: v9fs_issue_write: v9fs_issue_write kworker/u1037:2-3232 [096] ..... 286.618847: p9_client_write: p9_client_rpc kworker/u1037:2-3232 [096] ..... 286.618873: p9_fd_request: p9_fd_request kworker/u1037:2-3232 [096] ...1. 286.618874: p9_fd_request: p9_fd_request error kworker/u1037:2-3232 [096] ..... 286.618874: p9_client_rpc: Client disconnected (no write) ==== Connection with client is disconnected ==== kworker/u1037:2-3232 [096] ..... 286.618877: p9_client_write: p9_client_rpc done kworker/u1037:2-3232 [096] ..... 286.618880: p9_client_write: p9_client_rpc error (-5) kworker/u1037:2-3232 [096] ..... 286.618881: v9fs_issue_write: Issue write done 0 err(-5) kworker/u1037:2-3232 [096] ..... 286.618882: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated kworker/u1037:2-3232 [096] ..... 286.618882: netfs_wake_write_collector: Wake collector kworker/u1037:2-3232 [096] ..... 286.618882: netfs_wake_write_collector: Queuing collector work kworker/u1037:2-3232 [096] ..... 286.618887: v9fs_issue_write: Issue write subrequest terminated 0 kworker/u1037:2-3232 [096] ..... 286.618890: netfs_write_collection_worker: Write collect clearing and waking up! ==== The retry worker will clear the NETFS_RREQ_IN_PROGRESS bit and issue a wakeup ==== repro-4102 [100] ..... 286.619051: p9_client_prepare_req: p9_client_prepare_req eio 1 repro-4102 [100] ..... 286.619052: p9_client_rpc: p9_client_rpc early err return repro-4108 [240] ..... 286.621325: p9_fd_create: Client connected ==== The reproducer will then re-establish the connection and the pattern repeats ==== --- So the pattern is with the optimization reverted is: o The RPC thread reads "7" bytes. o p9_tag_lookup() fails giving -EIO. o connection is cancelled from rpc thread. o The reproducer thread receives the packet and finds -EIO. o The reproducer starts a Disconnect. o The reproducer wakes up collector and sleeps on bit NETFS_RREQ_IN_PROGRESS o A retry worker wakes up. o Finds connection has been disconnected o Wakes up the reproducer thread. o Reproducer wakes up and establishes connection. I don't know 9p bits enough to understand the intermediates steps but this is the rough flow. --- Now for the traces on vanilla mainline: kworker/31:1-1723 [031] ..... 95.011362: p9_write_work: Data write wait 32770 kworker/31:1-1723 [031] ..... 95.011372: p9_write_work: Data written 28672 kworker/31:1-1723 [031] ..... 95.011373: p9_fd_poll: p9_fd_poll rd poll kworker/31:1-1723 [031] ..... 95.011373: p9_fd_poll: p9_fd_request wr poll kworker/31:1-1723 [031] ..... 95.011374: p9_fd_poll: p9_fd_poll rd poll kworker/31:1-1723 [031] ..... 95.011374: p9_fd_poll: p9_fd_request wr poll kworker/31:1-1723 [031] ..... 95.011374: p9_read_work: Data read wait 7 kworker/31:1-1723 [031] ..... 95.011375: p9_read_work: Data read 7 kworker/31:1-1723 [031] ..... 95.011375: p9_fd_poll: p9_fd_poll rd poll kworker/31:1-1723 [031] ..... 95.011375: p9_fd_poll: p9_fd_request wr poll kworker/31:1-1723 [031] ..... 95.011376: p9_read_work: Data read wait 55 kworker/31:1-1723 [031] ..... 95.011376: p9_read_work: Data read 55 repro-4076 [031] ..... 95.011381: p9_client_rpc: Wait event killable (0) repro-4076 [031] ..... 95.011382: p9_client_rpc: Check error (0) repro-4076 [031] ..... 95.011382: p9_client_write: p9_client_rpc done repro-4076 [031] ..... 95.011382: p9_client_write: p9_pdup repro-4076 [031] ..... 95.011382: p9_client_write: iter revert ==== Above bits are same as the previous traces ==== repro-4076 [031] ..... 95.011383: p9_client_write: p9_client_rpc repro-4076 [031] ..... 95.011393: p9_fd_request: p9_fd_request repro-4076 [031] ..... 95.011394: p9_fd_request: p9_fd_request EPOLL repro-4076 [031] ..... 95.011394: p9_fd_poll: p9_fd_poll rd poll repro-4076 [031] ..... 95.011394: p9_fd_poll: p9_fd_request wr poll repro-4076 [031] ..... 99.731970: p9_client_rpc: Wait event killable (-512) ==== The wait event here fails with -ERESTARTSYS ==== repro-4076 [031] ..... 99.731979: p9_client_rpc: Flushed (-512) repro-4076 [031] ..... 99.731983: p9_client_rpc: Final error (-512) repro-4076 [031] ..... 99.731983: p9_client_write: p9_client_rpc done repro-4076 [031] ..... 99.731984: p9_client_write: p9_client_rpc error (-512) repro-4076 [031] ..... 99.731985: v9fs_issue_write: Issue write done 2 err(-512) repro-4076 [031] ..... 99.731987: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated repro-4076 [031] ..... 99.731987: netfs_wake_write_collector: Wake collector repro-4076 [031] ..... 99.731988: netfs_wake_write_collector: Queuing collector work repro-4076 [031] ..... 99.731991: v9fs_issue_write: Issue write subrequest terminated 2 repro-4076 [031] ..... 99.731991: netfs_unbuffered_write: netfs_unbuffered_write repro-4076 [031] ..... 99.731991: netfs_end_issue_write: netfs_end_issue_write repro-4076 [031] ..... 99.731992: netfs_end_issue_write: Write collector need poke 0 repro-4076 [031] ..... 99.731993: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS! ==== The reproducer waits on bit NETFS_RREQ_IN_PROGRESS ==== kworker/u1028:3-3860 [030] ..... 99.732028: netfs_retry_writes: netfs_reissue_write 1 kworker/u1028:3-3860 [030] ..... 99.732030: v9fs_issue_write: v9fs_issue_write kworker/u1028:3-3860 [030] ..... 99.732031: p9_client_write: p9_client_rpc kworker/u1028:3-3860 [030] ..... 99.732051: p9_fd_request: p9_fd_request kworker/u1028:3-3860 [030] ..... 99.732052: p9_fd_request: p9_fd_request EPOLL kworker/u1028:3-3860 [030] ..... 99.732052: p9_fd_poll: p9_fd_poll rd poll kworker/u1028:3-3860 [030] ..... 99.732053: p9_fd_poll: p9_fd_request wr poll ==== The retry worker seemingly gets stuck post p9_fd_poll() waiting for wakeup ==== --- That is my analysis so far. > > Oleg. > -- Thanks and Regards, Prateek [-- Attachment #2: debug.diff --] [-- Type: text/plain, Size: 19614 bytes --] diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 32619d146cbc..3fbfc6882c35 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -56,10 +56,13 @@ static void v9fs_issue_write(struct netfs_io_subrequest *subreq) struct p9_fid *fid = subreq->rreq->netfs_priv; int err, len; + trace_printk("v9fs_issue_write\n"); len = p9_client_write(fid, subreq->start, &subreq->io_iter, &err); + trace_printk("Issue write done %d err(%d)\n", len, err); if (len > 0) __set_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags); netfs_write_subrequest_terminated(subreq, len ?: err, false); + trace_printk("Issue write subrequest terminated %d\n", len); } /** @@ -74,6 +77,7 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) int total, err; total = p9_client_read(fid, pos, &subreq->io_iter, &err); + trace_printk("v9fs_issue_read %d err(%d)\n", total, err); /* if we just extended the file size, any portion not in * cache won't be on server and is zeroes */ diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c index 42ce53cc216e..6333b8bb4008 100644 --- a/fs/netfs/direct_write.c +++ b/fs/netfs/direct_write.c @@ -105,6 +105,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter * if (!async) { trace_netfs_rreq(wreq, netfs_rreq_trace_wait_ip); + trace_printk("Waiting on NETFS_RREQ_IN_PROGRESS!\n"); wait_on_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS, TASK_UNINTERRUPTIBLE); ret = wreq->error; diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c index 23c75755ad4e..a609063db8a7 100644 --- a/fs/netfs/read_collect.c +++ b/fs/netfs/read_collect.c @@ -446,6 +446,7 @@ static void netfs_read_collection(struct netfs_io_request *rreq) task_io_account_read(rreq->transferred); trace_netfs_rreq(rreq, netfs_rreq_trace_wake_ip); + trace_printk("Read collect clearing and waking up!\n"); clear_and_wake_up_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags); trace_netfs_rreq(rreq, netfs_rreq_trace_done); diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c index 3fca59e6475d..a54aa2bcc786 100644 --- a/fs/netfs/write_collect.c +++ b/fs/netfs/write_collect.c @@ -429,6 +429,7 @@ void netfs_write_collection_worker(struct work_struct *work) _debug("finished"); trace_netfs_rreq(wreq, netfs_rreq_trace_wake_ip); + trace_printk("Write collect clearing and waking up!\n"); clear_and_wake_up_bit(NETFS_RREQ_IN_PROGRESS, &wreq->flags); if (wreq->iocb) { @@ -449,10 +450,14 @@ void netfs_write_collection_worker(struct work_struct *work) */ void netfs_wake_write_collector(struct netfs_io_request *wreq, bool was_async) { + trace_printk("Wake collector\n"); if (!work_pending(&wreq->work)) { netfs_get_request(wreq, netfs_rreq_trace_get_work); - if (!queue_work(system_unbound_wq, &wreq->work)) + trace_printk("Queuing collector work\n"); + if (!queue_work(system_unbound_wq, &wreq->work)) { + trace_printk("Put request\n"); netfs_put_request(wreq, was_async, netfs_rreq_trace_put_work_nq); + } } } @@ -542,8 +547,10 @@ void netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error, /* If we are at the head of the queue, wake up the collector, * transferring a ref to it if we were the ones to do so. */ - if (list_is_first(&subreq->rreq_link, &stream->subrequests)) + if (list_is_first(&subreq->rreq_link, &stream->subrequests)) { + trace_printk("Collector woken up from netfs_write_subrequest_terminated\n"); netfs_wake_write_collector(wreq, was_async); + } netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated); } diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c index 77279fc5b5a7..22a4b793e789 100644 --- a/fs/netfs/write_issue.c +++ b/fs/netfs/write_issue.c @@ -232,8 +232,10 @@ static void netfs_do_issue_write(struct netfs_io_stream *stream, _enter("R=%x[%x],%zx", wreq->debug_id, subreq->debug_index, subreq->len); - if (test_bit(NETFS_SREQ_FAILED, &subreq->flags)) + if (test_bit(NETFS_SREQ_FAILED, &subreq->flags)) { + trace_printk("netfs_do_issue_write failed!\n"); return netfs_write_subrequest_terminated(subreq, subreq->error, false); + } trace_netfs_sreq(subreq, netfs_sreq_trace_submit); stream->issue_write(subreq); @@ -264,6 +266,7 @@ void netfs_issue_write(struct netfs_io_request *wreq, if (!subreq) return; + trace_printk("netfs_issue_write!\n"); stream->construct = NULL; subreq->io_iter.count = subreq->len; netfs_do_issue_write(stream, subreq); @@ -290,6 +293,7 @@ size_t netfs_advance_write(struct netfs_io_request *wreq, _enter("R=%x[%x]", wreq->debug_id, subreq ? subreq->debug_index : 0); if (subreq && start != subreq->start + subreq->len) { + trace_printk("netfs_issue_write from advance write!\n"); netfs_issue_write(wreq, stream); subreq = NULL; } @@ -307,6 +311,7 @@ size_t netfs_advance_write(struct netfs_io_request *wreq, if (subreq->len >= stream->sreq_max_len || subreq->nr_segs >= stream->sreq_max_segs || to_eof) { + trace_printk("netfs_issue_write from advance write! 2\n"); netfs_issue_write(wreq, stream); subreq = NULL; } @@ -397,6 +402,7 @@ static int netfs_write_folio(struct netfs_io_request *wreq, * when they were read. Note that these appear as a special * write-back group. */ + trace_printk("netfs_issue_write in folio copy\n"); if (fgroup == NETFS_FOLIO_COPY_TO_CACHE) { netfs_issue_write(wreq, upload); } else if (fgroup != wreq->group) { @@ -526,6 +532,7 @@ static int netfs_write_folio(struct netfs_io_request *wreq, */ static void netfs_end_issue_write(struct netfs_io_request *wreq) { + trace_printk("netfs_end_issue_write\n"); bool needs_poke = true; smp_wmb(); /* Write subreq lists before ALL_QUEUED. */ @@ -541,6 +548,7 @@ static void netfs_end_issue_write(struct netfs_io_request *wreq) netfs_issue_write(wreq, stream); } + trace_printk("Write collector need poke %d\n", needs_poke); if (needs_poke) netfs_wake_write_collector(wreq, false); } @@ -730,6 +738,7 @@ int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t break; } + trace_printk("netfs_unbuffered_write\n"); netfs_end_issue_write(wreq); _leave(" = %d", error); return error; @@ -908,6 +917,7 @@ int netfs_writeback_single(struct address_space *mapping, } stop: + trace_printk("netfs_issue_write writeback single\n"); for (int s = 0; s < NR_IO_STREAMS; s++) netfs_issue_write(wreq, &wreq->io_streams[s]); smp_wmb(); /* Write lists before ALL_QUEUED. */ diff --git a/fs/netfs/write_retry.c b/fs/netfs/write_retry.c index 545d33079a77..60a03079766b 100644 --- a/fs/netfs/write_retry.c +++ b/fs/netfs/write_retry.c @@ -43,6 +43,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq, iov_iter_revert(&source, subreq->len - source.count); netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); + trace_printk("netfs_reissue_write 1\n"); netfs_reissue_write(stream, subreq, &source); } } @@ -116,6 +117,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq, boundary = true; netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); + trace_printk("netfs_reissue_write 2\n"); netfs_reissue_write(stream, subreq, &source); if (subreq == to) break; @@ -183,6 +185,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq, boundary = false; } + trace_printk("netfs_reissue_write 3\n"); netfs_reissue_write(stream, subreq, &source); if (!len) break; diff --git a/net/9p/client.c b/net/9p/client.c index 09f8ced9f8bb..909c83b0e015 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -633,12 +633,16 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c, p9_debug(P9_DEBUG_MUX, "client %p op %d\n", c, type); /* we allow for any status other than disconnected */ - if (c->status == Disconnected) + if (c->status == Disconnected) { + trace_printk("p9_client_prepare_req eio 1\n"); return ERR_PTR(-EIO); + } /* if status is begin_disconnected we allow only clunk request */ - if (c->status == BeginDisconnect && type != P9_TCLUNK) + if (c->status == BeginDisconnect && type != P9_TCLUNK) { + trace_printk("p9_client_prepare_req eio 2\n"); return ERR_PTR(-EIO); + } va_copy(apc, ap); req = p9_tag_alloc(c, type, t_size, r_size, fmt, apc); @@ -690,8 +694,10 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) va_start(ap, fmt); req = p9_client_prepare_req(c, type, tsize, rsize, fmt, ap); va_end(ap); - if (IS_ERR(req)) + if (IS_ERR(req)) { + trace_printk("p9_client_rpc early err return\n"); return req; + } req->tc.zc = false; req->rc.zc = false; @@ -706,9 +712,12 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) err = c->trans_mod->request(c, req); if (err < 0) { /* write won't happen */ + trace_printk("write won't happnen (%d)\n", err); p9_req_put(c, req); - if (err != -ERESTARTSYS && err != -EFAULT) + if (err != -ERESTARTSYS && err != -EFAULT) { + trace_printk("Client disconnected (no write)\n"); c->status = Disconnected; + } goto recalc_sigpending; } again: @@ -721,6 +730,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) */ smp_rmb(); + trace_printk("Wait event killable (%d) (%d) (%d) (%d)\n", err, READ_ONCE(req->status), c->status, type); if (err == -ERESTARTSYS && c->status == Connected && type == P9_TFLUSH) { sigpending = 1; @@ -731,6 +741,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) if (READ_ONCE(req->status) == REQ_STATUS_ERROR) { p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err); err = req->t_err; + trace_printk("REQ_STATUS_ERROR (%d)\n", err); } if (err == -ERESTARTSYS && c->status == Connected) { p9_debug(P9_DEBUG_MUX, "flushing\n"); @@ -743,6 +754,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) /* if we received the response anyway, don't signal error */ if (READ_ONCE(req->status) == REQ_STATUS_RCVD) err = 0; + + trace_printk("Flushed (%d)\n", err); } recalc_sigpending: if (sigpending) { @@ -754,11 +767,13 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) goto reterr; err = p9_check_errors(c, req); + trace_printk("Check error (%d)\n", err); trace_9p_client_res(c, type, req->rc.tag, err); if (!err) return req; reterr: p9_req_put(c, req); + trace_printk("Final error (%d)\n", err); return ERR_PTR(safe_errno(err)); } @@ -808,8 +823,10 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, err = c->trans_mod->zc_request(c, req, uidata, uodata, inlen, olen, in_hdrlen); if (err < 0) { - if (err == -EIO) + if (err == -EIO) { + trace_printk("Client disconnected (zero_copy)\n"); c->status = Disconnected; + } if (err != -ERESTARTSYS) goto recalc_sigpending; } @@ -1093,6 +1110,7 @@ EXPORT_SYMBOL(p9_client_destroy); void p9_client_disconnect(struct p9_client *clnt) { p9_debug(P9_DEBUG_9P, "clnt %p\n", clnt); + trace_printk("Client disconnected\n"); clnt->status = Disconnected; } EXPORT_SYMBOL(p9_client_disconnect); @@ -1100,6 +1118,7 @@ EXPORT_SYMBOL(p9_client_disconnect); void p9_client_begin_disconnect(struct p9_client *clnt) { p9_debug(P9_DEBUG_9P, "clnt %p\n", clnt); + trace_printk("Client begin disconnect\n"); clnt->status = BeginDisconnect; } EXPORT_SYMBOL(p9_client_begin_disconnect); @@ -1572,11 +1591,13 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to, offset, rsize); } else { non_zc = 1; + trace_printk("RPC issue\n"); req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize); } if (IS_ERR(req)) { *err = PTR_ERR(req); + trace_printk("read RPC issue error(%d)\n", *err); if (!non_zc) iov_iter_revert(to, count - iov_iter_count(to)); return 0; @@ -1587,12 +1608,14 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to, if (*err) { if (!non_zc) iov_iter_revert(to, count - iov_iter_count(to)); + trace_printk("p9pdu_readf error(%d)\n", *err); trace_9p_protocol_dump(clnt, &req->rc); p9_req_put(clnt, req); return 0; } if (rsize < received) { pr_err("bogus RREAD count (%d > %d)\n", received, rsize); + trace_printk("bogus read\n"); received = rsize; } @@ -1602,6 +1625,7 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to, int n = copy_to_iter(dataptr, received, to); if (n != received) { + trace_printk("copy_to_iter fault\n"); *err = -EFAULT; p9_req_put(clnt, req); return n; @@ -1642,23 +1666,29 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err) rsize, P9_ZC_HDR_SZ, "dqd", fid->fid, offset, rsize); } else { + trace_printk("p9_client_rpc\n"); req = p9_client_rpc(clnt, P9_TWRITE, "dqV", fid->fid, offset, rsize, from); + trace_printk("p9_client_rpc done\n"); } if (IS_ERR(req)) { iov_iter_revert(from, count - iov_iter_count(from)); *err = PTR_ERR(req); + trace_printk("p9_client_rpc error (%d)\n", *err); break; } + trace_printk("p9_pdup\n"); *err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &written); if (*err) { + trace_printk("p9_pdup err(%d)\n", *err); iov_iter_revert(from, count - iov_iter_count(from)); trace_9p_protocol_dump(clnt, &req->rc); p9_req_put(clnt, req); break; } if (rsize < written) { + trace_printk("p9_pdup bogus %d %d\n", written, rsize); pr_err("bogus RWRITE count (%d > %d)\n", written, rsize); written = rsize; } @@ -1666,6 +1696,7 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err) p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", written); p9_req_put(clnt, req); + trace_printk("iter revert\n"); iov_iter_revert(from, count - written - iov_iter_count(from)); total += written; offset += written; diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 196060dc6138..3c83b3855e82 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -196,6 +196,7 @@ static void p9_conn_cancel(struct p9_conn *m, int err) return; } + trace_printk("Connection cancelled (%d)\n", err); m->err = err; list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) { @@ -233,9 +234,12 @@ p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err) return EPOLLERR; } + trace_printk("p9_fd_poll rd poll\n"); ret = vfs_poll(ts->rd, pt); - if (ts->rd != ts->wr) + if (ts->rd != ts->wr) { + trace_printk("p9_fd_request wr poll\n"); ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN); + } return ret; } @@ -263,9 +267,13 @@ static int p9_fd_read(struct p9_client *client, void *v, int len) p9_debug(P9_DEBUG_ERROR, "blocking read ...\n"); pos = ts->rd->f_pos; + trace_printk("Data read wait %d\n", len); ret = kernel_read(ts->rd, v, len, &pos); - if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN) + trace_printk("Data read %d\n", ret); + if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN) { + trace_printk("Client disconnected (fd_read)\n"); client->status = Disconnected; + } return ret; } @@ -330,6 +338,7 @@ static void p9_read_work(struct work_struct *work) if (!m->rreq || (m->rreq->status != REQ_STATUS_SENT)) { p9_debug(P9_DEBUG_ERROR, "Unexpected packet tag %d\n", m->rc.tag); + trace_printk("p9_tag_lookup error\n"); err = -EIO; goto error; } @@ -338,6 +347,7 @@ static void p9_read_work(struct work_struct *work) p9_debug(P9_DEBUG_ERROR, "requested packet size too big: %d for tag %d with capacity %zd\n", m->rc.size, m->rc.tag, m->rreq->rc.capacity); + trace_printk("capacity error\n"); err = -EIO; goto error; } @@ -348,6 +358,7 @@ static void p9_read_work(struct work_struct *work) m->rc.tag, m->rreq); p9_req_put(m->client, m->rreq); m->rreq = NULL; + trace_printk("sdata error\n"); err = -EIO; goto error; } @@ -375,6 +386,7 @@ static void p9_read_work(struct work_struct *work) p9_debug(P9_DEBUG_ERROR, "Request tag %d errored out while we were reading the reply\n", m->rc.tag); + trace_printk("request tag error %d status(%d)\n", m->rc.tag, m->rreq->status); err = -EIO; goto error; } @@ -403,6 +415,7 @@ static void p9_read_work(struct work_struct *work) return; error: + trace_printk("Connection cancel\n"); p9_conn_cancel(m, err); clear_bit(Rworksched, &m->wsched); } @@ -429,9 +442,13 @@ static int p9_fd_write(struct p9_client *client, void *v, int len) if (!(ts->wr->f_flags & O_NONBLOCK)) p9_debug(P9_DEBUG_ERROR, "blocking write ...\n"); + trace_printk("Data write wait %d\n",len); ret = kernel_write(ts->wr, v, len, &ts->wr->f_pos); - if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN) + trace_printk("Data written %ld\n", ret); + if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN) { + trace_printk("Client disconnected (fd_write)\n"); client->status = Disconnected; + } return ret; } @@ -519,6 +536,7 @@ static void p9_write_work(struct work_struct *work) return; error: + trace_printk("Connection cancel\n"); p9_conn_cancel(m, err); clear_bit(Wworksched, &m->wsched); } @@ -628,6 +646,7 @@ static void p9_poll_mux(struct p9_conn *m) n = p9_fd_poll(m->client, NULL, &err); if (n & (EPOLLERR | EPOLLHUP | EPOLLNVAL)) { p9_debug(P9_DEBUG_TRANS, "error mux %p err %d\n", m, n); + trace_printk("Connection cancel\n"); p9_conn_cancel(m, err); } @@ -668,12 +687,14 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) struct p9_trans_fd *ts = client->trans; struct p9_conn *m = &ts->conn; + trace_printk("p9_fd_request\n"); p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n", m, current, &req->tc, req->tc.id); spin_lock(&m->req_lock); if (m->err < 0) { + trace_printk("p9_fd_request error\n"); spin_unlock(&m->req_lock); return m->err; } @@ -682,13 +703,16 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) list_add_tail(&req->req_list, &m->unsent_req_list); spin_unlock(&m->req_lock); + trace_printk("p9_fd_request EPOLL\n"); if (test_and_clear_bit(Wpending, &m->wsched)) n = EPOLLOUT; else n = p9_fd_poll(m->client, NULL, NULL); - if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) + if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) { + trace_printk("p9_fd_request schedule work\n"); schedule_work(&m->wq); + } return 0; } @@ -852,6 +876,7 @@ static int p9_fd_open(struct p9_client *client, int rfd, int wfd) data_race(ts->wr->f_flags |= O_NONBLOCK); client->trans = ts; + trace_printk("Client connected\n"); client->status = Connected; return 0; @@ -889,6 +914,7 @@ static int p9_socket_open(struct p9_client *client, struct socket *csocket) get_file(file); p->wr = p->rd = file; client->trans = p; + trace_printk("Client connected\n"); client->status = Connected; p->rd->f_flags |= O_NONBLOCK; @@ -920,6 +946,7 @@ static void p9_conn_destroy(struct p9_conn *m) m->wreq = NULL; } + trace_printk("Connection destroy\n"); p9_conn_cancel(m, -ECONNRESET); m->client = NULL; @@ -942,6 +969,7 @@ static void p9_fd_close(struct p9_client *client) if (!ts) return; + trace_printk("Client disconnected (fd_close)\n"); client->status = Disconnected; p9_conn_destroy(&ts->conn); ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-25 14:49 ` K Prateek Nayak @ 2025-03-25 14:58 ` Dominique Martinet 2025-03-26 12:19 ` Oleg Nesterov 2025-03-27 17:46 ` K Prateek Nayak 0 siblings, 2 replies; 27+ messages in thread From: Dominique Martinet @ 2025-03-25 14:58 UTC (permalink / raw) To: K Prateek Nayak Cc: Oleg Nesterov, Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck, Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton, linux-fsdevel, linux-kernel, netfs, swapnil.sapkal, syzkaller-bugs, viro, v9fs Thanks for the traces. w/ revert K Prateek Nayak wrote on Tue, Mar 25, 2025 at 08:19:26PM +0530: > kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_poll rd poll > kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_request wr poll > kworker/100:1-1803 [100] ..... 286.618823: p9_read_work: Data read wait 7 new behavior > repro-4076 [031] ..... 95.011394: p9_fd_poll: p9_fd_poll rd poll > repro-4076 [031] ..... 95.011394: p9_fd_poll: p9_fd_request wr poll > repro-4076 [031] ..... 99.731970: p9_client_rpc: Wait event killable (-512) For me the problem isn't so much that this gets ERESTARTSYS but that it nevers gets to read the 7 bytes that are available? If the repro has already written the bytes in both cases then there's no reason to wait 5 seconds here... OTOH syzbot 9p code is silly and might have been depending on something that's not supposed to work e.g. they might be missing a flush or equivalent for all I know (I still haven't looked at the repro) -- Dominique ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-25 14:58 ` Dominique Martinet @ 2025-03-26 12:19 ` Oleg Nesterov 2025-03-26 12:44 ` Oleg Nesterov 2025-03-27 17:46 ` K Prateek Nayak 1 sibling, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2025-03-26 12:19 UTC (permalink / raw) To: Dominique Martinet Cc: K Prateek Nayak, Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck, Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton, linux-fsdevel, linux-kernel, netfs, swapnil.sapkal, syzkaller-bugs, viro, v9fs On 03/25, Dominique Martinet wrote: > > Thanks for the traces. > > w/ revert > K Prateek Nayak wrote on Tue, Mar 25, 2025 at 08:19:26PM +0530: > > kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_poll rd poll > > kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_request wr poll > > kworker/100:1-1803 [100] ..... 286.618823: p9_read_work: Data read wait 7 > > new behavior > > repro-4076 [031] ..... 95.011394: p9_fd_poll: p9_fd_poll rd poll > > repro-4076 [031] ..... 95.011394: p9_fd_poll: p9_fd_request wr poll > > repro-4076 [031] ..... 99.731970: p9_client_rpc: Wait event killable (-512) > > For me the problem isn't so much that this gets ERESTARTSYS but that it > nevers gets to read the 7 bytes that are available? Yes... OK, lets first recall what the commit aaec5a95d59615523 ("pipe_read: don't wake up the writer if the pipe is still full") does. It simply removes the unnecessary/spurious wakeups when the writer can't add more data to the pipe. See the "stupid test-cas" in https://lore.kernel.org/all/20250120144338.GC7432@redhat.com/ In particular this note: As you can see, without this patch pipe_read() wakes the writer up 4095 times for no reason, the writer burns a bit of CPU and blocks again after wakeup until the last read(fd[0], &c, 1). in this test-case the writer sleeps in pipe_write(), but the same is true for the task sleeping in poll( { .fd = pipe_fd, .events = POLLOUT}, ...). Now, after some grepping I have found static void p9_conn_create(struct p9_client *client) { ... init_poll_funcptr(&m->pt, p9_pollwait); n = p9_fd_poll(client, &m->pt, NULL); ... } So, iiuc, in this case p9_fd_poll(&m->pt /* != NULL */) -> p9_pollwait() paths will add the "dummy" pwait->wait entries with ->func = p9_pollwake to pipe_inode_info.rd_wait and pipe_inode_info.wr_wait. Hmm... I don't understand why the 2nd vfs_poll(ts->wr) depends on the ret from vfs_poll(ts->rd), but I assume this is correct. This means that every time pipe_read() does wake_up(&pipe->wr_wait) p9_pollwake() is called. This function kicks p9_poll_workfn() which calls p9_poll_mux() which calls p9_fd_poll() again with pt == NULL. In this case the conditional vfs_poll(ts->wr) looks more understandable... So. Without the commit above, p9_poll_mux()->p9_fd_poll() can be called much more often and, in particular, can report the "additional" EPOLLIN. Can this somehow explain the problem? Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-26 12:19 ` Oleg Nesterov @ 2025-03-26 12:44 ` Oleg Nesterov 2025-03-26 13:05 ` Oleg Nesterov 0 siblings, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2025-03-26 12:44 UTC (permalink / raw) To: Dominique Martinet, K Prateek Nayak Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck, Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton, linux-fsdevel, linux-kernel, netfs, swapnil.sapkal, syzkaller-bugs, viro, v9fs On 03/26, Oleg Nesterov wrote: > > Hmm... I don't understand why the 2nd vfs_poll(ts->wr) depends on the > ret from vfs_poll(ts->rd), but I assume this is correct. I meant, if pt != NULL and ts->rd != ts->wr we need both vfs_poll(ts->rd) and vfs_poll(ts->wr) ? and the reproducer writes to the pipe before it mounts 9p... Prateek, this is just a shot in the dark but since you can reproduce, can you check if the patch below makes any difference? Oleg. --- x/net/9p/trans_fd.c +++ x/net/9p/trans_fd.c @@ -234,8 +234,10 @@ p9_fd_poll(struct p9_client *client, str } ret = vfs_poll(ts->rd, pt); - if (ts->rd != ts->wr) + if (ts->rd != ts->wr) { + if (pt != NULL) vfs_poll(ts->wr, pt); ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN); + } return ret; } ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-26 12:44 ` Oleg Nesterov @ 2025-03-26 13:05 ` Oleg Nesterov 0 siblings, 0 replies; 27+ messages in thread From: Oleg Nesterov @ 2025-03-26 13:05 UTC (permalink / raw) To: Dominique Martinet, K Prateek Nayak Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck, Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton, linux-fsdevel, linux-kernel, netfs, swapnil.sapkal, syzkaller-bugs, viro, v9fs Damn, sorry for the noise please ignore ;) On 03/26, Oleg Nesterov wrote: > > On 03/26, Oleg Nesterov wrote: > > > > Hmm... I don't understand why the 2nd vfs_poll(ts->wr) depends on the > > ret from vfs_poll(ts->rd), but I assume this is correct. > > I meant, if pt != NULL and ts->rd != ts->wr we need both > vfs_poll(ts->rd) and vfs_poll(ts->wr) ? I am stupid. After the lot of reading I can't distinguish "|" and "||". Oleg. > and the reproducer writes to the pipe before it mounts 9p... > > Prateek, this is just a shot in the dark but since you can reproduce, > can you check if the patch below makes any difference? > > Oleg. > > --- x/net/9p/trans_fd.c > +++ x/net/9p/trans_fd.c > @@ -234,8 +234,10 @@ p9_fd_poll(struct p9_client *client, str > } > > ret = vfs_poll(ts->rd, pt); > - if (ts->rd != ts->wr) > + if (ts->rd != ts->wr) { > + if (pt != NULL) vfs_poll(ts->wr, pt); > ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN); > + } > return ret; > } > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-25 14:58 ` Dominique Martinet 2025-03-26 12:19 ` Oleg Nesterov @ 2025-03-27 17:46 ` K Prateek Nayak 2025-03-27 21:19 ` syzbot 1 sibling, 1 reply; 27+ messages in thread From: K Prateek Nayak @ 2025-03-27 17:46 UTC (permalink / raw) To: Dominique Martinet, Oleg Nesterov Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck, Mateusz Guzik, syzbot, brauner, dhowells, jack, jlayton, linux-fsdevel, linux-kernel, netfs, swapnil.sapkal, syzkaller-bugs, viro, v9fs Hello all, On 3/25/2025 8:28 PM, Dominique Martinet wrote: > Thanks for the traces. > > w/ revert > K Prateek Nayak wrote on Tue, Mar 25, 2025 at 08:19:26PM +0530: >> kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_poll rd poll >> kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_request wr poll >> kworker/100:1-1803 [100] ..... 286.618823: p9_read_work: Data read wait 7 > > new behavior >> repro-4076 [031] ..... 95.011394: p9_fd_poll: p9_fd_poll rd poll >> repro-4076 [031] ..... 95.011394: p9_fd_poll: p9_fd_request wr poll >> repro-4076 [031] ..... 99.731970: p9_client_rpc: Wait event killable (-512) > > For me the problem isn't so much that this gets ERESTARTSYS but that it > nevers gets to read the 7 bytes that are available? I'm back after getting distracted for a bit. So here it goes: On the mainline, the signal is actually SIGKILL from parent thread due to a timeout: repro-4084 [112] d..2. 233.654264: signal_generate: sig=9 errno=0 code=0 comm=repro pid=4085 grp=1 res=0 repro-4084 [112] d..1. 233.654272: signal_generate: sig=9 errno=0 code=0 comm=repro pid=4085 grp=1 res=1 repro-4085 [125] ..... 233.654304: p9_client_rpc: Wait event killable (-512) (1) (0) (118) repro-4085 [125] ..... 233.654308: p9_client_rpc: Flushed (-512) repro-4085 [125] ..... 233.654312: p9_client_rpc: Final error (-512) repro-4085 [125] ..... 233.654313: p9_client_write: p9_client_rpc done repro-4085 [125] ..... 233.654313: p9_client_write: p9_client_rpc error (-512) repro-4085 [125] ..... 233.654315: v9fs_issue_write: Issue write done 2 err(-512) Specifically, this part: for (;;) { sleep_ms(10); if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid) break; if (current_time_ms() - start < 5000) continue; kill_and_wait(pid, &status); /* <------- here */ break; } Now for the 7 bytes that were not read - If you look at the traces, every time there is a valid trans_mod->request(), there is a kworker wakeup: repro-4102 [100] ..... 286.618772: p9_client_write: p9_client_rpc repro-4102 [100] ..... 286.618780: p9_fd_request: p9_fd_request repro-4102 [100] ..... 286.618781: p9_fd_request: p9_fd_request EPOLL repro-4102 [100] ..... 286.618781: p9_fd_request: p9_fd_request schedule work <----- this kworker/100:1-1803 [100] ..... 286.618784: p9_write_work: Data write wait 32770 However, for that last 7 byte read, there is in fact no wakeup: repro-4102 [100] ..... 286.618805: p9_client_write: p9_client_rpc repro-4102 [100] ..... 286.618817: p9_fd_request: p9_fd_request repro-4102 [100] ..... 286.618818: p9_fd_request: p9_fd_request EPOLL repro-4102 [100] ..... 286.618818: p9_fd_poll: p9_fd_poll rd poll repro-4102 [100] ..... 286.618818: p9_fd_poll: p9_fd_request wr poll <--- No wakeup after kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_poll rd poll kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_request wr poll kworker/100:1-1803 [100] ..... 286.618823: p9_read_work: Data read wait 7 kworker/100:1-1803 [100] ..... 286.618825: p9_read_work: Data read 7 The kworker would spuriously wakeup, do something, realizes it wasn't suppose to read it, and then disconnects the connection waking up all waiters. But if we haven't sent a request, why are we waiting at all? > > If the repro has already written the bytes in both cases then there's no > reason to wait 5 seconds here... > > OTOH syzbot 9p code is silly and might have been depending on something > that's not supposed to work e.g. they might be missing a flush or > equivalent for all I know (I still haven't looked at the repro) > I believe if c->trans_mod->request(c, req) does not dispatch the request to the worker for RPC, the caller must not wait. So here goes something that got the benchmark going in my case which might be totally wrong but worth a look: #syz test: upstream aaec5a95d59615523db03dd53c2052f0a87beea7 diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 196060dc6138..148533c993f1 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -687,7 +687,11 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) else n = p9_fd_poll(m->client, NULL, NULL); - if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) + /* Request was not sent */ + if (!(n & EPOLLOUT)) + return -EIO; + + if (!test_and_set_bit(Wworksched, &m->wsched)) schedule_work(&m->wq); return 0; ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-27 17:46 ` K Prateek Nayak @ 2025-03-27 21:19 ` syzbot 2025-03-27 22:18 ` asmadeus 0 siblings, 1 reply; 27+ messages in thread From: syzbot @ 2025-03-27 21:19 UTC (permalink / raw) To: asmadeus, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro Hello, syzbot has tested the proposed patch but the reproducer is still triggering an issue: KASAN: slab-use-after-free Read in p9_conn_cancel ================================================================== BUG: KASAN: slab-use-after-free in p9_conn_cancel+0x900/0x910 net/9p/trans_fd.c:205 Read of size 8 at addr ffff88807b19ea50 by task syz-executor/6595 CPU: 0 UID: 0 PID: 6595 Comm: syz-executor Not tainted 6.13.0-rc1-syzkaller-00017-gaaec5a95d596-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xc3/0x620 mm/kasan/report.c:489 kasan_report+0xd9/0x110 mm/kasan/report.c:602 p9_conn_cancel+0x900/0x910 net/9p/trans_fd.c:205 p9_conn_destroy net/9p/trans_fd.c:927 [inline] p9_fd_close+0x3c5/0x590 net/9p/trans_fd.c:951 p9_client_destroy+0xce/0x480 net/9p/client.c:1077 v9fs_session_close+0x49/0x2d0 fs/9p/v9fs.c:506 v9fs_kill_super+0x4d/0xa0 fs/9p/vfs_super.c:196 deactivate_locked_super+0xbe/0x1a0 fs/super.c:473 deactivate_super+0xde/0x100 fs/super.c:506 cleanup_mnt+0x222/0x450 fs/namespace.c:1373 task_work_run+0x14e/0x250 kernel/task_work.c:239 exit_task_work include/linux/task_work.h:43 [inline] do_exit+0xadd/0x2d70 kernel/exit.c:938 do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 get_signal+0x24ed/0x26c0 kernel/signal.c:3017 arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fa1f978bb7c Code: Unable to access opcode bytes at 0x7fa1f978bb52. RSP: 002b:00007ffd5c5893b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: fffffffffffffe00 RBX: 0000000000000003 RCX: 00007fa1f978bb7c RDX: 0000000000000030 RSI: 00007ffd5c589470 RDI: 00000000000000f9 RBP: 00007ffd5c58941c R08: 0000000000000000 R09: 0079746972756365 R10: 00007ffd5c588d70 R11: 0000000000000246 R12: 0000000000000258 R13: 00000000000927c0 R14: 0000000000019c1d R15: 00007ffd5c589470 </TASK> Allocated by task 52: kasan_save_stack+0x33/0x60 mm/kasan/common.c:47 kasan_save_track+0x14/0x30 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:319 [inline] __kasan_slab_alloc+0x89/0x90 mm/kasan/common.c:345 kasan_slab_alloc include/linux/kasan.h:250 [inline] slab_post_alloc_hook mm/slub.c:4104 [inline] slab_alloc_node mm/slub.c:4153 [inline] kmem_cache_alloc_noprof+0x226/0x3d0 mm/slub.c:4160 p9_tag_alloc+0x9c/0x870 net/9p/client.c:281 p9_client_prepare_req+0x19f/0x4d0 net/9p/client.c:644 p9_client_rpc+0x1c3/0xc10 net/9p/client.c:691 p9_client_write+0x31f/0x680 net/9p/client.c:1645 v9fs_issue_write+0xe2/0x180 fs/9p/vfs_addr.c:59 netfs_do_issue_write+0x92/0x110 fs/netfs/write_issue.c:233 netfs_retry_write_stream fs/netfs/write_collect.c:184 [inline] netfs_retry_writes fs/netfs/write_collect.c:361 [inline] netfs_collect_write_results fs/netfs/write_collect.c:529 [inline] netfs_write_collection_worker+0x3e10/0x47d0 fs/netfs/write_collect.c:551 process_one_work+0x9c5/0x1ba0 kernel/workqueue.c:3229 process_scheduled_works kernel/workqueue.c:3310 [inline] worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Freed by task 5192: kasan_save_stack+0x33/0x60 mm/kasan/common.c:47 kasan_save_track+0x14/0x30 mm/kasan/common.c:68 kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:582 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x51/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:233 [inline] slab_free_hook mm/slub.c:2338 [inline] slab_free_after_rcu_debug+0x115/0x340 mm/slub.c:4648 rcu_do_batch kernel/rcu/tree.c:2567 [inline] rcu_core+0x79d/0x14d0 kernel/rcu/tree.c:2823 handle_softirqs+0x213/0x8f0 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0x109/0x170 kernel/softirq.c:655 irq_exit_rcu+0x9/0x30 kernel/softirq.c:671 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline] sysvec_apic_timer_interrupt+0xa4/0xc0 arch/x86/kernel/apic/apic.c:1049 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 Last potentially related work creation: kasan_save_stack+0x33/0x60 mm/kasan/common.c:47 __kasan_record_aux_stack+0xba/0xd0 mm/kasan/generic.c:544 slab_free_hook mm/slub.c:2299 [inline] slab_free mm/slub.c:4598 [inline] kmem_cache_free+0x305/0x4c0 mm/slub.c:4700 p9_req_put net/9p/client.c:409 [inline] p9_req_put+0x1c6/0x250 net/9p/client.c:402 p9_client_rpc+0x591/0xc10 net/9p/client.c:761 p9_client_write+0x31f/0x680 net/9p/client.c:1645 v9fs_issue_write+0xe2/0x180 fs/9p/vfs_addr.c:59 netfs_do_issue_write+0x92/0x110 fs/netfs/write_issue.c:233 netfs_retry_write_stream fs/netfs/write_collect.c:184 [inline] netfs_retry_writes fs/netfs/write_collect.c:361 [inline] netfs_collect_write_results fs/netfs/write_collect.c:529 [inline] netfs_write_collection_worker+0x3e10/0x47d0 fs/netfs/write_collect.c:551 process_one_work+0x9c5/0x1ba0 kernel/workqueue.c:3229 process_scheduled_works kernel/workqueue.c:3310 [inline] worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The buggy address belongs to the object at ffff88807b19e990 which belongs to the cache p9_req_t of size 208 The buggy address is located 192 bytes inside of freed 208-byte region [ffff88807b19e990, ffff88807b19ea60) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7b19e flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff) page_type: f5(slab) raw: 00fff00000000000 ffff88814c7a0140 ffffea0001dd70c0 0000000000000006 raw: 0000000000000000 00000000000f000f 00000001f5000000 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52c40(GFP_NOFS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 6752, tgid 6751 (syz.0.17), ts 91802565462, free_ts 91788017536 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x2d1/0x350 mm/page_alloc.c:1556 prep_new_page mm/page_alloc.c:1564 [inline] get_page_from_freelist+0xfce/0x2f80 mm/page_alloc.c:3474 __alloc_pages_noprof+0x221/0x2470 mm/page_alloc.c:4751 alloc_pages_mpol_noprof+0x2c9/0x610 mm/mempolicy.c:2265 alloc_slab_page mm/slub.c:2408 [inline] allocate_slab mm/slub.c:2574 [inline] new_slab+0x2c9/0x410 mm/slub.c:2627 ___slab_alloc+0xda4/0x1880 mm/slub.c:3815 __slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3905 __slab_alloc_node mm/slub.c:3980 [inline] slab_alloc_node mm/slub.c:4141 [inline] kmem_cache_alloc_noprof+0xfa/0x3d0 mm/slub.c:4160 p9_tag_alloc+0x9c/0x870 net/9p/client.c:281 p9_client_prepare_req+0x19f/0x4d0 net/9p/client.c:644 p9_client_rpc+0x1c3/0xc10 net/9p/client.c:691 p9_client_version net/9p/client.c:930 [inline] p9_client_create+0xc65/0x1200 net/9p/client.c:1036 v9fs_session_init+0x1f8/0x1a80 fs/9p/v9fs.c:410 v9fs_mount+0xc6/0xa30 fs/9p/vfs_super.c:122 legacy_get_tree+0x109/0x220 fs/fs_context.c:662 vfs_get_tree+0x8f/0x380 fs/super.c:1814 page last free pid 6750 tgid 6748 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1127 [inline] free_unref_folios+0xa7b/0x14f0 mm/page_alloc.c:2704 folios_put_refs+0x587/0x7b0 mm/swap.c:962 free_pages_and_swap_cache+0x45f/0x510 mm/swap_state.c:335 __tlb_batch_free_encoded_pages+0xf9/0x290 mm/mmu_gather.c:136 tlb_batch_pages_flush mm/mmu_gather.c:149 [inline] tlb_flush_mmu_free mm/mmu_gather.c:366 [inline] tlb_flush_mmu mm/mmu_gather.c:373 [inline] tlb_finish_mmu+0x168/0x7b0 mm/mmu_gather.c:465 exit_mmap+0x3df/0xb20 mm/mmap.c:1680 __mmput+0x12a/0x4c0 kernel/fork.c:1353 mmput+0x62/0x70 kernel/fork.c:1375 exit_mm kernel/exit.c:570 [inline] do_exit+0x9bf/0x2d70 kernel/exit.c:925 do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 get_signal+0x24ed/0x26c0 kernel/signal.c:3017 arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f Memory state around the buggy address: ffff88807b19e900: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc ffff88807b19e980: fc fc fa fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88807b19ea00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ^ ffff88807b19ea80: fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 00 ffff88807b19eb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc ================================================================== Tested on: commit: aaec5a95 pipe_read: don't wake up the writer if the pi.. git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git console output: https://syzkaller.appspot.com/x/log.txt?x=15fafde4580000 kernel config: https://syzkaller.appspot.com/x/.config?x=8d5a2956e94d7972 dashboard link: https://syzkaller.appspot.com/bug?extid=62262fdc0e01d99573fc compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 patch: https://syzkaller.appspot.com/x/patch.diff?x=110a3804580000 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-27 21:19 ` syzbot @ 2025-03-27 22:18 ` asmadeus 2025-03-28 4:01 ` K Prateek Nayak 0 siblings, 1 reply; 27+ messages in thread From: asmadeus @ 2025-03-27 22:18 UTC (permalink / raw) To: syzbot Cc: brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro syzbot wrote on Thu, Mar 27, 2025 at 02:19:03PM -0700: > BUG: KASAN: slab-use-after-free in p9_conn_cancel+0x900/0x910 net/9p/trans_fd.c:205 > Read of size 8 at addr ffff88807b19ea50 by task syz-executor/6595 Ugh, why... Ah, if ->request() fails p9_client_rpc assumes the request was not written (e.g. write error), so you can't return an error after the list_add_tail call in p9_fd_request. I think you can call p9_conn_cancel with the error and return 0 anyway, and this paticular workaround will probably work, regardless of whether it's the correct thing to do here (still haven't had time to look at the patch here) Sorry for this mess (even if most of it predates me...) -- Dominique ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-27 22:18 ` asmadeus @ 2025-03-28 4:01 ` K Prateek Nayak 2025-03-28 4:43 ` syzbot 2025-03-28 13:06 ` Oleg Nesterov 0 siblings, 2 replies; 27+ messages in thread From: K Prateek Nayak @ 2025-03-28 4:01 UTC (permalink / raw) To: asmadeus, syzbot Cc: brauner, dhowells, ericvh, jack, jlayton, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro Hello Dominique, On 3/28/2025 3:48 AM, asmadeus@codewreck.org wrote: > syzbot wrote on Thu, Mar 27, 2025 at 02:19:03PM -0700: >> BUG: KASAN: slab-use-after-free in p9_conn_cancel+0x900/0x910 net/9p/trans_fd.c:205 >> Read of size 8 at addr ffff88807b19ea50 by task syz-executor/6595 Woops! Should have tested with KASAN enabled. This time I did. > > Ugh, why... > Ah, if ->request() fails p9_client_rpc assumes the request was not > written (e.g. write error), so you can't return an error after the > list_add_tail call in p9_fd_request. > > I think you can call p9_conn_cancel with the error and return 0 anyway, > and this paticular workaround will probably work, regardless of whether > it's the correct thing to do here (still haven't had time to look at the > patch here) That works too! Thank you for the pointer. > > Sorry for this mess (even if most of it predates me...) And I'm sorry for the noise from hacking on stuff that I don't fully understand. Fingers crossed (and thank you syzbot) #syz test: upstream aaec5a95d59615523db03dd53c2052f0a87beea7 diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 196060dc6138..dab7f02dc243 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -687,7 +687,13 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) else n = p9_fd_poll(m->client, NULL, NULL); - if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) + /* Failed to send request */ + if (!(n & EPOLLOUT)) { + p9_conn_cancel(m, -EIO); + return 0; + } + + if (!test_and_set_bit(Wworksched, &m->wsched)) schedule_work(&m->wq); return 0; ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-28 4:01 ` K Prateek Nayak @ 2025-03-28 4:43 ` syzbot 2025-03-28 13:06 ` Oleg Nesterov 1 sibling, 0 replies; 27+ messages in thread From: syzbot @ 2025-03-28 4:43 UTC (permalink / raw) To: asmadeus, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro Hello, syzbot has tested the proposed patch and the reproducer did not trigger any issue: Reported-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com Tested-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com Tested on: commit: aaec5a95 pipe_read: don't wake up the writer if the pi.. git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git console output: https://syzkaller.appspot.com/x/log.txt?x=1285fa4c580000 kernel config: https://syzkaller.appspot.com/x/.config?x=8d5a2956e94d7972 dashboard link: https://syzkaller.appspot.com/bug?extid=62262fdc0e01d99573fc compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 patch: https://syzkaller.appspot.com/x/patch.diff?x=121eede4580000 Note: testing is done by a robot and is best-effort only. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-28 4:01 ` K Prateek Nayak 2025-03-28 4:43 ` syzbot @ 2025-03-28 13:06 ` Oleg Nesterov 2025-03-28 13:07 ` syzbot 1 sibling, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2025-03-28 13:06 UTC (permalink / raw) To: K Prateek Nayak, Dominique Martinet Cc: syzbot, brauner, dhowells, ericvh, jack, jlayton, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro On 03/28, K Prateek Nayak wrote: > > --- a/net/9p/trans_fd.c > +++ b/net/9p/trans_fd.c > @@ -687,7 +687,13 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) > else > n = p9_fd_poll(m->client, NULL, NULL); > - if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) > + /* Failed to send request */ > + if (!(n & EPOLLOUT)) { > + p9_conn_cancel(m, -EIO); Thanks a lot Prateek! Can't really the changes in net/9p, but I am not sure. !(n & EPOLLOUT) means that pipe is full, nothing wrong. We need to kick m->rq in this case. Dominique says: For me the problem isn't so much that this gets ERESTARTSYS but that it nevers gets to read the 7 bytes that are available? Yes. Of course I do not pretend I fully understand the problem, but it seems this is because p9_read_work() doesn't set Rworksched and doesn't do schedule_work(&m->rq) if list_empty(&m->req_list). However, if the pipe is full, before the commit aaec5a95d59615523db0 ("pipe_read: don't wake up the writer if the pipe is still full"), p9_read_work() -> p9_fd_read() -> kernel_read(ts->rd) triggered the unnecessary wakeup. This wakeup calls p9_pollwake() shich kicks p9_poll_workfn() -> p9_poll_mux(), and p9_poll_mux() will notice EPOLLIN and schedule_work(&m->rq). May be the patch below makes more sense? Oleg. #syz test: upstream diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 56e62978e502..aa9cd248a243 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -669,7 +669,6 @@ static void p9_poll_mux(struct p9_conn *m) static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) { - __poll_t n; struct p9_trans_fd *ts = client->trans; struct p9_conn *m = &ts->conn; @@ -687,13 +686,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) list_add_tail(&req->req_list, &m->unsent_req_list); spin_unlock(&m->req_lock); - if (test_and_clear_bit(Wpending, &m->wsched)) - n = EPOLLOUT; - else - n = p9_fd_poll(m->client, NULL, NULL); - - if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) - schedule_work(&m->wq); + p9_poll_mux(m); return 0; } ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-28 13:06 ` Oleg Nesterov @ 2025-03-28 13:07 ` syzbot 2025-03-28 13:25 ` Oleg Nesterov 0 siblings, 1 reply; 27+ messages in thread From: syzbot @ 2025-03-28 13:07 UTC (permalink / raw) To: oleg Cc: asmadeus, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro > On 03/28, K Prateek Nayak wrote: >> >> --- a/net/9p/trans_fd.c >> +++ b/net/9p/trans_fd.c >> @@ -687,7 +687,13 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) >> else >> n = p9_fd_poll(m->client, NULL, NULL); >> - if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) >> + /* Failed to send request */ >> + if (!(n & EPOLLOUT)) { >> + p9_conn_cancel(m, -EIO); > > Thanks a lot Prateek! > > Can't really the changes in net/9p, but I am not sure. !(n & EPOLLOUT) > means that pipe is full, nothing wrong. We need to kick m->rq in this > case. > > Dominique says: > > For me the problem isn't so much that this gets ERESTARTSYS but that it > nevers gets to read the 7 bytes that are available? > > Yes. Of course I do not pretend I fully understand the problem, but it > seems this is because p9_read_work() doesn't set Rworksched and doesn't > do schedule_work(&m->rq) if list_empty(&m->req_list). > > However, if the pipe is full, before the commit aaec5a95d59615523db0 > ("pipe_read: don't wake up the writer if the pipe is still full"), > p9_read_work() -> p9_fd_read() -> kernel_read(ts->rd) triggered the > unnecessary wakeup. This wakeup calls p9_pollwake() shich kicks > p9_poll_workfn() -> p9_poll_mux(), and p9_poll_mux() will notice > EPOLLIN and schedule_work(&m->rq). > > May be the patch below makes more sense? > > Oleg. > > #syz test: upstream want either no args or 2 args (repo, branch), got 5 > > diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c > index 56e62978e502..aa9cd248a243 100644 > --- a/net/9p/trans_fd.c > +++ b/net/9p/trans_fd.c > @@ -669,7 +669,6 @@ static void p9_poll_mux(struct p9_conn *m) > > static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) > { > - __poll_t n; > struct p9_trans_fd *ts = client->trans; > struct p9_conn *m = &ts->conn; > > @@ -687,13 +686,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) > list_add_tail(&req->req_list, &m->unsent_req_list); > spin_unlock(&m->req_lock); > > - if (test_and_clear_bit(Wpending, &m->wsched)) > - n = EPOLLOUT; > - else > - n = p9_fd_poll(m->client, NULL, NULL); > - > - if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) > - schedule_work(&m->wq); > + p9_poll_mux(m); > > return 0; > } > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-28 13:07 ` syzbot @ 2025-03-28 13:25 ` Oleg Nesterov 2025-03-28 13:49 ` syzbot 0 siblings, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2025-03-28 13:25 UTC (permalink / raw) To: syzbot Cc: asmadeus, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro On 03/28, syzbot wrote: > > > #syz test: upstream > > want either no args or 2 args (repo, branch), got 5 #syz test diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 56e62978e502..aa9cd248a243 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -669,7 +669,6 @@ static void p9_poll_mux(struct p9_conn *m) static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) { - __poll_t n; struct p9_trans_fd *ts = client->trans; struct p9_conn *m = &ts->conn; @@ -687,13 +686,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) list_add_tail(&req->req_list, &m->unsent_req_list); spin_unlock(&m->req_lock); - if (test_and_clear_bit(Wpending, &m->wsched)) - n = EPOLLOUT; - else - n = p9_fd_poll(m->client, NULL, NULL); - - if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) - schedule_work(&m->wq); + p9_poll_mux(m); return 0; } ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-28 13:25 ` Oleg Nesterov @ 2025-03-28 13:49 ` syzbot 2025-03-28 14:49 ` Oleg Nesterov 0 siblings, 1 reply; 27+ messages in thread From: syzbot @ 2025-03-28 13:49 UTC (permalink / raw) To: asmadeus, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro Hello, syzbot has tested the proposed patch but the reproducer is still triggering an issue: unregister_netdevice: waiting for DEV to become free unregister_netdevice: waiting for batadv0 to become free. Usage count = 3 Tested on: commit: acb4f337 Merge tag 'm68knommu-for-v6.15' of git://git... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=16bab804580000 kernel config: https://syzkaller.appspot.com/x/.config?x=95c3bbe7ce8436a7 dashboard link: https://syzkaller.appspot.com/bug?extid=62262fdc0e01d99573fc compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 patch: https://syzkaller.appspot.com/x/patch.diff?x=146b9bb0580000 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-28 13:49 ` syzbot @ 2025-03-28 14:49 ` Oleg Nesterov 2025-03-28 15:22 ` syzbot 0 siblings, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2025-03-28 14:49 UTC (permalink / raw) To: syzbot Cc: asmadeus, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro On 03/28, syzbot wrote: > > Hello, > > syzbot has tested the proposed patch but the reproducer is still triggering an issue: > unregister_netdevice: waiting for DEV to become free > > unregister_netdevice: waiting for batadv0 to become free. Usage count = 3 Looks "obviously unrelated", but I'm not sure about anything anymore. In any case I leave this to 9p maintainers. Okay, just out of curiosity: #syz test: upstream aaec5a95d59615523db03dd53c2052f0a87beea7 diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 56e62978e502..aa9cd248a243 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -669,7 +669,6 @@ static void p9_poll_mux(struct p9_conn *m) static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) { - __poll_t n; struct p9_trans_fd *ts = client->trans; struct p9_conn *m = &ts->conn; @@ -687,13 +686,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) list_add_tail(&req->req_list, &m->unsent_req_list); spin_unlock(&m->req_lock); - if (test_and_clear_bit(Wpending, &m->wsched)) - n = EPOLLOUT; - else - n = p9_fd_poll(m->client, NULL, NULL); - - if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) - schedule_work(&m->wq); + p9_poll_mux(m); return 0; } ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-28 14:49 ` Oleg Nesterov @ 2025-03-28 15:22 ` syzbot 2025-03-28 17:00 ` Oleg Nesterov 0 siblings, 1 reply; 27+ messages in thread From: syzbot @ 2025-03-28 15:22 UTC (permalink / raw) To: asmadeus, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, oleg, swapnil.sapkal, syzkaller-bugs, v9fs, viro Hello, syzbot has tested the proposed patch and the reproducer did not trigger any issue: Reported-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com Tested-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com Tested on: commit: aaec5a95 pipe_read: don't wake up the writer if the pi.. git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git console output: https://syzkaller.appspot.com/x/log.txt?x=1598064c580000 kernel config: https://syzkaller.appspot.com/x/.config?x=8d5a2956e94d7972 dashboard link: https://syzkaller.appspot.com/bug?extid=62262fdc0e01d99573fc compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 patch: https://syzkaller.appspot.com/x/patch.diff?x=16682a4c580000 Note: testing is done by a robot and is best-effort only. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-28 15:22 ` syzbot @ 2025-03-28 17:00 ` Oleg Nesterov 2025-03-28 17:56 ` K Prateek Nayak 2025-03-29 0:00 ` asmadeus 0 siblings, 2 replies; 27+ messages in thread From: Oleg Nesterov @ 2025-03-28 17:00 UTC (permalink / raw) To: syzbot Cc: asmadeus, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro Dear syzbot, On 03/28, syzbot wrote: > > syzbot has tested the proposed patch and the reproducer did not trigger any issue: > > Reported-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com > Tested-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com Thanks. so the previous syzbot has tested the proposed patch but the reproducer is still triggering an issue: unregister_netdevice: waiting for DEV to become free unregister_netdevice: waiting for batadv0 to become free. Usage count = 3 report suggests that upstream/master has another/unrelated issue(s). As for the patches from me or Prateek (thanks again!), I think that the maintainers should take a look. But at this point I am mostly confident that the bisected commit aaec5a95d5961 ("pipe_read: don't wake up the writer if the pipe is still full") is innocent, it just reveals yet another problem. I guess (I hope ;) Prateek agrees. Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-28 17:00 ` Oleg Nesterov @ 2025-03-28 17:56 ` K Prateek Nayak 2025-03-28 18:20 ` Oleg Nesterov 2025-03-29 0:00 ` asmadeus 1 sibling, 1 reply; 27+ messages in thread From: K Prateek Nayak @ 2025-03-28 17:56 UTC (permalink / raw) To: Oleg Nesterov, syzbot Cc: asmadeus, brauner, dhowells, ericvh, jack, jlayton, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro Hello Oleg, On 3/28/2025 10:30 PM, Oleg Nesterov wrote: > Dear syzbot, > > On 03/28, syzbot wrote: >> >> syzbot has tested the proposed patch and the reproducer did not trigger any issue: >> >> Reported-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com >> Tested-by: syzbot+62262fdc0e01d99573fc@syzkaller.appspotmail.com > Yours is the right approach. I was searching for p9_poll_mux() which would wake the read collector to free pipe buffers but I didn't dig enough and I was too lazy to open code it (guilty!) If this gets picked, feel free to add: Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com> > Thanks. so the previous > > syzbot has tested the proposed patch but the reproducer is still triggering an issue: > unregister_netdevice: waiting for DEV to become free > > unregister_netdevice: waiting for batadv0 to become free. Usage count = 3 > > report suggests that upstream/master has another/unrelated issue(s). I could not reproduce this bit on mainline at commit acb4f33713b9 ("Merge tag 'm68knommu-for-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu") so I too second that it is an unrelated issue. > > As for the patches from me or Prateek (thanks again!), I think that > the maintainers should take a look. > > But at this point I am mostly confident that the bisected commit aaec5a95d5961 > ("pipe_read: don't wake up the writer if the pipe is still full") is innocent, > it just reveals yet another problem. > > I guess (I hope ;) Prateek agrees. I agree aaec5a95d5961 is not guilty! (unlike me) :) > > Oleg. > -- Thanks and Regards, Prateek ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-28 17:56 ` K Prateek Nayak @ 2025-03-28 18:20 ` Oleg Nesterov 0 siblings, 0 replies; 27+ messages in thread From: Oleg Nesterov @ 2025-03-28 18:20 UTC (permalink / raw) To: K Prateek Nayak Cc: syzbot, asmadeus, brauner, dhowells, ericvh, jack, jlayton, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro Hi Prateek, On 03/28, K Prateek Nayak wrote: > > Yours is the right approach. OK, thank you, but lets wait for 9p maintainers. but... > If this gets picked, feel free to add: > > Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com> No! My version is just a slightly updated version of your initial patch. It was you who (unless we both are wrong) actually pinpointed the problem. So. If this is acked by maintainers, please-please send the updated patch with the changelog, and feel free to add my Reviewed-by or Acked-by. Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-28 17:00 ` Oleg Nesterov 2025-03-28 17:56 ` K Prateek Nayak @ 2025-03-29 0:00 ` asmadeus 2025-03-29 14:21 ` Oleg Nesterov 1 sibling, 1 reply; 27+ messages in thread From: asmadeus @ 2025-03-29 0:00 UTC (permalink / raw) To: Oleg Nesterov Cc: syzbot, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro Oleg Nesterov wrote on Fri, Mar 28, 2025 at 06:00:12PM +0100: > As for the patches from me or Prateek (thanks again!), I think that > the maintainers should take a look. > > But at this point I am mostly confident that the bisected commit aaec5a95d5961 > ("pipe_read: don't wake up the writer if the pipe is still full") is innocent, > it just reveals yet another problem. > > I guess (I hope ;) Prateek agrees. Right, so your patch sounds better than Prateek's initial blowing up workaround, but it's a bit weird anyway so let me recap: - that syz repro has this unnatural pattern where the replies are all written before the requests are sent - 9p_read_work() (read worker) has an optimization that if there is no in fly request then there obviously must be nothing to read (9p is 100% client initiated, there's no way the server should send something first), so at this point the reader task is idle (OTOH, we're checking for rx right at p9_conn_create() before anything is sent, so it's not like we're consistent on this...) - p9_fd_request() (sending a new request) has another optimization that only checks for tx: at this point if another request was already in flight then the rx task should have a poll going on for rx, and if there were no in flight request yet then there should be no point in checking for rx, so p9_fd_request() only kick in the tx worker if there is room to send - at this point I don't really get the logic that'll wake the rx thread up either... p9_pollwake() will trigger p9_poll_workfn() (through p9_poll_work) which will call p9_poll_mux() and get the reader kicking again, but I don't know how the waitqueue mechanism is supposed to work (see p9_pollwait()) I'd have expected the tx task to somehow nudge this on, but it doesn't?... - due to the new optimization (aaec5a95d59615 "pipe_read: don't wake up the writer if the pipe is still full"), that 'if there is room to send' check started failing and tx thread doesn't start? Because syzbot passed us a pipe that was already full, or they messed with it after mount? I'm not clear about this point, but I think it's the key here -- the 9p "mount by fd" is great for local pseudo filesystems and things like that but it's been abused by syzbot too much, and I don't want to spend too much time making sure that any unholy things they do with these fd works. If possible I'd like to mark that fd unusable by userspace but I'm honestly doubtful it's possible (if e.g. it was dup'd or something before mount for example...) So, yeah, well, okay I don't mind the patch even if it doesn't make sense with a regular server. We don't really care about trans fd performance either so it's fine if it's a bit slower, and the error Prateek added might happen in a real case if tx queue is full of real requests so I think your approach is good enough. If we understand what's happening here I think it's as good as anything else, but I'd just like it clear in the commit message what syzbot is doing and why the regression happened Thank you both for the thorough analysis and follow ups! -- Dominique Martinet | Asmadeus ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-29 0:00 ` asmadeus @ 2025-03-29 14:21 ` Oleg Nesterov 2025-03-29 23:27 ` asmadeus 0 siblings, 1 reply; 27+ messages in thread From: Oleg Nesterov @ 2025-03-29 14:21 UTC (permalink / raw) To: asmadeus Cc: syzbot, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro First of all, let me remind that I know nothing about 9p or netfs ;) And I am not sure that my patch is the right solution. I am not even sure we need the fix, according to syzbot testing the problem goes away with the fixes from David https://web.git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes but I didn't even try to read them, this is not my area. Now, I'll try to answer some of your questions, but I can be easily wrong. On 03/29, asmadeus@codewreck.org wrote: > > Right, so your patch sounds better than Prateek's initial blowing > up workaround, but it's a bit weird anyway so let me recap: > - that syz repro has this unnatural pattern where the replies are all > written before the requests are sent Yes, > - 9p_read_work() (read worker) has an optimization that if there is no > in fly request then there obviously must be nothing to read (9p is 100% > client initiated, there's no way the server should send something > first), so at this point the reader task is idle Yes. But note that it does kernel_read() -> pipe_read() before it becomes idle. See below. > - p9_fd_request() (sending a new request) has another optimization that > only checks for tx: at this point if another request was already in > flight then the rx task should have a poll going on for rx, and if there > were no in flight request yet then there should be no point in checking > for rx, so p9_fd_request() only kick in the tx worker if there is room > to send Can't comment, but > - at this point I don't really get the logic that'll wake the rx thread > up either... p9_pollwake() will trigger p9_poll_workfn() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes, but where this p9_pollwake() can come from? see below. > - due to the new optimization (aaec5a95d59615 "pipe_read: don't wake up > the writer if the pipe is still full"), that 'if there is room to send' > check started failing and tx thread doesn't start? Again, I can be easily wrong, but no. With or without the optimization above, it doesn't make sense to start the tx thread when the pipe is full, p9_fd_poll() can't report EPOLLOUT. Lets recall that the idle read worker did kernel_read() -> pipe_read(). Before this optimization, pipe_read() did the unnecessary wake_up_interruptible_sync_poll(&pipe->wr_wait); when the pipe was full before the reading _and_ is still full after the reading. This wakeup calls p9_pollwake() which kicks p9_poll_workfn(). p9_poll_workfn() calls p9_poll_mux(). p9_poll_mux() does n = p9_fd_poll(). "n & EPOLLOUT" is false, exactly because this wakeup was unnecessary, so p9_poll_mux() won't do schedule_work(&m->wq), this is fine, But, "n & EPOLLIN" is true, so p9_poll_mux() does schedule_work(&m->rq) and wakes the rx thread. p9_read_work() is called again. It reads more data and (I guess) notices some problem and does p9_conn_cancel(EIO). This no longer happens after the optimization. So in some sense the p9_fd_request() -> p9_poll_mux() hack (which wakes the rx thread in this case) restores the old behaviour. But again, again, quite possibly I completely misread this (nontrivial) code. Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-29 14:21 ` Oleg Nesterov @ 2025-03-29 23:27 ` asmadeus 2025-03-30 10:21 ` Oleg Nesterov 0 siblings, 1 reply; 27+ messages in thread From: asmadeus @ 2025-03-29 23:27 UTC (permalink / raw) To: Oleg Nesterov Cc: syzbot, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro Oleg Nesterov wrote on Sat, Mar 29, 2025 at 03:21:39PM +0100: > First of all, let me remind that I know nothing about 9p or netfs ;) > And I am not sure that my patch is the right solution. > > I am not even sure we need the fix, according to syzbot testing the > problem goes away with the fixes from David > https://web.git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes > but I didn't even try to read them, this is not my area. (gah, I hate emails when one gets added to thread later.. I've just now opened the thread on lore and seen David's test :/) > > - due to the new optimization (aaec5a95d59615 "pipe_read: don't wake up > > the writer if the pipe is still full"), that 'if there is room to send' > > check started failing and tx thread doesn't start? > > Again, I can be easily wrong, but no. > > With or without the optimization above, it doesn't make sense to start > the tx thread when the pipe is full, p9_fd_poll() can't report EPOLLOUT. > > Lets recall that the idle read worker did kernel_read() -> pipe_read(). > Before this optimization, pipe_read() did the unnecessary > > wake_up_interruptible_sync_poll(&pipe->wr_wait); > > when the pipe was full before the reading _and_ is still full after the > reading. > > This wakeup calls p9_pollwake() which kicks p9_poll_workfn(). Aah, that's the bit I didn't get, thank you! > This no longer happens after the optimization. So in some sense the > p9_fd_request() -> p9_poll_mux() hack (which wakes the rx thread in this > case) restores the old behaviour. > > But again, again, quite possibly I completely misread this (nontrivial) > code. Yes, this totally makes sense; I agree with your analysis. So basically 9p was optimizing for this impossible (on a normal server) behaviour in the 9p side (it doesn't make any sense for the tx pipe to be full with 0 in flight request, and tx pipe never goes unfull, and reply comes (was there) before the actual write happened!!), but this old behaviour made it work anyway... So part of me wants to just leave it there and if anything try to make this kind of usage impossible by adding more checks to mount -o trans=fd, but I don't think it's possible to lock down all kind of weird behaviour root users (=syzbot) can engage in... OTOH syzbot does find some useful bugs so I guess it might be worth fixing, I don't know. If David's patch also happens to fix it I guess we can also just wait for that? Thanks, -- Dominique Martinet | Asmadeus ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter 2025-03-29 23:27 ` asmadeus @ 2025-03-30 10:21 ` Oleg Nesterov 0 siblings, 0 replies; 27+ messages in thread From: Oleg Nesterov @ 2025-03-30 10:21 UTC (permalink / raw) To: asmadeus Cc: syzbot, brauner, dhowells, ericvh, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, swapnil.sapkal, syzkaller-bugs, v9fs, viro On 03/30, asmadeus@codewreck.org wrote: > > If David's patch also happens to fix it I guess we can also just wait > for that? Sure, whatever you prefer. Thanks! Oleg. ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <67dedd2f.050a0220.31a16b.003f.GAE@google.com>]
* Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter [not found] <67dedd2f.050a0220.31a16b.003f.GAE@google.com> @ 2025-08-03 12:09 ` syzbot 0 siblings, 0 replies; 27+ messages in thread From: syzbot @ 2025-08-03 12:09 UTC (permalink / raw) To: asmadeus, brauner, dhowells, eadavis, ericvh, hdanton, jack, jlayton, kprateek.nayak, linux-fsdevel, linux-kernel, linux_oss, lucho, mjguzik, netfs, oleg, pc, sfrench, swapnil.sapkal, syzkaller-bugs, v9fs, viro syzbot suspects this issue was fixed by commit: commit da8cf4bd458722d090a788c6e581eeb72695c62f Author: David Howells <dhowells@redhat.com> Date: Tue Jul 1 16:38:36 2025 +0000 netfs: Fix hang due to missing case in final DIO read result collection bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=14cfbcf0580000 start commit: 66701750d556 Merge tag 'io_uring-6.16-20250630' of git://g.. git tree: upstream kernel config: https://syzkaller.appspot.com/x/.config?x=e29b8115bf337f53 dashboard link: https://syzkaller.appspot.com/bug?extid=62262fdc0e01d99573fc syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1400348c580000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17b2048c580000 If the result looks correct, please mark the issue as fixed by replying with: #syz fix: netfs: Fix hang due to missing case in final DIO read result collection For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2025-08-03 12:09 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250323184848.GB14883@redhat.com>
[not found] ` <67e05e30.050a0220.21942d.0003.GAE@google.com>
[not found] ` <20250323194701.GC14883@redhat.com>
[not found] ` <CAGudoHHmvU54MU8dsZy422A4+ZzWTVs7LFevP7NpKzwZ1YOqgg@mail.gmail.com>
[not found] ` <20250323210251.GD14883@redhat.com>
[not found] ` <af0134a7-6f2a-46e1-85aa-c97477bd6ed8@amd.com>
[not found] ` <CAGudoHH9w8VO8069iKf_TsAjnfuRSrgiJ2e2D9-NGEDgXW+Lcw@mail.gmail.com>
[not found] ` <7e377feb-a78b-4055-88cc-2c20f924bf82@amd.com>
[not found] ` <f7585a27-aaef-4334-a1de-5e081f10c901@amd.com>
[not found] ` <ff294b3c-cd24-4aa6-9d03-718ff7087158@amd.com>
2025-03-25 12:15 ` [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter Oleg Nesterov
2025-03-25 12:36 ` Dominique Martinet
2025-03-25 13:04 ` Oleg Nesterov
2025-03-25 14:49 ` K Prateek Nayak
2025-03-25 14:58 ` Dominique Martinet
2025-03-26 12:19 ` Oleg Nesterov
2025-03-26 12:44 ` Oleg Nesterov
2025-03-26 13:05 ` Oleg Nesterov
2025-03-27 17:46 ` K Prateek Nayak
2025-03-27 21:19 ` syzbot
2025-03-27 22:18 ` asmadeus
2025-03-28 4:01 ` K Prateek Nayak
2025-03-28 4:43 ` syzbot
2025-03-28 13:06 ` Oleg Nesterov
2025-03-28 13:07 ` syzbot
2025-03-28 13:25 ` Oleg Nesterov
2025-03-28 13:49 ` syzbot
2025-03-28 14:49 ` Oleg Nesterov
2025-03-28 15:22 ` syzbot
2025-03-28 17:00 ` Oleg Nesterov
2025-03-28 17:56 ` K Prateek Nayak
2025-03-28 18:20 ` Oleg Nesterov
2025-03-29 0:00 ` asmadeus
2025-03-29 14:21 ` Oleg Nesterov
2025-03-29 23:27 ` asmadeus
2025-03-30 10:21 ` Oleg Nesterov
[not found] <67dedd2f.050a0220.31a16b.003f.GAE@google.com>
2025-08-03 12:09 ` syzbot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox