* [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown
@ 2025-10-21 21:33 Bernd Schubert
2025-10-21 21:33 ` [PATCH 2/2] fs/fuse: fix potential memory leak from fuse_uring_cancel Bernd Schubert
2026-04-09 11:02 ` [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert
0 siblings, 2 replies; 6+ messages in thread
From: Bernd Schubert @ 2025-10-21 21:33 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Joanne Koong, linux-fsdevel, Jian Huang Li, Bernd Schubert,
stable
Do not merge yet, the current series has not been tested yet.
The race is only easily reproducible with additional patches that
pin pages during FUSE_IO_URING_CMD_REGISTER - slows it down and then
xfstest's generic/001 triggers it reliably. However, I need to update
these pin patches for linux master.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
Bernd Schubert (1):
fuse: Move ring queues_refs decrement
Jian Huang Li (1):
fs/fuse: fix potential memory leak from fuse_uring_cancel
fs/fuse/dev_uring.c | 33 ++++++++++++++-------------------
1 file changed, 14 insertions(+), 19 deletions(-)
---
base-commit: 6548d364a3e850326831799d7e3ea2d7bb97ba08
change-id: 20251021-io-uring-fixes-cancel-mem-leak-820642677c37
Best regards,
--
Bernd Schubert <bschubert@ddn.com>
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 2/2] fs/fuse: fix potential memory leak from fuse_uring_cancel 2025-10-21 21:33 [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert @ 2025-10-21 21:33 ` Bernd Schubert 2026-04-09 11:02 ` [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert 1 sibling, 0 replies; 6+ messages in thread From: Bernd Schubert @ 2025-10-21 21:33 UTC (permalink / raw) To: Miklos Szeredi Cc: Joanne Koong, linux-fsdevel, Jian Huang Li, Bernd Schubert, stable From: Jian Huang Li <ali@ddn.com> This issue could be observed sometimes during libfuse xfstests, from dmseg prints some like "kernel: WARNING: CPU: 4 PID: 0 at fs/fuse/dev_uring.c:204 fuse_uring_destruct+0x1f5/0x200 [fuse]". The cause is, if when fuse daemon just submitted FUSE_IO_URING_CMD_REGISTER SQEs, then umount or fuse daemon quits at this very early stage. After all uring queues stopped, might have one or more unprocessed FUSE_IO_URING_CMD_REGISTER SQEs get processed then some new ring entities are created and added to ent_avail_queue, and immediately fuse_uring_cancel moved them to ent_in_userspace after SQEs get canceled. These ring entities were not moved to ent_released, and stayed in ent_in_userspace when fuse_uring_destruct was called. One way to solve it would be to also free 'ent_in_userspace' in fuse_uring_destruct(), but from code point of view it is hard to see why it is needed. As suggested by Joanne, another solution is to avoid moving entries in fuse_uring_cancel() to the 'ent_in_userspace' list and just releasing them directly. Fixes: b6236c8407cb ("fuse: {io-uring} Prevent mount point hang on fuse-server termination") Cc: Joanne Koong <joannelkoong@gmail.com> Cc: <stable@vger.kernel.org> # v6.14 Signed-off-by: Jian Huang Li <ali@ddn.com> Signed-off-by: Bernd Schubert <bschubert@ddn.com> --- fs/fuse/dev_uring.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index e7c1095b83b11fe46080c24f539df17e70969e21..d88a0c05434a04668241f09f123d5e3a9cc1621d 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -324,7 +324,7 @@ static void fuse_uring_stop_fuse_req_end(struct fuse_req *req) /* * Release a request/entry on connection tear down */ -static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent) +static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent, int issue_flags) { struct fuse_req *req; struct io_uring_cmd *cmd; @@ -352,7 +352,7 @@ static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent) spin_unlock(&queue->lock); if (cmd) - io_uring_cmd_done(cmd, -ENOTCONN, IO_URING_F_UNLOCKED); + io_uring_cmd_done(cmd, -ENOTCONN, issue_flags); if (req) fuse_uring_stop_fuse_req_end(req); @@ -383,7 +383,7 @@ static void fuse_uring_stop_list_entries(struct list_head *head, /* no queue lock to avoid lock order issues */ list_for_each_entry_safe(ent, next, &to_teardown, list) - fuse_uring_entry_teardown(ent); + fuse_uring_entry_teardown(ent, IO_URING_F_UNLOCKED); } static void fuse_uring_teardown_entries(struct fuse_ring_queue *queue) @@ -499,7 +499,7 @@ static void fuse_uring_cancel(struct io_uring_cmd *cmd, { struct fuse_ring_ent *ent = uring_cmd_to_ring_ent(cmd); struct fuse_ring_queue *queue; - bool need_cmd_done = false; + bool teardown = false; /* * direct access on ent - it must not be destructed as long as @@ -508,17 +508,14 @@ static void fuse_uring_cancel(struct io_uring_cmd *cmd, queue = ent->queue; spin_lock(&queue->lock); if (ent->state == FRRS_AVAILABLE) { - ent->state = FRRS_USERSPACE; - list_move_tail(&ent->list, &queue->ent_in_userspace); - need_cmd_done = true; - ent->cmd = NULL; + ent->state = FRRS_TEARDOWN; + list_del_init(&ent->list); + teardown = true; } spin_unlock(&queue->lock); - if (need_cmd_done) { - /* no queue lock to avoid lock order issues */ - io_uring_cmd_done(cmd, -ENOTCONN, issue_flags); - } + if (teardown) + fuse_uring_entry_teardown(ent, issue_flags); } static void fuse_uring_prepare_cancel(struct io_uring_cmd *cmd, int issue_flags, -- 2.43.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown 2025-10-21 21:33 [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert 2025-10-21 21:33 ` [PATCH 2/2] fs/fuse: fix potential memory leak from fuse_uring_cancel Bernd Schubert @ 2026-04-09 11:02 ` Bernd Schubert 2026-04-09 23:09 ` Joanne Koong 1 sibling, 1 reply; 6+ messages in thread From: Bernd Schubert @ 2026-04-09 11:02 UTC (permalink / raw) To: Bernd Schubert, Miklos Szeredi Cc: Joanne Koong, linux-fsdevel, Jian Huang Li, stable, Horst Birthelmer On 10/21/25 23:33, Bernd Schubert wrote: > Do not merge yet, the current series has not been tested yet. I'm glad that that I was hesitating to apply it, the DDN branch had it for ages and this patch actually introduced a possible fc->num_waiting issue, because fc->uring->queue_refs might go down to 0 though fuse_uring_cancel() and then fuse_uring_abort() would never stop and flush the queues without another addition. Thanks, Bernd > The race is only easily reproducible with additional patches that > pin pages during FUSE_IO_URING_CMD_REGISTER - slows it down and then > xfstest's generic/001 triggers it reliably. However, I need to update > these pin patches for linux master. > > Signed-off-by: Bernd Schubert <bschubert@ddn.com> > --- > Bernd Schubert (1): > fuse: Move ring queues_refs decrement > > Jian Huang Li (1): > fs/fuse: fix potential memory leak from fuse_uring_cancel > > fs/fuse/dev_uring.c | 33 ++++++++++++++------------------- > 1 file changed, 14 insertions(+), 19 deletions(-) > --- > base-commit: 6548d364a3e850326831799d7e3ea2d7bb97ba08 > change-id: 20251021-io-uring-fixes-cancel-mem-leak-820642677c37 > > Best regards, ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown 2026-04-09 11:02 ` [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert @ 2026-04-09 23:09 ` Joanne Koong 2026-04-10 7:21 ` Horst Birthelmer 2026-04-10 11:26 ` Bernd Schubert 0 siblings, 2 replies; 6+ messages in thread From: Joanne Koong @ 2026-04-09 23:09 UTC (permalink / raw) To: Bernd Schubert Cc: Bernd Schubert, Miklos Szeredi, linux-fsdevel, Jian Huang Li, stable, Horst Birthelmer On Thu, Apr 9, 2026 at 4:02 AM Bernd Schubert <bernd@bsbernd.com> wrote: > > > > On 10/21/25 23:33, Bernd Schubert wrote: > > Do not merge yet, the current series has not been tested yet. > > I'm glad that that I was hesitating to apply it, the DDN branch had it > for ages and this patch actually introduced a possible fc->num_waiting > issue, because fc->uring->queue_refs might go down to 0 though > fuse_uring_cancel() and then fuse_uring_abort() would never stop and > flush the queues without another addition. > Hi Bernd and Jian, For some reason the "[PATCH 2/2] fs/fuse: fix potential memory leak from fuse_uring_cancel" email was never delivered to my inbox, so I am just going to write my reply to that patch here instead, hope that's ok. Just to summarize, the race is that during unmount, fuse_abort() -> fuse_uring_abort() -> ... -> fuse_uring_teardown_entries() -> ... -> fuse_uring_entry_teardown() gets run but there may still be sqes that are being registered, which results in new ents that are created (and leaked) after the teardown logic has finished and the queues are stopped/dead. The async teardown work (fuse_uring_async_stop_queues()) never gets scheduled because at the time of teardown, queue->refs is 0 as those sqes have not fully created the ents and grabbed refs yet. fuse_uring_destruct() runs during unmount, but this doesn't clean up the created ents because those registered ents got put on the ent_in_userspace list which fuse_uring_destruct() doesn't go through to free, resulting in those ents being leaked. The root cause of the race is that ents are being registered even when the queue is already stopped/dead. I think if we at registration time check the queue state before calling fuse_uring_prepare_cancel(), we eliminate the race altogether. If we see that the abort path has already triggered (eg queue->stopped == true), we manually free the ent and return an error instead of adding it to a list, eg diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index d88a0c05434a..351c19150aae 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -969,7 +969,7 @@ static bool is_ring_ready(struct fuse_ring *ring, int current_qid) /* * fuse_uring_req_fetch command handling */ -static void fuse_uring_do_register(struct fuse_ring_ent *ent, +static int fuse_uring_do_register(struct fuse_ring_ent *ent, struct io_uring_cmd *cmd, unsigned int issue_flags) { @@ -978,6 +978,16 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ent, struct fuse_conn *fc = ring->fc; struct fuse_iqueue *fiq = &fc->iq; + spin_lock(&queue->lock); + /* abort teardown path is running or has run */ + if (queue->stopped) { + spin_unlock(&queue->lock); + atomic_dec(&ring->queue_refs); + kfree(ent); + return -ECONNABORTED; + } + spin_unlock(&queue->lock); + fuse_uring_prepare_cancel(cmd, issue_flags, ent); spin_lock(&queue->lock); @@ -994,6 +1004,7 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ent, wake_up_all(&fc->blocked_waitq); } } + return 0; } /* @@ -1109,9 +1120,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd, if (IS_ERR(ent)) return PTR_ERR(ent); - fuse_uring_do_register(ent, cmd, issue_flags); - - return 0; + return fuse_uring_do_register(ent, cmd, issue_flags); } There's the scenario where the abort path's "queue->stopped = true" gets set right between when we drop the queue lock and before we call fuse_uring_prepare_cancel(), but the fuse_uring_create_ring_ent() logic that was called before fuse_uring_do_register() has already grabbed the ref on ring->queue_refs, which means in the abort path, the async teardown (fuse_uring_async_stop_queues()) work is guaranteed to run and clean up / free the entry. Thanks, Joanne > Thanks, > Bernd > > > The race is only easily reproducible with additional patches that > > pin pages during FUSE_IO_URING_CMD_REGISTER - slows it down and then > > xfstest's generic/001 triggers it reliably. However, I need to update > > these pin patches for linux master. > > > > Signed-off-by: Bernd Schubert <bschubert@ddn.com> > > --- > > Bernd Schubert (1): > > fuse: Move ring queues_refs decrement > > > > Jian Huang Li (1): > > fs/fuse: fix potential memory leak from fuse_uring_cancel > > > > fs/fuse/dev_uring.c | 33 ++++++++++++++------------------- > > 1 file changed, 14 insertions(+), 19 deletions(-) > > --- > > base-commit: 6548d364a3e850326831799d7e3ea2d7bb97ba08 > > change-id: 20251021-io-uring-fixes-cancel-mem-leak-820642677c37 > > > > Best regards, > ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Re: [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown 2026-04-09 23:09 ` Joanne Koong @ 2026-04-10 7:21 ` Horst Birthelmer 2026-04-10 11:26 ` Bernd Schubert 1 sibling, 0 replies; 6+ messages in thread From: Horst Birthelmer @ 2026-04-10 7:21 UTC (permalink / raw) To: Joanne Koong Cc: Bernd Schubert, Bernd Schubert, Miklos Szeredi, linux-fsdevel, Jian Huang Li, stable, Horst Birthelmer On Thu, Apr 09, 2026 at 04:09:53PM -0700, Joanne Koong wrote: > On Thu, Apr 9, 2026 at 4:02 AM Bernd Schubert <bernd@bsbernd.com> wrote: > > > > > > > > On 10/21/25 23:33, Bernd Schubert wrote: > > > Do not merge yet, the current series has not been tested yet. > > > > I'm glad that that I was hesitating to apply it, the DDN branch had it > > for ages and this patch actually introduced a possible fc->num_waiting > > issue, because fc->uring->queue_refs might go down to 0 though > > fuse_uring_cancel() and then fuse_uring_abort() would never stop and > > flush the queues without another addition. > > > > Hi Bernd and Jian, > > For some reason the "[PATCH 2/2] fs/fuse: fix potential memory leak > from fuse_uring_cancel" email was never delivered to my inbox, so I am > just going to write my reply to that patch here instead, hope that's > ok. > > Just to summarize, the race is that during unmount, fuse_abort() -> > fuse_uring_abort() -> ... -> fuse_uring_teardown_entries() -> ... -> > fuse_uring_entry_teardown() gets run but there may still be sqes that > are being registered, which results in new ents that are created (and > leaked) after the teardown logic has finished and the queues are > stopped/dead. The async teardown work (fuse_uring_async_stop_queues()) > never gets scheduled because at the time of teardown, queue->refs is 0 > as those sqes have not fully created the ents and grabbed refs yet. > fuse_uring_destruct() runs during unmount, but this doesn't clean up > the created ents because those registered ents got put on the > ent_in_userspace list which fuse_uring_destruct() doesn't go through > to free, resulting in those ents being leaked. > > The root cause of the race is that ents are being registered even when > the queue is already stopped/dead. I think if we at registration time > check the queue state before calling fuse_uring_prepare_cancel(), we > eliminate the race altogether. If we see that the abort path has > already triggered (eg queue->stopped == true), we manually free the > ent and return an error instead of adding it to a list, eg In my case (Bernd mentioned that I was investigating a hang during umount) there were a lot of requests created during teardown, so what happened was very similar, but for exact the opposite reason. In fuse_uring_abort() queue_refs was already 0 due to an optimization where the ring teardown ran before fuse_abort_conn(). Thus the queue->stopped was never set. How do we make sure that fuse_uring_teardown_entries() has not been called by fuse_uring_async_stop_queues()? Maybe I'm missing something? My fix was to remove the check for queue_refs > 0 in fuse_uring_abort() and make sure that even if the teardown was complete nothing bad happens in fuse_uring_abort_end_requests() and fuse_uring_stop_queues(). Thanks, Horst ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown 2026-04-09 23:09 ` Joanne Koong 2026-04-10 7:21 ` Horst Birthelmer @ 2026-04-10 11:26 ` Bernd Schubert 1 sibling, 0 replies; 6+ messages in thread From: Bernd Schubert @ 2026-04-10 11:26 UTC (permalink / raw) To: Joanne Koong Cc: Bernd Schubert, Miklos Szeredi, linux-fsdevel, Jian Huang Li, stable, Horst Birthelmer Hi Joanne, On 4/10/26 01:09, Joanne Koong wrote: > On Thu, Apr 9, 2026 at 4:02 AM Bernd Schubert <bernd@bsbernd.com> wrote: >> >> >> >> On 10/21/25 23:33, Bernd Schubert wrote: >>> Do not merge yet, the current series has not been tested yet. >> >> I'm glad that that I was hesitating to apply it, the DDN branch had it >> for ages and this patch actually introduced a possible fc->num_waiting >> issue, because fc->uring->queue_refs might go down to 0 though >> fuse_uring_cancel() and then fuse_uring_abort() would never stop and >> flush the queues without another addition. >> > > Hi Bernd and Jian, > > For some reason the "[PATCH 2/2] fs/fuse: fix potential memory leak > from fuse_uring_cancel" email was never delivered to my inbox, so I am > just going to write my reply to that patch here instead, hope that's > ok. > > Just to summarize, the race is that during unmount, fuse_abort() -> > fuse_uring_abort() -> ... -> fuse_uring_teardown_entries() -> ... -> > fuse_uring_entry_teardown() gets run but there may still be sqes that > are being registered, which results in new ents that are created (and > leaked) after the teardown logic has finished and the queues are > stopped/dead. The async teardown work (fuse_uring_async_stop_queues()) > never gets scheduled because at the time of teardown, queue->refs is 0 > as those sqes have not fully created the ents and grabbed refs yet. > fuse_uring_destruct() runs during unmount, but this doesn't clean up > the created ents because those registered ents got put on the > ent_in_userspace list which fuse_uring_destruct() doesn't go through > to free, resulting in those ents being leaked. > > The root cause of the race is that ents are being registered even when > the queue is already stopped/dead. I think if we at registration time > check the queue state before calling fuse_uring_prepare_cancel(), we > eliminate the race altogether. If we see that the abort path has > already triggered (eg queue->stopped == true), we manually free the > ent and return an error instead of adding it to a list, eg > > diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c > index d88a0c05434a..351c19150aae 100644 > --- a/fs/fuse/dev_uring.c > +++ b/fs/fuse/dev_uring.c > @@ -969,7 +969,7 @@ static bool is_ring_ready(struct fuse_ring *ring, > int current_qid) > /* > * fuse_uring_req_fetch command handling > */ > -static void fuse_uring_do_register(struct fuse_ring_ent *ent, > +static int fuse_uring_do_register(struct fuse_ring_ent *ent, > struct io_uring_cmd *cmd, > unsigned int issue_flags) > { > @@ -978,6 +978,16 @@ static void fuse_uring_do_register(struct > fuse_ring_ent *ent, > struct fuse_conn *fc = ring->fc; > struct fuse_iqueue *fiq = &fc->iq; > > + spin_lock(&queue->lock); > + /* abort teardown path is running or has run */ > + if (queue->stopped) { > + spin_unlock(&queue->lock); > + atomic_dec(&ring->queue_refs); > + kfree(ent); > + return -ECONNABORTED; > + } > + spin_unlock(&queue->lock); > + > fuse_uring_prepare_cancel(cmd, issue_flags, ent); > > spin_lock(&queue->lock); > @@ -994,6 +1004,7 @@ static void fuse_uring_do_register(struct > fuse_ring_ent *ent, > wake_up_all(&fc->blocked_waitq); > } > } > + return 0; > } > > /* > @@ -1109,9 +1120,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd, > if (IS_ERR(ent)) > return PTR_ERR(ent); > > - fuse_uring_do_register(ent, cmd, issue_flags); > - > - return 0; > + return fuse_uring_do_register(ent, cmd, issue_flags); > } > > There's the scenario where the abort path's "queue->stopped = true" > gets set right between when we drop the queue lock and before we call > fuse_uring_prepare_cancel(), but the fuse_uring_create_ring_ent() > logic that was called before fuse_uring_do_register() has already > grabbed the ref on ring->queue_refs, which means in the abort path, > the async teardown (fuse_uring_async_stop_queues()) work is guaranteed > to run and clean up / free the entry. I don't think your changes are needed, it should be handled by IO_URING_F_CANCEL -> fuse_uring_cancel(). That is exactly where the initial leak was - these commands came after abort and fuse_uring_cancel() in linux upstream then puts the entries onto the &queue->ent_in_userspace list. Issue in master is, fuse_uring_stop_queues() might have been run already - entries then get leaked and fuse_uring_destruct() later might give a warning. That part can be reproduced with xfstests, before it starts any of the tests it does some funny start stop actions. Initial *simple* patch was to either add a new list or to just remove the warning and to also handle either that new list or queue->ent_in_userspace list in fuse_uring_destruct(). The comment explaining why it is needed was much longer than the rest of the patch. The hard part in the long term would be tranfer the knowledge for that requirement. You then asked to handle the release directly in fuse_uring_cancel() without another list https://lore.kernel.org/r/CAJnrk1YaRRKHA-jVPAKZYpydaKcdswLG0XO7pUQZZ4-pTewkHQ@mail.gmail.com Yes possible and this is what the next patch version does. However, given fuse_uring_cancel() runs outside of all the fuse locks, it is racy and I therefore asked in the introduction patch not to merge it yet. https://lore.kernel.org/all/20251021-io-uring-fixes-cancel-mem-leak-v1-0-26b78b2c973c@ddn.com/ Turns out my suspicion was right ;) Queue references might go to 0 when nothing is in flight and then fuse_uring_abort(), which _might_ race and come a little later, then might not doing anything. if (atomic_read(&ring->queue_refs) > 0) { fuse_uring_abort_end_requests(ring); fuse_uring_stop_queues(ring); } As Horst figure out, removing this check for queue_refs avoids the issue. I'm rather sure that the check was needed during development and avoided some null pointer derefs, as that is what I remember. But I don't think it is needed anymore. Thanks, Bernd ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-10 11:26 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-21 21:33 [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert 2025-10-21 21:33 ` [PATCH 2/2] fs/fuse: fix potential memory leak from fuse_uring_cancel Bernd Schubert 2026-04-09 11:02 ` [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert 2026-04-09 23:09 ` Joanne Koong 2026-04-10 7:21 ` Horst Birthelmer 2026-04-10 11:26 ` Bernd Schubert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox