From: Horst Birthelmer <horst@birthelmer.de>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Bernd Schubert <bernd@bsbernd.com>,
Bernd Schubert <bschubert@ddn.com>,
Miklos Szeredi <miklos@szeredi.hu>,
linux-fsdevel@vger.kernel.org, Jian Huang Li <ali@ddn.com>,
stable@vger.kernel.org, Horst Birthelmer <hbirthelmer@ddn.com>
Subject: Re: Re: [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown
Date: Sat, 11 Apr 2026 00:08:29 +0200 [thread overview]
Message-ID: <adlyjDaxLZyHcSun@fedora> (raw)
In-Reply-To: <CAJnrk1aoxGMGNZi+OwdoET6ahhGHp_7dw__=dmOWW+PMxnsj2w@mail.gmail.com>
On Fri, Apr 10, 2026 at 02:24:08PM -0700, Joanne Koong wrote:
> On Fri, Apr 10, 2026 at 4:26 AM Bernd Schubert <bernd@bsbernd.com> wrote:
> >
> Hi Bernd,
>
> > Hi Joanne,
> >
> > On 4/10/26 01:09, Joanne Koong wrote:
> > > On Thu, Apr 9, 2026 at 4:02 AM Bernd Schubert <bernd@bsbernd.com> wrote:
> > >>
> > >>
> > >>
> > >> On 10/21/25 23:33, Bernd Schubert wrote:
> > >>> Do not merge yet, the current series has not been tested yet.
> > >>
> > >> I'm glad that that I was hesitating to apply it, the DDN branch had it
> > >> for ages and this patch actually introduced a possible fc->num_waiting
> > >> issue, because fc->uring->queue_refs might go down to 0 though
> > >> fuse_uring_cancel() and then fuse_uring_abort() would never stop and
> > >> flush the queues without another addition.
> > >>
> > >
> > > Hi Bernd and Jian,
> > >
> > > For some reason the "[PATCH 2/2] fs/fuse: fix potential memory leak
> > > from fuse_uring_cancel" email was never delivered to my inbox, so I am
> > > just going to write my reply to that patch here instead, hope that's
> > > ok.
> > >
> > > Just to summarize, the race is that during unmount, fuse_abort() ->
> > > fuse_uring_abort() -> ... -> fuse_uring_teardown_entries() -> ... ->
> > > fuse_uring_entry_teardown() gets run but there may still be sqes that
> > > are being registered, which results in new ents that are created (and
> > > leaked) after the teardown logic has finished and the queues are
> > > stopped/dead. The async teardown work (fuse_uring_async_stop_queues())
> > > never gets scheduled because at the time of teardown, queue->refs is 0
> > > as those sqes have not fully created the ents and grabbed refs yet.
> > > fuse_uring_destruct() runs during unmount, but this doesn't clean up
> > > the created ents because those registered ents got put on the
> > > ent_in_userspace list which fuse_uring_destruct() doesn't go through
> > > to free, resulting in those ents being leaked.
> > >
> > > The root cause of the race is that ents are being registered even when
> > > the queue is already stopped/dead. I think if we at registration time
> > > check the queue state before calling fuse_uring_prepare_cancel(), we
> > > eliminate the race altogether. If we see that the abort path has
> > > already triggered (eg queue->stopped == true), we manually free the
> > > ent and return an error instead of adding it to a list, eg
> > >
> > > diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> > > index d88a0c05434a..351c19150aae 100644
> > > --- a/fs/fuse/dev_uring.c
> > > +++ b/fs/fuse/dev_uring.c
> > > @@ -969,7 +969,7 @@ static bool is_ring_ready(struct fuse_ring *ring,
> > > int current_qid)
> > > /*
> > > * fuse_uring_req_fetch command handling
> > > */
> > > -static void fuse_uring_do_register(struct fuse_ring_ent *ent,
> > > +static int fuse_uring_do_register(struct fuse_ring_ent *ent,
> > > struct io_uring_cmd *cmd,
> > > unsigned int issue_flags)
> > > {
> > > @@ -978,6 +978,16 @@ static void fuse_uring_do_register(struct
> > > fuse_ring_ent *ent,
> > > struct fuse_conn *fc = ring->fc;
> > > struct fuse_iqueue *fiq = &fc->iq;
> > >
> > > + spin_lock(&queue->lock);
> > > + /* abort teardown path is running or has run */
> > > + if (queue->stopped) {
> > > + spin_unlock(&queue->lock);
> > > + atomic_dec(&ring->queue_refs);
> > > + kfree(ent);
> > > + return -ECONNABORTED;
> > > + }
> > > + spin_unlock(&queue->lock);
> > > +
> > > fuse_uring_prepare_cancel(cmd, issue_flags, ent);
> > >
> > > spin_lock(&queue->lock);
> > > @@ -994,6 +1004,7 @@ static void fuse_uring_do_register(struct
> > > fuse_ring_ent *ent,
> > > wake_up_all(&fc->blocked_waitq);
> > > }
> > > }
> > > + return 0;
> > > }
> > >
> > > /*
> > > @@ -1109,9 +1120,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd,
> > > if (IS_ERR(ent))
> > > return PTR_ERR(ent);
> > >
> > > - fuse_uring_do_register(ent, cmd, issue_flags);
> > > -
> > > - return 0;
> > > + return fuse_uring_do_register(ent, cmd, issue_flags);
> > > }
> > >
> > > There's the scenario where the abort path's "queue->stopped = true"
> > > gets set right between when we drop the queue lock and before we call
> > > fuse_uring_prepare_cancel(), but the fuse_uring_create_ring_ent()
> > > logic that was called before fuse_uring_do_register() has already
> > > grabbed the ref on ring->queue_refs, which means in the abort path,
> > > the async teardown (fuse_uring_async_stop_queues()) work is guaranteed
> > > to run and clean up / free the entry.
> >
> >
> > I don't think your changes are needed, it should be handled by
> > IO_URING_F_CANCEL -> fuse_uring_cancel(). That is exactly where the
> > initial leak was - these commands came after abort and
> > fuse_uring_cancel() in linux upstream then puts the entries onto the
> > &queue->ent_in_userspace list.
>
> I think there are still races if we handle it in fuse_uring_cancel()
> that still leak the ent, eg even with the fuse_uring_abort()
> queue_refs gating taken out in the original (jian's) patch:
> * thread A: fuse_uring_register() ->fuse_uring_create_ring_ent() ->
> kzalloc, sets up the entry but hasn't called
> atomic_inc(&ring->queue_refs) yet
> concurrently on another thread, thread B: fuse_uring_cancel()
> ->fuse_uring_entry_teardown() ->
> atomic_dec_return(&queue->ring->queue_refs) -> brings queue_refs down
> to 0
> At this instant, queue_Refs == 0. fuse_uring_stop_queues() ->
> teardown entries (nothing left) -> checks "if
> atomic_read(&ring->queue_refs) > 0", sees this is false, and skips
> scheduling any async teardown work
> thread A calls atomic_inc(&ring->queue_refs) for the new ent,
> queue_refs is now 1, the ent is now placed on the ent_avail_queue, but
> it's never torn down.
> the ent is leaked and there's also a hang now when we hit
> fuse_uring_wait_stopped_queues() -> fuse_uring_wait_stopped_queues()
> where it sleeps and is never woken since it's waiting for queue refs
> to drop to 0
>
> imo, the change proposed in my last message is more robust and handles
> this case since it guarantees the async teardown worker will be
> running (since it does the queue state check after the ent has grabbed
> the queue ref).
Ok so you rely on the fact that fuse_abort_conn() will call
fuse_uring_abort() and that sets queue->stopped.
This could work, but I would still remove the check for
queue_refs > 0 in fuse_uring_abort(), since it just complicates things
for no real reason.
>
> btw, there's also another (separate) race, which neither of our
> approaches solve lol. This is the situation where fuse_uring_cancel()
> runs right after we call fuse_uring_prepare_cancel() in
> fuse_uring_do_register() but before we have set the ent state to
> FRRS_AVAILABLE. The ent gets leaked and continues to be used even
> though it's canceled, which may lead to use-after-frees. This probably
> requires a separate fix, I haven't had time to look much at it yet.
> Maybe Horst or Jian has looked at this?
>
Interesting scenario ... haven't seen that one so far.
> > Issue in master is, fuse_uring_stop_queues() might have been run already
> > - entries then get leaked and fuse_uring_destruct() later might give a
> > warning. That part can be reproduced with xfstests, before it starts any
> > of the tests it does some funny start stop actions.
> >
> > Initial *simple* patch was to either add a new list or to just remove
> > the warning and to also handle either that new list or
> > queue->ent_in_userspace list in fuse_uring_destruct(). The comment
> > explaining why it is needed was much longer than the rest of the patch.
> > The hard part in the long term would be tranfer the knowledge for that
> > requirement.
>
> I think the initial simple patch doesn't address the hang. When the
> ent is canceled, it still has the ref on queue_refs, which means
> fuse_uring_wait_stopped_queues() will wait for queue_refs == 0
> forever. I don't think we ever even get to fuse_uring_destruct().
>
> Thanks,
> Joanne
>
> >
> > You then asked to handle the release directly in fuse_uring_cancel()
> > without another list
> > https://lore.kernel.org/r/CAJnrk1YaRRKHA-jVPAKZYpydaKcdswLG0XO7pUQZZ4-pTewkHQ@mail.gmail.com
> >
> > Yes possible and this is what the next patch version does. However,
> > given fuse_uring_cancel() runs outside of all the fuse locks, it is racy
> > and I therefore asked in the introduction patch not to merge it yet.
> >
> > https://lore.kernel.org/all/20251021-io-uring-fixes-cancel-mem-leak-v1-0-26b78b2c973c@ddn.com/
> >
> >
> > Turns out my suspicion was right ;)
> >
> > Queue references might go to 0 when nothing is in flight and then
> > fuse_uring_abort(), which _might_ race and come a little later, then
> > might not doing anything.
> >
> > if (atomic_read(&ring->queue_refs) > 0) {
> > fuse_uring_abort_end_requests(ring);
> > fuse_uring_stop_queues(ring);
> > }
> >
> > As Horst figure out, removing this check for queue_refs avoids the
> > issue. I'm rather sure that the check was needed during development and
> > avoided some null pointer derefs, as that is what I remember. But I
> > don't think it is needed anymore.
> >
> >
> > Thanks,
> > Bernd
>
next prev parent reply other threads:[~2026-04-10 22:16 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-21 21:33 [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert
2025-10-21 21:33 ` [PATCH 1/2] fuse: Move ring queues_refs decrement Bernd Schubert
2025-10-21 21:33 ` [PATCH 2/2] fs/fuse: fix potential memory leak from fuse_uring_cancel Bernd Schubert
2026-04-09 11:02 ` [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert
2026-04-09 23:09 ` Joanne Koong
2026-04-10 7:21 ` Horst Birthelmer
2026-04-10 17:09 ` Joanne Koong
2026-04-10 17:18 ` Bernd Schubert
2026-04-10 17:28 ` Joanne Koong
2026-04-10 17:32 ` Bernd Schubert
2026-04-10 19:53 ` Joanne Koong
2026-04-10 18:55 ` Re: " Horst Birthelmer
2026-04-10 20:09 ` Joanne Koong
2026-04-10 21:49 ` Horst Birthelmer
2026-04-10 11:26 ` Bernd Schubert
2026-04-10 21:24 ` Joanne Koong
2026-04-10 22:08 ` Horst Birthelmer [this message]
2026-04-11 18:11 ` Joanne Koong
2026-04-11 18:25 ` Horst Birthelmer
2026-04-11 19:22 ` Bernd Schubert
2026-04-13 15:56 ` Joanne Koong
2026-04-13 16:41 ` Bernd Schubert
2026-04-13 23:24 ` Joanne Koong
2026-04-14 21:48 ` Bernd Schubert
2026-04-15 0:37 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adlyjDaxLZyHcSun@fedora \
--to=horst@birthelmer.de \
--cc=ali@ddn.com \
--cc=bernd@bsbernd.com \
--cc=bschubert@ddn.com \
--cc=hbirthelmer@ddn.com \
--cc=joannelkoong@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.