From: Horst Birthelmer <horst@birthelmer.de>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Bernd Schubert <bernd@bsbernd.com>,
Bernd Schubert <bschubert@ddn.com>,
Miklos Szeredi <miklos@szeredi.hu>,
linux-fsdevel@vger.kernel.org, Jian Huang Li <ali@ddn.com>,
stable@vger.kernel.org, Horst Birthelmer <hbirthelmer@ddn.com>
Subject: Re: Re: Re: [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown
Date: Sat, 11 Apr 2026 20:25:26 +0200 [thread overview]
Message-ID: <adqQ9rtoo16jF_cn@fedora.fritz.box> (raw)
In-Reply-To: <CAJnrk1Yb2ABBKFK=KMaU+W10FNazt+h93P445i1USXcN2W45Xw@mail.gmail.com>
On Sat, Apr 11, 2026 at 11:11:10AM -0700, Joanne Koong wrote:
> On Fri, Apr 10, 2026 at 3:08 PM Horst Birthelmer <horst@birthelmer.de> wrote:
> >
> > On Fri, Apr 10, 2026 at 02:24:08PM -0700, Joanne Koong wrote:
> > > On Fri, Apr 10, 2026 at 4:26 AM Bernd Schubert <bernd@bsbernd.com> wrote:
> > > >
> > > Hi Bernd,
> > >
> > > > Hi Joanne,
> > > >
> > > > On 4/10/26 01:09, Joanne Koong wrote:
> > > > > On Thu, Apr 9, 2026 at 4:02 AM Bernd Schubert <bernd@bsbernd.com> wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >> On 10/21/25 23:33, Bernd Schubert wrote:
> > > > >>> Do not merge yet, the current series has not been tested yet.
> > > > >>
> > > > >> I'm glad that that I was hesitating to apply it, the DDN branch had it
> > > > >> for ages and this patch actually introduced a possible fc->num_waiting
> > > > >> issue, because fc->uring->queue_refs might go down to 0 though
> > > > >> fuse_uring_cancel() and then fuse_uring_abort() would never stop and
> > > > >> flush the queues without another addition.
> > > > >>
> > > > >
> > > > > Hi Bernd and Jian,
> > > > >
> > > > > For some reason the "[PATCH 2/2] fs/fuse: fix potential memory leak
> > > > > from fuse_uring_cancel" email was never delivered to my inbox, so I am
> > > > > just going to write my reply to that patch here instead, hope that's
> > > > > ok.
> > > > >
> > > > > Just to summarize, the race is that during unmount, fuse_abort() ->
> > > > > fuse_uring_abort() -> ... -> fuse_uring_teardown_entries() -> ... ->
> > > > > fuse_uring_entry_teardown() gets run but there may still be sqes that
> > > > > are being registered, which results in new ents that are created (and
> > > > > leaked) after the teardown logic has finished and the queues are
> > > > > stopped/dead. The async teardown work (fuse_uring_async_stop_queues())
> > > > > never gets scheduled because at the time of teardown, queue->refs is 0
> > > > > as those sqes have not fully created the ents and grabbed refs yet.
> > > > > fuse_uring_destruct() runs during unmount, but this doesn't clean up
> > > > > the created ents because those registered ents got put on the
> > > > > ent_in_userspace list which fuse_uring_destruct() doesn't go through
> > > > > to free, resulting in those ents being leaked.
> > > > >
> > > > > The root cause of the race is that ents are being registered even when
> > > > > the queue is already stopped/dead. I think if we at registration time
> > > > > check the queue state before calling fuse_uring_prepare_cancel(), we
> > > > > eliminate the race altogether. If we see that the abort path has
> > > > > already triggered (eg queue->stopped == true), we manually free the
> > > > > ent and return an error instead of adding it to a list, eg
> > > > >
> > > > > diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> > > > > index d88a0c05434a..351c19150aae 100644
> > > > > --- a/fs/fuse/dev_uring.c
> > > > > +++ b/fs/fuse/dev_uring.c
> > > > > @@ -969,7 +969,7 @@ static bool is_ring_ready(struct fuse_ring *ring,
> > > > > int current_qid)
> > > > > /*
> > > > > * fuse_uring_req_fetch command handling
> > > > > */
> > > > > -static void fuse_uring_do_register(struct fuse_ring_ent *ent,
> > > > > +static int fuse_uring_do_register(struct fuse_ring_ent *ent,
> > > > > struct io_uring_cmd *cmd,
> > > > > unsigned int issue_flags)
> > > > > {
> > > > > @@ -978,6 +978,16 @@ static void fuse_uring_do_register(struct
> > > > > fuse_ring_ent *ent,
> > > > > struct fuse_conn *fc = ring->fc;
> > > > > struct fuse_iqueue *fiq = &fc->iq;
> > > > >
> > > > > + spin_lock(&queue->lock);
> > > > > + /* abort teardown path is running or has run */
> > > > > + if (queue->stopped) {
> > > > > + spin_unlock(&queue->lock);
> > > > > + atomic_dec(&ring->queue_refs);
> > > > > + kfree(ent);
> > > > > + return -ECONNABORTED;
> > > > > + }
> > > > > + spin_unlock(&queue->lock);
> > > > > +
> > > > > fuse_uring_prepare_cancel(cmd, issue_flags, ent);
> > > > >
> > > > > spin_lock(&queue->lock);
> > > > > @@ -994,6 +1004,7 @@ static void fuse_uring_do_register(struct
> > > > > fuse_ring_ent *ent,
> > > > > wake_up_all(&fc->blocked_waitq);
> > > > > }
> > > > > }
> > > > > + return 0;
> > > > > }
> > > > >
> > > > > /*
> > > > > @@ -1109,9 +1120,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd,
> > > > > if (IS_ERR(ent))
> > > > > return PTR_ERR(ent);
> > > > >
> > > > > - fuse_uring_do_register(ent, cmd, issue_flags);
> > > > > -
> > > > > - return 0;
> > > > > + return fuse_uring_do_register(ent, cmd, issue_flags);
> > > > > }
> > > > >
> > > > > There's the scenario where the abort path's "queue->stopped = true"
> > > > > gets set right between when we drop the queue lock and before we call
> > > > > fuse_uring_prepare_cancel(), but the fuse_uring_create_ring_ent()
> > > > > logic that was called before fuse_uring_do_register() has already
> > > > > grabbed the ref on ring->queue_refs, which means in the abort path,
> > > > > the async teardown (fuse_uring_async_stop_queues()) work is guaranteed
> > > > > to run and clean up / free the entry.
> > > >
> > > >
> > > > I don't think your changes are needed, it should be handled by
> > > > IO_URING_F_CANCEL -> fuse_uring_cancel(). That is exactly where the
> > > > initial leak was - these commands came after abort and
> > > > fuse_uring_cancel() in linux upstream then puts the entries onto the
> > > > &queue->ent_in_userspace list.
> > >
> > > I think there are still races if we handle it in fuse_uring_cancel()
> > > that still leak the ent, eg even with the fuse_uring_abort()
> > > queue_refs gating taken out in the original (jian's) patch:
> > > * thread A: fuse_uring_register() ->fuse_uring_create_ring_ent() ->
> > > kzalloc, sets up the entry but hasn't called
> > > atomic_inc(&ring->queue_refs) yet
> > > concurrently on another thread, thread B: fuse_uring_cancel()
> > > ->fuse_uring_entry_teardown() ->
> > > atomic_dec_return(&queue->ring->queue_refs) -> brings queue_refs down
> > > to 0
> > > At this instant, queue_Refs == 0. fuse_uring_stop_queues() ->
> > > teardown entries (nothing left) -> checks "if
> > > atomic_read(&ring->queue_refs) > 0", sees this is false, and skips
> > > scheduling any async teardown work
> > > thread A calls atomic_inc(&ring->queue_refs) for the new ent,
> > > queue_refs is now 1, the ent is now placed on the ent_avail_queue, but
> > > it's never torn down.
> > > the ent is leaked and there's also a hang now when we hit
> > > fuse_uring_wait_stopped_queues() -> fuse_uring_wait_stopped_queues()
> > > where it sleeps and is never woken since it's waiting for queue refs
> > > to drop to 0
> > >
> > > imo, the change proposed in my last message is more robust and handles
> > > this case since it guarantees the async teardown worker will be
> > > running (since it does the queue state check after the ent has grabbed
> > > the queue ref).
> >
> > Ok so you rely on the fact that fuse_abort_conn() will call
> > fuse_uring_abort() and that sets queue->stopped.
> > This could work, but I would still remove the check for
> > queue_refs > 0 in fuse_uring_abort(), since it just complicates things
> > for no real reason.
> >
> > >
> > > btw, there's also another (separate) race, which neither of our
> > > approaches solve lol. This is the situation where fuse_uring_cancel()
> > > runs right after we call fuse_uring_prepare_cancel() in
> > > fuse_uring_do_register() but before we have set the ent state to
> > > FRRS_AVAILABLE. The ent gets leaked and continues to be used even
> > > though it's canceled, which may lead to use-after-frees. This probably
> > > requires a separate fix, I haven't had time to look much at it yet.
> > > Maybe Horst or Jian has looked at this?
> > >
> > Interesting scenario ... haven't seen that one so far.
>
> Looking at the io-uring code for how cancels are handled
> (io_uring_try_cancel_uring_cmd()), I was wrong in my prevoius message
> about these two races. io-uring already serializes this for us, the
> io-uring code unconditionally grabs the uring lock before invoking
> file->f_op->uring_cmd() in the cancel path, which means there's no
> interweaving between the fuse registration logic and the cancel logic.
>
> But I still think the more robust/resilient fix for the memleak is to
> do the preemptive checking at registration time. I think this fixes
> races in the force unmount case between registration and abort that is
> unresolved with the original patch. With the original patch w/
> fuse_uring_abort()'s queue_refs check removed, I think we can still
> hit this:
>
> registration vs abort:
> - thread a: io_uring_enter -> register sqe ->
> fuse_uring_create_ring_ent -> allocate ent but doesn't grab queue_ref
> yet
> - thread b: fuse_conn_destroy() -> fuse_abort_conn() ->
> fuse_uring_abort() -> fuse_uring_stop_queues() ->
> fuse_uring_teardown_entries(), skips scheduling async teardown work
> since queue_refs == 0, returns
> - thread a: grabs the queue_ref, queue_ref is now 1, rest of
> fuse_uring_do_register() logic executes, ent is now marked cancelable,
> ent state is now available, ent is placed on available queue
> - thread b: fuse_abort_conn() returns, fuse_wait_aborted() now runs
> and does a "wait_event(ring->stop_waitq,
> atomic_read(&ring->queue_refs) == 0);" which hangs since the waiter
> never gets woken
>
> whereas if we check preemptively at registration time, we explicjtly
> free the ent and release the queue_ref. I think the preemptive check
> needs to check ring->fc->connected though instead of queue->stopped,
> because there's the race where abort and stop_queues() may have been
> triggered before the register sqe path does queue creation. I'm hoping
> there's a better solution than having to grab the fc lock and checking
> fc->connected though, will try to look more at this next week.
>
> I think we can hit this hang on a ring creation vs abort race as well:
> * thread a: fuse_uring_cmd() gets called, passes fc->aborted check (not set yet)
> * thread b: abort is called, calls fuse_uring_abort(),
> fuse_uring_abort() is a no-op since ring == NULL right now
> * thread a: creates ring, creates queue, creates entry
> - if thread a takes the queue_ref count before the rest of the abort
> logic, we end up with the same hang as the situation above.
>
> I think for this we'll need to check fc->connected state under the fc
> lock before doing the "smp_store_release(&fc->ring, ring);" call, eg
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -243,6 +243,11 @@ static struct fuse_ring *fuse_uring_create(struct
> fuse_conn *fc)
> max_payload_size = max(max_payload_size, fc->max_pages * PAGE_SIZE);
>
> spin_lock(&fc->lock);
> + if (!fc->connected) {
> + spin_unlock(&fc->lock);
> + goto out_err;
> + }
> if (fc->ring) {
> /* race, another thread created the ring in the meantime */
>
> but this is a separate race from the main one we're talking about on
> this thread.
>
> Does all of this align with your analysis and Bernd's analysis of the
> situation or am I misanalyzing something here? I'll try to spend more
> time next week looking at this.
It does align with what I have seen, even though my testcase only tested
the teardown when the fuse server ran into a problem.
You expanded that quite a bit ;-)
>
> Thanks,
> Joanne
>
> >
> > > > Issue in master is, fuse_uring_stop_queues() might have been run already
> > > > - entries then get leaked and fuse_uring_destruct() later might give a
> > > > warning. That part can be reproduced with xfstests, before it starts any
> > > > of the tests it does some funny start stop actions.
> > > >
> > > > Initial *simple* patch was to either add a new list or to just remove
> > > > the warning and to also handle either that new list or
> > > > queue->ent_in_userspace list in fuse_uring_destruct(). The comment
> > > > explaining why it is needed was much longer than the rest of the patch.
> > > > The hard part in the long term would be tranfer the knowledge for that
> > > > requirement.
> > >
> > > I think the initial simple patch doesn't address the hang. When the
> > > ent is canceled, it still has the ref on queue_refs, which means
> > > fuse_uring_wait_stopped_queues() will wait for queue_refs == 0
> > > forever. I don't think we ever even get to fuse_uring_destruct().
> > >
> > > Thanks,
> > > Joanne
> > >
> > > >
> > > > You then asked to handle the release directly in fuse_uring_cancel()
> > > > without another list
> > > > https://lore.kernel.org/r/CAJnrk1YaRRKHA-jVPAKZYpydaKcdswLG0XO7pUQZZ4-pTewkHQ@mail.gmail.com
> > > >
> > > > Yes possible and this is what the next patch version does. However,
> > > > given fuse_uring_cancel() runs outside of all the fuse locks, it is racy
> > > > and I therefore asked in the introduction patch not to merge it yet.
> > > >
> > > > https://lore.kernel.org/all/20251021-io-uring-fixes-cancel-mem-leak-v1-0-26b78b2c973c@ddn.com/
> > > >
> > > >
> > > > Turns out my suspicion was right ;)
> > > >
> > > > Queue references might go to 0 when nothing is in flight and then
> > > > fuse_uring_abort(), which _might_ race and come a little later, then
> > > > might not doing anything.
> > > >
> > > > if (atomic_read(&ring->queue_refs) > 0) {
> > > > fuse_uring_abort_end_requests(ring);
> > > > fuse_uring_stop_queues(ring);
> > > > }
> > > >
> > > > As Horst figure out, removing this check for queue_refs avoids the
> > > > issue. I'm rather sure that the check was needed during development and
> > > > avoided some null pointer derefs, as that is what I remember. But I
> > > > don't think it is needed anymore.
> > > >
> > > >
> > > > Thanks,
> > > > Bernd
> > >
next prev parent reply other threads:[~2026-04-11 18:31 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-21 21:33 [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert
2025-10-21 21:33 ` [PATCH 1/2] fuse: Move ring queues_refs decrement Bernd Schubert
2025-10-21 21:33 ` [PATCH 2/2] fs/fuse: fix potential memory leak from fuse_uring_cancel Bernd Schubert
2026-04-09 11:02 ` [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert
2026-04-09 23:09 ` Joanne Koong
2026-04-10 7:21 ` Horst Birthelmer
2026-04-10 17:09 ` Joanne Koong
2026-04-10 17:18 ` Bernd Schubert
2026-04-10 17:28 ` Joanne Koong
2026-04-10 17:32 ` Bernd Schubert
2026-04-10 19:53 ` Joanne Koong
2026-04-10 18:55 ` Re: " Horst Birthelmer
2026-04-10 20:09 ` Joanne Koong
2026-04-10 21:49 ` Horst Birthelmer
2026-04-10 11:26 ` Bernd Schubert
2026-04-10 21:24 ` Joanne Koong
2026-04-10 22:08 ` Horst Birthelmer
2026-04-11 18:11 ` Joanne Koong
2026-04-11 18:25 ` Horst Birthelmer [this message]
2026-04-11 19:22 ` Bernd Schubert
2026-04-13 15:56 ` Joanne Koong
2026-04-13 16:41 ` Bernd Schubert
2026-04-13 23:24 ` Joanne Koong
2026-04-14 21:48 ` Bernd Schubert
2026-04-15 0:37 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adqQ9rtoo16jF_cn@fedora.fritz.box \
--to=horst@birthelmer.de \
--cc=ali@ddn.com \
--cc=bernd@bsbernd.com \
--cc=bschubert@ddn.com \
--cc=hbirthelmer@ddn.com \
--cc=joannelkoong@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.