public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Horst Birthelmer <horst@birthelmer.de>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Bernd Schubert <bernd@bsbernd.com>,
	Bernd Schubert <bschubert@ddn.com>,
	 Miklos Szeredi <miklos@szeredi.hu>,
	linux-fsdevel@vger.kernel.org, Jian Huang Li <ali@ddn.com>,
	 stable@vger.kernel.org, Horst Birthelmer <hbirthelmer@ddn.com>
Subject: Re: Re: Re: [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown
Date: Sat, 11 Apr 2026 20:25:26 +0200	[thread overview]
Message-ID: <adqQ9rtoo16jF_cn@fedora.fritz.box> (raw)
In-Reply-To: <CAJnrk1Yb2ABBKFK=KMaU+W10FNazt+h93P445i1USXcN2W45Xw@mail.gmail.com>

On Sat, Apr 11, 2026 at 11:11:10AM -0700, Joanne Koong wrote:
> On Fri, Apr 10, 2026 at 3:08 PM Horst Birthelmer <horst@birthelmer.de> wrote:
> >
> > On Fri, Apr 10, 2026 at 02:24:08PM -0700, Joanne Koong wrote:
> > > On Fri, Apr 10, 2026 at 4:26 AM Bernd Schubert <bernd@bsbernd.com> wrote:
> > > >
> > > Hi Bernd,
> > >
> > > > Hi Joanne,
> > > >
> > > > On 4/10/26 01:09, Joanne Koong wrote:
> > > > > On Thu, Apr 9, 2026 at 4:02 AM Bernd Schubert <bernd@bsbernd.com> wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >> On 10/21/25 23:33, Bernd Schubert wrote:
> > > > >>> Do not merge yet, the current series has not been tested yet.
> > > > >>
> > > > >> I'm glad that that I was hesitating to apply it, the DDN branch had it
> > > > >> for ages and this patch actually introduced a possible fc->num_waiting
> > > > >> issue, because fc->uring->queue_refs might go down to 0 though
> > > > >> fuse_uring_cancel() and then fuse_uring_abort() would never stop and
> > > > >> flush the queues without another addition.
> > > > >>
> > > > >
> > > > > Hi Bernd and Jian,
> > > > >
> > > > > For some reason the "[PATCH 2/2] fs/fuse: fix potential memory leak
> > > > > from fuse_uring_cancel" email was never delivered to my inbox, so I am
> > > > > just going to write my reply to that patch here instead, hope that's
> > > > > ok.
> > > > >
> > > > > Just to summarize, the race is that during unmount, fuse_abort() ->
> > > > > fuse_uring_abort() -> ... -> fuse_uring_teardown_entries() -> ... ->
> > > > > fuse_uring_entry_teardown() gets run but there may still be sqes that
> > > > > are being registered, which results in new ents that are created (and
> > > > > leaked) after the teardown logic has finished and the queues are
> > > > > stopped/dead. The async teardown work (fuse_uring_async_stop_queues())
> > > > > never gets scheduled because at the time of teardown, queue->refs is 0
> > > > > as those sqes have not fully created the ents and grabbed refs yet.
> > > > > fuse_uring_destruct() runs during unmount, but this doesn't clean up
> > > > > the created ents because those registered ents got put on the
> > > > > ent_in_userspace list which fuse_uring_destruct() doesn't go through
> > > > > to free, resulting in those ents being leaked.
> > > > >
> > > > > The root cause of the race is that ents are being registered even when
> > > > > the queue is already stopped/dead. I think if we at registration time
> > > > > check the queue state before calling fuse_uring_prepare_cancel(), we
> > > > > eliminate the race altogether. If we see that the abort path has
> > > > > already triggered (eg queue->stopped == true), we manually free the
> > > > > ent and return an error instead of adding it to a list, eg
> > > > >
> > > > > diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> > > > > index d88a0c05434a..351c19150aae 100644
> > > > > --- a/fs/fuse/dev_uring.c
> > > > > +++ b/fs/fuse/dev_uring.c
> > > > > @@ -969,7 +969,7 @@ static bool is_ring_ready(struct fuse_ring *ring,
> > > > > int current_qid)
> > > > >  /*
> > > > >   * fuse_uring_req_fetch command handling
> > > > >   */
> > > > > -static void fuse_uring_do_register(struct fuse_ring_ent *ent,
> > > > > +static int fuse_uring_do_register(struct fuse_ring_ent *ent,
> > > > >                                    struct io_uring_cmd *cmd,
> > > > >                                    unsigned int issue_flags)
> > > > >  {
> > > > > @@ -978,6 +978,16 @@ static void fuse_uring_do_register(struct
> > > > > fuse_ring_ent *ent,
> > > > >         struct fuse_conn *fc = ring->fc;
> > > > >         struct fuse_iqueue *fiq = &fc->iq;
> > > > >
> > > > > +       spin_lock(&queue->lock);
> > > > > +       /* abort teardown path is running or has run */
> > > > > +       if (queue->stopped) {
> > > > > +               spin_unlock(&queue->lock);
> > > > > +               atomic_dec(&ring->queue_refs);
> > > > > +               kfree(ent);
> > > > > +               return -ECONNABORTED;
> > > > > +       }
> > > > > +       spin_unlock(&queue->lock);
> > > > > +
> > > > >         fuse_uring_prepare_cancel(cmd, issue_flags, ent);
> > > > >
> > > > >         spin_lock(&queue->lock);
> > > > > @@ -994,6 +1004,7 @@ static void fuse_uring_do_register(struct
> > > > > fuse_ring_ent *ent,
> > > > >                         wake_up_all(&fc->blocked_waitq);
> > > > >                 }
> > > > >         }
> > > > > +       return 0;
> > > > >  }
> > > > >
> > > > >  /*
> > > > > @@ -1109,9 +1120,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd,
> > > > >         if (IS_ERR(ent))
> > > > >                 return PTR_ERR(ent);
> > > > >
> > > > > -       fuse_uring_do_register(ent, cmd, issue_flags);
> > > > > -
> > > > > -       return 0;
> > > > > +       return fuse_uring_do_register(ent, cmd, issue_flags);
> > > > >  }
> > > > >
> > > > > There's the scenario where the abort path's "queue->stopped = true"
> > > > > gets set right between when we drop the queue lock and before we call
> > > > > fuse_uring_prepare_cancel(), but the fuse_uring_create_ring_ent()
> > > > > logic that was called before fuse_uring_do_register() has already
> > > > > grabbed the ref on ring->queue_refs, which means in the abort path,
> > > > > the async teardown (fuse_uring_async_stop_queues()) work is guaranteed
> > > > > to run and clean up / free the entry.
> > > >
> > > >
> > > > I don't think your changes are needed, it should be handled by
> > > > IO_URING_F_CANCEL -> fuse_uring_cancel(). That is exactly where the
> > > > initial leak was - these commands came after abort and
> > > > fuse_uring_cancel() in linux upstream then puts the entries onto the
> > > > &queue->ent_in_userspace list.
> > >
> > > I think there are still races if we handle it in fuse_uring_cancel()
> > > that still leak the ent, eg even with the fuse_uring_abort()
> > > queue_refs gating taken out in the original (jian's) patch:
> > > * thread A: fuse_uring_register() ->fuse_uring_create_ring_ent() ->
> > > kzalloc, sets up the entry but hasn't called
> > > atomic_inc(&ring->queue_refs) yet
> > >   concurrently on another thread, thread B: fuse_uring_cancel()
> > > ->fuse_uring_entry_teardown() ->
> > > atomic_dec_return(&queue->ring->queue_refs) -> brings queue_refs down
> > > to 0
> > >   At this instant, queue_Refs == 0. fuse_uring_stop_queues() ->
> > > teardown entries (nothing left) -> checks "if
> > > atomic_read(&ring->queue_refs) > 0", sees this is false, and skips
> > > scheduling any async teardown work
> > >   thread A calls atomic_inc(&ring->queue_refs) for the new ent,
> > > queue_refs is now 1, the ent is now placed on the ent_avail_queue, but
> > > it's never torn down.
> > >   the ent is leaked and there's also a hang now when we hit
> > > fuse_uring_wait_stopped_queues() -> fuse_uring_wait_stopped_queues()
> > > where it sleeps and is never woken since it's waiting for queue refs
> > > to drop to 0
> > >
> > > imo, the change proposed in my last message is more robust and handles
> > > this case since it guarantees the async teardown worker will be
> > > running (since it does the queue state check after the ent has grabbed
> > > the queue ref).
> >
> > Ok so you rely on the fact that fuse_abort_conn() will call
> > fuse_uring_abort() and that sets queue->stopped.
> > This could work, but I would still remove the check for
> > queue_refs > 0 in fuse_uring_abort(), since it just complicates things
> > for no real reason.
> >
> > >
> > > btw, there's also another (separate) race, which neither of our
> > > approaches solve lol. This is the situation where fuse_uring_cancel()
> > > runs right after we call fuse_uring_prepare_cancel() in
> > > fuse_uring_do_register() but before we have set the ent state to
> > > FRRS_AVAILABLE. The ent gets leaked and continues to be used even
> > > though it's canceled, which may lead to use-after-frees. This probably
> > > requires a separate fix, I haven't had time to look much at it yet.
> > > Maybe Horst or Jian has looked at this?
> > >
> > Interesting scenario ... haven't seen that one so far.
> 
> Looking at the io-uring code for how cancels are handled
> (io_uring_try_cancel_uring_cmd()), I was wrong in my prevoius message
> about these two races. io-uring already serializes this for us, the
> io-uring code unconditionally grabs the uring lock before invoking
> file->f_op->uring_cmd() in the cancel path, which means there's no
> interweaving between the fuse registration logic and the cancel logic.
> 
> But I still think the more robust/resilient fix for the memleak is to
> do the preemptive checking at registration time. I think this fixes
> races in the force unmount case between registration and abort that is
> unresolved with the original patch. With the original patch w/
> fuse_uring_abort()'s queue_refs check removed, I think we can still
> hit this:
> 
> registration vs abort:
>   - thread a: io_uring_enter -> register sqe ->
> fuse_uring_create_ring_ent -> allocate ent but doesn't grab queue_ref
> yet
>   - thread b: fuse_conn_destroy() -> fuse_abort_conn() ->
> fuse_uring_abort() -> fuse_uring_stop_queues() ->
> fuse_uring_teardown_entries(), skips scheduling async teardown work
> since queue_refs == 0, returns
>   - thread a: grabs the queue_ref, queue_ref is now 1, rest of
> fuse_uring_do_register() logic executes, ent is now marked cancelable,
> ent state is now available, ent is placed on available queue
>   - thread b: fuse_abort_conn() returns, fuse_wait_aborted() now runs
> and does a "wait_event(ring->stop_waitq,
> atomic_read(&ring->queue_refs) == 0);" which hangs since the waiter
> never gets woken
> 
> whereas if we check preemptively at registration time, we explicjtly
> free the ent and release the queue_ref. I think the preemptive check
> needs to check ring->fc->connected though instead of queue->stopped,
> because there's the race where abort and stop_queues() may have been
> triggered before the register sqe path does queue creation. I'm hoping
> there's a better solution than having to grab the fc lock and checking
> fc->connected though, will try to look more at this next week.
> 
> I think we can hit this hang on a ring creation vs abort race as well:
> * thread a: fuse_uring_cmd() gets called, passes fc->aborted check (not set yet)
> * thread b: abort is called, calls fuse_uring_abort(),
> fuse_uring_abort() is a no-op since ring == NULL right now
> * thread a: creates ring, creates queue, creates entry
> - if thread a takes the queue_ref count before the rest of the abort
> logic, we end up with the same hang as the situation above.
> 
> I think for this we'll need to check fc->connected state under the fc
> lock before doing the "smp_store_release(&fc->ring, ring);" call, eg
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -243,6 +243,11 @@ static struct fuse_ring *fuse_uring_create(struct
> fuse_conn *fc)
>         max_payload_size = max(max_payload_size, fc->max_pages * PAGE_SIZE);
> 
>         spin_lock(&fc->lock);
> +       if (!fc->connected) {
> +               spin_unlock(&fc->lock);
> +               goto out_err;
> +       }
>         if (fc->ring) {
>                 /* race, another thread created the ring in the meantime */
> 
> but this is a separate race from the main one we're talking about on
> this thread.
> 
> Does all of this align with your analysis and Bernd's analysis of the
> situation or am I misanalyzing something here? I'll try to spend more
> time next week looking at this.

It does align with what I have seen, even though my testcase only tested
the teardown when the fuse server ran into a problem.
You expanded that quite a bit ;-)

> 
> Thanks,
> Joanne
> 
> >
> > > > Issue in master is, fuse_uring_stop_queues() might have been run already
> > > > - entries then get leaked and fuse_uring_destruct() later might give a
> > > > warning. That part can be reproduced with xfstests, before it starts any
> > > > of the tests it does some funny start stop actions.
> > > >
> > > > Initial *simple* patch was to either add a new list or to just remove
> > > > the warning and to also handle either that new list or
> > > > queue->ent_in_userspace list  in fuse_uring_destruct(). The comment
> > > > explaining why it is needed was much longer than the rest of the patch.
> > > > The hard part in the long term would be tranfer the knowledge for that
> > > > requirement.
> > >
> > > I think the initial simple patch doesn't address the hang. When the
> > > ent is canceled, it still has the ref on queue_refs, which means
> > > fuse_uring_wait_stopped_queues() will wait for queue_refs == 0
> > > forever. I don't think we ever even get to fuse_uring_destruct().
> > >
> > > Thanks,
> > > Joanne
> > >
> > > >
> > > > You then asked to handle the release directly in fuse_uring_cancel()
> > > > without another list
> > > > https://lore.kernel.org/r/CAJnrk1YaRRKHA-jVPAKZYpydaKcdswLG0XO7pUQZZ4-pTewkHQ@mail.gmail.com
> > > >
> > > > Yes possible and this is what the next patch version does. However,
> > > > given fuse_uring_cancel() runs outside of all the fuse locks, it is racy
> > > > and I therefore asked in the introduction patch not to merge it yet.
> > > >
> > > > https://lore.kernel.org/all/20251021-io-uring-fixes-cancel-mem-leak-v1-0-26b78b2c973c@ddn.com/
> > > >
> > > >
> > > > Turns out my suspicion was right ;)
> > > >
> > > > Queue references might go to 0 when nothing is in flight and then
> > > > fuse_uring_abort(), which _might_ race and come a little later, then
> > > > might not doing anything.
> > > >
> > > >         if (atomic_read(&ring->queue_refs) > 0) {
> > > >                 fuse_uring_abort_end_requests(ring);
> > > >                 fuse_uring_stop_queues(ring);
> > > >         }
> > > >
> > > > As Horst figure out, removing this check for queue_refs avoids the
> > > > issue. I'm rather sure that the check was needed during development and
> > > > avoided some null pointer derefs, as that is what I remember. But I
> > > > don't think it is needed anymore.
> > > >
> > > >
> > > > Thanks,
> > > > Bernd
> > >

  reply	other threads:[~2026-04-11 18:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-21 21:33 [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert
2025-10-21 21:33 ` [PATCH 2/2] fs/fuse: fix potential memory leak from fuse_uring_cancel Bernd Schubert
2026-04-09 11:02 ` [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert
2026-04-09 23:09   ` Joanne Koong
2026-04-10  7:21     ` Horst Birthelmer
2026-04-10 17:09       ` Joanne Koong
2026-04-10 17:18         ` Bernd Schubert
2026-04-10 17:28           ` Joanne Koong
2026-04-10 17:32             ` Bernd Schubert
2026-04-10 19:53               ` Joanne Koong
2026-04-10 18:55         ` Re: " Horst Birthelmer
2026-04-10 20:09           ` Joanne Koong
2026-04-10 21:49             ` Horst Birthelmer
2026-04-10 11:26     ` Bernd Schubert
2026-04-10 21:24       ` Joanne Koong
2026-04-10 22:08         ` Horst Birthelmer
2026-04-11 18:11           ` Joanne Koong
2026-04-11 18:25             ` Horst Birthelmer [this message]
2026-04-11 19:22             ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adqQ9rtoo16jF_cn@fedora.fritz.box \
    --to=horst@birthelmer.de \
    --cc=ali@ddn.com \
    --cc=bernd@bsbernd.com \
    --cc=bschubert@ddn.com \
    --cc=hbirthelmer@ddn.com \
    --cc=joannelkoong@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox