From: Bernd Schubert <bernd@bsbernd.com>
To: Joanne Koong <joannelkoong@gmail.com>,
Horst Birthelmer <horst@birthelmer.de>
Cc: Bernd Schubert <bschubert@ddn.com>,
Miklos Szeredi <miklos@szeredi.hu>,
linux-fsdevel@vger.kernel.org, Jian Huang Li <ali@ddn.com>,
stable@vger.kernel.org, Horst Birthelmer <hbirthelmer@ddn.com>
Subject: Re: [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown
Date: Sat, 11 Apr 2026 21:22:19 +0200 [thread overview]
Message-ID: <f27651af-e5c0-4c3e-8baa-fa2d7232cb4d@bsbernd.com> (raw)
In-Reply-To: <CAJnrk1Yb2ABBKFK=KMaU+W10FNazt+h93P445i1USXcN2W45Xw@mail.gmail.com>
On 4/11/26 20:11, Joanne Koong wrote:
> On Fri, Apr 10, 2026 at 3:08 PM Horst Birthelmer <horst@birthelmer.de> wrote:
>>
>> On Fri, Apr 10, 2026 at 02:24:08PM -0700, Joanne Koong wrote:
>>> On Fri, Apr 10, 2026 at 4:26 AM Bernd Schubert <bernd@bsbernd.com> wrote:
>>>>
>>> Hi Bernd,
>>>
>>>> Hi Joanne,
>>>>
>>>> On 4/10/26 01:09, Joanne Koong wrote:
>>>>> On Thu, Apr 9, 2026 at 4:02 AM Bernd Schubert <bernd@bsbernd.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/21/25 23:33, Bernd Schubert wrote:
>>>>>>> Do not merge yet, the current series has not been tested yet.
>>>>>>
>>>>>> I'm glad that that I was hesitating to apply it, the DDN branch had it
>>>>>> for ages and this patch actually introduced a possible fc->num_waiting
>>>>>> issue, because fc->uring->queue_refs might go down to 0 though
>>>>>> fuse_uring_cancel() and then fuse_uring_abort() would never stop and
>>>>>> flush the queues without another addition.
>>>>>>
>>>>>
>>>>> Hi Bernd and Jian,
>>>>>
>>>>> For some reason the "[PATCH 2/2] fs/fuse: fix potential memory leak
>>>>> from fuse_uring_cancel" email was never delivered to my inbox, so I am
>>>>> just going to write my reply to that patch here instead, hope that's
>>>>> ok.
>>>>>
>>>>> Just to summarize, the race is that during unmount, fuse_abort() ->
>>>>> fuse_uring_abort() -> ... -> fuse_uring_teardown_entries() -> ... ->
>>>>> fuse_uring_entry_teardown() gets run but there may still be sqes that
>>>>> are being registered, which results in new ents that are created (and
>>>>> leaked) after the teardown logic has finished and the queues are
>>>>> stopped/dead. The async teardown work (fuse_uring_async_stop_queues())
>>>>> never gets scheduled because at the time of teardown, queue->refs is 0
>>>>> as those sqes have not fully created the ents and grabbed refs yet.
>>>>> fuse_uring_destruct() runs during unmount, but this doesn't clean up
>>>>> the created ents because those registered ents got put on the
>>>>> ent_in_userspace list which fuse_uring_destruct() doesn't go through
>>>>> to free, resulting in those ents being leaked.
>>>>>
>>>>> The root cause of the race is that ents are being registered even when
>>>>> the queue is already stopped/dead. I think if we at registration time
>>>>> check the queue state before calling fuse_uring_prepare_cancel(), we
>>>>> eliminate the race altogether. If we see that the abort path has
>>>>> already triggered (eg queue->stopped == true), we manually free the
>>>>> ent and return an error instead of adding it to a list, eg
>>>>>
>>>>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>>>>> index d88a0c05434a..351c19150aae 100644
>>>>> --- a/fs/fuse/dev_uring.c
>>>>> +++ b/fs/fuse/dev_uring.c
>>>>> @@ -969,7 +969,7 @@ static bool is_ring_ready(struct fuse_ring *ring,
>>>>> int current_qid)
>>>>> /*
>>>>> * fuse_uring_req_fetch command handling
>>>>> */
>>>>> -static void fuse_uring_do_register(struct fuse_ring_ent *ent,
>>>>> +static int fuse_uring_do_register(struct fuse_ring_ent *ent,
>>>>> struct io_uring_cmd *cmd,
>>>>> unsigned int issue_flags)
>>>>> {
>>>>> @@ -978,6 +978,16 @@ static void fuse_uring_do_register(struct
>>>>> fuse_ring_ent *ent,
>>>>> struct fuse_conn *fc = ring->fc;
>>>>> struct fuse_iqueue *fiq = &fc->iq;
>>>>>
>>>>> + spin_lock(&queue->lock);
>>>>> + /* abort teardown path is running or has run */
>>>>> + if (queue->stopped) {
>>>>> + spin_unlock(&queue->lock);
>>>>> + atomic_dec(&ring->queue_refs);
>>>>> + kfree(ent);
>>>>> + return -ECONNABORTED;
>>>>> + }
>>>>> + spin_unlock(&queue->lock);
>>>>> +
>>>>> fuse_uring_prepare_cancel(cmd, issue_flags, ent);
>>>>>
>>>>> spin_lock(&queue->lock);
>>>>> @@ -994,6 +1004,7 @@ static void fuse_uring_do_register(struct
>>>>> fuse_ring_ent *ent,
>>>>> wake_up_all(&fc->blocked_waitq);
>>>>> }
>>>>> }
>>>>> + return 0;
>>>>> }
>>>>>
>>>>> /*
>>>>> @@ -1109,9 +1120,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd,
>>>>> if (IS_ERR(ent))
>>>>> return PTR_ERR(ent);
>>>>>
>>>>> - fuse_uring_do_register(ent, cmd, issue_flags);
>>>>> -
>>>>> - return 0;
>>>>> + return fuse_uring_do_register(ent, cmd, issue_flags);
>>>>> }
>>>>>
>>>>> There's the scenario where the abort path's "queue->stopped = true"
>>>>> gets set right between when we drop the queue lock and before we call
>>>>> fuse_uring_prepare_cancel(), but the fuse_uring_create_ring_ent()
>>>>> logic that was called before fuse_uring_do_register() has already
>>>>> grabbed the ref on ring->queue_refs, which means in the abort path,
>>>>> the async teardown (fuse_uring_async_stop_queues()) work is guaranteed
>>>>> to run and clean up / free the entry.
>>>>
>>>>
>>>> I don't think your changes are needed, it should be handled by
>>>> IO_URING_F_CANCEL -> fuse_uring_cancel(). That is exactly where the
>>>> initial leak was - these commands came after abort and
>>>> fuse_uring_cancel() in linux upstream then puts the entries onto the
>>>> &queue->ent_in_userspace list.
>>>
>>> I think there are still races if we handle it in fuse_uring_cancel()
>>> that still leak the ent, eg even with the fuse_uring_abort()
>>> queue_refs gating taken out in the original (jian's) patch:
>>> * thread A: fuse_uring_register() ->fuse_uring_create_ring_ent() ->
>>> kzalloc, sets up the entry but hasn't called
>>> atomic_inc(&ring->queue_refs) yet
>>> concurrently on another thread, thread B: fuse_uring_cancel()
>>> ->fuse_uring_entry_teardown() ->
>>> atomic_dec_return(&queue->ring->queue_refs) -> brings queue_refs down
>>> to 0
>>> At this instant, queue_Refs == 0. fuse_uring_stop_queues() ->
>>> teardown entries (nothing left) -> checks "if
>>> atomic_read(&ring->queue_refs) > 0", sees this is false, and skips
>>> scheduling any async teardown work
>>> thread A calls atomic_inc(&ring->queue_refs) for the new ent,
>>> queue_refs is now 1, the ent is now placed on the ent_avail_queue, but
>>> it's never torn down.
>>> the ent is leaked and there's also a hang now when we hit
>>> fuse_uring_wait_stopped_queues() -> fuse_uring_wait_stopped_queues()
>>> where it sleeps and is never woken since it's waiting for queue refs
>>> to drop to 0
>>>
>>> imo, the change proposed in my last message is more robust and handles
>>> this case since it guarantees the async teardown worker will be
>>> running (since it does the queue state check after the ent has grabbed
>>> the queue ref).
>>
>> Ok so you rely on the fact that fuse_abort_conn() will call
>> fuse_uring_abort() and that sets queue->stopped.
>> This could work, but I would still remove the check for
>> queue_refs > 0 in fuse_uring_abort(), since it just complicates things
>> for no real reason.
>>
>>>
>>> btw, there's also another (separate) race, which neither of our
>>> approaches solve lol. This is the situation where fuse_uring_cancel()
>>> runs right after we call fuse_uring_prepare_cancel() in
>>> fuse_uring_do_register() but before we have set the ent state to
>>> FRRS_AVAILABLE. The ent gets leaked and continues to be used even
>>> though it's canceled, which may lead to use-after-frees. This probably
>>> requires a separate fix, I haven't had time to look much at it yet.
>>> Maybe Horst or Jian has looked at this?
>>>
>> Interesting scenario ... haven't seen that one so far.
>
> Looking at the io-uring code for how cancels are handled
> (io_uring_try_cancel_uring_cmd()), I was wrong in my prevoius message
> about these two races. io-uring already serializes this for us, the
> io-uring code unconditionally grabs the uring lock before invoking
> file->f_op->uring_cmd() in the cancel path, which means there's no
> interweaving between the fuse registration logic and the cancel logic.
>
> But I still think the more robust/resilient fix for the memleak is to
> do the preemptive checking at registration time. I think this fixes
> races in the force unmount case between registration and abort that is
> unresolved with the original patch. With the original patch w/
> fuse_uring_abort()'s queue_refs check removed, I think we can still
> hit this:
I need to go through the other messages, but I still do not see any
registration time leak. At least not with the additional patch we have
here to tear down entries through IO_URING_F_CANCEL
Sorry, besides also looking into ublk now (for main work), also in
progress to fix an issue with reduced queues and also still on the
libfuse part of sync-init....
>
> registration vs abort:
> - thread a: io_uring_enter -> register sqe ->
> fuse_uring_create_ring_ent -> allocate ent but doesn't grab queue_ref
> yet
> - thread b: fuse_conn_destroy() -> fuse_abort_conn() ->
> fuse_uring_abort() -> fuse_uring_stop_queues() ->
> fuse_uring_teardown_entries(), skips scheduling async teardown work
> since queue_refs == 0, returns
> - thread a: grabs the queue_ref, queue_ref is now 1, rest of
> fuse_uring_do_register() logic executes, ent is now marked cancelable,
> ent state is now available, ent is placed on available queue
> - thread b: fuse_abort_conn() returns, fuse_wait_aborted() now runs
> and does a "wait_event(ring->stop_waitq,
> atomic_read(&ring->queue_refs) == 0);" which hangs since the waiter
> never gets woken
>
> whereas if we check preemptively at registration time, we explicjtly
> free the ent and release the queue_ref. I think the preemptive check
> needs to check ring->fc->connected though instead of queue->stopped,
> because there's the race where abort and stop_queues() may have been
> triggered before the register sqe path does queue creation. I'm hoping
> there's a better solution than having to grab the fc lock and checking
> fc->connected though, will try to look more at this next week.
>
> I think we can hit this hang on a ring creation vs abort race as well:
> * thread a: fuse_uring_cmd() gets called, passes fc->aborted check (not set yet)
> * thread b: abort is called, calls fuse_uring_abort(),
> fuse_uring_abort() is a no-op since ring == NULL right now
> * thread a: creates ring, creates queue, creates entry
> - if thread a takes the queue_ref count before the rest of the abort
> logic, we end up with the same hang as the situation above.
IO-uring sends IO_URING_F_CANCEL for every registred command. We never
had a leak you describe. Upstream has a leak because it does not free
'queue->ent_in_userspace' in fuse_uring_destruct. I'm fine with the
addition in fuse_uring_cancel() (although the just freeing the entries
in the list is much simpler and race free).
Please let's not make it anymore complex.
Thanks,
Bernd
prev parent reply other threads:[~2026-04-11 19:22 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-21 21:33 [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert
2025-10-21 21:33 ` [PATCH 2/2] fs/fuse: fix potential memory leak from fuse_uring_cancel Bernd Schubert
2026-04-09 11:02 ` [PATCH 0/2] fuse: Fix possible memleak at startup with immediate teardown Bernd Schubert
2026-04-09 23:09 ` Joanne Koong
2026-04-10 7:21 ` Horst Birthelmer
2026-04-10 17:09 ` Joanne Koong
2026-04-10 17:18 ` Bernd Schubert
2026-04-10 17:28 ` Joanne Koong
2026-04-10 17:32 ` Bernd Schubert
2026-04-10 19:53 ` Joanne Koong
2026-04-10 18:55 ` Re: " Horst Birthelmer
2026-04-10 20:09 ` Joanne Koong
2026-04-10 21:49 ` Horst Birthelmer
2026-04-10 11:26 ` Bernd Schubert
2026-04-10 21:24 ` Joanne Koong
2026-04-10 22:08 ` Horst Birthelmer
2026-04-11 18:11 ` Joanne Koong
2026-04-11 18:25 ` Horst Birthelmer
2026-04-11 19:22 ` Bernd Schubert [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f27651af-e5c0-4c3e-8baa-fa2d7232cb4d@bsbernd.com \
--to=bernd@bsbernd.com \
--cc=ali@ddn.com \
--cc=bschubert@ddn.com \
--cc=hbirthelmer@ddn.com \
--cc=horst@birthelmer.de \
--cc=joannelkoong@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox