All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bernd Schubert <bschubert@ddn.com>
To: Brian Song <hibriansong@gmail.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [QEMU/FUSE] Discussion on Proper Termination and Async Cancellation in fuse-over-io_uring
Date: Mon, 4 Aug 2025 11:33:13 +0000	[thread overview]
Message-ID: <577bf373-92cb-4160-a49e-e29d3615a308@ddn.com> (raw)
In-Reply-To: <dc326a4b-f6fa-435a-b614-208e03f61556@gmail.com>

Hi Brian,

sorry for my late reply, just back from vacation and fighting through
my mails.

On 8/4/25 01:33, Brian Song wrote:
> 
> 
> On 2025-08-01 12:09 p.m., Brian Song wrote:
>> Hi Bernd,
>>
>> We are currently working on implementing termination support for fuse- 
>> over-io_uring in QEMU, and right now we are focusing on how to clean up 
>> in-flight SQEs properly. Our main question is about how well the kernel 
>> supports robust cancellation for these fuse-over-io_uring SQEs. Does it 
>> actually implement cancellation beyond destroying the io_uring queue?
>>
>> In QEMU FUSE export, we need a way to quickly and cleanly detach from 
>> the event loop and cancel any pending SQEs when an export is no longer 
>> in use. Ideally, we want to avoid the more drastic measure of having to 
>> close the entire /dev/fuse fd just to gracefully terminate outstanding 
>> operations.
>>
>> We are not sure if there's an existing code path that supports async 
>> cancel for these in-flight SQEs in the fuse-over-io_uring setup, or if 
>> additional callbacks might be needed to fully integrate with the 
>> kernel's async cancel mechanism. We also realized libfuse manages 
>> shutdowns differently, typically by signaling a thread via eventfd 
>> rather than relying on async cancel.
>>
>> Would love to hear your thoughts or suggestions on this!
>>
>> Thanks,
>> Brian
> 
> I looked into the kernel codebase and came up with some initial ideas, 
> which might not be entirely accurate:
> 
> The IORING_OP_ASYNC_CANCEL operation can only cancel io_uring ring 
> resources and a limited set of request types. It does not clean up 
> resources related to fuse-over-io_uring, such as in-use entries.
> IORING_OP_ASYNC_CANCEL
> -> submit/enter
> -> io_uring/opdef.c:: .issue = io_async_cancel,
> 	-> __io_async_cancel
> 		-> io_try_cancel ==> Can only cancel few types of requests
> 
> 
> Currently, full cleanup of both io_uring and FUSE data structures for 
> fuse-over-io_uring only happens in two cases:  [since we have mark these 
> SQEs cancelable when we commit_and_fetch everytime(mentioned below)]
> 1.When the FUSE daemon exits (exit syscall)
> 2.During execve, which triggers the kernel path:
> 
> io_uring_files_cancel =>
> io_uring_try_cancel_uring_cmd =>
> file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL | IO_URING_F_COMPLETE_DEFER)
> 
> 
> 
> Below is a state diagram (mermaid graph) of a fuse_uring entry inside 
> the kernel:
> 
> graph TD
>      A["Userspace daemon"] --> 
> B["FUSE_IO_URING_CMD_REGISTER<br/>Register buffer"]
>      B --> C["Create fuse_ring_ent"]
>      C --> D["State: FRRS_AVAILABLE<br/>Added to ent_avail_queue"]
> 
>      E["FUSE filesystem operation"] --> F["Generate FUSE request"]
>      F --> G["fuse_uring_queue_fuse_req()"]
>      G --> H{"Check ent_avail_queue"}
> 
>      H -->|Entry available| I["Take entry from queue<br/>Assign to FUSE 
> request"]
>      H -->|No entry available| J["Request goes to fuse_req_queue and waits"]
> 
>      I --> K["fuse_uring_dispatch_ent()"]
>      K --> L["State: FRRS_USERSPACE<br/>Move to ent_in_userspace"]
>      L --> M["Notify userspace to process"]
> 
>      N["Process exit / daemon termination"] --> 
> O["io_uring_try_cancel_uring_cmd() <br/> >> NOTE Since we marked the 
> entry IORING_URING_CMD_CANCELABLE <br/> in the previous fuse_uring_cmd , 
> try_cancel_uring_cmd will call <br/> fuse_uring_cmd to 'delete' it <<"]
>      O --> P["fuse_uring_cancel()"]
>      P --> Q{"Is entry state AVAILABLE?"}
> 
>      Q -->|Yes| R[">> equivalent to 'delete' << Directly change to 
> USERSPACE<br/>Move to ent_in_userspace"]
>      Q -->|No| S["Do nothing"]
> 
>      R --> T["io_uring_cmd_done(-ENOTCONN)"]
>      T --> U["Entry is 'disguised' as completed<br/>Will no longer 
> handle new FUSE requests"]
> 
>      V["Practical effects of cancellation:"] --> W["1. Prevent new FUSE 
> requests from using this entry<br/>2. Release io_uring command 
> resources<br/>3. Does not affect already assigned FUSE requests"]
> 
> 
> 
> When the kernel is waiting for VFS requests and the corresponding entry 
> is idle, its state is FRRS_AVAILABLE. Once a request is handed off to 
> the userspace daemon, the entry's state transitions to FRRS_USERSPACE.
> 
> The fuse_uring_cmd function handles the COMMIT_AND_FETCH operation. If a 
> cmd call carries the IO_URING_F_CANCEL flag, fuse_uring_cancel is 
> invoked to mark the entry state as FRRS_USERSPACE, making it unavailable 
> for future requests from the VFS.
> 
> If the IORING_URING_CMD_CANCELABLE flag is not set, before committing 
> and fetching, we first call fuse_uring_prepare_cancel to mark the entry 
> as IORING_URING_CMD_CANCELABLE. This indicates that if the daemon exits 
> or an execve happens during fetch, the kernel can call 
> io_uring_try_cancel_uring_cmd to safely clean up these SQEs/CQEs and 
> related fuse resource.
> 
> Back to our previous issue, when deleting a FUSE export in QEMU, we hit 
> a crash due to an invalid CQE handler. This happened because the SQEs we 
> previously submitted hadn't returned yet by the time we shut down and 
> deleted the export.
> 
> To avoid this, we need to ensure that no further CQEs are returned and 
> no CQE handler is triggered. We need to either:
> 
> * Prevent any further user operations before calling blk_exp_close_all
> 
> or
> 
> * Require the userspace to trigger few specific operations that causes 
> the kernel to return all outstanding CQEs, and then the daemon can send 
> io_uring_cmd with the IO_URING_F_CANCEL flag to mark all entries as 
> unavailable (FRRS_USERSPACE) "delete operation", ensuring the kernel 
> won't assign them to future VFS requests.
> 
> 
> 

I have to admit that I'm confused why you can't use umount, isn't that
the most graceful way to shutdown a connection?

If you need another custom way for some reasons, we probably need
to add it.


Thanks,
Bernd

  reply	other threads:[~2025-08-04 11:34 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-01 16:09 [QEMU/FUSE] Discussion on Proper Termination and Async Cancellation in fuse-over-io_uring Brian Song
2025-08-03 23:33 ` Brian Song
2025-08-04 11:33   ` Bernd Schubert [this message]
2025-08-04 12:29     ` Kevin Wolf
2025-08-05  4:11     ` Brian Song
2025-08-07  9:05       ` Bernd Schubert
2025-08-07 15:15         ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=577bf373-92cb-4160-a49e-e29d3615a308@ddn.com \
    --to=bschubert@ddn.com \
    --cc=hibriansong@gmail.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.