qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [QEMU/FUSE] Discussion on Proper Termination and Async Cancellation in fuse-over-io_uring
@ 2025-08-01 16:09 Brian Song
  2025-08-03 23:33 ` Brian Song
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Song @ 2025-08-01 16:09 UTC (permalink / raw)
  To: bschubert, qemu-block; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel

Hi Bernd,

We are currently working on implementing termination support for 
fuse-over-io_uring in QEMU, and right now we are focusing on how to 
clean up in-flight SQEs properly. Our main question is about how well 
the kernel supports robust cancellation for these fuse-over-io_uring 
SQEs. Does it actually implement cancellation beyond destroying the 
io_uring queue?

In QEMU FUSE export, we need a way to quickly and cleanly detach from 
the event loop and cancel any pending SQEs when an export is no longer 
in use. Ideally, we want to avoid the more drastic measure of having to 
close the entire /dev/fuse fd just to gracefully terminate outstanding 
operations.

We are not sure if there's an existing code path that supports async 
cancel for these in-flight SQEs in the fuse-over-io_uring setup, or if 
additional callbacks might be needed to fully integrate with the 
kernel's async cancel mechanism. We also realized libfuse manages 
shutdowns differently, typically by signaling a thread via eventfd 
rather than relying on async cancel.

Would love to hear your thoughts or suggestions on this!

Thanks,
Brian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [QEMU/FUSE] Discussion on Proper Termination and Async Cancellation in fuse-over-io_uring
  2025-08-01 16:09 [QEMU/FUSE] Discussion on Proper Termination and Async Cancellation in fuse-over-io_uring Brian Song
@ 2025-08-03 23:33 ` Brian Song
  2025-08-04 11:33   ` Bernd Schubert
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Song @ 2025-08-03 23:33 UTC (permalink / raw)
  To: bschubert, qemu-block; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel



On 2025-08-01 12:09 p.m., Brian Song wrote:
> Hi Bernd,
> 
> We are currently working on implementing termination support for fuse- 
> over-io_uring in QEMU, and right now we are focusing on how to clean up 
> in-flight SQEs properly. Our main question is about how well the kernel 
> supports robust cancellation for these fuse-over-io_uring SQEs. Does it 
> actually implement cancellation beyond destroying the io_uring queue?
> 
> In QEMU FUSE export, we need a way to quickly and cleanly detach from 
> the event loop and cancel any pending SQEs when an export is no longer 
> in use. Ideally, we want to avoid the more drastic measure of having to 
> close the entire /dev/fuse fd just to gracefully terminate outstanding 
> operations.
> 
> We are not sure if there's an existing code path that supports async 
> cancel for these in-flight SQEs in the fuse-over-io_uring setup, or if 
> additional callbacks might be needed to fully integrate with the 
> kernel's async cancel mechanism. We also realized libfuse manages 
> shutdowns differently, typically by signaling a thread via eventfd 
> rather than relying on async cancel.
> 
> Would love to hear your thoughts or suggestions on this!
> 
> Thanks,
> Brian

I looked into the kernel codebase and came up with some initial ideas, 
which might not be entirely accurate:

The IORING_OP_ASYNC_CANCEL operation can only cancel io_uring ring 
resources and a limited set of request types. It does not clean up 
resources related to fuse-over-io_uring, such as in-use entries.
IORING_OP_ASYNC_CANCEL
-> submit/enter
-> io_uring/opdef.c:: .issue = io_async_cancel,
	-> __io_async_cancel
		-> io_try_cancel ==> Can only cancel few types of requests


Currently, full cleanup of both io_uring and FUSE data structures for 
fuse-over-io_uring only happens in two cases:  [since we have mark these 
SQEs cancelable when we commit_and_fetch everytime(mentioned below)]
1.When the FUSE daemon exits (exit syscall)
2.During execve, which triggers the kernel path:

io_uring_files_cancel =>
io_uring_try_cancel_uring_cmd =>
file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL | IO_URING_F_COMPLETE_DEFER)



Below is a state diagram (mermaid graph) of a fuse_uring entry inside 
the kernel:

graph TD
     A["Userspace daemon"] --> 
B["FUSE_IO_URING_CMD_REGISTER<br/>Register buffer"]
     B --> C["Create fuse_ring_ent"]
     C --> D["State: FRRS_AVAILABLE<br/>Added to ent_avail_queue"]

     E["FUSE filesystem operation"] --> F["Generate FUSE request"]
     F --> G["fuse_uring_queue_fuse_req()"]
     G --> H{"Check ent_avail_queue"}

     H -->|Entry available| I["Take entry from queue<br/>Assign to FUSE 
request"]
     H -->|No entry available| J["Request goes to fuse_req_queue and waits"]

     I --> K["fuse_uring_dispatch_ent()"]
     K --> L["State: FRRS_USERSPACE<br/>Move to ent_in_userspace"]
     L --> M["Notify userspace to process"]

     N["Process exit / daemon termination"] --> 
O["io_uring_try_cancel_uring_cmd() <br/> >> NOTE Since we marked the 
entry IORING_URING_CMD_CANCELABLE <br/> in the previous fuse_uring_cmd , 
try_cancel_uring_cmd will call <br/> fuse_uring_cmd to 'delete' it <<"]
     O --> P["fuse_uring_cancel()"]
     P --> Q{"Is entry state AVAILABLE?"}

     Q -->|Yes| R[">> equivalent to 'delete' << Directly change to 
USERSPACE<br/>Move to ent_in_userspace"]
     Q -->|No| S["Do nothing"]

     R --> T["io_uring_cmd_done(-ENOTCONN)"]
     T --> U["Entry is 'disguised' as completed<br/>Will no longer 
handle new FUSE requests"]

     V["Practical effects of cancellation:"] --> W["1. Prevent new FUSE 
requests from using this entry<br/>2. Release io_uring command 
resources<br/>3. Does not affect already assigned FUSE requests"]



When the kernel is waiting for VFS requests and the corresponding entry 
is idle, its state is FRRS_AVAILABLE. Once a request is handed off to 
the userspace daemon, the entry's state transitions to FRRS_USERSPACE.

The fuse_uring_cmd function handles the COMMIT_AND_FETCH operation. If a 
cmd call carries the IO_URING_F_CANCEL flag, fuse_uring_cancel is 
invoked to mark the entry state as FRRS_USERSPACE, making it unavailable 
for future requests from the VFS.

If the IORING_URING_CMD_CANCELABLE flag is not set, before committing 
and fetching, we first call fuse_uring_prepare_cancel to mark the entry 
as IORING_URING_CMD_CANCELABLE. This indicates that if the daemon exits 
or an execve happens during fetch, the kernel can call 
io_uring_try_cancel_uring_cmd to safely clean up these SQEs/CQEs and 
related fuse resource.

Back to our previous issue, when deleting a FUSE export in QEMU, we hit 
a crash due to an invalid CQE handler. This happened because the SQEs we 
previously submitted hadn't returned yet by the time we shut down and 
deleted the export.

To avoid this, we need to ensure that no further CQEs are returned and 
no CQE handler is triggered. We need to either:

* Prevent any further user operations before calling blk_exp_close_all

or

* Require the userspace to trigger few specific operations that causes 
the kernel to return all outstanding CQEs, and then the daemon can send 
io_uring_cmd with the IO_URING_F_CANCEL flag to mark all entries as 
unavailable (FRRS_USERSPACE) "delete operation", ensuring the kernel 
won't assign them to future VFS requests.






^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [QEMU/FUSE] Discussion on Proper Termination and Async Cancellation in fuse-over-io_uring
  2025-08-03 23:33 ` Brian Song
@ 2025-08-04 11:33   ` Bernd Schubert
  2025-08-04 12:29     ` Kevin Wolf
  2025-08-05  4:11     ` Brian Song
  0 siblings, 2 replies; 7+ messages in thread
From: Bernd Schubert @ 2025-08-04 11:33 UTC (permalink / raw)
  To: Brian Song, qemu-block@nongnu.org
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel@nongnu.org

Hi Brian,

sorry for my late reply, just back from vacation and fighting through
my mails.

On 8/4/25 01:33, Brian Song wrote:
> 
> 
> On 2025-08-01 12:09 p.m., Brian Song wrote:
>> Hi Bernd,
>>
>> We are currently working on implementing termination support for fuse- 
>> over-io_uring in QEMU, and right now we are focusing on how to clean up 
>> in-flight SQEs properly. Our main question is about how well the kernel 
>> supports robust cancellation for these fuse-over-io_uring SQEs. Does it 
>> actually implement cancellation beyond destroying the io_uring queue?
>>
>> In QEMU FUSE export, we need a way to quickly and cleanly detach from 
>> the event loop and cancel any pending SQEs when an export is no longer 
>> in use. Ideally, we want to avoid the more drastic measure of having to 
>> close the entire /dev/fuse fd just to gracefully terminate outstanding 
>> operations.
>>
>> We are not sure if there's an existing code path that supports async 
>> cancel for these in-flight SQEs in the fuse-over-io_uring setup, or if 
>> additional callbacks might be needed to fully integrate with the 
>> kernel's async cancel mechanism. We also realized libfuse manages 
>> shutdowns differently, typically by signaling a thread via eventfd 
>> rather than relying on async cancel.
>>
>> Would love to hear your thoughts or suggestions on this!
>>
>> Thanks,
>> Brian
> 
> I looked into the kernel codebase and came up with some initial ideas, 
> which might not be entirely accurate:
> 
> The IORING_OP_ASYNC_CANCEL operation can only cancel io_uring ring 
> resources and a limited set of request types. It does not clean up 
> resources related to fuse-over-io_uring, such as in-use entries.
> IORING_OP_ASYNC_CANCEL
> -> submit/enter
> -> io_uring/opdef.c:: .issue = io_async_cancel,
> 	-> __io_async_cancel
> 		-> io_try_cancel ==> Can only cancel few types of requests
> 
> 
> Currently, full cleanup of both io_uring and FUSE data structures for 
> fuse-over-io_uring only happens in two cases:  [since we have mark these 
> SQEs cancelable when we commit_and_fetch everytime(mentioned below)]
> 1.When the FUSE daemon exits (exit syscall)
> 2.During execve, which triggers the kernel path:
> 
> io_uring_files_cancel =>
> io_uring_try_cancel_uring_cmd =>
> file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL | IO_URING_F_COMPLETE_DEFER)
> 
> 
> 
> Below is a state diagram (mermaid graph) of a fuse_uring entry inside 
> the kernel:
> 
> graph TD
>      A["Userspace daemon"] --> 
> B["FUSE_IO_URING_CMD_REGISTER<br/>Register buffer"]
>      B --> C["Create fuse_ring_ent"]
>      C --> D["State: FRRS_AVAILABLE<br/>Added to ent_avail_queue"]
> 
>      E["FUSE filesystem operation"] --> F["Generate FUSE request"]
>      F --> G["fuse_uring_queue_fuse_req()"]
>      G --> H{"Check ent_avail_queue"}
> 
>      H -->|Entry available| I["Take entry from queue<br/>Assign to FUSE 
> request"]
>      H -->|No entry available| J["Request goes to fuse_req_queue and waits"]
> 
>      I --> K["fuse_uring_dispatch_ent()"]
>      K --> L["State: FRRS_USERSPACE<br/>Move to ent_in_userspace"]
>      L --> M["Notify userspace to process"]
> 
>      N["Process exit / daemon termination"] --> 
> O["io_uring_try_cancel_uring_cmd() <br/> >> NOTE Since we marked the 
> entry IORING_URING_CMD_CANCELABLE <br/> in the previous fuse_uring_cmd , 
> try_cancel_uring_cmd will call <br/> fuse_uring_cmd to 'delete' it <<"]
>      O --> P["fuse_uring_cancel()"]
>      P --> Q{"Is entry state AVAILABLE?"}
> 
>      Q -->|Yes| R[">> equivalent to 'delete' << Directly change to 
> USERSPACE<br/>Move to ent_in_userspace"]
>      Q -->|No| S["Do nothing"]
> 
>      R --> T["io_uring_cmd_done(-ENOTCONN)"]
>      T --> U["Entry is 'disguised' as completed<br/>Will no longer 
> handle new FUSE requests"]
> 
>      V["Practical effects of cancellation:"] --> W["1. Prevent new FUSE 
> requests from using this entry<br/>2. Release io_uring command 
> resources<br/>3. Does not affect already assigned FUSE requests"]
> 
> 
> 
> When the kernel is waiting for VFS requests and the corresponding entry 
> is idle, its state is FRRS_AVAILABLE. Once a request is handed off to 
> the userspace daemon, the entry's state transitions to FRRS_USERSPACE.
> 
> The fuse_uring_cmd function handles the COMMIT_AND_FETCH operation. If a 
> cmd call carries the IO_URING_F_CANCEL flag, fuse_uring_cancel is 
> invoked to mark the entry state as FRRS_USERSPACE, making it unavailable 
> for future requests from the VFS.
> 
> If the IORING_URING_CMD_CANCELABLE flag is not set, before committing 
> and fetching, we first call fuse_uring_prepare_cancel to mark the entry 
> as IORING_URING_CMD_CANCELABLE. This indicates that if the daemon exits 
> or an execve happens during fetch, the kernel can call 
> io_uring_try_cancel_uring_cmd to safely clean up these SQEs/CQEs and 
> related fuse resource.
> 
> Back to our previous issue, when deleting a FUSE export in QEMU, we hit 
> a crash due to an invalid CQE handler. This happened because the SQEs we 
> previously submitted hadn't returned yet by the time we shut down and 
> deleted the export.
> 
> To avoid this, we need to ensure that no further CQEs are returned and 
> no CQE handler is triggered. We need to either:
> 
> * Prevent any further user operations before calling blk_exp_close_all
> 
> or
> 
> * Require the userspace to trigger few specific operations that causes 
> the kernel to return all outstanding CQEs, and then the daemon can send 
> io_uring_cmd with the IO_URING_F_CANCEL flag to mark all entries as 
> unavailable (FRRS_USERSPACE) "delete operation", ensuring the kernel 
> won't assign them to future VFS requests.
> 
> 
> 

I have to admit that I'm confused why you can't use umount, isn't that
the most graceful way to shutdown a connection?

If you need another custom way for some reasons, we probably need
to add it.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [QEMU/FUSE] Discussion on Proper Termination and Async Cancellation in fuse-over-io_uring
  2025-08-04 11:33   ` Bernd Schubert
@ 2025-08-04 12:29     ` Kevin Wolf
  2025-08-05  4:11     ` Brian Song
  1 sibling, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2025-08-04 12:29 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Brian Song, qemu-block@nongnu.org, Stefan Hajnoczi,
	qemu-devel@nongnu.org

Hi Bernd,

Am 04.08.2025 um 13:33 hat Bernd Schubert geschrieben:
> Hi Brian,
> 
> sorry for my late reply, just back from vacation and fighting through
> my mails.
> 
> On 8/4/25 01:33, Brian Song wrote:
> > 
> > 
> > On 2025-08-01 12:09 p.m., Brian Song wrote:
> >> Hi Bernd,
> >>
> >> We are currently working on implementing termination support for fuse- 
> >> over-io_uring in QEMU, and right now we are focusing on how to clean up 
> >> in-flight SQEs properly. Our main question is about how well the kernel 
> >> supports robust cancellation for these fuse-over-io_uring SQEs. Does it 
> >> actually implement cancellation beyond destroying the io_uring queue?
> >>
> >> In QEMU FUSE export, we need a way to quickly and cleanly detach from 
> >> the event loop and cancel any pending SQEs when an export is no longer 
> >> in use. Ideally, we want to avoid the more drastic measure of having to 
> >> close the entire /dev/fuse fd just to gracefully terminate outstanding 
> >> operations.
> >> [...]

> I have to admit that I'm confused why you can't use umount, isn't that
> the most graceful way to shutdown a connection?
> 
> If you need another custom way for some reasons, we probably need
> to add it.

Brian focussed on shutdown in his message because that is the scenario
he's seeing right now, but you're right that shutdown probably isn't
that bad and once we unmount the exported image, we can properly shut
down things on the QEMU side, too.

The more challenging part is that sometimes QEMU needs to quiesce an
export so that no new requests can be processed for a short time. Maybe
we're switching processing to a different iothread or something like
this. In this scenario, we don't actually want to unmount the image, but
just cancel any outstanding COMMIT_AND_FETCH request, and soon after
submit a new one to continue processing requests.

If it's impossible to cancel the request in the kernel and queue new
request for a little bit (I suppose it would look a bit like userspace
being completely busy processing hypothetical NOP requests), we would
have to introduce some indirections in userspace to handle the case that
CQEs may be posted at times when we don't want to process them, or even
in the ring of the wrong thread (each iothread in QEMU has it's own
io_uring instance).

Come to think of it, the next thing the user may want to do might even
be deleting the old thread, which would have to fail while it's still
busy. So I think we do need a way to get rid of requests that it started
and can't just wait until they are used up.

Kevin



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [QEMU/FUSE] Discussion on Proper Termination and Async Cancellation in fuse-over-io_uring
  2025-08-04 11:33   ` Bernd Schubert
  2025-08-04 12:29     ` Kevin Wolf
@ 2025-08-05  4:11     ` Brian Song
  2025-08-07  9:05       ` Bernd Schubert
  1 sibling, 1 reply; 7+ messages in thread
From: Brian Song @ 2025-08-05  4:11 UTC (permalink / raw)
  To: Bernd Schubert, qemu-block@nongnu.org
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel@nongnu.org



On 2025-08-04 7:33 a.m., Bernd Schubert wrote:
> Hi Brian,
> 
> sorry for my late reply, just back from vacation and fighting through
> my mails.
> 
> On 8/4/25 01:33, Brian Song wrote:
>>
>>
>> On 2025-08-01 12:09 p.m., Brian Song wrote:
>>> Hi Bernd,
>>>
>>> We are currently working on implementing termination support for fuse-
>>> over-io_uring in QEMU, and right now we are focusing on how to clean up
>>> in-flight SQEs properly. Our main question is about how well the kernel
>>> supports robust cancellation for these fuse-over-io_uring SQEs. Does it
>>> actually implement cancellation beyond destroying the io_uring queue?
>>> [...]
>>
> 
> I have to admit that I'm confused why you can't use umount, isn't that
> the most graceful way to shutdown a connection?
> 
> If you need another custom way for some reasons, we probably need
> to add it.
> 
> 
> Thanks,
> Bernd

Hi Bernd,

Thanks for your insights!

I think umount doesn't cancel any pending SQEs, right? From what I see, 
the only way to cancel all pending SQEs and transition all entries to 
the FRRS_USERSPACE state (unavailable for further fuse requests) in the 
kernel is by calling io_uring_files_cancel in do_exit, or 
io_uring_task_cancel in begin_new_exec.

 From my understanding, QEMU follows an event-driven model. So if we 
don't cancel the SQEs submitted by a connection when it ends, then 
before QEMU exits — after the connection is closed and the associated 
FUSE data structures have been freed — any CQE that comes back will 
trigger QEMU to invoke a previously deleted CQE handler, leading to a 
segfault.

So if the only way to make all pending entries unavailable in the kernel 
is calling do_exit or begin_new_exec, I think we should do some 
workarounds in QEMU.

Thanks,
Brian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [QEMU/FUSE] Discussion on Proper Termination and Async Cancellation in fuse-over-io_uring
  2025-08-05  4:11     ` Brian Song
@ 2025-08-07  9:05       ` Bernd Schubert
  2025-08-07 15:15         ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Bernd Schubert @ 2025-08-07  9:05 UTC (permalink / raw)
  To: Brian Song, qemu-block@nongnu.org
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel@nongnu.org

Hi Brian,

sorry for late replies. Totally swamped in work this week and next week
will be off another week.

On 8/5/25 06:11, Brian Song wrote:
> 
> 
> On 2025-08-04 7:33 a.m., Bernd Schubert wrote:
>> Hi Brian,
>>
>> sorry for my late reply, just back from vacation and fighting through
>> my mails.
>>
>> On 8/4/25 01:33, Brian Song wrote:
>>>
>>>
>>> On 2025-08-01 12:09 p.m., Brian Song wrote:
>>>> Hi Bernd,
>>>>
>>>> We are currently working on implementing termination support for fuse-
>>>> over-io_uring in QEMU, and right now we are focusing on how to clean up
>>>> in-flight SQEs properly. Our main question is about how well the kernel
>>>> supports robust cancellation for these fuse-over-io_uring SQEs. Does it
>>>> actually implement cancellation beyond destroying the io_uring queue?
>>>> [...]
>>>
>>
>> I have to admit that I'm confused why you can't use umount, isn't that
>> the most graceful way to shutdown a connection?
>>
>> If you need another custom way for some reasons, we probably need
>> to add it.
>>
>>
>> Thanks,
>> Bernd
> 
> Hi Bernd,
> 
> Thanks for your insights!
> 
> I think umount doesn't cancel any pending SQEs, right? From what I see, 
> the only way to cancel all pending SQEs and transition all entries to 
> the FRRS_USERSPACE state (unavailable for further fuse requests) in the 
> kernel is by calling io_uring_files_cancel in do_exit, or 
> io_uring_task_cancel in begin_new_exec.

There are two umount forms

- Forced umount - immediately cancels the connection and aborts
requests. That also immediately releases pending SQEs.

- Normal umount, destroys the connection and completed SQEs at the end
of umount.

> 
>  From my understanding, QEMU follows an event-driven model. So if we 
> don't cancel the SQEs submitted by a connection when it ends, then 
> before QEMU exits — after the connection is closed and the associated 
> FUSE data structures have been freed — any CQE that comes back will 
> trigger QEMU to invoke a previously deleted CQE handler, leading to a 
> segfault.
> 
> So if the only way to make all pending entries unavailable in the kernel 
> is calling do_exit or begin_new_exec, I think we should do some 
> workarounds in QEMU.

I guess if we find a good argument why qemu needs to complete SQEs
before umount is complete a kernel patch would be accepted. Doesn't
sound that difficult to create patch for that. At least for entries that
are on state FRRS_AVAILABLE. I can prepare patch, but at best in between
Saturday and Monday.

Thanks,
Bernd





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [QEMU/FUSE] Discussion on Proper Termination and Async Cancellation in fuse-over-io_uring
  2025-08-07  9:05       ` Bernd Schubert
@ 2025-08-07 15:15         ` Stefan Hajnoczi
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2025-08-07 15:15 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Brian Song, qemu-block@nongnu.org, Kevin Wolf,
	qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 3559 bytes --]

On Thu, Aug 07, 2025 at 09:05:25AM +0000, Bernd Schubert wrote:
> Hi Brian,
> 
> sorry for late replies. Totally swamped in work this week and next week
> will be off another week.
> 
> On 8/5/25 06:11, Brian Song wrote:
> > 
> > 
> > On 2025-08-04 7:33 a.m., Bernd Schubert wrote:
> >> Hi Brian,
> >>
> >> sorry for my late reply, just back from vacation and fighting through
> >> my mails.
> >>
> >> On 8/4/25 01:33, Brian Song wrote:
> >>>
> >>>
> >>> On 2025-08-01 12:09 p.m., Brian Song wrote:
> >>>> Hi Bernd,
> >>>>
> >>>> We are currently working on implementing termination support for fuse-
> >>>> over-io_uring in QEMU, and right now we are focusing on how to clean up
> >>>> in-flight SQEs properly. Our main question is about how well the kernel
> >>>> supports robust cancellation for these fuse-over-io_uring SQEs. Does it
> >>>> actually implement cancellation beyond destroying the io_uring queue?
> >>>> [...]
> >>>
> >>
> >> I have to admit that I'm confused why you can't use umount, isn't that
> >> the most graceful way to shutdown a connection?
> >>
> >> If you need another custom way for some reasons, we probably need
> >> to add it.
> >>
> >>
> >> Thanks,
> >> Bernd
> > 
> > Hi Bernd,
> > 
> > Thanks for your insights!
> > 
> > I think umount doesn't cancel any pending SQEs, right? From what I see, 
> > the only way to cancel all pending SQEs and transition all entries to 
> > the FRRS_USERSPACE state (unavailable for further fuse requests) in the 
> > kernel is by calling io_uring_files_cancel in do_exit, or 
> > io_uring_task_cancel in begin_new_exec.
> 
> There are two umount forms
> 
> - Forced umount - immediately cancels the connection and aborts
> requests. That also immediately releases pending SQEs.
> 
> - Normal umount, destroys the connection and completed SQEs at the end
> of umount.
> 
> > 
> >  From my understanding, QEMU follows an event-driven model. So if we 
> > don't cancel the SQEs submitted by a connection when it ends, then 
> > before QEMU exits — after the connection is closed and the associated 
> > FUSE data structures have been freed — any CQE that comes back will 
> > trigger QEMU to invoke a previously deleted CQE handler, leading to a 
> > segfault.
> > 
> > So if the only way to make all pending entries unavailable in the kernel 
> > is calling do_exit or begin_new_exec, I think we should do some 
> > workarounds in QEMU.
> 
> I guess if we find a good argument why qemu needs to complete SQEs
> before umount is complete a kernel patch would be accepted. Doesn't
> sound that difficult to create patch for that. At least for entries that
> are on state FRRS_AVAILABLE. I can prepare patch, but at best in between
> Saturday and Monday.

Hi Bernd,
QEMU quiesces I/O at certain points, like when the block driver graph is
reconfigured (kind of like changing the device-mapper table in the
kernel) or when threads are reconfigured. This is also used during
termination to stop accepting new I/O and wait until in-flight I/O has
completed.

Ideally io_uring's ASYNC_CANCEL would work on in-flight
FUSE-over-io_uring uring_cmd requests. The REGISTER or COMMIT_AND_FETCH
uring_cmds would complete with -ECANCELED and future FUSE requests would
be queued in the kernel until FUSE-over-io_uring becomes ready again.

If and when userspace becomes ready again, it submits REGISTER
uring_cmds again and queued FUSE requests are then delivered to
userspace.

Thanks for your help!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-07 15:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-01 16:09 [QEMU/FUSE] Discussion on Proper Termination and Async Cancellation in fuse-over-io_uring Brian Song
2025-08-03 23:33 ` Brian Song
2025-08-04 11:33   ` Bernd Schubert
2025-08-04 12:29     ` Kevin Wolf
2025-08-05  4:11     ` Brian Song
2025-08-07  9:05       ` Bernd Schubert
2025-08-07 15:15         ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).