All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] block/linux-aio: bound ioq_submit() recursion depth
@ 2026-04-28 10:58 Denis V. Lunev via qemu development
  2026-05-18 17:13 ` Denis V. Lunev
  0 siblings, 1 reply; 4+ messages in thread
From: Denis V. Lunev via qemu development @ 2026-04-28 10:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, qemu-stable, Denis V. Lunev, Kevin Wolf, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini

qemu_laio_process_completions() wraps its body in defer_call_begin /
defer_call_end. Inside the section, completion callbacks wake coroutines
that queue new aiocbs; laio_do_submit() defers laio_deferred_fn. At the
bottom of qemu_laio_process_completions() the defer_call_end() fires
laio_deferred_fn, which calls ioq_submit(), closing the cycle:

  ioq_submit
    -> io_submit(2)                           // some sync completions
    -> qemu_laio_process_completions          // defer_call_begin
         -> aio_co_wake                       // resumes coroutine
              -> laio_do_submit
                   -> defer_call(laio_deferred_fn, s)   // enqueued
         -> defer_call_end                    // nesting drops to 0
              -> laio_deferred_fn
                   -> ioq_submit              // +1 stack frame, loop

When io_submit(2) returns asynchronously (O_DIRECT) the cycle
terminates in one extra frame: the fresh aiocb is still in flight, no
completion is drained, no coroutine wakes, no new submission queues.
When submissions complete synchronously (non-O_DIRECT, or per-descriptor
drivers such as vmdk) each level enqueues more work for the next
defer_call_end() to drain, so recursion grows without bound and QEMU
crashes with SIGSEGV on the thread guard page.

The cycle was closed by two performance commits, each correct in
isolation:

  076682885d ("block/linux-aio: convert to blk_io_plug_call() API")
    -- introduced laio_deferred_fn and wired
       laio_do_submit -> defer_call(laio_deferred_fn, s).

  84d61e5f36 ("virtio: use defer_call() in virtio_irqfd_notify()")
    -- added defer_call_begin/end around qemu_laio_process_completions
       so virtio-irqfd notifications batch across a completion pass.

The supported aio=native + cache=none pairing keeps submissions
asynchronous, so the cycle stays bounded; nothing in the code enforces
that contract. Observed in production as a SIGSEGV during a backup job
configured with --cached + aio=native; reproducible on upstream with
qemu-io against vmdk.

Cap ioq_submit() recursion with a per-thread counter. On overflow,
return without submitting. The pending work is drained by
s->completion_bh, which qemu_laio_process_completions() has already
scheduled on entry -- no work is lost; one event-loop round-trip of
latency is paid only when the bound is hit, which cannot happen on a
supported configuration.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Hanna Reitz <hreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
---
 block/linux-aio.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 0a7424fbb3..f98bb6e766 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -36,6 +36,19 @@
 /* Maximum number of requests in a batch. (default value) */
 #define DEFAULT_MAX_BATCH 32
 
+/*
+ * Bound on how deep ioq_submit() may recurse on a single thread via the
+ * ioq_submit -> qemu_laio_process_completions -> defer_call_end ->
+ * laio_deferred_fn -> ioq_submit cycle. The cycle terminates naturally
+ * when io_submit(2) returns asynchronously (O_DIRECT), but can grow
+ * without bound when submissions complete synchronously. On overflow
+ * the caller returns without submitting; the outermost
+ * qemu_laio_process_completions() has already scheduled s->completion_bh
+ * (via qemu_bh_schedule() at the top of that function), which resumes
+ * submission from the next event-loop dispatch.
+ */
+#define IOQ_SUBMIT_MAX_DEPTH 8
+
 struct qemu_laiocb {
     Coroutine *co;
     LinuxAioState *ctx;
@@ -80,6 +93,9 @@ struct LinuxAioState {
 static void ioq_submit(LinuxAioState *s);
 static int laio_do_submit(struct qemu_laiocb *laiocb);
 
+/* Per-thread recursion counter for ioq_submit(). See IOQ_SUBMIT_MAX_DEPTH. */
+static __thread unsigned ioq_submit_depth;
+
 static inline ssize_t io_event_ret(struct io_event *ev)
 {
     return (ssize_t)(((uint64_t)ev->res2 << 32) | ev->res);
@@ -340,6 +356,11 @@ static void ioq_submit(LinuxAioState *s)
     QEMU_UNINITIALIZED struct iocb *iocbs[MAX_EVENTS];
     QSIMPLEQ_HEAD(, qemu_laiocb) completed;
 
+    if (ioq_submit_depth >= IOQ_SUBMIT_MAX_DEPTH) {
+        return;
+    }
+    ioq_submit_depth++;
+
     do {
         if (s->io_q.in_flight >= MAX_EVENTS) {
             break;
@@ -385,6 +406,8 @@ static void ioq_submit(LinuxAioState *s)
          * pended requests will be submitted from there.
          */
     }
+
+    ioq_submit_depth--;
 }
 
 static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch)
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-20 14:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 10:58 [PATCH 1/1] block/linux-aio: bound ioq_submit() recursion depth Denis V. Lunev via qemu development
2026-05-18 17:13 ` Denis V. Lunev
2026-05-19 20:19   ` Stefan Hajnoczi
2026-05-20 14:25     ` Denis V. Lunev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.