All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: "Denis V. Lunev" <den@virtuozzo.com>
Cc: "Denis V. Lunev" <den@openvz.org>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org,
	qemu-stable@nongnu.org, Kevin Wolf <kwolf@redhat.com>,
	Hanna Reitz <hreitz@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 1/1] block/linux-aio: bound ioq_submit() recursion depth
Date: Tue, 19 May 2026 16:19:42 -0400	[thread overview]
Message-ID: <20260519201942.GA335826@fedora> (raw)
In-Reply-To: <9fc3e8ab-c2a3-4573-9fb4-764d8321bae5@virtuozzo.com>

[-- Attachment #1: Type: text/plain, Size: 5502 bytes --]

On Mon, May 18, 2026 at 07:13:56PM +0200, Denis V. Lunev wrote:
> On 4/28/26 12:58, Denis V. Lunev wrote:
> > qemu_laio_process_completions() wraps its body in defer_call_begin /
> > defer_call_end. Inside the section, completion callbacks wake coroutines
> > that queue new aiocbs; laio_do_submit() defers laio_deferred_fn. At the
> > bottom of qemu_laio_process_completions() the defer_call_end() fires
> > laio_deferred_fn, which calls ioq_submit(), closing the cycle:
> >
> >   ioq_submit
> >     -> io_submit(2)                           // some sync completions
> >     -> qemu_laio_process_completions          // defer_call_begin
> >          -> aio_co_wake                       // resumes coroutine
> >               -> laio_do_submit
> >                    -> defer_call(laio_deferred_fn, s)   // enqueued
> >          -> defer_call_end                    // nesting drops to 0
> >               -> laio_deferred_fn
> >                    -> ioq_submit              // +1 stack frame, loop
> >
> > When io_submit(2) returns asynchronously (O_DIRECT) the cycle
> > terminates in one extra frame: the fresh aiocb is still in flight, no
> > completion is drained, no coroutine wakes, no new submission queues.
> > When submissions complete synchronously (non-O_DIRECT, or per-descriptor
> > drivers such as vmdk) each level enqueues more work for the next
> > defer_call_end() to drain, so recursion grows without bound and QEMU
> > crashes with SIGSEGV on the thread guard page.
> >
> > The cycle was closed by two performance commits, each correct in
> > isolation:
> >
> >   076682885d ("block/linux-aio: convert to blk_io_plug_call() API")
> >     -- introduced laio_deferred_fn and wired
> >        laio_do_submit -> defer_call(laio_deferred_fn, s).
> >
> >   84d61e5f36 ("virtio: use defer_call() in virtio_irqfd_notify()")
> >     -- added defer_call_begin/end around qemu_laio_process_completions
> >        so virtio-irqfd notifications batch across a completion pass.
> >
> > The supported aio=native + cache=none pairing keeps submissions
> > asynchronous, so the cycle stays bounded; nothing in the code enforces
> > that contract. Observed in production as a SIGSEGV during a backup job
> > configured with --cached + aio=native; reproducible on upstream with
> > qemu-io against vmdk.
> >
> > Cap ioq_submit() recursion with a per-thread counter. On overflow,
> > return without submitting. The pending work is drained by
> > s->completion_bh, which qemu_laio_process_completions() has already
> > scheduled on entry -- no work is lost; one event-loop round-trip of
> > latency is paid only when the bound is hit, which cannot happen on a
> > supported configuration.
> >
> > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > CC: Kevin Wolf <kwolf@redhat.com>
> > CC: Hanna Reitz <hreitz@redhat.com>
> > CC: Stefan Hajnoczi <stefanha@redhat.com>
> > CC: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  block/linux-aio.c | 23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> >
> > diff --git a/block/linux-aio.c b/block/linux-aio.c
> > index 0a7424fbb3..f98bb6e766 100644
> > --- a/block/linux-aio.c
> > +++ b/block/linux-aio.c
> > @@ -36,6 +36,19 @@
> >  /* Maximum number of requests in a batch. (default value) */
> >  #define DEFAULT_MAX_BATCH 32
> >  
> > +/*
> > + * Bound on how deep ioq_submit() may recurse on a single thread via the
> > + * ioq_submit -> qemu_laio_process_completions -> defer_call_end ->
> > + * laio_deferred_fn -> ioq_submit cycle. The cycle terminates naturally
> > + * when io_submit(2) returns asynchronously (O_DIRECT), but can grow
> > + * without bound when submissions complete synchronously. On overflow
> > + * the caller returns without submitting; the outermost
> > + * qemu_laio_process_completions() has already scheduled s->completion_bh
> > + * (via qemu_bh_schedule() at the top of that function), which resumes
> > + * submission from the next event-loop dispatch.
> > + */
> > +#define IOQ_SUBMIT_MAX_DEPTH 8
> > +
> >  struct qemu_laiocb {
> >      Coroutine *co;
> >      LinuxAioState *ctx;
> > @@ -80,6 +93,9 @@ struct LinuxAioState {
> >  static void ioq_submit(LinuxAioState *s);
> >  static int laio_do_submit(struct qemu_laiocb *laiocb);
> >  
> > +/* Per-thread recursion counter for ioq_submit(). See IOQ_SUBMIT_MAX_DEPTH. */
> > +static __thread unsigned ioq_submit_depth;
> > +
> >  static inline ssize_t io_event_ret(struct io_event *ev)
> >  {
> >      return (ssize_t)(((uint64_t)ev->res2 << 32) | ev->res);
> > @@ -340,6 +356,11 @@ static void ioq_submit(LinuxAioState *s)
> >      QEMU_UNINITIALIZED struct iocb *iocbs[MAX_EVENTS];
> >      QSIMPLEQ_HEAD(, qemu_laiocb) completed;
> >  
> > +    if (ioq_submit_depth >= IOQ_SUBMIT_MAX_DEPTH) {
> > +        return;
> > +    }
> > +    ioq_submit_depth++;
> > +
> >      do {
> >          if (s->io_q.in_flight >= MAX_EVENTS) {
> >              break;
> > @@ -385,6 +406,8 @@ static void ioq_submit(LinuxAioState *s)
> >           * pended requests will be submitted from there.
> >           */
> >      }
> > +
> > +    ioq_submit_depth--;
> >  }
> >  
> >  static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch)
> ping

Sorry, I missed this.

Please add a field to LaioQueue instead of adding a __thread variable.
LaioQueue is only accessed from a single thread and that's where the
state lives.

Stefan


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2026-05-19 20:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28 10:58 [PATCH 1/1] block/linux-aio: bound ioq_submit() recursion depth Denis V. Lunev via qemu development
2026-05-18 17:13 ` Denis V. Lunev
2026-05-19 20:19   ` Stefan Hajnoczi [this message]
2026-05-20 14:25     ` Denis V. Lunev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260519201942.GA335826@fedora \
    --to=stefanha@redhat.com \
    --cc=den@openvz.org \
    --cc=den@virtuozzo.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.