From: Stefan Hajnoczi <stefanha@redhat.com>
To: "Denis V. Lunev" <den@openvz.org>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org,
qemu-stable@nongnu.org, kwolf@redhat.com, hreitz@redhat.com,
pbonzini@redhat.com
Subject: Re: [PATCH v3 0/1] block/linux-aio: fix reproducible SIGSEGV from unbounded ioq_submit() recursion
Date: Thu, 21 May 2026 11:55:11 -0400 [thread overview]
Message-ID: <20260521155511.GA647779@fedora> (raw)
In-Reply-To: <20260520142503.251959-1-den@openvz.org>
[-- Attachment #1: Type: text/plain, Size: 3629 bytes --]
On Wed, May 20, 2026 at 04:25:02PM +0200, Denis V. Lunev via qemu development wrote:
> Observed in production where a cached-I/O backup path was driven
> through aio=native, making io_submit(2) complete synchronously and
> closing the recursion cycle. On the supported aio=native + cache=none
> + qcow2 configuration the cycle stays bounded by accident rather than
> by construction; this patch bounds it explicitly.
>
> Bisect:
>
> v8.1.0 (forward edge only) no crash / 20
> 84d61e5f36^ no crash / 20
> 84d61e5f36 (backward edge in) crash at attempt 17
> v8.2.0 crash at attempt 4
> master + this patch no crash / 80
>
> The closing commit is 84d61e5f36 ("virtio: use defer_call() in
> virtio_irqfd_notify()").
>
> No iotest: crash rate is 6..17 per 20 on unpatched master; a formal
> test would be flaky. The vmdk + aio=native + cache=none shape is
> not otherwise exercised by the suite.
>
> --- gen-workload.py -----------------------------------------------
> #!/usr/bin/env python3
> import random, sys
> REGION = 32 * 1024 * 1024
> CLUSTER = 64 * 1024
> SEED = 0xC0FFEE
> def main(out):
> r = random.Random(SEED); ops = []
> for _ in range(10000):
> off = r.randrange(0, REGION - 4096) & ~4095
> ops.append("aio_write -q %d 4k" % off)
> for i in range(10000):
> size, n = ("64k", 65536) if i < 5000 else ("128k", 131072)
> off = r.randrange(0, REGION - n) & ~(CLUSTER - 1)
> ops.append("aio_write -q -z -u %d %s" % (off, size))
> r.shuffle(ops); ops.append("aio_flush")
> open(out, "w").write("\n".join(ops) + "\n")
> if __name__ == "__main__":
> main(sys.argv[1] if len(sys.argv) > 1 else "t.cmds")
> -------------------------------------------------------------------
>
> --- repro.sh ------------------------------------------------------
> #!/bin/bash
> set -u
> qimg=$1; qio=$2; label=$3; attempts=${4:-20}
> cmds=${5:-$(dirname "$0")/t.cmds}
> vmdk=/tmp/t.$label.vmdk; log=/tmp/repro_$label.log
> : > "$log"
> for i in $(seq 1 "$attempts"); do
> rm -f "$vmdk"
> "$qimg" create -f vmdk "$vmdk" 256M >/dev/null 2>&1
> "$qio" -f vmdk -n --cache=none --aio=native "$vmdk" < "$cmds" \
> >>"$log" 2>&1
> rc=$?
> [ $rc -ge 128 ] && { echo "CRASH attempt $i rc=$rc" >>"$log"; break; }
> done
> echo "DONE $label rc=$rc attempt=$i" >> "$log"
> -------------------------------------------------------------------
>
> python3 gen-workload.py t.cmds
> ./repro.sh /path/to/qemu-img /path/to/qemu-io test 20
>
> Notes:
>
> * IOQ_SUBMIT_MAX_DEPTH = 8. Round headroom over the bounded depth
> of the supported async-completion path.
> * Per-thread __thread counter, matching util/defer-call.c's storage.
> A per-LinuxAioState field would let multiple devices on one
> thread recurse independently.
>
> Changes from v2:
> * moved depth guard to struct qemu_laiocb (suggestion from Stefan)
>
> Changes from v1:
> * removed all downstream marks
>
> Denis V. Lunev (1):
> block/linux-aio: bound ioq_submit() recursion depth
>
> block/linux-aio.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Hanna Reitz <hreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> --
> 2.51.0
>
>
Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
prev parent reply other threads:[~2026-05-21 15:56 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 14:25 [PATCH v3 0/1] block/linux-aio: fix reproducible SIGSEGV from unbounded ioq_submit() recursion Denis V. Lunev via qemu development
2026-05-20 14:25 ` [PATCH v3] block/linux-aio: bound ioq_submit() recursion depth Denis V. Lunev via qemu development
2026-05-21 15:55 ` Stefan Hajnoczi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260521155511.GA647779@fedora \
--to=stefanha@redhat.com \
--cc=den@openvz.org \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-stable@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.