QEMU-Devel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: kwolf@redhat.com
Cc: "Denis V. Lunev" <den@openvz.org>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org,
	qemu-stable@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
	Fiona Ebner <f.ebner@proxmox.com>,
	"Denis V. Lunev" <den@virtuozzo.com>
Subject: Re: [PATCH 0/2] block: fix two missed-wakeup hangs on shutdown path
Date: Tue, 12 May 2026 14:28:34 -0400	[thread overview]
Message-ID: <20260512182834.GA912491@fedora> (raw)
In-Reply-To: <055dc8bf-f5f8-44e3-b1da-b21e05b70d94@virtuozzo.com>

[-- Attachment #1: Type: text/plain, Size: 3604 bytes --]

On Mon, May 11, 2026 at 11:53:37PM +0200, Denis V. Lunev wrote:
> On 4/24/26 12:39, Denis V. Lunev wrote:
> > Problem
> > -------
> >
> > The qemu shutdown / blockdev-close path can deadlock permanently on
> > upstream master.  The main thread enters ppoll(timeout=-1) holding
> > BQL, no other thread has a wake source that points back at it, and
> > qemu has to be SIGKILLed.  The hang has no timeout -- it is a hard
> > deadlock, not a slow operation; behind BQL, RCU, VCPUs and every
> > iothread path that needs BQL stall with it.
> >
> > Two independent missed-wakeup races in the block layer contribute.
> > Both share the same shape: a waiter arms on one side, the waker
> > reads stale state on its fast path and silently skips the kick, and
> > nothing else on the AioContext will fire to recover.  They are
> > different bugs in different subsystems and each patch stands on its
> > own; they are posted together because they surface through the same
> > test and the same symptom and are easiest to diagnose side by side.
> >
> > Depending on which race fires, the main thread backtrace at the
> > moment of hang is one of:
> >
> >   ppoll -> aio_poll -> bdrv_graph_wrlock -> blk_remove_bs
> >       (patch 1 -- block/graph-lock)
> >
> >   ppoll -> aio_poll -> cache_clean_timer_del_and_wait -> qcow2_close
> >       (patch 2 -- block/qcow2 cache_clean_timer)
> >
> > Race diagrams and the exact stale-state read are in each patch's
> > commit message.
> >
> > Reproducer
> > ----------
> >
> > Environment used for the numbers below: 4-vCPU VM guest,
> > kernel 6.12.x, upstream master at bb230769b4.  On modern bare-metal
> > the window is narrow enough that the hangs rarely reproduce without
> > a VM -- a VM guest under full CPU saturation is what makes the
> > timing reliable.  Downstream trees that still use plain
> > bdrv_graph_wrlock() in blk_remove_bs() hit the graph-lock race on
> > the first iteration without any stress at all.
> >
> >     # reproducer
> >     stress-ng --cpu "$(nproc)" --timeout 0 &
> >     for r in $(seq 20); do
> >         timeout 120 ./build/tests/qemu-iotests/check -qcow2 iothreads-create
> >     done
> >     kill %1
> >
> > With `stress-ng --cpu $(nproc)` both races surface.  With
> > `stress-ng --cpu $(($(nproc) - 1))` or without a stressor neither
> > reproduces reliably across 20 iterations.
> >
> > When a race fires, the Python QMP client times out on vm.run_job()
> > after 5 s, the qemu process keeps running but never makes forward
> > progress, and the outer `timeout 120` eventually kills it.  attach
> > gdb before the timeout kills qemu to capture the stack and
> > distinguish which of the two races fired.
> >
> > Results
> > -------
> >
> > Same guest, 20 iterations of the loop above:
> >
> >   upstream master:            10/20 FAIL (first fail at iter #2)
> >   master + both patches:      20/20 PASS
> >
> > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > Cc: Kevin Wolf <kwolf@redhat.com>
> > Cc: Hanna Reitz <hreitz@redhat.com>
> > Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > Cc: Fiona Ebner <f.ebner@proxmox.com>
> > Cc: Hanna Czenczek <hreitz@redhat.com>
> >
> > Denis V. Lunev (2):
> >   block/graph-lock: fix missed wakeup in bdrv_graph_co_rdunlock()
> >   block/qcow2: fix hangup in cache_clean_timer cancellation
> >
> >  block/graph-lock.c | 12 +++++-------
> >  block/qcow2.c      | 28 +++++++++++++++++-----------
> >  2 files changed, 22 insertions(+), 18 deletions(-)
> >
> > --
> > 2.51.0
> ping

Hi Kevin,
This looks like a series for your block tree. If I can help in some way,
please let me know.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

      reply	other threads:[~2026-05-12 18:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 10:39 [PATCH 0/2] block: fix two missed-wakeup hangs on shutdown path Denis V. Lunev via qemu development
2026-04-24 10:39 ` [PATCH 1/2] block/graph-lock: fix missed wakeup in bdrv_graph_co_rdunlock() Denis V. Lunev via qemu development
2026-04-24 10:39 ` [PATCH 2/2] block/qcow2: fix hangup in cache_clean_timer cancellation Denis V. Lunev via qemu development
2026-05-11 21:53 ` [PATCH 0/2] block: fix two missed-wakeup hangs on shutdown path Denis V. Lunev
2026-05-12 18:28   ` Stefan Hajnoczi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260512182834.GA912491@fedora \
    --to=stefanha@redhat.com \
    --cc=den@openvz.org \
    --cc=den@virtuozzo.com \
    --cc=f.ebner@proxmox.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox