All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: "Denis V. Lunev" <den@openvz.org>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org,
	qemu-stable@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Fiona Ebner <f.ebner@proxmox.com>
Subject: Re: [PATCH 1/2] block/graph-lock: fix missed wakeup in bdrv_graph_co_rdunlock()
Date: Fri, 15 May 2026 15:17:05 +0200	[thread overview]
Message-ID: <agcc0dEwnb_knUE7@redhat.com> (raw)
In-Reply-To: <20260424103917.248668-2-den@openvz.org>

Am 24.04.2026 um 12:39 hat Denis V. Lunev geschrieben:
> tests/qemu-iotests/tests/iothreads-create reproduces the hang on
> master under `stress-ng --cpu $(nproc) --timeout 0`.  The iotest's
> vm.run_job() times out and qemu stays permanently stuck in
> ppoll(timeout=-1) inside bdrv_graph_wrlock_drained -> blk_remove_bs
> during qemu_cleanup().  The timing window is narrow on modern
> bare-metal hardware and much wider in a VM guest; downstream trees
> that still use plain bdrv_graph_wrlock() in blk_remove_bs() hit it
> on the first iteration under the same stress.
> 
> bdrv_graph_wrlock() zeroes has_writer around its AIO_WAIT_WHILE loop
> so that callbacks dispatched by aio_poll() can still take the read
> lock on the fast path.  The rdunlock side, however, only kicks a
> waiting writer when has_writer is observed set; a reader that drops
> its lock inside the polling window silently returns and nothing ever
> wakes the writer:
> 
>   main thread                         iothread0 coroutine
>   -----------                         -------------------
>   bdrv_graph_wrlock:                  rdlock held, reader_count=1
>     bdrv_drain_all_begin_nopoll
>     has_writer = 0
>     AIO_WAIT_WHILE_UNLOCKED(
>         NULL, reader_count >= 1):
>       num_waiters++
>       smp_mb
>       aio_poll(main_ctx, true)   -->  bdrv_graph_co_rdunlock:
>         (ppoll, blocked)                reader_count-- -> 0
>                                         smp_mb
>                                         read has_writer = 0
>                                         skip aio_wait_kick()
>                                       return
> 
> reader_count is now 0 and num_waiters is still 1, but no BH, fd or
> timer on the main AioContext will fire -- the only entity that could
> kick just decided it did not have to.  Main stays in ppoll() holding
> BQL, so RCU, VCPUs and any iothread path that needs BQL stall behind
> it.  The hang is final; no timeout, no forward progress, no recovery
> as there is no other source of wake up inside qemu_cleanup().
> 
> bdrv_drain_all_begin() does not close the race on its own: it
> quiesces in-flight I/O, but graph readers also include non-I/O
> coroutines (block-job cleanup, virtio-scsi polling) that drain does
> not evict.  The bdrv_graph_wrlock_drained() wrapper narrows the
> window but does not eliminate it; every plain bdrv_graph_wrlock()
> site is exposed on the same basis.
> 
> Drop the has_writer check in bdrv_graph_co_rdunlock() and call
> aio_wait_kick() unconditionally.  The helper itself loads num_waiters
> atomically and only schedules a dummy BH when a waiter exists, so the
> change is a no-op on the no-writer path and closes the missed-wakeup
> on the writer path.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Hanna Reitz <hreitz@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Fiona Ebner <f.ebner@proxmox.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Thanks, applied to the block branch.

Kevin



  reply	other threads:[~2026-05-15 13:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 10:39 [PATCH 0/2] block: fix two missed-wakeup hangs on shutdown path Denis V. Lunev via qemu development
2026-04-24 10:39 ` [PATCH 1/2] block/graph-lock: fix missed wakeup in bdrv_graph_co_rdunlock() Denis V. Lunev via qemu development
2026-05-15 13:17   ` Kevin Wolf [this message]
2026-04-24 10:39 ` [PATCH 2/2] block/qcow2: fix hangup in cache_clean_timer cancellation Denis V. Lunev via qemu development
2026-05-15 12:58   ` Kevin Wolf
2026-05-18 17:26     ` Denis V. Lunev
2026-05-11 21:53 ` [PATCH 0/2] block: fix two missed-wakeup hangs on shutdown path Denis V. Lunev
2026-05-12 18:28   ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agcc0dEwnb_knUE7@redhat.com \
    --to=kwolf@redhat.com \
    --cc=den@openvz.org \
    --cc=f.ebner@proxmox.com \
    --cc=hreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.