qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: qemu-devel@nongnu.org
Cc: qemu-block@nongnu.org, hreitz@redhat.com, kwolf@redhat.com,
	f.ebner@proxmox.com
Subject: [PATCH 0/3] block-backend: avoid deadlocks due to early queuing of request
Date: Wed,  5 Apr 2023 18:17:49 +0200	[thread overview]
Message-ID: <20230405161752.194727-1-pbonzini@redhat.com> (raw)

IDE TRIM is a BB user that wants to elevate its BB's in-flight counter
for a "macro" operation that consists of several actual I/O operations.
Each of those operations is individually started and awaited.  It does
this so that blk_drain() will drain the whole TRIM, and not just a
single one of the many discard operations it may encompass.

When request queuing is enabled, this leads to a deadlock: The currently
ongoing discard is drained, and the next one is queued, waiting for the
drain to stop.  Meanwhile, TRIM still keeps the in-flight counter
elevated, waiting for all discards to stop -- which will never happen,
because with the in-flight counter elevated, the BB is never considered
drained, so the drained section does not begin and cannot end.

Draining has two purposes, granting exclusive access to a BlockDriverState
and waiting for all previous requests to complete.  Request queuing was
introduced mostly to ensure exclusive access to the BlockDriverState.
However, the implementation is stricter: it prevents new requests from
being submitted to the BlockDriverState, not allowing them to start
instead of just letting them complete before bdrv_drained_begin() returns.

The reason for this was to ensure progress and avoid a livelock
in blk_drain(), blk_drain_all_begin(), bdrv_drained_begin() or
bdrv_drain_all_begin(), if there is an endless stream of requests to
a BlockBackend.  However, as proved by the IDE TRIM testcase, the current
implementation of request queuing is prone to deadlocks and hard to fix
in different ways---even though Hanna tried, all of her attempts were
unsatisfactory one way or the other.

As the name suggests, deadlocks are worse than livelocks :) so let's
avoid them: turn the request queuing on only after the BlockBackend has
quiesced, and leave the second functionality of bdrv_drained_begin()
to the BQL or to the individual BlockDevOps implementations.

And in fact, this is not really a problem.  Of the various users of
BlockBackend, all of them can avoid the livelock:

- for a device that runs in the vCPU thread, requests will only be
submitted while holding the big QEMU lock, meaning they _won't_ be
submitted during bdrv_drained_begin() or bdrv_drain_all_begin().

- for anything that is blocked by aio_disable_external(), the iothread
will not be woken up.  There is still the case of polling, which has
to be disabled with patch 1.  This is slightly hackish but anyway
aio_disable_external() is going away, meaning that these cases will
fall under the third bucket...

- ... i.e. BlockBackends that can use a .drained_begin callback in
their BlockDevOps to temporarily stop I/O submissions.  Note that this
callback is not _absolutely_ necessary, in particular it is not needed
for safety because the patches do not get away with request queuing.

In the end, request queuing should indeed be unnecessary if .drained_begin
is implemented properly in all BlockDevOps.  It should be possible to warn
if a request come at the wrong time.  However, this is left for later.

Paolo


Based-on: <20230405101634.10537-1-pbonzini@redhat.com>


Paolo Bonzini (3):
  aio-posix: disable polling after aio_disable_external()
  block-backend: make global properties write-once
  block-backend: delay application of request queuing

 block/block-backend.c             | 61 +++++++++++++++++++++----------
 block/commit.c                    |  4 +-
 block/export/export.c             |  2 +-
 block/mirror.c                    |  4 +-
 block/parallels.c                 |  2 +-
 block/qcow.c                      |  2 +-
 block/qcow2.c                     |  2 +-
 block/qed.c                       |  2 +-
 block/stream.c                    |  4 +-
 block/vdi.c                       |  2 +-
 block/vhdx.c                      |  2 +-
 block/vmdk.c                      |  4 +-
 block/vpc.c                       |  2 +-
 include/sysemu/block-backend-io.h |  6 +--
 nbd/server.c                      |  3 +-
 tests/unit/test-bdrv-drain.c      |  4 +-
 tests/unit/test-block-iothread.c  |  2 +-
 util/aio-posix.c                  |  2 +-
 18 files changed, 66 insertions(+), 44 deletions(-)

-- 
2.39.2



             reply	other threads:[~2023-04-05 16:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-05 16:17 Paolo Bonzini [this message]
2023-04-05 16:17 ` [PATCH 1/3] aio-posix: disable polling after aio_disable_external() Paolo Bonzini
2023-04-05 16:17 ` [PATCH 2/3] block-backend: make global properties write-once Paolo Bonzini
2023-04-05 16:30 ` [PATCH] block-backend: delay application of request queuing Paolo Bonzini
2023-04-05 16:31 ` [PATCH 3/3] " Paolo Bonzini
2023-04-12 11:54   ` Hanna Czenczek
2023-04-12 12:03     ` Paolo Bonzini
2023-04-12 14:33       ` Hanna Czenczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230405161752.194727-1-pbonzini@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=f.ebner@proxmox.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).