From: Paolo Bonzini <pbonzini@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>, qemu-devel@nongnu.org
Cc: "Julia Suvorova" <jusual@redhat.com>,
"Kevin Wolf" <kwolf@redhat.com>, "Peter Lieven" <pl@kamp.de>,
"Coiby Xu" <Coiby.Xu@gmail.com>,
xen-devel@lists.xenproject.org,
"Richard Henderson" <richard.henderson@linaro.org>,
"Stefano Garzarella" <sgarzare@redhat.com>,
qemu-block@nongnu.org, "Eduardo Habkost" <eduardo@habkost.net>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Paul Durrant" <paul@xen.org>,
"Richard W.M. Jones" <rjones@redhat.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Aarushi Mehta" <mehta.aaru20@gmail.com>,
"Stefano Stabellini" <sstabellini@kernel.org>,
"Fam Zheng" <fam@euphon.net>,
"David Woodhouse" <dwmw2@infradead.org>,
"Stefan Weil" <sw@weilnetz.de>,
"Juan Quintela" <quintela@redhat.com>,
"Xie Yongji" <xieyongji@bytedance.com>,
"Hanna Reitz" <hreitz@redhat.com>,
"Ronnie Sahlberg" <ronniesahlberg@gmail.com>,
eesposit@redhat.com, "Michael S. Tsirkin" <mst@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Anthony Perard" <anthony.perard@citrix.com>
Subject: Re: [PATCH 00/13] block: remove aio_disable_external() API
Date: Tue, 4 Apr 2023 15:43:20 +0200 [thread overview]
Message-ID: <261efade-683e-84dc-d402-7143be7199c3@redhat.com> (raw)
In-Reply-To: <20230403183004.347205-1-stefanha@redhat.com>
On 4/3/23 20:29, Stefan Hajnoczi wrote:
> The aio_disable_external() API temporarily suspends file descriptor monitoring
> in the event loop. The block layer uses this to prevent new I/O requests being
> submitted from the guest and elsewhere between bdrv_drained_begin() and
> bdrv_drained_end().
>
> While the block layer still needs to prevent new I/O requests in drained
> sections, the aio_disable_external() API can be replaced with
> .drained_begin/end/poll() callbacks that have been added to BdrvChildClass and
> BlockDevOps.
>
> This newer .bdrained_begin/end/poll() approach is attractive because it works
> without specifying a specific AioContext. The block layer is moving towards
> multi-queue and that means multiple AioContexts may be processing I/O
> simultaneously.
>
> The aio_disable_external() was always somewhat hacky. It suspends all file
> descriptors that were registered with is_external=true, even if they have
> nothing to do with the BlockDriverState graph nodes that are being drained.
> It's better to solve a block layer problem in the block layer than to have an
> odd event loop API solution.
>
> That covers the motivation for this change, now on to the specifics of this
> series:
>
> While it would be nice if a single conceptual approach could be applied to all
> is_external=true file descriptors, I ended up looking at callers on a
> case-by-case basis. There are two general ways I migrated code away from
> is_external=true:
>
> 1. Block exports are typically best off unregistering fds in .drained_begin()
> and registering them again in .drained_end(). The .drained_poll() function
> waits for in-flight requests to finish using a reference counter.
>
> 2. Emulated storage controllers like virtio-blk and virtio-scsi are a little
> simpler. They can rely on BlockBackend's request queuing during drain
> feature. Guest I/O request coroutines are suspended in a drained section and
> resume upon the end of the drained section.
Sorry, I disagree with this.
Request queuing was shown to cause deadlocks; Hanna's latest patch is
piling another hack upon it, instead in my opinion we should go in the
direction of relying _less_ (or not at all) on request queuing.
I am strongly convinced that request queuing must apply only after
bdrv_drained_begin has returned, which would also fix the IDE TRIM bug
reported by Fiona Ebner. The possible livelock scenario is generally
not a problem because 1) outside an iothread you have anyway the BQL
that prevents a vCPU from issuing more I/O operations during
bdrv_drained_begin 2) in iothreads you have aio_disable_external()
instead of .drained_begin().
It is also less tidy to start a request during the drained_begin phase,
because a request that has been submitted has to be completed (cancel
doesn't really work).
So in an ideal world, request queuing would not only apply only after
bdrv_drained_begin has returned, it would log a warning and
.drained_begin() should set up things so that there are no such warnings.
Thanks,
Paolo
next prev parent reply other threads:[~2023-04-04 13:44 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
2023-04-03 18:29 ` [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
2023-04-03 20:47 ` Philippe Mathieu-Daudé
2023-04-04 13:06 ` Stefan Hajnoczi
2023-04-04 13:47 ` Paolo Bonzini
2023-04-04 14:38 ` Michael S. Tsirkin
2023-04-03 18:29 ` [PATCH 02/13] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
2023-04-04 13:38 ` Paolo Bonzini
2023-04-03 18:29 ` [PATCH 03/13] block/export: only acquire AioContext once for vhost_user_server_stop() Stefan Hajnoczi
2023-04-03 18:29 ` [PATCH 04/13] util/vhost-user-server: rename refcount to in_flight counter Stefan Hajnoczi
2023-04-04 13:39 ` Paolo Bonzini
2023-04-03 18:29 ` [PATCH 05/13] block/export: wait for vhost-user-blk requests when draining Stefan Hajnoczi
2023-04-03 18:29 ` [PATCH 06/13] block/export: stop using is_external in vhost-user-blk server Stefan Hajnoczi
2023-04-03 18:29 ` [PATCH 07/13] virtio: do not set is_external=true on host notifiers Stefan Hajnoczi
2023-04-03 18:29 ` [PATCH 08/13] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Stefan Hajnoczi
2023-04-04 9:52 ` David Woodhouse
2023-04-03 18:30 ` [PATCH 09/13] hw/xen: do not set is_external=true on evtchn fds Stefan Hajnoczi
2023-04-03 18:30 ` [PATCH 10/13] block/export: rewrite vduse-blk drain code Stefan Hajnoczi
2023-04-03 18:30 ` [PATCH 11/13] block/fuse: take AioContext lock around blk_exp_ref/unref() Stefan Hajnoczi
2023-04-04 13:46 ` Paolo Bonzini
2023-04-04 21:01 ` Stefan Hajnoczi
2023-04-03 18:30 ` [PATCH 12/13] block/fuse: do not set is_external=true on FUSE fd Stefan Hajnoczi
2023-04-03 18:30 ` [PATCH 13/13] aio: remove aio_disable_external() API Stefan Hajnoczi
2023-04-04 9:16 ` Juan Quintela
2023-04-04 9:38 ` Dr. David Alan Gilbert
2023-04-04 13:43 ` Paolo Bonzini [this message]
2023-04-04 21:04 ` [PATCH 00/13] block: " Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=261efade-683e-84dc-d402-7143be7199c3@redhat.com \
--to=pbonzini@redhat.com \
--cc=Coiby.Xu@gmail.com \
--cc=anthony.perard@citrix.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=dwmw2@infradead.org \
--cc=eduardo@habkost.net \
--cc=eesposit@redhat.com \
--cc=fam@euphon.net \
--cc=hreitz@redhat.com \
--cc=jusual@redhat.com \
--cc=kwolf@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mehta.aaru20@gmail.com \
--cc=mst@redhat.com \
--cc=paul@xen.org \
--cc=philmd@linaro.org \
--cc=pl@kamp.de \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=richard.henderson@linaro.org \
--cc=rjones@redhat.com \
--cc=ronniesahlberg@gmail.com \
--cc=sgarzare@redhat.com \
--cc=sstabellini@kernel.org \
--cc=stefanha@redhat.com \
--cc=sw@weilnetz.de \
--cc=xen-devel@lists.xenproject.org \
--cc=xieyongji@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).