qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Hanna Reitz <hreitz@redhat.com>
Cc: qemu-devel@nongnu.org, "Alberto Faria" <afaria@redhat.com>,
	"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	sgarzare@redhat.com,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
	qemu-block@nongnu.org, "Eduardo Habkost" <eduardo@habkost.net>,
	"Vladimir Sementsov-Ogievskiy" <v.sementsov-og@mail.ru>,
	"John Snow" <jsnow@redhat.com>, "Thomas Huth" <thuth@redhat.com>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Fam Zheng" <fam@euphon.net>,
	"Yanan Wang" <wangyanan55@huawei.com>
Subject: Re: [RFC v3 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag
Date: Wed, 17 Aug 2022 16:46:55 -0400	[thread overview]
Message-ID: <Yv1Tv83Mcwf6gRes@fedora> (raw)
In-Reply-To: <5ef9e60c-bcb2-7172-4664-688a5f3b2844@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 7011 bytes --]

On Thu, Jul 14, 2022 at 10:54:12AM +0200, Hanna Reitz wrote:
> On 08.07.22 06:17, Stefan Hajnoczi wrote:
> > Block drivers may optimize I/O requests accessing buffers previously
> > registered with bdrv_register_buf(). Checking whether all elements of a
> > request's QEMUIOVector are within previously registered buffers is
> > expensive, so we need a hint from the user to avoid costly checks.
> > 
> > Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all
> > QEMUIOVector elements in an I/O request are known to be within
> > previously registered buffers.
> > 
> > bdrv_aligned_preadv() is strict in validating supported read flags and
> > its assertions fail when it sees BDRV_REQ_REGISTERED_BUF. There is no
> > harm in passing BDRV_REQ_REGISTERED_BUF to block drivers that do not
> > support it, so update the assertions to ignore BDRV_REQ_REGISTERED_BUF.
> > 
> > Care must be taken to clear the flag when the block layer or filter
> > drivers replace QEMUIOVector elements with bounce buffers since these
> > have not been registered with bdrv_register_buf(). A lot of the changes
> > in this commit deal with clearing the flag in those cases.
> > 
> > Ensuring that the flag is cleared properly is somewhat invasive to
> > implement across the block layer and it's hard to spot when future code
> > changes accidentally break it. Another option might be to add a flag to
> > QEMUIOVector itself and clear it in qemu_iovec_*() functions that modify
> > elements. That is more robust but somewhat of a layering violation, so I
> > haven't attempted that.
> 
> Yeah...  I will say that most read code already looks quite reasonable in
> that it’ll pass @flags to lower layers basically only if it’s an unmodified
> request, so it seems like in the past most people have already adhered to
> “don’t pass on any flags if you’re reading to a local bounce buffer”.
> 
> > Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
> > ---
> >   include/block/block-common.h |  9 +++++++++
> >   block/blkverify.c            |  4 ++--
> >   block/crypto.c               |  2 ++
> >   block/io.c                   | 30 +++++++++++++++++++++++-------
> >   block/mirror.c               |  2 ++
> >   block/raw-format.c           |  2 ++
> >   6 files changed, 40 insertions(+), 9 deletions(-)
> 
> Some things not covered here that look a bit wrong:
> 
> While bdrv_driver_preadv() asserts that the flags don’t contain anything the
> driver couldn’t handle (and this new flag is made exempt from that assertion
> here in this patch), bdrv_driver_pwritev() just hides those flags from
> drivers silently. I think just like we exempt the new flag from the
> assertion in bdrv_driver_preadv(), we should have bdrv_driver_pwritev()
> always pass it to drivers.
> 
> The following driver read/write functions assert that @flags is 0, which is
> probably no longer ideal:
> - bdrv_qed_co_writev()
> - block_crypto_co_preadv()
> - nbd_client_co_preadv()
> - parallels_co_writev()
> - qcow_co_preadv()
> - qcow_co_pwritev()
> - qemu_gluster_co_writev()
> - raw_co_pwritev() (block/file-posix.c)
> - replication_co_writev()
> - ssh_co_writev()
> - vhdx_co_writev()
> 
> snapshot_access_co_preadv_part() returns an error when any flags are set,
> but should probably ignore BDRV_REQ_REGISTERED_BUF for this check.

The assert(!flags) checks can be removed without losing much safety
since bdrv_driver_preadv/pwritev() prepare the flags bits appropriately
and calls from other locations are rare.

> 
> 
> While looking around, I spotted a couple of places that look like they could
> pass the flag on but currently don’t (just FYI, not asking for anything
> here):
> 
> bdrv_co_do_copy_on_readv() never passes the flags through to its calls, but
> I think it could pass this flag on in the one bdrv_driver_preadv() call
> where it doesn’t use a bounce buffer (“Read directly into the destination”).
> 
> qcow2’s qcow2_co_preadv_task() and qcow2_co_pwritev_task() (besides the
> encryption part) also look like they should pass this flag on, but, well,
> the functions themselves currently don’t get the flag, so they can’t.
> 
> qcow1’s qcow_co_preadv() and qcow_co_pwritev() are so-so, sometimes using a
> bounce buffer, and sometimes not, but those function could use optimization
> in general if anyone cared.
> 
> vpc_co_preadv()’s and vpc_co_pwritev()’s first
> bdrv_co_preadv()/bdrv_co_pwritev() invocations look straightforward, but as
> with qcow1, not sure if anyone cares.
> 
> I’m too lazy to thoroughly check what’s going on with qed_aio_write_main(). 
> Passing 0 is safe, and it doesn’t get the original request flags, so I guess
> doing anything about this would be difficult.
> 
> quorum’s read_fifo_child() probably could pass acb->flags. Probably. 
> Perhaps not.  Difficult to say it is.
> 
> block/replication.c also looks like a candidate for passing flags, but
> personally, I’d like to refrain from touching it.  (Well, besides the fact
> that replication_co_writev() asserts that @flags is 0.)
> 
> 
> (And finally, I found that block/parallels.c invokes bdrv_co_pwritev() with
> a buffer instead of an I/O vector, which looks really wrong, but has nothing
> to do with this patch.)

Thanks for looking at these. I haven't attempted to propagate
BDRV_REQ_REGISTERED_BUF through image format drivers yet so there are
optimization opportunities here.

> [...]
> 
> > diff --git a/block/io.c b/block/io.c
> > index e7f4117fe7..83b8259227 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> 
> [...]
> 
> > @@ -1902,6 +1910,11 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
> >           return -ENOTSUP;
> >       }
> > +    /* By definition there is no user buffer so this flag doesn't make sense */
> > +    if (flags & BDRV_REQ_REGISTERED_BUF) {
> > +        return -EINVAL;
> > +    }
> > +
> 
> Here we return an error when the flag is met...
> 
> >       /* Invalidate the cached block-status data range if this write overlaps */
> >       bdrv_bsc_invalidate_range(bs, offset, bytes);
> > @@ -2187,6 +2200,9 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
> >       bool padding;
> >       BdrvRequestPadding pad;
> > +    /* This flag doesn't make sense for padding or zero writes */
> > +    flags &= ~BDRV_REQ_REGISTERED_BUF;
> > +
> 
> ...and here we just ignore it.  Why don’t we handle this the same in both of
> these functions?  (And what about bdrv_co_pwrite_zeroes()?)
> 
> Besides that, if we do make it an error, I wonder if it shouldn’t be an
> assertion instead so the duty of clearing the flag falls on the caller.  (I
> personally like just silently clearing it in the zero-write functions,
> though.)

Thanks for catching this. Let's consistently clear
BDRV_REQ_REGISTERED_BUF silently for zero writes.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2022-08-17 20:51 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-08  4:17 [RFC v3 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
2022-07-08  4:17 ` [RFC v3 1/8] blkio: add io_uring block driver using libblkio Stefan Hajnoczi
2022-07-12 14:23   ` Stefano Garzarella
2022-08-11 16:51     ` Stefan Hajnoczi
2022-07-13 12:05   ` Hanna Reitz
2022-08-11 19:08     ` Stefan Hajnoczi
2022-07-27 19:33   ` Kevin Wolf
2022-08-03 12:25     ` Peter Krempa
2022-08-03 13:30       ` Kevin Wolf
2022-08-11 19:09     ` Stefan Hajnoczi
2022-07-08  4:17 ` [RFC v3 2/8] numa: call ->ram_block_removed() in ram_block_notifer_remove() Stefan Hajnoczi
2022-07-08  4:17 ` [RFC v3 3/8] block: pass size to bdrv_unregister_buf() Stefan Hajnoczi
2022-07-13 14:08   ` Hanna Reitz
2022-07-08  4:17 ` [RFC v3 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag Stefan Hajnoczi
2022-07-14  8:54   ` Hanna Reitz
2022-08-17 20:46     ` Stefan Hajnoczi [this message]
2022-07-08  4:17 ` [RFC v3 5/8] block: add BlockRAMRegistrar Stefan Hajnoczi
2022-07-14  9:30   ` Hanna Reitz
2022-08-17 20:51     ` Stefan Hajnoczi
2022-07-08  4:17 ` [RFC v3 6/8] stubs: add memory_region_from_host() and memory_region_get_fd() Stefan Hajnoczi
2022-07-14  9:39   ` Hanna Reitz
2022-07-08  4:17 ` [RFC v3 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization Stefan Hajnoczi
2022-07-12 14:28   ` Stefano Garzarella
2022-08-15 20:52     ` Stefan Hajnoczi
2022-07-14 10:13   ` Hanna Reitz
2022-08-18 19:46     ` Stefan Hajnoczi
2022-07-08  4:17 ` [RFC v3 8/8] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint Stefan Hajnoczi
2022-07-14 10:16   ` Hanna Reitz
2022-08-15 21:24     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yv1Tv83Mcwf6gRes@fedora \
    --to=stefanha@redhat.com \
    --cc=afaria@redhat.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=f4bug@amsat.org \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    --cc=thuth@redhat.com \
    --cc=v.sementsov-og@mail.ru \
    --cc=vsementsov@yandex-team.ru \
    --cc=wangyanan55@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).