From: Stefan Hajnoczi <stefanha@redhat.com>
To: Hanna Reitz <hreitz@redhat.com>
Cc: qemu-devel@nongnu.org, "Alberto Faria" <afaria@redhat.com>,
"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Laurent Vivier" <lvivier@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
sgarzare@redhat.com,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
qemu-block@nongnu.org, "Eduardo Habkost" <eduardo@habkost.net>,
"Vladimir Sementsov-Ogievskiy" <v.sementsov-og@mail.ru>,
"John Snow" <jsnow@redhat.com>, "Thomas Huth" <thuth@redhat.com>,
"Kevin Wolf" <kwolf@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Fam Zheng" <fam@euphon.net>,
"Yanan Wang" <wangyanan55@huawei.com>
Subject: Re: [RFC v3 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization
Date: Thu, 18 Aug 2022 15:46:36 -0400 [thread overview]
Message-ID: <Yv6XHAqgx1J3pTTc@fedora> (raw)
In-Reply-To: <8c8faae1-9215-9b18-0051-028e78f56009@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 7056 bytes --]
On Thu, Jul 14, 2022 at 12:13:53PM +0200, Hanna Reitz wrote:
> On 08.07.22 06:17, Stefan Hajnoczi wrote:
> > Avoid bounce buffers when QEMUIOVector elements are within previously
> > registered bdrv_register_buf() buffers.
> >
> > The idea is that emulated storage controllers will register guest RAM
> > using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O
> > requests. Therefore no blkio_map_mem_region() calls are necessary in the
> > performance-critical I/O code path.
> >
> > This optimization doesn't apply if the I/O buffer is internally
> > allocated by QEMU (e.g. qcow2 metadata). There we still take the slow
> > path because BDRV_REQ_REGISTERED_BUF is not set.
>
> Which keeps the question relevant of how slow the slow path is, i.e. whether
> it wouldn’t make sense to keep some of the mem regions allocated there in a
> cache instead of allocating/freeing them on every I/O request.
Yes, bounce buffer reuse would be possible, but let's keep it simple for
now.
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > block/blkio.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++--
> > 1 file changed, 101 insertions(+), 3 deletions(-)
> >
> > diff --git a/block/blkio.c b/block/blkio.c
> > index 7fbdbd7fae..37d593a20c 100644
> > --- a/block/blkio.c
> > +++ b/block/blkio.c
>
> [...]
>
> > @@ -198,6 +203,8 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset,
> > BlockCompletionFunc *cb, void *opaque)
> > {
> > BDRVBlkioState *s = bs->opaque;
> > + bool needs_mem_regions =
> > + s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
>
> Is that condition sufficient? bdrv_register_buf() has no way of returning
> an error, so it’s possible that buffers are silently not registered. (And
> there are conditions in blkio_register_buf() where the buffer will not be
> registered, e.g. because it isn’t aligned.)
>
> The caller knows nothing of this and will still pass
> BDRV_REQ_REGISTERED_BUF, and then we’ll assume the region is mapped but it
> won’t be.
>
> > struct iovec *iov = qiov->iov;
> > int iovcnt = qiov->niov;
> > BlkioAIOCB *acb;
>
> [...]
>
> > @@ -324,6 +333,80 @@ static void blkio_io_unplug(BlockDriverState *bs)
> > }
> > }
> > +static void blkio_register_buf(BlockDriverState *bs, void *host, size_t size)
> > +{
> > + BDRVBlkioState *s = bs->opaque;
> > + int ret;
> > + struct blkio_mem_region region = (struct blkio_mem_region){
> > + .addr = host,
> > + .len = size,
> > + .fd = -1,
> > + };
> > +
> > + if (((uintptr_t)host | size) % s->mem_region_alignment) {
> > + error_report_once("%s: skipping unaligned buf %p with size %zu",
> > + __func__, host, size);
> > + return; /* skip unaligned */
> > + }
>
> How big is mem-region-alignment generally? Is it like 4k or is it going to
> be a real issue?
Yes, it's usually the page size of the MMU/IOMMU. vhost-user and VFIO
have the same requirements so I don't think anything special is
necessary.
> (Also, we could probably register a truncated region. I know, that’ll break
> the BDRV_REQ_REGISTERED_BUF idea because the caller won’t know we’ve
> truncated it, but that’s no different than just not registering the buffer
> at all.)
>
> > +
> > + /* Attempt to find the fd for a MemoryRegion */
> > + if (s->needs_mem_region_fd) {
> > + int fd = -1;
> > + ram_addr_t offset;
> > + MemoryRegion *mr;
> > +
> > + /*
> > + * bdrv_register_buf() is called with the BQL held so mr lives at least
> > + * until this function returns.
> > + */
> > + mr = memory_region_from_host(host, &offset);
> > + if (mr) {
> > + fd = memory_region_get_fd(mr);
> > + }
>
> I don’t think it’s specified that buffers registered with
> bdrv_register_buf() must be within a single memory region, is it? So can we
> somehow verify that the memory region covers the whole buffer?
You are right, there is no guarantee. However, the range will always be
within a RAMBlock at the moment because the bdrv_register_buf() calls
are driven by a RAMBlock notifier and match the boundaries of the
RAMBlocks.
I will add a check so this starts failing when that assumption is
violated.
>
> > + if (fd == -1) {
> > + error_report_once("%s: skipping fd-less buf %p with size %zu",
> > + __func__, host, size);
> > + return; /* skip if there is no fd */
> > + }
> > +
> > + region.fd = fd;
> > + region.fd_offset = offset;
> > + }
> > +
> > + WITH_QEMU_LOCK_GUARD(&s->lock) {
> > + ret = blkio_map_mem_region(s->blkio, ®ion);
> > + }
> > +
> > + if (ret < 0) {
> > + error_report_once("Failed to add blkio mem region %p with size %zu: %s",
> > + host, size, blkio_get_error_msg());
> > + }
> > +}
> > +
> > +static void blkio_unregister_buf(BlockDriverState *bs, void *host, size_t size)
> > +{
> > + BDRVBlkioState *s = bs->opaque;
> > + int ret;
> > + struct blkio_mem_region region = (struct blkio_mem_region){
> > + .addr = host,
> > + .len = size,
> > + .fd = -1,
> > + };
> > +
> > + if (((uintptr_t)host | size) % s->mem_region_alignment) {
> > + return; /* skip unaligned */
> > + }
> > +
> > + WITH_QEMU_LOCK_GUARD(&s->lock) {
> > + ret = blkio_unmap_mem_region(s->blkio, ®ion);
> > + }
>
> The documentation of libblkio says that “memory regions must be
> unmapped/freed with exactly the same `region` field values that they were
> mapped/allocated with.” We don’t set .fd here, though.
That's a bug. The memory region will not be unmapped because libblkio's
HashSet won't match. I'll fix the QEMU code to pass the exact same
struct blkio_mem_region fields.
>
> It’s also unclear whether it’s allowed to unmap a region that wasn’t mapped,
> but I’ll trust libblkio to detect that.
Yes, it's a nop.
>
> > +
> > + if (ret < 0) {
> > + error_report_once("Failed to delete blkio mem region %p with size %zu: %s",
> > + host, size, blkio_get_error_msg());
> > + }
> > +}
> > +
> > static void blkio_parse_filename_io_uring(const char *filename, QDict *options,
> > Error **errp)
> > {
>
> [...]
>
> > @@ -459,7 +553,7 @@ static int blkio_file_open(BlockDriverState *bs, QDict *options, int flags,
> > return ret;
> > }
> > - bs->supported_write_flags = BDRV_REQ_FUA;
> > + bs->supported_write_flags = BDRV_REQ_FUA | BDRV_REQ_REGISTERED_BUF;
>
> Shouldn’t we also report it as a supported read flag then?
Yes, thank you!
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2022-08-18 19:49 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-08 4:17 [RFC v3 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
2022-07-08 4:17 ` [RFC v3 1/8] blkio: add io_uring block driver using libblkio Stefan Hajnoczi
2022-07-12 14:23 ` Stefano Garzarella
2022-08-11 16:51 ` Stefan Hajnoczi
2022-07-13 12:05 ` Hanna Reitz
2022-08-11 19:08 ` Stefan Hajnoczi
2022-07-27 19:33 ` Kevin Wolf
2022-08-03 12:25 ` Peter Krempa
2022-08-03 13:30 ` Kevin Wolf
2022-08-11 19:09 ` Stefan Hajnoczi
2022-07-08 4:17 ` [RFC v3 2/8] numa: call ->ram_block_removed() in ram_block_notifer_remove() Stefan Hajnoczi
2022-07-08 4:17 ` [RFC v3 3/8] block: pass size to bdrv_unregister_buf() Stefan Hajnoczi
2022-07-13 14:08 ` Hanna Reitz
2022-07-08 4:17 ` [RFC v3 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag Stefan Hajnoczi
2022-07-14 8:54 ` Hanna Reitz
2022-08-17 20:46 ` Stefan Hajnoczi
2022-07-08 4:17 ` [RFC v3 5/8] block: add BlockRAMRegistrar Stefan Hajnoczi
2022-07-14 9:30 ` Hanna Reitz
2022-08-17 20:51 ` Stefan Hajnoczi
2022-07-08 4:17 ` [RFC v3 6/8] stubs: add memory_region_from_host() and memory_region_get_fd() Stefan Hajnoczi
2022-07-14 9:39 ` Hanna Reitz
2022-07-08 4:17 ` [RFC v3 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization Stefan Hajnoczi
2022-07-12 14:28 ` Stefano Garzarella
2022-08-15 20:52 ` Stefan Hajnoczi
2022-07-14 10:13 ` Hanna Reitz
2022-08-18 19:46 ` Stefan Hajnoczi [this message]
2022-07-08 4:17 ` [RFC v3 8/8] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint Stefan Hajnoczi
2022-07-14 10:16 ` Hanna Reitz
2022-08-15 21:24 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yv6XHAqgx1J3pTTc@fedora \
--to=stefanha@redhat.com \
--cc=afaria@redhat.com \
--cc=armbru@redhat.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=f4bug@amsat.org \
--cc=fam@euphon.net \
--cc=hreitz@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=lvivier@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=sgarzare@redhat.com \
--cc=thuth@redhat.com \
--cc=v.sementsov-og@mail.ru \
--cc=vsementsov@yandex-team.ru \
--cc=wangyanan55@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.