qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Hanna Reitz <hreitz@redhat.com>
Cc: qemu-devel@nongnu.org, "Alberto Faria" <afaria@redhat.com>,
	"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	sgarzare@redhat.com,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
	qemu-block@nongnu.org, "Eduardo Habkost" <eduardo@habkost.net>,
	"Vladimir Sementsov-Ogievskiy" <v.sementsov-og@mail.ru>,
	"John Snow" <jsnow@redhat.com>, "Thomas Huth" <thuth@redhat.com>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Fam Zheng" <fam@euphon.net>,
	"Yanan Wang" <wangyanan55@huawei.com>
Subject: Re: [RFC v3 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization
Date: Thu, 18 Aug 2022 15:46:36 -0400	[thread overview]
Message-ID: <Yv6XHAqgx1J3pTTc@fedora> (raw)
In-Reply-To: <8c8faae1-9215-9b18-0051-028e78f56009@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 7056 bytes --]

On Thu, Jul 14, 2022 at 12:13:53PM +0200, Hanna Reitz wrote:
> On 08.07.22 06:17, Stefan Hajnoczi wrote:
> > Avoid bounce buffers when QEMUIOVector elements are within previously
> > registered bdrv_register_buf() buffers.
> > 
> > The idea is that emulated storage controllers will register guest RAM
> > using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O
> > requests. Therefore no blkio_map_mem_region() calls are necessary in the
> > performance-critical I/O code path.
> > 
> > This optimization doesn't apply if the I/O buffer is internally
> > allocated by QEMU (e.g. qcow2 metadata). There we still take the slow
> > path because BDRV_REQ_REGISTERED_BUF is not set.
> 
> Which keeps the question relevant of how slow the slow path is, i.e. whether
> it wouldn’t make sense to keep some of the mem regions allocated there in a
> cache instead of allocating/freeing them on every I/O request.

Yes, bounce buffer reuse would be possible, but let's keep it simple for
now.

> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   block/blkio.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++--
> >   1 file changed, 101 insertions(+), 3 deletions(-)
> > 
> > diff --git a/block/blkio.c b/block/blkio.c
> > index 7fbdbd7fae..37d593a20c 100644
> > --- a/block/blkio.c
> > +++ b/block/blkio.c
> 
> [...]
> 
> > @@ -198,6 +203,8 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset,
> >           BlockCompletionFunc *cb, void *opaque)
> >   {
> >       BDRVBlkioState *s = bs->opaque;
> > +    bool needs_mem_regions =
> > +        s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
> 
> Is that condition sufficient?  bdrv_register_buf() has no way of returning
> an error, so it’s possible that buffers are silently not registered.  (And
> there are conditions in blkio_register_buf() where the buffer will not be
> registered, e.g. because it isn’t aligned.)
> 
> The caller knows nothing of this and will still pass
> BDRV_REQ_REGISTERED_BUF, and then we’ll assume the region is mapped but it
> won’t be.
> 
> >       struct iovec *iov = qiov->iov;
> >       int iovcnt = qiov->niov;
> >       BlkioAIOCB *acb;
> 
> [...]
> 
> > @@ -324,6 +333,80 @@ static void blkio_io_unplug(BlockDriverState *bs)
> >       }
> >   }
> > +static void blkio_register_buf(BlockDriverState *bs, void *host, size_t size)
> > +{
> > +    BDRVBlkioState *s = bs->opaque;
> > +    int ret;
> > +    struct blkio_mem_region region = (struct blkio_mem_region){
> > +        .addr = host,
> > +        .len = size,
> > +        .fd = -1,
> > +    };
> > +
> > +    if (((uintptr_t)host | size) % s->mem_region_alignment) {
> > +        error_report_once("%s: skipping unaligned buf %p with size %zu",
> > +                          __func__, host, size);
> > +        return; /* skip unaligned */
> > +    }
> 
> How big is mem-region-alignment generally?  Is it like 4k or is it going to
> be a real issue?

Yes, it's usually the page size of the MMU/IOMMU. vhost-user and VFIO
have the same requirements so I don't think anything special is
necessary.

> (Also, we could probably register a truncated region.  I know, that’ll break
> the BDRV_REQ_REGISTERED_BUF idea because the caller won’t know we’ve
> truncated it, but that’s no different than just not registering the buffer
> at all.)
> 
> > +
> > +    /* Attempt to find the fd for a MemoryRegion */
> > +    if (s->needs_mem_region_fd) {
> > +        int fd = -1;
> > +        ram_addr_t offset;
> > +        MemoryRegion *mr;
> > +
> > +        /*
> > +         * bdrv_register_buf() is called with the BQL held so mr lives at least
> > +         * until this function returns.
> > +         */
> > +        mr = memory_region_from_host(host, &offset);
> > +        if (mr) {
> > +            fd = memory_region_get_fd(mr);
> > +        }
> 
> I don’t think it’s specified that buffers registered with
> bdrv_register_buf() must be within a single memory region, is it? So can we
> somehow verify that the memory region covers the whole buffer?

You are right, there is no guarantee. However, the range will always be
within a RAMBlock at the moment because the bdrv_register_buf() calls
are driven by a RAMBlock notifier and match the boundaries of the
RAMBlocks.

I will add a check so this starts failing when that assumption is
violated.

> 
> > +        if (fd == -1) {
> > +            error_report_once("%s: skipping fd-less buf %p with size %zu",
> > +                              __func__, host, size);
> > +            return; /* skip if there is no fd */
> > +        }
> > +
> > +        region.fd = fd;
> > +        region.fd_offset = offset;
> > +    }
> > +
> > +    WITH_QEMU_LOCK_GUARD(&s->lock) {
> > +        ret = blkio_map_mem_region(s->blkio, &region);
> > +    }
> > +
> > +    if (ret < 0) {
> > +        error_report_once("Failed to add blkio mem region %p with size %zu: %s",
> > +                          host, size, blkio_get_error_msg());
> > +    }
> > +}
> > +
> > +static void blkio_unregister_buf(BlockDriverState *bs, void *host, size_t size)
> > +{
> > +    BDRVBlkioState *s = bs->opaque;
> > +    int ret;
> > +    struct blkio_mem_region region = (struct blkio_mem_region){
> > +        .addr = host,
> > +        .len = size,
> > +        .fd = -1,
> > +    };
> > +
> > +    if (((uintptr_t)host | size) % s->mem_region_alignment) {
> > +        return; /* skip unaligned */
> > +    }
> > +
> > +    WITH_QEMU_LOCK_GUARD(&s->lock) {
> > +        ret = blkio_unmap_mem_region(s->blkio, &region);
> > +    }
> 
> The documentation of libblkio says that “memory regions must be
> unmapped/freed with exactly the same `region` field values that they were
> mapped/allocated with.”  We don’t set .fd here, though.

That's a bug. The memory region will not be unmapped because libblkio's
HashSet won't match. I'll fix the QEMU code to pass the exact same
struct blkio_mem_region fields.

> 
> It’s also unclear whether it’s allowed to unmap a region that wasn’t mapped,
> but I’ll trust libblkio to detect that.

Yes, it's a nop.

> 
> > +
> > +    if (ret < 0) {
> > +        error_report_once("Failed to delete blkio mem region %p with size %zu: %s",
> > +                          host, size, blkio_get_error_msg());
> > +    }
> > +}
> > +
> >   static void blkio_parse_filename_io_uring(const char *filename, QDict *options,
> >                                             Error **errp)
> >   {
> 
> [...]
> 
> > @@ -459,7 +553,7 @@ static int blkio_file_open(BlockDriverState *bs, QDict *options, int flags,
> >           return ret;
> >       }
> > -    bs->supported_write_flags = BDRV_REQ_FUA;
> > +    bs->supported_write_flags = BDRV_REQ_FUA | BDRV_REQ_REGISTERED_BUF;
> 
> Shouldn’t we also report it as a supported read flag then?

Yes, thank you!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2022-08-18 19:49 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-08  4:17 [RFC v3 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
2022-07-08  4:17 ` [RFC v3 1/8] blkio: add io_uring block driver using libblkio Stefan Hajnoczi
2022-07-12 14:23   ` Stefano Garzarella
2022-08-11 16:51     ` Stefan Hajnoczi
2022-07-13 12:05   ` Hanna Reitz
2022-08-11 19:08     ` Stefan Hajnoczi
2022-07-27 19:33   ` Kevin Wolf
2022-08-03 12:25     ` Peter Krempa
2022-08-03 13:30       ` Kevin Wolf
2022-08-11 19:09     ` Stefan Hajnoczi
2022-07-08  4:17 ` [RFC v3 2/8] numa: call ->ram_block_removed() in ram_block_notifer_remove() Stefan Hajnoczi
2022-07-08  4:17 ` [RFC v3 3/8] block: pass size to bdrv_unregister_buf() Stefan Hajnoczi
2022-07-13 14:08   ` Hanna Reitz
2022-07-08  4:17 ` [RFC v3 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag Stefan Hajnoczi
2022-07-14  8:54   ` Hanna Reitz
2022-08-17 20:46     ` Stefan Hajnoczi
2022-07-08  4:17 ` [RFC v3 5/8] block: add BlockRAMRegistrar Stefan Hajnoczi
2022-07-14  9:30   ` Hanna Reitz
2022-08-17 20:51     ` Stefan Hajnoczi
2022-07-08  4:17 ` [RFC v3 6/8] stubs: add memory_region_from_host() and memory_region_get_fd() Stefan Hajnoczi
2022-07-14  9:39   ` Hanna Reitz
2022-07-08  4:17 ` [RFC v3 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization Stefan Hajnoczi
2022-07-12 14:28   ` Stefano Garzarella
2022-08-15 20:52     ` Stefan Hajnoczi
2022-07-14 10:13   ` Hanna Reitz
2022-08-18 19:46     ` Stefan Hajnoczi [this message]
2022-07-08  4:17 ` [RFC v3 8/8] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint Stefan Hajnoczi
2022-07-14 10:16   ` Hanna Reitz
2022-08-15 21:24     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yv6XHAqgx1J3pTTc@fedora \
    --to=stefanha@redhat.com \
    --cc=afaria@redhat.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=f4bug@amsat.org \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    --cc=thuth@redhat.com \
    --cc=v.sementsov-og@mail.ru \
    --cc=vsementsov@yandex-team.ru \
    --cc=wangyanan55@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).