All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Denis V. Lunev" <den@openvz.org>, "Peter Xu" <peterx@redhat.com>,
	"Yanan Wang" <wangyanan55@huawei.com>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Richard W.M. Jones" <rjones@redhat.com>,
	qemu-block@nongnu.org, "John Snow" <jsnow@redhat.com>,
	integration@gluster.org,
	"Vladimir Sementsov-Ogievskiy" <v.sementsov-og@mail.ru>,
	"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Raphael Norwitz" <raphael.norwitz@nutanix.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Fam Zheng" <fam@euphon.net>,
	sgarzare@redhat.com, "Alberto Faria" <afaria@redhat.com>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Wen Congyang" <wencongyang2@huawei.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Hanna Reitz" <hreitz@redhat.com>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Jeff Cody" <codyprime@gmail.com>,
	"Xie Changlong" <xiechanglong.d@gmail.com>
Subject: Re: [RFC v4 11/11] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint
Date: Tue, 30 Aug 2022 16:16:24 -0400	[thread overview]
Message-ID: <Yw5wGEhdsztxhV2s@fedora> (raw)
In-Reply-To: <9f6d41c6-6d67-611b-a8b6-2a1a93242ff4@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3886 bytes --]

On Thu, Aug 25, 2022 at 09:43:16AM +0200, David Hildenbrand wrote:
> On 23.08.22 21:22, Stefan Hajnoczi wrote:
> > On Tue, Aug 23, 2022 at 10:01:59AM +0200, David Hildenbrand wrote:
> >> On 23.08.22 00:24, Stefan Hajnoczi wrote:
> >>> Register guest RAM using BlockRAMRegistrar and set the
> >>> BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory
> >>> accesses in I/O requests.
> >>>
> >>> This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely
> >>> on DMA mapping/unmapping.
> >>
> >> Can you explain why we're monitoring RAMRegistrar to hook into "guest
> >> RAM" and not go the usual path of the MemoryListener?
> > 
> > The requirements are similar to VFIO, which uses RAMBlockNotifier. We
> 
> Only VFIO NVME uses RAMBlockNotifier. Ordinary VFIO uses the MemoryListener.
> 
> Maybe the difference is that ordinary VFIO has to replicate the actual
> guest physical memory layout, and VFIO NVME is only interested in
> possible guest RAM inside guest physical memory.
> 
> > need to learn about all guest RAM because that's where I/O buffers are
> > located.
> > 
> > Do you think RAMBlockNotifier should be avoided?
> 
> I assume it depends on the use case. For saying "this might be used for
> I/O" it might be good enough I guess.
> 
> > 
> >> What will BDRV_REQ_REGISTERED_BUF actually do? Pin all guest memory in
> >> the worst case such as io_uring fixed buffers would do ( I hope not ).
> > 
> > BLK_REQ_REGISTERED_BUF is a hint that no bounce buffer is necessary
> > because the I/O buffer is located in memory that was previously
> > registered with bdrv_registered_buf().
> > 
> > The RAMBlockNotifier calls bdrv_register_buf() to let the libblkio
> > driver know about RAM. Some libblkio drivers ignore this hint, io_uring
> > may use the fixed buffers feature, vhost-user sends the shared memory
> > file descriptors to the vhost device server, and VFIO/vhost may pin
> > pages.
> > 
> > So the blkio block driver doesn't add anything new, it's the union of
> > VFIO/vhost/vhost-user/etc memory requirements.
> 
> The issue is if that backend pins memory inside any of these regions.
> Then, you're instantly incompatible to anything the relies on sparse
> RAMBlocks, such as memory ballooning or virtio-mem, and have to properly
> fence it.
> 
> In that case, you'd have to successfully trigger
> ram_block_discard_disable(true) first, before pinning. Who would do that
> now conditionally, just like e.g., VFIO does?
> 
> io_uring fixed buffers would be one such example that pins memory and is
> problematic. vfio (unless on s390x) is another example, as you point out.

Okay, I think libblkio needs to expose a bool property called
"mem-regions-pinned" so QEMU whether or not the registered buffers will
be pinned.

Then the QEMU BlockDriver can do:

  if (mem_regions_pinned) {
      if (ram_block_discard_disable(true) < 0) {
          ...fail to open block device...
      }
  }

Does that sound right?

Is "pinned" the best word to describe this or is there a more general
characteristic we are looking for?

> 
> This has to be treated with care. Another thing to consider is that
> different backends might only support a limited number of such regions.
> I assume there is a way for QEMU to query this limit upfront? It might
> be required for memory hot(un)plug to figure out how many memory slots
> we actually have (for ordinary DIMMs, and if we ever want to make this
> compatible to virtio-mem, it might be required as well when the backend
> pins memory).

Yes, libblkio reports the maximum number of blkio_mem_regions supported
by the device. The property is called "max-mem-regions".

The QEMU BlockDriver currently doesn't use this information. Are there
any QEMU APIs that should be called to propagate this value?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2022-08-30 20:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-22 22:23 [RFC v4 00/11] blkio: add libblkio BlockDriver Stefan Hajnoczi
2022-08-22 22:23 ` [RFC v4 01/11] blkio: add libblkio block driver Stefan Hajnoczi
2022-08-30  7:30   ` Markus Armbruster
2022-08-30 20:18     ` Stefan Hajnoczi
2022-08-22 22:23 ` [RFC v4 02/11] numa: call ->ram_block_removed() in ram_block_notifer_remove() Stefan Hajnoczi
2022-08-22 22:23 ` [RFC v4 03/11] block: pass size to bdrv_unregister_buf() Stefan Hajnoczi
2022-08-22 22:23 ` [RFC v4 04/11] block: use BdrvRequestFlags type for supported flag fields Stefan Hajnoczi
2022-08-22 22:23 ` [RFC v4 05/11] block: add BDRV_REQ_REGISTERED_BUF request flag Stefan Hajnoczi
2022-08-22 22:23 ` [RFC v4 06/11] block: return errors from bdrv_register_buf() Stefan Hajnoczi
2022-08-22 22:23 ` [RFC v4 07/11] block: add BlockRAMRegistrar Stefan Hajnoczi
2022-08-22 22:23 ` [RFC v4 08/11] exec/cpu-common: add qemu_ram_get_fd() Stefan Hajnoczi
2022-08-22 22:24 ` [RFC v4 09/11] stubs: add qemu_ram_block_from_host() and qemu_ram_get_fd() Stefan Hajnoczi
2022-08-22 22:24 ` [RFC v4 10/11] blkio: implement BDRV_REQ_REGISTERED_BUF optimization Stefan Hajnoczi
2022-08-22 22:24 ` [RFC v4 11/11] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint Stefan Hajnoczi
2022-08-23  8:01   ` David Hildenbrand
2022-08-23 19:22     ` Stefan Hajnoczi
2022-08-25  7:43       ` David Hildenbrand
2022-08-30 20:16         ` Stefan Hajnoczi [this message]
2022-09-02  8:06           ` David Hildenbrand
2022-09-05 20:50             ` Stefan Hajnoczi
2022-08-23 17:31 ` [RFC v4 00/11] blkio: add libblkio BlockDriver Vladimir Sementsov-Ogievskiy
2022-08-23 20:35   ` Stefan Hajnoczi
2022-08-31 19:44   ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yw5wGEhdsztxhV2s@fedora \
    --to=stefanha@redhat.com \
    --cc=afaria@redhat.com \
    --cc=armbru@redhat.com \
    --cc=codyprime@gmail.com \
    --cc=david@redhat.com \
    --cc=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=f4bug@amsat.org \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=integration@gluster.org \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=raphael.norwitz@nutanix.com \
    --cc=richard.henderson@linaro.org \
    --cc=rjones@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=thuth@redhat.com \
    --cc=v.sementsov-og@mail.ru \
    --cc=vsementsov@yandex-team.ru \
    --cc=wangyanan55@huawei.com \
    --cc=wencongyang2@huawei.com \
    --cc=xiechanglong.d@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.