From: Stefan Hajnoczi <stefanha@redhat.com>
To: Mattias Nissler <mnissler@rivosinc.com>
Cc: qemu-devel@nongnu.org, john.levon@nutanix.com,
"Elena Ufimtseva" <elena.ufimtseva@oracle.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"David Hildenbrand" <david@redhat.com>,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Jagannathan Raman" <jag.raman@oracle.com>,
"Peter Xu" <peterx@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH v3 4/5] vfio-user: Message-based DMA support
Date: Thu, 14 Sep 2023 15:04:48 -0400 [thread overview]
Message-ID: <20230914190448.GC1066211@fedora> (raw)
In-Reply-To: <20230907130410.498935-5-mnissler@rivosinc.com>
[-- Attachment #1: Type: text/plain, Size: 6474 bytes --]
On Thu, Sep 07, 2023 at 06:04:09AM -0700, Mattias Nissler wrote:
> Wire up support for DMA for the case where the vfio-user client does not
> provide mmap()-able file descriptors, but DMA requests must be performed
> via the VFIO-user protocol. This installs an indirect memory region,
> which already works for pci_dma_{read,write}, and pci_dma_map works
> thanks to the existing DMA bounce buffering support.
>
> Note that while simple scenarios work with this patch, there's a known
> race condition in libvfio-user that will mess up the communication
> channel. See https://github.com/nutanix/libvfio-user/issues/279 for
> details as well as a proposed fix.
>
> Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
> ---
> hw/remote/trace-events | 2 +
> hw/remote/vfio-user-obj.c | 84 +++++++++++++++++++++++++++++++++++----
> 2 files changed, 79 insertions(+), 7 deletions(-)
>
> diff --git a/hw/remote/trace-events b/hw/remote/trace-events
> index 0d1b7d56a5..358a68fb34 100644
> --- a/hw/remote/trace-events
> +++ b/hw/remote/trace-events
> @@ -9,6 +9,8 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x -> 0x%x"
> vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x <- 0x%x"
> vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
> vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
> +vfu_dma_read(uint64_t gpa, size_t len) "vfu: DMA read 0x%"PRIx64", %zu bytes"
> +vfu_dma_write(uint64_t gpa, size_t len) "vfu: DMA write 0x%"PRIx64", %zu bytes"
> vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
> vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
> vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 8b10c32a3c..cee5e615a9 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -300,6 +300,63 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
> return count;
> }
>
> +static MemTxResult vfu_dma_read(void *opaque, hwaddr addr, uint64_t *val,
> + unsigned size, MemTxAttrs attrs)
> +{
> + MemoryRegion *region = opaque;
> + VfuObject *o = VFU_OBJECT(region->owner);
> + uint8_t buf[sizeof(uint64_t)];
> +
> + trace_vfu_dma_read(region->addr + addr, size);
> +
> + dma_sg_t *sg = alloca(dma_sg_size());
Variable-length arrays have recently been removed from QEMU and
alloca(3) is a similar case. An example is commit
b3c8246750b7077add335559341268f2956f6470 ("hw/nvme: Avoid dynamic stack
allocation").
libvfio-user returns a sane sizeof(struct dma_sg) value so we don't need
to worry about bogus values, so the risk is low here.
However, its hard to scan for and forbid the dangerous alloca(3) calls
when exceptions are made for some alloca(3) uses.
I would avoid alloca(3) and instead use:
g_autofree dma_sg_t *sg = g_new(dma_sg_size(), 1);
> + vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
> + if (vfu_addr_to_sgl(o->vfu_ctx, vfu_addr, size, sg, 1, PROT_READ) < 0 ||
> + vfu_sgl_read(o->vfu_ctx, sg, 1, buf) != 0) {
> + return MEMTX_ERROR;
> + }
> +
> + *val = ldn_he_p(buf, size);
> +
> + return MEMTX_OK;
> +}
> +
> +static MemTxResult vfu_dma_write(void *opaque, hwaddr addr, uint64_t val,
> + unsigned size, MemTxAttrs attrs)
> +{
> + MemoryRegion *region = opaque;
> + VfuObject *o = VFU_OBJECT(region->owner);
> + uint8_t buf[sizeof(uint64_t)];
> +
> + trace_vfu_dma_write(region->addr + addr, size);
> +
> + stn_he_p(buf, size, val);
> +
> + dma_sg_t *sg = alloca(dma_sg_size());
Same here.
> + vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
> + if (vfu_addr_to_sgl(o->vfu_ctx, vfu_addr, size, sg, 1, PROT_WRITE) < 0 ||
> + vfu_sgl_write(o->vfu_ctx, sg, 1, buf) != 0) {
> + return MEMTX_ERROR;
> + }
> +
> + return MEMTX_OK;
> +}
> +
> +static const MemoryRegionOps vfu_dma_ops = {
> + .read_with_attrs = vfu_dma_read,
> + .write_with_attrs = vfu_dma_write,
> + .endianness = DEVICE_HOST_ENDIAN,
> + .valid = {
> + .min_access_size = 1,
> + .max_access_size = 8,
> + .unaligned = true,
> + },
> + .impl = {
> + .min_access_size = 1,
> + .max_access_size = 8,
> + },
> +};
> +
> static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
> {
> VfuObject *o = vfu_get_private(vfu_ctx);
> @@ -308,17 +365,30 @@ static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
> g_autofree char *name = NULL;
> struct iovec *iov = &info->iova;
>
> - if (!info->vaddr) {
> - return;
> - }
> -
> name = g_strdup_printf("mem-%s-%"PRIx64"", o->device,
> - (uint64_t)info->vaddr);
> + (uint64_t)iov->iov_base);
>
> subregion = g_new0(MemoryRegion, 1);
>
> - memory_region_init_ram_ptr(subregion, NULL, name,
> - iov->iov_len, info->vaddr);
> + if (info->vaddr) {
> + memory_region_init_ram_ptr(subregion, OBJECT(o), name,
> + iov->iov_len, info->vaddr);
> + } else {
> + /*
> + * Note that I/O regions' MemoryRegionOps handle accesses of at most 8
> + * bytes at a time, and larger accesses are broken down. However,
> + * many/most DMA accesses are larger than 8 bytes and VFIO-user can
> + * handle large DMA accesses just fine, thus this size restriction
> + * unnecessarily hurts performance, in particular given that each
> + * access causes a round trip on the VFIO-user socket.
> + *
> + * TODO: Investigate how to plumb larger accesses through memory
> + * regions, possibly by amending MemoryRegionOps or by creating a new
> + * memory region type.
> + */
> + memory_region_init_io(subregion, OBJECT(o), &vfu_dma_ops, subregion,
> + name, iov->iov_len);
> + }
>
> dma_as = pci_device_iommu_address_space(o->pci_dev);
>
> --
> 2.34.1
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2023-09-14 19:05 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-07 13:04 [PATCH v3 0/5] Support message-based DMA in vfio-user server Mattias Nissler
2023-09-07 13:04 ` [PATCH v3 1/5] softmmu: Per-AddressSpace bounce buffering Mattias Nissler
2023-09-13 18:30 ` Peter Xu
2023-09-15 8:37 ` Mattias Nissler
2023-09-19 15:54 ` Mattias Nissler
2023-09-07 13:04 ` [PATCH v3 2/5] softmmu: Support concurrent bounce buffers Mattias Nissler
2023-09-13 19:11 ` Peter Xu
2023-09-15 9:32 ` Mattias Nissler
2023-09-15 15:57 ` Peter Xu
2023-09-14 18:49 ` Stefan Hajnoczi
2023-09-15 9:54 ` Mattias Nissler
2023-09-15 20:40 ` Stefan Hajnoczi
2023-09-07 13:04 ` [PATCH v3 3/5] Update subprojects/libvfio-user Mattias Nissler
2023-09-07 13:04 ` [PATCH v3 4/5] vfio-user: Message-based DMA support Mattias Nissler
2023-09-14 19:04 ` Stefan Hajnoczi [this message]
2023-09-15 9:58 ` Mattias Nissler
2023-09-07 13:04 ` [PATCH v3 5/5] vfio-user: Fix config space access byte order Mattias Nissler
2023-09-14 19:34 ` Stefan Hajnoczi
2023-09-14 20:29 ` Philippe Mathieu-Daudé
2023-09-14 20:32 ` Stefan Hajnoczi
2023-09-15 10:24 ` Mattias Nissler
2023-09-14 14:39 ` [PATCH v3 0/5] Support message-based DMA in vfio-user server Stefan Hajnoczi
2023-09-15 8:23 ` Mattias Nissler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230914190448.GC1066211@fedora \
--to=stefanha@redhat.com \
--cc=david@redhat.com \
--cc=elena.ufimtseva@oracle.com \
--cc=jag.raman@oracle.com \
--cc=john.levon@nutanix.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mnissler@rivosinc.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).