* [PATCH 0/3] Support message-based DMA in vfio-user server
@ 2023-07-04 8:06 Mattias Nissler
2023-07-04 8:06 ` [PATCH 1/3] softmmu: Support concurrent bounce buffers Mattias Nissler
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Mattias Nissler @ 2023-07-04 8:06 UTC (permalink / raw)
To: qemu-devel
Cc: Elena Ufimtseva, Jagannathan Raman, Philippe Mathieu-Daudé,
Paolo Bonzini, Peter Xu, stefanha, David Hildenbrand, john.levon,
Mattias Nissler
This series adds basic support for message-based DMA in qemu's vfio-user
server. This is useful for cases where the client does not provide file
descriptors for accessing system memory via memory mappings. My motivating use
case is to hook up device models as PCIe endpoints to a hardware design. This
works by bridging the PCIe transaction layer to vfio-user, and the endpoint
does not access memory directly, but sends memory requests TLPs to the hardware
design in order to perform DMA.
Note that in addition to the 3 commits included, we also need a
subprojects/libvfio-user roll to bring in this bugfix:
https://github.com/nutanix/libvfio-user/commit/bb308a2e8ee9486a4c8b53d8d773f7c8faaeba08
Stefan, can I ask you to kindly update the
https://gitlab.com/qemu-project/libvfio-user mirror? I'll be happy to include
an update to subprojects/libvfio-user.wrap in this series.
Finally, there is some more work required on top of this series to get
message-based DMA to really work well:
* libvfio-user has a long-standing issue where socket communication gets messed
up when messages are sent from both ends at the same time. See
https://github.com/nutanix/libvfio-user/issues/279 for more details. I've
been engaging there and plan to contribute a fix.
* qemu currently breaks down DMA accesses into chunks of size 8 bytes at
maximum, each of which will be handled in a separate vfio-user DMA request
message. This is quite terrible for large DMA acceses, such as when nvme
reads and writes page-sized blocks for example. Thus, I would like to improve
qemu to be able to perform larger accesses, at least for indirect memory
regions. I have something working locally, but since this will likely result
in more involved surgery and discussion, I am leaving this to be addressed in
a separate patch.
Mattias Nissler (3):
softmmu: Support concurrent bounce buffers
softmmu: Remove DMA unmap notification callback
vfio-user: Message-based DMA support
hw/remote/vfio-user-obj.c | 62 ++++++++++++++++--
softmmu/dma-helpers.c | 28 --------
softmmu/physmem.c | 131 ++++++++------------------------------
3 files changed, 83 insertions(+), 138 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] softmmu: Support concurrent bounce buffers
2023-07-04 8:06 [PATCH 0/3] Support message-based DMA in vfio-user server Mattias Nissler
@ 2023-07-04 8:06 ` Mattias Nissler
2023-07-20 18:10 ` Stefan Hajnoczi
2023-07-20 18:14 ` Stefan Hajnoczi
2023-07-04 8:06 ` [PATCH 2/3] softmmu: Remove DMA unmap notification callback Mattias Nissler
` (3 subsequent siblings)
4 siblings, 2 replies; 14+ messages in thread
From: Mattias Nissler @ 2023-07-04 8:06 UTC (permalink / raw)
To: qemu-devel
Cc: Elena Ufimtseva, Jagannathan Raman, Philippe Mathieu-Daudé,
Paolo Bonzini, Peter Xu, stefanha, David Hildenbrand, john.levon,
Mattias Nissler
It is not uncommon for device models to request mapping of several DMA
regions at the same time. An example is igb (and probably other net
devices as well) when a packet is spread across multiple descriptors.
In order to support this when indirect DMA is used, as is the case when
running the device model in a vfio-server process without mmap()-ed DMA,
this change allocates DMA bounce buffers dynamically instead of
supporting only a single buffer.
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
---
softmmu/physmem.c | 74 ++++++++++++++++++++++-------------------------
1 file changed, 35 insertions(+), 39 deletions(-)
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index bda475a719..56130b5a1d 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2904,13 +2904,11 @@ void cpu_flush_icache_range(hwaddr start, hwaddr len)
typedef struct {
MemoryRegion *mr;
- void *buffer;
hwaddr addr;
- hwaddr len;
- bool in_use;
+ uint8_t buffer[];
} BounceBuffer;
-static BounceBuffer bounce;
+static size_t bounce_buffers_in_use;
typedef struct MapClient {
QEMUBH *bh;
@@ -2947,7 +2945,7 @@ void cpu_register_map_client(QEMUBH *bh)
QLIST_INSERT_HEAD(&map_client_list, client, link);
/* Write map_client_list before reading in_use. */
smp_mb();
- if (!qatomic_read(&bounce.in_use)) {
+ if (qatomic_read(&bounce_buffers_in_use)) {
cpu_notify_map_clients_locked();
}
qemu_mutex_unlock(&map_client_list_lock);
@@ -3076,31 +3074,24 @@ void *address_space_map(AddressSpace *as,
RCU_READ_LOCK_GUARD();
fv = address_space_to_flatview(as);
mr = flatview_translate(fv, addr, &xlat, &l, is_write, attrs);
+ memory_region_ref(mr);
if (!memory_access_is_direct(mr, is_write)) {
- if (qatomic_xchg(&bounce.in_use, true)) {
- *plen = 0;
- return NULL;
- }
- /* Avoid unbounded allocations */
- l = MIN(l, TARGET_PAGE_SIZE);
- bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l);
- bounce.addr = addr;
- bounce.len = l;
-
- memory_region_ref(mr);
- bounce.mr = mr;
+ qatomic_inc_fetch(&bounce_buffers_in_use);
+
+ BounceBuffer *bounce = g_malloc(l + sizeof(BounceBuffer));
+ bounce->addr = addr;
+ bounce->mr = mr;
+
if (!is_write) {
flatview_read(fv, addr, MEMTXATTRS_UNSPECIFIED,
- bounce.buffer, l);
+ bounce->buffer, l);
}
*plen = l;
- return bounce.buffer;
+ return bounce->buffer;
}
-
- memory_region_ref(mr);
*plen = flatview_extend_translation(fv, addr, len, mr, xlat,
l, is_write, attrs);
fuzz_dma_read_cb(addr, *plen, mr);
@@ -3114,31 +3105,36 @@ void *address_space_map(AddressSpace *as,
void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
bool is_write, hwaddr access_len)
{
- if (buffer != bounce.buffer) {
- MemoryRegion *mr;
- ram_addr_t addr1;
+ MemoryRegion *mr;
+ ram_addr_t addr1;
+
+ mr = memory_region_from_host(buffer, &addr1);
+ if (mr == NULL) {
+ /*
+ * Must be a bounce buffer (unless the caller passed a pointer which
+ * wasn't returned by address_space_map, which is illegal).
+ */
+ BounceBuffer *bounce = container_of(buffer, BounceBuffer, buffer);
- mr = memory_region_from_host(buffer, &addr1);
- assert(mr != NULL);
if (is_write) {
- invalidate_and_set_dirty(mr, addr1, access_len);
+ address_space_write(as, bounce->addr, MEMTXATTRS_UNSPECIFIED,
+ bounce->buffer, access_len);
}
- if (xen_enabled()) {
- xen_invalidate_map_cache_entry(buffer);
+ memory_region_unref(bounce->mr);
+ g_free(bounce);
+
+ if (qatomic_dec_fetch(&bounce_buffers_in_use) == 1) {
+ cpu_notify_map_clients();
}
- memory_region_unref(mr);
return;
}
+
+ if (xen_enabled()) {
+ xen_invalidate_map_cache_entry(buffer);
+ }
if (is_write) {
- address_space_write(as, bounce.addr, MEMTXATTRS_UNSPECIFIED,
- bounce.buffer, access_len);
- }
- qemu_vfree(bounce.buffer);
- bounce.buffer = NULL;
- memory_region_unref(bounce.mr);
- /* Clear in_use before reading map_client_list. */
- qatomic_set_mb(&bounce.in_use, false);
- cpu_notify_map_clients();
+ invalidate_and_set_dirty(mr, addr1, access_len);
+ }
}
void *cpu_physical_memory_map(hwaddr addr,
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/3] softmmu: Remove DMA unmap notification callback
2023-07-04 8:06 [PATCH 0/3] Support message-based DMA in vfio-user server Mattias Nissler
2023-07-04 8:06 ` [PATCH 1/3] softmmu: Support concurrent bounce buffers Mattias Nissler
@ 2023-07-04 8:06 ` Mattias Nissler
2023-07-20 18:14 ` Stefan Hajnoczi
2023-07-04 8:06 ` [PATCH 3/3] vfio-user: Message-based DMA support Mattias Nissler
` (2 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Mattias Nissler @ 2023-07-04 8:06 UTC (permalink / raw)
To: qemu-devel
Cc: Elena Ufimtseva, Jagannathan Raman, Philippe Mathieu-Daudé,
Paolo Bonzini, Peter Xu, stefanha, David Hildenbrand, john.levon,
Mattias Nissler
According to old commit messages, this was introduced to retry a DMA
operation at a later point in case the single bounce buffer is found to
be busy. This was never used widely - only the dma-helpers code made use
of it, but there are other device models that use multiple DMA mappings
(concurrently) and just failed.
After the improvement to support multiple concurrent bounce buffers,
the condition the notification callback allowed to work around no
longer exists, so we can just remove the logic and simplify the code.
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
---
softmmu/dma-helpers.c | 28 -----------------
softmmu/physmem.c | 71 -------------------------------------------
2 files changed, 99 deletions(-)
diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
index 2463964805..d05d226f11 100644
--- a/softmmu/dma-helpers.c
+++ b/softmmu/dma-helpers.c
@@ -68,23 +68,10 @@ typedef struct {
int sg_cur_index;
dma_addr_t sg_cur_byte;
QEMUIOVector iov;
- QEMUBH *bh;
DMAIOFunc *io_func;
void *io_func_opaque;
} DMAAIOCB;
-static void dma_blk_cb(void *opaque, int ret);
-
-static void reschedule_dma(void *opaque)
-{
- DMAAIOCB *dbs = (DMAAIOCB *)opaque;
-
- assert(!dbs->acb && dbs->bh);
- qemu_bh_delete(dbs->bh);
- dbs->bh = NULL;
- dma_blk_cb(dbs, 0);
-}
-
static void dma_blk_unmap(DMAAIOCB *dbs)
{
int i;
@@ -101,7 +88,6 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
{
trace_dma_complete(dbs, ret, dbs->common.cb);
- assert(!dbs->acb && !dbs->bh);
dma_blk_unmap(dbs);
if (dbs->common.cb) {
dbs->common.cb(dbs->common.opaque, ret);
@@ -164,13 +150,6 @@ static void dma_blk_cb(void *opaque, int ret)
}
}
- if (dbs->iov.size == 0) {
- trace_dma_map_wait(dbs);
- dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs);
- cpu_register_map_client(dbs->bh);
- goto out;
- }
-
if (!QEMU_IS_ALIGNED(dbs->iov.size, dbs->align)) {
qemu_iovec_discard_back(&dbs->iov,
QEMU_ALIGN_DOWN(dbs->iov.size, dbs->align));
@@ -189,18 +168,12 @@ static void dma_aio_cancel(BlockAIOCB *acb)
trace_dma_aio_cancel(dbs);
- assert(!(dbs->acb && dbs->bh));
if (dbs->acb) {
/* This will invoke dma_blk_cb. */
blk_aio_cancel_async(dbs->acb);
return;
}
- if (dbs->bh) {
- cpu_unregister_map_client(dbs->bh);
- qemu_bh_delete(dbs->bh);
- dbs->bh = NULL;
- }
if (dbs->common.cb) {
dbs->common.cb(dbs->common.opaque, -ECANCELED);
}
@@ -239,7 +212,6 @@ BlockAIOCB *dma_blk_io(AioContext *ctx,
dbs->dir = dir;
dbs->io_func = io_func;
dbs->io_func_opaque = io_func_opaque;
- dbs->bh = NULL;
qemu_iovec_init(&dbs->iov, sg->nsg);
dma_blk_cb(dbs, 0);
return &dbs->common;
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 56130b5a1d..2b4123c127 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2908,49 +2908,6 @@ typedef struct {
uint8_t buffer[];
} BounceBuffer;
-static size_t bounce_buffers_in_use;
-
-typedef struct MapClient {
- QEMUBH *bh;
- QLIST_ENTRY(MapClient) link;
-} MapClient;
-
-QemuMutex map_client_list_lock;
-static QLIST_HEAD(, MapClient) map_client_list
- = QLIST_HEAD_INITIALIZER(map_client_list);
-
-static void cpu_unregister_map_client_do(MapClient *client)
-{
- QLIST_REMOVE(client, link);
- g_free(client);
-}
-
-static void cpu_notify_map_clients_locked(void)
-{
- MapClient *client;
-
- while (!QLIST_EMPTY(&map_client_list)) {
- client = QLIST_FIRST(&map_client_list);
- qemu_bh_schedule(client->bh);
- cpu_unregister_map_client_do(client);
- }
-}
-
-void cpu_register_map_client(QEMUBH *bh)
-{
- MapClient *client = g_malloc(sizeof(*client));
-
- qemu_mutex_lock(&map_client_list_lock);
- client->bh = bh;
- QLIST_INSERT_HEAD(&map_client_list, client, link);
- /* Write map_client_list before reading in_use. */
- smp_mb();
- if (qatomic_read(&bounce_buffers_in_use)) {
- cpu_notify_map_clients_locked();
- }
- qemu_mutex_unlock(&map_client_list_lock);
-}
-
void cpu_exec_init_all(void)
{
qemu_mutex_init(&ram_list.mutex);
@@ -2964,28 +2921,6 @@ void cpu_exec_init_all(void)
finalize_target_page_bits();
io_mem_init();
memory_map_init();
- qemu_mutex_init(&map_client_list_lock);
-}
-
-void cpu_unregister_map_client(QEMUBH *bh)
-{
- MapClient *client;
-
- qemu_mutex_lock(&map_client_list_lock);
- QLIST_FOREACH(client, &map_client_list, link) {
- if (client->bh == bh) {
- cpu_unregister_map_client_do(client);
- break;
- }
- }
- qemu_mutex_unlock(&map_client_list_lock);
-}
-
-static void cpu_notify_map_clients(void)
-{
- qemu_mutex_lock(&map_client_list_lock);
- cpu_notify_map_clients_locked();
- qemu_mutex_unlock(&map_client_list_lock);
}
static bool flatview_access_valid(FlatView *fv, hwaddr addr, hwaddr len,
@@ -3077,8 +3012,6 @@ void *address_space_map(AddressSpace *as,
memory_region_ref(mr);
if (!memory_access_is_direct(mr, is_write)) {
- qatomic_inc_fetch(&bounce_buffers_in_use);
-
BounceBuffer *bounce = g_malloc(l + sizeof(BounceBuffer));
bounce->addr = addr;
bounce->mr = mr;
@@ -3122,10 +3055,6 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
}
memory_region_unref(bounce->mr);
g_free(bounce);
-
- if (qatomic_dec_fetch(&bounce_buffers_in_use) == 1) {
- cpu_notify_map_clients();
- }
return;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/3] vfio-user: Message-based DMA support
2023-07-04 8:06 [PATCH 0/3] Support message-based DMA in vfio-user server Mattias Nissler
2023-07-04 8:06 ` [PATCH 1/3] softmmu: Support concurrent bounce buffers Mattias Nissler
2023-07-04 8:06 ` [PATCH 2/3] softmmu: Remove DMA unmap notification callback Mattias Nissler
@ 2023-07-04 8:06 ` Mattias Nissler
2023-07-20 18:32 ` Stefan Hajnoczi
2023-07-04 8:20 ` [PATCH 0/3] Support message-based DMA in vfio-user server David Hildenbrand
2023-07-20 18:41 ` Stefan Hajnoczi
4 siblings, 1 reply; 14+ messages in thread
From: Mattias Nissler @ 2023-07-04 8:06 UTC (permalink / raw)
To: qemu-devel
Cc: Elena Ufimtseva, Jagannathan Raman, Philippe Mathieu-Daudé,
Paolo Bonzini, Peter Xu, stefanha, David Hildenbrand, john.levon,
Mattias Nissler
Wire up support for DMA for the case where the vfio-user client does not
provide mmap()-able file descriptors, but DMA requests must be performed
via the VFIO-user protocol. This installs an indirect memory region,
which already works for pci_dma_{read,write}, and pci_dma_map works
thanks to the existing DMA bounce buffering support.
Note that while simple scenarios work with this patch, there's a known
race condition in libvfio-user that will mess up the communication
channel: https://github.com/nutanix/libvfio-user/issues/279 I intend to
contribute a fix for this problem, see discussion on the github issue
for more details.
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
---
hw/remote/vfio-user-obj.c | 62 ++++++++++++++++++++++++++++++++++-----
1 file changed, 55 insertions(+), 7 deletions(-)
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 8b10c32a3c..9799580c77 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -300,6 +300,53 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
return count;
}
+static MemTxResult vfu_dma_read(void *opaque, hwaddr addr, uint64_t *val,
+ unsigned size, MemTxAttrs attrs)
+{
+ MemoryRegion *region = opaque;
+ VfuObject *o = VFU_OBJECT(region->owner);
+
+ dma_sg_t *sg = alloca(dma_sg_size());
+ vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
+ if (vfu_addr_to_sgl(o->vfu_ctx, vfu_addr, size, sg, 1, PROT_READ) < 0 ||
+ vfu_sgl_read(o->vfu_ctx, sg, 1, val) != 0) {
+ return MEMTX_ERROR;
+ }
+
+ return MEMTX_OK;
+}
+
+static MemTxResult vfu_dma_write(void *opaque, hwaddr addr, uint64_t val,
+ unsigned size, MemTxAttrs attrs)
+{
+ MemoryRegion *region = opaque;
+ VfuObject *o = VFU_OBJECT(region->owner);
+
+ dma_sg_t *sg = alloca(dma_sg_size());
+ vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
+ if (vfu_addr_to_sgl(o->vfu_ctx, vfu_addr, size, sg, 1, PROT_WRITE) < 0 ||
+ vfu_sgl_write(o->vfu_ctx, sg, 1, &val) != 0) {
+ return MEMTX_ERROR;
+ }
+
+ return MEMTX_OK;
+}
+
+static const MemoryRegionOps vfu_dma_ops = {
+ .read_with_attrs = vfu_dma_read,
+ .write_with_attrs = vfu_dma_write,
+ .endianness = DEVICE_NATIVE_ENDIAN,
+ .valid = {
+ .min_access_size = 1,
+ .max_access_size = 8,
+ .unaligned = true,
+ },
+ .impl = {
+ .min_access_size = 1,
+ .max_access_size = 8,
+ },
+};
+
static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
{
VfuObject *o = vfu_get_private(vfu_ctx);
@@ -308,17 +355,18 @@ static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
g_autofree char *name = NULL;
struct iovec *iov = &info->iova;
- if (!info->vaddr) {
- return;
- }
-
name = g_strdup_printf("mem-%s-%"PRIx64"", o->device,
- (uint64_t)info->vaddr);
+ (uint64_t)iov->iov_base);
subregion = g_new0(MemoryRegion, 1);
- memory_region_init_ram_ptr(subregion, NULL, name,
- iov->iov_len, info->vaddr);
+ if (info->vaddr) {
+ memory_region_init_ram_ptr(subregion, OBJECT(o), name,
+ iov->iov_len, info->vaddr);
+ } else {
+ memory_region_init_io(subregion, OBJECT(o), &vfu_dma_ops, subregion,
+ name, iov->iov_len);
+ }
dma_as = pci_device_iommu_address_space(o->pci_dev);
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] Support message-based DMA in vfio-user server
2023-07-04 8:06 [PATCH 0/3] Support message-based DMA in vfio-user server Mattias Nissler
` (2 preceding siblings ...)
2023-07-04 8:06 ` [PATCH 3/3] vfio-user: Message-based DMA support Mattias Nissler
@ 2023-07-04 8:20 ` David Hildenbrand
2023-07-20 18:41 ` Stefan Hajnoczi
4 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2023-07-04 8:20 UTC (permalink / raw)
To: Mattias Nissler, qemu-devel
Cc: Elena Ufimtseva, Jagannathan Raman, Philippe Mathieu-Daudé,
Paolo Bonzini, Peter Xu, stefanha, john.levon
On 04.07.23 10:06, Mattias Nissler wrote:
> This series adds basic support for message-based DMA in qemu's vfio-user
> server. This is useful for cases where the client does not provide file
> descriptors for accessing system memory via memory mappings. My motivating use
> case is to hook up device models as PCIe endpoints to a hardware design. This
> works by bridging the PCIe transaction layer to vfio-user, and the endpoint
> does not access memory directly, but sends memory requests TLPs to the hardware
> design in order to perform DMA.
>
> Note that in addition to the 3 commits included, we also need a
> subprojects/libvfio-user roll to bring in this bugfix:
> https://github.com/nutanix/libvfio-user/commit/bb308a2e8ee9486a4c8b53d8d773f7c8faaeba08
> Stefan, can I ask you to kindly update the
> https://gitlab.com/qemu-project/libvfio-user mirror? I'll be happy to include
> an update to subprojects/libvfio-user.wrap in this series.
>
> Finally, there is some more work required on top of this series to get
> message-based DMA to really work well:
>
> * libvfio-user has a long-standing issue where socket communication gets messed
> up when messages are sent from both ends at the same time. See
> https://github.com/nutanix/libvfio-user/issues/279 for more details. I've
> been engaging there and plan to contribute a fix.
>
> * qemu currently breaks down DMA accesses into chunks of size 8 bytes at
> maximum, each of which will be handled in a separate vfio-user DMA request
> message. This is quite terrible for large DMA acceses, such as when nvme
> reads and writes page-sized blocks for example. Thus, I would like to improve
> qemu to be able to perform larger accesses, at least for indirect memory
> regions. I have something working locally, but since this will likely result
> in more involved surgery and discussion, I am leaving this to be addressed in
> a separate patch.
>
I remember asking Stefan in the past if there wouldn't be a way to avoid
that mmap dance (and also handle uffd etc. easier) for vhost-user
(especially, virtiofsd) by only making QEMU access guest memory.
That could make memory-backend-ram support something like vhost-user,
avoiding shared memory and everything that comes with that (e.g., no
KSM, no shared zeropage).
So this series tackles vfio-user, does anybody know what it would take
to get something similar running for vhost-user?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] softmmu: Support concurrent bounce buffers
2023-07-04 8:06 ` [PATCH 1/3] softmmu: Support concurrent bounce buffers Mattias Nissler
@ 2023-07-20 18:10 ` Stefan Hajnoczi
2023-08-23 9:27 ` Mattias Nissler
2023-07-20 18:14 ` Stefan Hajnoczi
1 sibling, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2023-07-20 18:10 UTC (permalink / raw)
To: Mattias Nissler
Cc: qemu-devel, Elena Ufimtseva, Jagannathan Raman,
Philippe Mathieu-Daudé, Paolo Bonzini, Peter Xu,
David Hildenbrand, john.levon
[-- Attachment #1: Type: text/plain, Size: 5696 bytes --]
On Tue, Jul 04, 2023 at 01:06:25AM -0700, Mattias Nissler wrote:
> It is not uncommon for device models to request mapping of several DMA
> regions at the same time. An example is igb (and probably other net
> devices as well) when a packet is spread across multiple descriptors.
>
> In order to support this when indirect DMA is used, as is the case when
> running the device model in a vfio-server process without mmap()-ed DMA,
> this change allocates DMA bounce buffers dynamically instead of
> supporting only a single buffer.
>
> Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
> ---
> softmmu/physmem.c | 74 ++++++++++++++++++++++-------------------------
> 1 file changed, 35 insertions(+), 39 deletions(-)
Is this a functional change or purely a performance optimization? If
it's a performance optimization, please include benchmark results to
justify this change.
QEMU memory allocations must be bounded so that an untrusted guest
cannot cause QEMU to exhaust host memory. There must be a limit to the
amount of bounce buffer memory.
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index bda475a719..56130b5a1d 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -2904,13 +2904,11 @@ void cpu_flush_icache_range(hwaddr start, hwaddr len)
>
> typedef struct {
> MemoryRegion *mr;
> - void *buffer;
> hwaddr addr;
> - hwaddr len;
> - bool in_use;
> + uint8_t buffer[];
> } BounceBuffer;
>
> -static BounceBuffer bounce;
> +static size_t bounce_buffers_in_use;
>
> typedef struct MapClient {
> QEMUBH *bh;
> @@ -2947,7 +2945,7 @@ void cpu_register_map_client(QEMUBH *bh)
> QLIST_INSERT_HEAD(&map_client_list, client, link);
> /* Write map_client_list before reading in_use. */
> smp_mb();
> - if (!qatomic_read(&bounce.in_use)) {
> + if (qatomic_read(&bounce_buffers_in_use)) {
> cpu_notify_map_clients_locked();
> }
> qemu_mutex_unlock(&map_client_list_lock);
> @@ -3076,31 +3074,24 @@ void *address_space_map(AddressSpace *as,
> RCU_READ_LOCK_GUARD();
> fv = address_space_to_flatview(as);
> mr = flatview_translate(fv, addr, &xlat, &l, is_write, attrs);
> + memory_region_ref(mr);
>
> if (!memory_access_is_direct(mr, is_write)) {
> - if (qatomic_xchg(&bounce.in_use, true)) {
> - *plen = 0;
> - return NULL;
> - }
> - /* Avoid unbounded allocations */
> - l = MIN(l, TARGET_PAGE_SIZE);
> - bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l);
> - bounce.addr = addr;
> - bounce.len = l;
> -
> - memory_region_ref(mr);
> - bounce.mr = mr;
> + qatomic_inc_fetch(&bounce_buffers_in_use);
> +
> + BounceBuffer *bounce = g_malloc(l + sizeof(BounceBuffer));
> + bounce->addr = addr;
> + bounce->mr = mr;
> +
> if (!is_write) {
> flatview_read(fv, addr, MEMTXATTRS_UNSPECIFIED,
> - bounce.buffer, l);
> + bounce->buffer, l);
> }
>
> *plen = l;
> - return bounce.buffer;
> + return bounce->buffer;
Bounce buffer allocation always succeeds now. Can the
cpu_notify_map_clients*() be removed now that no one is waiting for
bounce buffers anymore?
> }
>
> -
> - memory_region_ref(mr);
> *plen = flatview_extend_translation(fv, addr, len, mr, xlat,
> l, is_write, attrs);
> fuzz_dma_read_cb(addr, *plen, mr);
> @@ -3114,31 +3105,36 @@ void *address_space_map(AddressSpace *as,
> void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
> bool is_write, hwaddr access_len)
> {
> - if (buffer != bounce.buffer) {
> - MemoryRegion *mr;
> - ram_addr_t addr1;
> + MemoryRegion *mr;
> + ram_addr_t addr1;
> +
> + mr = memory_region_from_host(buffer, &addr1);
> + if (mr == NULL) {
> + /*
> + * Must be a bounce buffer (unless the caller passed a pointer which
> + * wasn't returned by address_space_map, which is illegal).
> + */
> + BounceBuffer *bounce = container_of(buffer, BounceBuffer, buffer);
>
> - mr = memory_region_from_host(buffer, &addr1);
> - assert(mr != NULL);
> if (is_write) {
> - invalidate_and_set_dirty(mr, addr1, access_len);
> + address_space_write(as, bounce->addr, MEMTXATTRS_UNSPECIFIED,
> + bounce->buffer, access_len);
> }
> - if (xen_enabled()) {
> - xen_invalidate_map_cache_entry(buffer);
> + memory_region_unref(bounce->mr);
> + g_free(bounce);
> +
> + if (qatomic_dec_fetch(&bounce_buffers_in_use) == 1) {
> + cpu_notify_map_clients();
> }
> - memory_region_unref(mr);
> return;
> }
> +
> + if (xen_enabled()) {
> + xen_invalidate_map_cache_entry(buffer);
> + }
> if (is_write) {
> - address_space_write(as, bounce.addr, MEMTXATTRS_UNSPECIFIED,
> - bounce.buffer, access_len);
> - }
> - qemu_vfree(bounce.buffer);
> - bounce.buffer = NULL;
> - memory_region_unref(bounce.mr);
> - /* Clear in_use before reading map_client_list. */
> - qatomic_set_mb(&bounce.in_use, false);
> - cpu_notify_map_clients();
> + invalidate_and_set_dirty(mr, addr1, access_len);
> + }
> }
>
> void *cpu_physical_memory_map(hwaddr addr,
> --
> 2.34.1
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] softmmu: Remove DMA unmap notification callback
2023-07-04 8:06 ` [PATCH 2/3] softmmu: Remove DMA unmap notification callback Mattias Nissler
@ 2023-07-20 18:14 ` Stefan Hajnoczi
2023-08-23 9:28 ` Mattias Nissler
0 siblings, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2023-07-20 18:14 UTC (permalink / raw)
To: Mattias Nissler
Cc: qemu-devel, Elena Ufimtseva, Jagannathan Raman,
Philippe Mathieu-Daudé, Paolo Bonzini, Peter Xu,
David Hildenbrand, john.levon
[-- Attachment #1: Type: text/plain, Size: 6458 bytes --]
On Tue, Jul 04, 2023 at 01:06:26AM -0700, Mattias Nissler wrote:
> According to old commit messages, this was introduced to retry a DMA
> operation at a later point in case the single bounce buffer is found to
> be busy. This was never used widely - only the dma-helpers code made use
> of it, but there are other device models that use multiple DMA mappings
> (concurrently) and just failed.
>
> After the improvement to support multiple concurrent bounce buffers,
> the condition the notification callback allowed to work around no
> longer exists, so we can just remove the logic and simplify the code.
>
> Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
> ---
> softmmu/dma-helpers.c | 28 -----------------
> softmmu/physmem.c | 71 -------------------------------------------
> 2 files changed, 99 deletions(-)
I'm not sure if it will be possible to remove this once a limit is
placed bounce buffer space.
>
> diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
> index 2463964805..d05d226f11 100644
> --- a/softmmu/dma-helpers.c
> +++ b/softmmu/dma-helpers.c
> @@ -68,23 +68,10 @@ typedef struct {
> int sg_cur_index;
> dma_addr_t sg_cur_byte;
> QEMUIOVector iov;
> - QEMUBH *bh;
> DMAIOFunc *io_func;
> void *io_func_opaque;
> } DMAAIOCB;
>
> -static void dma_blk_cb(void *opaque, int ret);
> -
> -static void reschedule_dma(void *opaque)
> -{
> - DMAAIOCB *dbs = (DMAAIOCB *)opaque;
> -
> - assert(!dbs->acb && dbs->bh);
> - qemu_bh_delete(dbs->bh);
> - dbs->bh = NULL;
> - dma_blk_cb(dbs, 0);
> -}
> -
> static void dma_blk_unmap(DMAAIOCB *dbs)
> {
> int i;
> @@ -101,7 +88,6 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
> {
> trace_dma_complete(dbs, ret, dbs->common.cb);
>
> - assert(!dbs->acb && !dbs->bh);
> dma_blk_unmap(dbs);
> if (dbs->common.cb) {
> dbs->common.cb(dbs->common.opaque, ret);
> @@ -164,13 +150,6 @@ static void dma_blk_cb(void *opaque, int ret)
> }
> }
>
> - if (dbs->iov.size == 0) {
> - trace_dma_map_wait(dbs);
> - dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs);
> - cpu_register_map_client(dbs->bh);
> - goto out;
> - }
> -
> if (!QEMU_IS_ALIGNED(dbs->iov.size, dbs->align)) {
> qemu_iovec_discard_back(&dbs->iov,
> QEMU_ALIGN_DOWN(dbs->iov.size, dbs->align));
> @@ -189,18 +168,12 @@ static void dma_aio_cancel(BlockAIOCB *acb)
>
> trace_dma_aio_cancel(dbs);
>
> - assert(!(dbs->acb && dbs->bh));
> if (dbs->acb) {
> /* This will invoke dma_blk_cb. */
> blk_aio_cancel_async(dbs->acb);
> return;
> }
>
> - if (dbs->bh) {
> - cpu_unregister_map_client(dbs->bh);
> - qemu_bh_delete(dbs->bh);
> - dbs->bh = NULL;
> - }
> if (dbs->common.cb) {
> dbs->common.cb(dbs->common.opaque, -ECANCELED);
> }
> @@ -239,7 +212,6 @@ BlockAIOCB *dma_blk_io(AioContext *ctx,
> dbs->dir = dir;
> dbs->io_func = io_func;
> dbs->io_func_opaque = io_func_opaque;
> - dbs->bh = NULL;
> qemu_iovec_init(&dbs->iov, sg->nsg);
> dma_blk_cb(dbs, 0);
> return &dbs->common;
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 56130b5a1d..2b4123c127 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -2908,49 +2908,6 @@ typedef struct {
> uint8_t buffer[];
> } BounceBuffer;
>
> -static size_t bounce_buffers_in_use;
> -
> -typedef struct MapClient {
> - QEMUBH *bh;
> - QLIST_ENTRY(MapClient) link;
> -} MapClient;
> -
> -QemuMutex map_client_list_lock;
> -static QLIST_HEAD(, MapClient) map_client_list
> - = QLIST_HEAD_INITIALIZER(map_client_list);
> -
> -static void cpu_unregister_map_client_do(MapClient *client)
> -{
> - QLIST_REMOVE(client, link);
> - g_free(client);
> -}
> -
> -static void cpu_notify_map_clients_locked(void)
> -{
> - MapClient *client;
> -
> - while (!QLIST_EMPTY(&map_client_list)) {
> - client = QLIST_FIRST(&map_client_list);
> - qemu_bh_schedule(client->bh);
> - cpu_unregister_map_client_do(client);
> - }
> -}
> -
> -void cpu_register_map_client(QEMUBH *bh)
> -{
> - MapClient *client = g_malloc(sizeof(*client));
> -
> - qemu_mutex_lock(&map_client_list_lock);
> - client->bh = bh;
> - QLIST_INSERT_HEAD(&map_client_list, client, link);
> - /* Write map_client_list before reading in_use. */
> - smp_mb();
> - if (qatomic_read(&bounce_buffers_in_use)) {
> - cpu_notify_map_clients_locked();
> - }
> - qemu_mutex_unlock(&map_client_list_lock);
> -}
> -
> void cpu_exec_init_all(void)
> {
> qemu_mutex_init(&ram_list.mutex);
> @@ -2964,28 +2921,6 @@ void cpu_exec_init_all(void)
> finalize_target_page_bits();
> io_mem_init();
> memory_map_init();
> - qemu_mutex_init(&map_client_list_lock);
> -}
> -
> -void cpu_unregister_map_client(QEMUBH *bh)
> -{
> - MapClient *client;
> -
> - qemu_mutex_lock(&map_client_list_lock);
> - QLIST_FOREACH(client, &map_client_list, link) {
> - if (client->bh == bh) {
> - cpu_unregister_map_client_do(client);
> - break;
> - }
> - }
> - qemu_mutex_unlock(&map_client_list_lock);
> -}
> -
> -static void cpu_notify_map_clients(void)
> -{
> - qemu_mutex_lock(&map_client_list_lock);
> - cpu_notify_map_clients_locked();
> - qemu_mutex_unlock(&map_client_list_lock);
> }
>
> static bool flatview_access_valid(FlatView *fv, hwaddr addr, hwaddr len,
> @@ -3077,8 +3012,6 @@ void *address_space_map(AddressSpace *as,
> memory_region_ref(mr);
>
> if (!memory_access_is_direct(mr, is_write)) {
> - qatomic_inc_fetch(&bounce_buffers_in_use);
> -
> BounceBuffer *bounce = g_malloc(l + sizeof(BounceBuffer));
> bounce->addr = addr;
> bounce->mr = mr;
> @@ -3122,10 +3055,6 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
> }
> memory_region_unref(bounce->mr);
> g_free(bounce);
> -
> - if (qatomic_dec_fetch(&bounce_buffers_in_use) == 1) {
> - cpu_notify_map_clients();
> - }
> return;
> }
>
> --
> 2.34.1
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] softmmu: Support concurrent bounce buffers
2023-07-04 8:06 ` [PATCH 1/3] softmmu: Support concurrent bounce buffers Mattias Nissler
2023-07-20 18:10 ` Stefan Hajnoczi
@ 2023-07-20 18:14 ` Stefan Hajnoczi
1 sibling, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2023-07-20 18:14 UTC (permalink / raw)
To: Mattias Nissler
Cc: qemu-devel, Elena Ufimtseva, Jagannathan Raman,
Philippe Mathieu-Daudé, Paolo Bonzini, Peter Xu,
David Hildenbrand, john.levon
[-- Attachment #1: Type: text/plain, Size: 271 bytes --]
On Tue, Jul 04, 2023 at 01:06:25AM -0700, Mattias Nissler wrote:
> + if (qatomic_dec_fetch(&bounce_buffers_in_use) == 1) {
> + cpu_notify_map_clients();
> }
About my comment regarding removing this API: I see the next patch does
that.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] vfio-user: Message-based DMA support
2023-07-04 8:06 ` [PATCH 3/3] vfio-user: Message-based DMA support Mattias Nissler
@ 2023-07-20 18:32 ` Stefan Hajnoczi
2023-08-23 9:28 ` Mattias Nissler
0 siblings, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2023-07-20 18:32 UTC (permalink / raw)
To: Mattias Nissler
Cc: qemu-devel, Elena Ufimtseva, Jagannathan Raman,
Philippe Mathieu-Daudé, Paolo Bonzini, Peter Xu,
David Hildenbrand, john.levon
[-- Attachment #1: Type: text/plain, Size: 2556 bytes --]
On Tue, Jul 04, 2023 at 01:06:27AM -0700, Mattias Nissler wrote:
> Wire up support for DMA for the case where the vfio-user client does not
> provide mmap()-able file descriptors, but DMA requests must be performed
> via the VFIO-user protocol. This installs an indirect memory region,
> which already works for pci_dma_{read,write}, and pci_dma_map works
> thanks to the existing DMA bounce buffering support.
>
> Note that while simple scenarios work with this patch, there's a known
> race condition in libvfio-user that will mess up the communication
> channel: https://github.com/nutanix/libvfio-user/issues/279 I intend to
> contribute a fix for this problem, see discussion on the github issue
> for more details.
>
> Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
> ---
> hw/remote/vfio-user-obj.c | 62 ++++++++++++++++++++++++++++++++++-----
> 1 file changed, 55 insertions(+), 7 deletions(-)
>
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 8b10c32a3c..9799580c77 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -300,6 +300,53 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
> return count;
> }
>
> +static MemTxResult vfu_dma_read(void *opaque, hwaddr addr, uint64_t *val,
> + unsigned size, MemTxAttrs attrs)
> +{
> + MemoryRegion *region = opaque;
> + VfuObject *o = VFU_OBJECT(region->owner);
> +
> + dma_sg_t *sg = alloca(dma_sg_size());
> + vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
> + if (vfu_addr_to_sgl(o->vfu_ctx, vfu_addr, size, sg, 1, PROT_READ) < 0 ||
> + vfu_sgl_read(o->vfu_ctx, sg, 1, val) != 0) {
Does this work on big-endian host CPUs? It looks like reading 0x12345678
into uint64_t val would result in *val = 0x12345678xxxxxxxx instead of
0x12345678.
> + return MEMTX_ERROR;
> + }
> +
> + return MEMTX_OK;
> +}
> +
> +static MemTxResult vfu_dma_write(void *opaque, hwaddr addr, uint64_t val,
> + unsigned size, MemTxAttrs attrs)
> +{
> + MemoryRegion *region = opaque;
> + VfuObject *o = VFU_OBJECT(region->owner);
> +
> + dma_sg_t *sg = alloca(dma_sg_size());
> + vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
> + if (vfu_addr_to_sgl(o->vfu_ctx, vfu_addr, size, sg, 1, PROT_WRITE) < 0 ||
> + vfu_sgl_write(o->vfu_ctx, sg, 1, &val) != 0) {
Same potential endianness issue here.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] Support message-based DMA in vfio-user server
2023-07-04 8:06 [PATCH 0/3] Support message-based DMA in vfio-user server Mattias Nissler
` (3 preceding siblings ...)
2023-07-04 8:20 ` [PATCH 0/3] Support message-based DMA in vfio-user server David Hildenbrand
@ 2023-07-20 18:41 ` Stefan Hajnoczi
2023-07-20 22:10 ` Mattias Nissler
4 siblings, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2023-07-20 18:41 UTC (permalink / raw)
To: Mattias Nissler
Cc: qemu-devel, Elena Ufimtseva, Jagannathan Raman,
Philippe Mathieu-Daudé, Paolo Bonzini, Peter Xu,
David Hildenbrand, john.levon
[-- Attachment #1: Type: text/plain, Size: 2902 bytes --]
On Tue, Jul 04, 2023 at 01:06:24AM -0700, Mattias Nissler wrote:
> This series adds basic support for message-based DMA in qemu's vfio-user
> server. This is useful for cases where the client does not provide file
> descriptors for accessing system memory via memory mappings. My motivating use
> case is to hook up device models as PCIe endpoints to a hardware design. This
> works by bridging the PCIe transaction layer to vfio-user, and the endpoint
> does not access memory directly, but sends memory requests TLPs to the hardware
> design in order to perform DMA.
>
> Note that in addition to the 3 commits included, we also need a
> subprojects/libvfio-user roll to bring in this bugfix:
> https://github.com/nutanix/libvfio-user/commit/bb308a2e8ee9486a4c8b53d8d773f7c8faaeba08
> Stefan, can I ask you to kindly update the
> https://gitlab.com/qemu-project/libvfio-user mirror? I'll be happy to include
> an update to subprojects/libvfio-user.wrap in this series.
Done:
https://gitlab.com/qemu-project/libvfio-user/-/commits/master
Repository mirroring is automated now, so new upstream commits will
appear in the QEMU mirror repository from now on.
>
> Finally, there is some more work required on top of this series to get
> message-based DMA to really work well:
>
> * libvfio-user has a long-standing issue where socket communication gets messed
> up when messages are sent from both ends at the same time. See
> https://github.com/nutanix/libvfio-user/issues/279 for more details. I've
> been engaging there and plan to contribute a fix.
>
> * qemu currently breaks down DMA accesses into chunks of size 8 bytes at
> maximum, each of which will be handled in a separate vfio-user DMA request
> message. This is quite terrible for large DMA acceses, such as when nvme
> reads and writes page-sized blocks for example. Thus, I would like to improve
> qemu to be able to perform larger accesses, at least for indirect memory
> regions. I have something working locally, but since this will likely result
> in more involved surgery and discussion, I am leaving this to be addressed in
> a separate patch.
>
> Mattias Nissler (3):
> softmmu: Support concurrent bounce buffers
> softmmu: Remove DMA unmap notification callback
> vfio-user: Message-based DMA support
>
> hw/remote/vfio-user-obj.c | 62 ++++++++++++++++--
> softmmu/dma-helpers.c | 28 --------
> softmmu/physmem.c | 131 ++++++++------------------------------
> 3 files changed, 83 insertions(+), 138 deletions(-)
Sorry for the late review. I was on vacation and am catching up on
emails.
Paolo worked on the QEMU memory API and can give input on how to make
this efficient for large DMA accesses. There is a chance that memory
dispatch with larger sizes will be needed for ENQCMD CPU instruction
emulation too.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] Support message-based DMA in vfio-user server
2023-07-20 18:41 ` Stefan Hajnoczi
@ 2023-07-20 22:10 ` Mattias Nissler
0 siblings, 0 replies; 14+ messages in thread
From: Mattias Nissler @ 2023-07-20 22:10 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Elena Ufimtseva, Jagannathan Raman,
Philippe Mathieu-Daudé, Paolo Bonzini, Peter Xu,
David Hildenbrand, john.levon
Stefan,
I hope you had a great vacation!
Thanks for updating the mirror and your review. Your comments all make
sense, and I will address your input when I find time - just a quick
ack now since I'm travelling next week and will be on vacation the
first half of August, so it might be a while.
Thanks,
Mattias
On Thu, Jul 20, 2023 at 8:41 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Jul 04, 2023 at 01:06:24AM -0700, Mattias Nissler wrote:
> > This series adds basic support for message-based DMA in qemu's vfio-user
> > server. This is useful for cases where the client does not provide file
> > descriptors for accessing system memory via memory mappings. My motivating use
> > case is to hook up device models as PCIe endpoints to a hardware design. This
> > works by bridging the PCIe transaction layer to vfio-user, and the endpoint
> > does not access memory directly, but sends memory requests TLPs to the hardware
> > design in order to perform DMA.
> >
> > Note that in addition to the 3 commits included, we also need a
> > subprojects/libvfio-user roll to bring in this bugfix:
> > https://github.com/nutanix/libvfio-user/commit/bb308a2e8ee9486a4c8b53d8d773f7c8faaeba08
> > Stefan, can I ask you to kindly update the
> > https://gitlab.com/qemu-project/libvfio-user mirror? I'll be happy to include
> > an update to subprojects/libvfio-user.wrap in this series.
>
> Done:
> https://gitlab.com/qemu-project/libvfio-user/-/commits/master
>
> Repository mirroring is automated now, so new upstream commits will
> appear in the QEMU mirror repository from now on.
>
> >
> > Finally, there is some more work required on top of this series to get
> > message-based DMA to really work well:
> >
> > * libvfio-user has a long-standing issue where socket communication gets messed
> > up when messages are sent from both ends at the same time. See
> > https://github.com/nutanix/libvfio-user/issues/279 for more details. I've
> > been engaging there and plan to contribute a fix.
> >
> > * qemu currently breaks down DMA accesses into chunks of size 8 bytes at
> > maximum, each of which will be handled in a separate vfio-user DMA request
> > message. This is quite terrible for large DMA acceses, such as when nvme
> > reads and writes page-sized blocks for example. Thus, I would like to improve
> > qemu to be able to perform larger accesses, at least for indirect memory
> > regions. I have something working locally, but since this will likely result
> > in more involved surgery and discussion, I am leaving this to be addressed in
> > a separate patch.
> >
> > Mattias Nissler (3):
> > softmmu: Support concurrent bounce buffers
> > softmmu: Remove DMA unmap notification callback
> > vfio-user: Message-based DMA support
> >
> > hw/remote/vfio-user-obj.c | 62 ++++++++++++++++--
> > softmmu/dma-helpers.c | 28 --------
> > softmmu/physmem.c | 131 ++++++++------------------------------
> > 3 files changed, 83 insertions(+), 138 deletions(-)
>
> Sorry for the late review. I was on vacation and am catching up on
> emails.
>
> Paolo worked on the QEMU memory API and can give input on how to make
> this efficient for large DMA accesses. There is a chance that memory
> dispatch with larger sizes will be needed for ENQCMD CPU instruction
> emulation too.
>
> Stefan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] softmmu: Support concurrent bounce buffers
2023-07-20 18:10 ` Stefan Hajnoczi
@ 2023-08-23 9:27 ` Mattias Nissler
0 siblings, 0 replies; 14+ messages in thread
From: Mattias Nissler @ 2023-08-23 9:27 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Elena Ufimtseva, Jagannathan Raman,
Philippe Mathieu-Daudé, Paolo Bonzini, Peter Xu,
David Hildenbrand, john.levon
On Thu, Jul 20, 2023 at 8:10 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Jul 04, 2023 at 01:06:25AM -0700, Mattias Nissler wrote:
> > It is not uncommon for device models to request mapping of several DMA
> > regions at the same time. An example is igb (and probably other net
> > devices as well) when a packet is spread across multiple descriptors.
> >
> > In order to support this when indirect DMA is used, as is the case when
> > running the device model in a vfio-server process without mmap()-ed DMA,
> > this change allocates DMA bounce buffers dynamically instead of
> > supporting only a single buffer.
> >
> > Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
> > ---
> > softmmu/physmem.c | 74 ++++++++++++++++++++++-------------------------
> > 1 file changed, 35 insertions(+), 39 deletions(-)
>
> Is this a functional change or purely a performance optimization? If
> it's a performance optimization, please include benchmark results to
> justify this change.
It's a functional change in the sense that it fixes qemu to make some
hardware models actually work with bounce-buffered DMA. Right now, the
device models attempt to perform DMA accesses, receive an error due to
bounce buffer contention and then just bail, which the guest will
observe as a timeout and/or hardware error. I ran into this with igb
and xhci.
>
>
> QEMU memory allocations must be bounded so that an untrusted guest
> cannot cause QEMU to exhaust host memory. There must be a limit to the
> amount of bounce buffer memory.
Ah, makes sense. I will add code to track the total bounce buffer size
and enforce a limit. Since the amount of buffer space depends a lot on
the workload (I have observed xhci + usb-storage + Linux to use 1MB
buffer sizes by default), I'll make the limit configurable.
>
>
> > diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> > index bda475a719..56130b5a1d 100644
> > --- a/softmmu/physmem.c
> > +++ b/softmmu/physmem.c
> > @@ -2904,13 +2904,11 @@ void cpu_flush_icache_range(hwaddr start, hwaddr len)
> >
> > typedef struct {
> > MemoryRegion *mr;
> > - void *buffer;
> > hwaddr addr;
> > - hwaddr len;
> > - bool in_use;
> > + uint8_t buffer[];
> > } BounceBuffer;
> >
> > -static BounceBuffer bounce;
> > +static size_t bounce_buffers_in_use;
> >
> > typedef struct MapClient {
> > QEMUBH *bh;
> > @@ -2947,7 +2945,7 @@ void cpu_register_map_client(QEMUBH *bh)
> > QLIST_INSERT_HEAD(&map_client_list, client, link);
> > /* Write map_client_list before reading in_use. */
> > smp_mb();
> > - if (!qatomic_read(&bounce.in_use)) {
> > + if (qatomic_read(&bounce_buffers_in_use)) {
> > cpu_notify_map_clients_locked();
> > }
> > qemu_mutex_unlock(&map_client_list_lock);
> > @@ -3076,31 +3074,24 @@ void *address_space_map(AddressSpace *as,
> > RCU_READ_LOCK_GUARD();
> > fv = address_space_to_flatview(as);
> > mr = flatview_translate(fv, addr, &xlat, &l, is_write, attrs);
> > + memory_region_ref(mr);
> >
> > if (!memory_access_is_direct(mr, is_write)) {
> > - if (qatomic_xchg(&bounce.in_use, true)) {
> > - *plen = 0;
> > - return NULL;
> > - }
> > - /* Avoid unbounded allocations */
> > - l = MIN(l, TARGET_PAGE_SIZE);
> > - bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l);
> > - bounce.addr = addr;
> > - bounce.len = l;
> > -
> > - memory_region_ref(mr);
> > - bounce.mr = mr;
> > + qatomic_inc_fetch(&bounce_buffers_in_use);
> > +
> > + BounceBuffer *bounce = g_malloc(l + sizeof(BounceBuffer));
> > + bounce->addr = addr;
> > + bounce->mr = mr;
> > +
> > if (!is_write) {
> > flatview_read(fv, addr, MEMTXATTRS_UNSPECIFIED,
> > - bounce.buffer, l);
> > + bounce->buffer, l);
> > }
> >
> > *plen = l;
> > - return bounce.buffer;
> > + return bounce->buffer;
>
> Bounce buffer allocation always succeeds now. Can the
> cpu_notify_map_clients*() be removed now that no one is waiting for
> bounce buffers anymore?
>
> > }
> >
> > -
> > - memory_region_ref(mr);
> > *plen = flatview_extend_translation(fv, addr, len, mr, xlat,
> > l, is_write, attrs);
> > fuzz_dma_read_cb(addr, *plen, mr);
> > @@ -3114,31 +3105,36 @@ void *address_space_map(AddressSpace *as,
> > void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
> > bool is_write, hwaddr access_len)
> > {
> > - if (buffer != bounce.buffer) {
> > - MemoryRegion *mr;
> > - ram_addr_t addr1;
> > + MemoryRegion *mr;
> > + ram_addr_t addr1;
> > +
> > + mr = memory_region_from_host(buffer, &addr1);
> > + if (mr == NULL) {
> > + /*
> > + * Must be a bounce buffer (unless the caller passed a pointer which
> > + * wasn't returned by address_space_map, which is illegal).
> > + */
> > + BounceBuffer *bounce = container_of(buffer, BounceBuffer, buffer);
> >
> > - mr = memory_region_from_host(buffer, &addr1);
> > - assert(mr != NULL);
> > if (is_write) {
> > - invalidate_and_set_dirty(mr, addr1, access_len);
> > + address_space_write(as, bounce->addr, MEMTXATTRS_UNSPECIFIED,
> > + bounce->buffer, access_len);
> > }
> > - if (xen_enabled()) {
> > - xen_invalidate_map_cache_entry(buffer);
> > + memory_region_unref(bounce->mr);
> > + g_free(bounce);
> > +
> > + if (qatomic_dec_fetch(&bounce_buffers_in_use) == 1) {
> > + cpu_notify_map_clients();
> > }
> > - memory_region_unref(mr);
> > return;
> > }
> > +
> > + if (xen_enabled()) {
> > + xen_invalidate_map_cache_entry(buffer);
> > + }
> > if (is_write) {
> > - address_space_write(as, bounce.addr, MEMTXATTRS_UNSPECIFIED,
> > - bounce.buffer, access_len);
> > - }
> > - qemu_vfree(bounce.buffer);
> > - bounce.buffer = NULL;
> > - memory_region_unref(bounce.mr);
> > - /* Clear in_use before reading map_client_list. */
> > - qatomic_set_mb(&bounce.in_use, false);
> > - cpu_notify_map_clients();
> > + invalidate_and_set_dirty(mr, addr1, access_len);
> > + }
> > }
> >
> > void *cpu_physical_memory_map(hwaddr addr,
> > --
> > 2.34.1
> >
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] softmmu: Remove DMA unmap notification callback
2023-07-20 18:14 ` Stefan Hajnoczi
@ 2023-08-23 9:28 ` Mattias Nissler
0 siblings, 0 replies; 14+ messages in thread
From: Mattias Nissler @ 2023-08-23 9:28 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Elena Ufimtseva, Jagannathan Raman,
Philippe Mathieu-Daudé, Paolo Bonzini, Peter Xu,
David Hildenbrand, john.levon
On Thu, Jul 20, 2023 at 8:14 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Jul 04, 2023 at 01:06:26AM -0700, Mattias Nissler wrote:
> > According to old commit messages, this was introduced to retry a DMA
> > operation at a later point in case the single bounce buffer is found to
> > be busy. This was never used widely - only the dma-helpers code made use
> > of it, but there are other device models that use multiple DMA mappings
> > (concurrently) and just failed.
> >
> > After the improvement to support multiple concurrent bounce buffers,
> > the condition the notification callback allowed to work around no
> > longer exists, so we can just remove the logic and simplify the code.
> >
> > Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
> > ---
> > softmmu/dma-helpers.c | 28 -----------------
> > softmmu/physmem.c | 71 -------------------------------------------
> > 2 files changed, 99 deletions(-)
>
> I'm not sure if it will be possible to remove this once a limit is
> placed bounce buffer space.
I investigated this in detail and concluded that you're right
unfortunately. In particular, after I found that Linux likes to use
megabyte-sided DMA buffers with xhci-attached USB storage, I don't
think it's realistic to set a reasonable fixed limit that accommodates
most workloads in practice.
>
> >
> > diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
> > index 2463964805..d05d226f11 100644
> > --- a/softmmu/dma-helpers.c
> > +++ b/softmmu/dma-helpers.c
> > @@ -68,23 +68,10 @@ typedef struct {
> > int sg_cur_index;
> > dma_addr_t sg_cur_byte;
> > QEMUIOVector iov;
> > - QEMUBH *bh;
> > DMAIOFunc *io_func;
> > void *io_func_opaque;
> > } DMAAIOCB;
> >
> > -static void dma_blk_cb(void *opaque, int ret);
> > -
> > -static void reschedule_dma(void *opaque)
> > -{
> > - DMAAIOCB *dbs = (DMAAIOCB *)opaque;
> > -
> > - assert(!dbs->acb && dbs->bh);
> > - qemu_bh_delete(dbs->bh);
> > - dbs->bh = NULL;
> > - dma_blk_cb(dbs, 0);
> > -}
> > -
> > static void dma_blk_unmap(DMAAIOCB *dbs)
> > {
> > int i;
> > @@ -101,7 +88,6 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
> > {
> > trace_dma_complete(dbs, ret, dbs->common.cb);
> >
> > - assert(!dbs->acb && !dbs->bh);
> > dma_blk_unmap(dbs);
> > if (dbs->common.cb) {
> > dbs->common.cb(dbs->common.opaque, ret);
> > @@ -164,13 +150,6 @@ static void dma_blk_cb(void *opaque, int ret)
> > }
> > }
> >
> > - if (dbs->iov.size == 0) {
> > - trace_dma_map_wait(dbs);
> > - dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs);
> > - cpu_register_map_client(dbs->bh);
> > - goto out;
> > - }
> > -
> > if (!QEMU_IS_ALIGNED(dbs->iov.size, dbs->align)) {
> > qemu_iovec_discard_back(&dbs->iov,
> > QEMU_ALIGN_DOWN(dbs->iov.size, dbs->align));
> > @@ -189,18 +168,12 @@ static void dma_aio_cancel(BlockAIOCB *acb)
> >
> > trace_dma_aio_cancel(dbs);
> >
> > - assert(!(dbs->acb && dbs->bh));
> > if (dbs->acb) {
> > /* This will invoke dma_blk_cb. */
> > blk_aio_cancel_async(dbs->acb);
> > return;
> > }
> >
> > - if (dbs->bh) {
> > - cpu_unregister_map_client(dbs->bh);
> > - qemu_bh_delete(dbs->bh);
> > - dbs->bh = NULL;
> > - }
> > if (dbs->common.cb) {
> > dbs->common.cb(dbs->common.opaque, -ECANCELED);
> > }
> > @@ -239,7 +212,6 @@ BlockAIOCB *dma_blk_io(AioContext *ctx,
> > dbs->dir = dir;
> > dbs->io_func = io_func;
> > dbs->io_func_opaque = io_func_opaque;
> > - dbs->bh = NULL;
> > qemu_iovec_init(&dbs->iov, sg->nsg);
> > dma_blk_cb(dbs, 0);
> > return &dbs->common;
> > diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> > index 56130b5a1d..2b4123c127 100644
> > --- a/softmmu/physmem.c
> > +++ b/softmmu/physmem.c
> > @@ -2908,49 +2908,6 @@ typedef struct {
> > uint8_t buffer[];
> > } BounceBuffer;
> >
> > -static size_t bounce_buffers_in_use;
> > -
> > -typedef struct MapClient {
> > - QEMUBH *bh;
> > - QLIST_ENTRY(MapClient) link;
> > -} MapClient;
> > -
> > -QemuMutex map_client_list_lock;
> > -static QLIST_HEAD(, MapClient) map_client_list
> > - = QLIST_HEAD_INITIALIZER(map_client_list);
> > -
> > -static void cpu_unregister_map_client_do(MapClient *client)
> > -{
> > - QLIST_REMOVE(client, link);
> > - g_free(client);
> > -}
> > -
> > -static void cpu_notify_map_clients_locked(void)
> > -{
> > - MapClient *client;
> > -
> > - while (!QLIST_EMPTY(&map_client_list)) {
> > - client = QLIST_FIRST(&map_client_list);
> > - qemu_bh_schedule(client->bh);
> > - cpu_unregister_map_client_do(client);
> > - }
> > -}
> > -
> > -void cpu_register_map_client(QEMUBH *bh)
> > -{
> > - MapClient *client = g_malloc(sizeof(*client));
> > -
> > - qemu_mutex_lock(&map_client_list_lock);
> > - client->bh = bh;
> > - QLIST_INSERT_HEAD(&map_client_list, client, link);
> > - /* Write map_client_list before reading in_use. */
> > - smp_mb();
> > - if (qatomic_read(&bounce_buffers_in_use)) {
> > - cpu_notify_map_clients_locked();
> > - }
> > - qemu_mutex_unlock(&map_client_list_lock);
> > -}
> > -
> > void cpu_exec_init_all(void)
> > {
> > qemu_mutex_init(&ram_list.mutex);
> > @@ -2964,28 +2921,6 @@ void cpu_exec_init_all(void)
> > finalize_target_page_bits();
> > io_mem_init();
> > memory_map_init();
> > - qemu_mutex_init(&map_client_list_lock);
> > -}
> > -
> > -void cpu_unregister_map_client(QEMUBH *bh)
> > -{
> > - MapClient *client;
> > -
> > - qemu_mutex_lock(&map_client_list_lock);
> > - QLIST_FOREACH(client, &map_client_list, link) {
> > - if (client->bh == bh) {
> > - cpu_unregister_map_client_do(client);
> > - break;
> > - }
> > - }
> > - qemu_mutex_unlock(&map_client_list_lock);
> > -}
> > -
> > -static void cpu_notify_map_clients(void)
> > -{
> > - qemu_mutex_lock(&map_client_list_lock);
> > - cpu_notify_map_clients_locked();
> > - qemu_mutex_unlock(&map_client_list_lock);
> > }
> >
> > static bool flatview_access_valid(FlatView *fv, hwaddr addr, hwaddr len,
> > @@ -3077,8 +3012,6 @@ void *address_space_map(AddressSpace *as,
> > memory_region_ref(mr);
> >
> > if (!memory_access_is_direct(mr, is_write)) {
> > - qatomic_inc_fetch(&bounce_buffers_in_use);
> > -
> > BounceBuffer *bounce = g_malloc(l + sizeof(BounceBuffer));
> > bounce->addr = addr;
> > bounce->mr = mr;
> > @@ -3122,10 +3055,6 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
> > }
> > memory_region_unref(bounce->mr);
> > g_free(bounce);
> > -
> > - if (qatomic_dec_fetch(&bounce_buffers_in_use) == 1) {
> > - cpu_notify_map_clients();
> > - }
> > return;
> > }
> >
> > --
> > 2.34.1
> >
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] vfio-user: Message-based DMA support
2023-07-20 18:32 ` Stefan Hajnoczi
@ 2023-08-23 9:28 ` Mattias Nissler
0 siblings, 0 replies; 14+ messages in thread
From: Mattias Nissler @ 2023-08-23 9:28 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Elena Ufimtseva, Jagannathan Raman,
Philippe Mathieu-Daudé, Paolo Bonzini, Peter Xu,
David Hildenbrand, john.levon
On Thu, Jul 20, 2023 at 8:32 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Jul 04, 2023 at 01:06:27AM -0700, Mattias Nissler wrote:
> > Wire up support for DMA for the case where the vfio-user client does not
> > provide mmap()-able file descriptors, but DMA requests must be performed
> > via the VFIO-user protocol. This installs an indirect memory region,
> > which already works for pci_dma_{read,write}, and pci_dma_map works
> > thanks to the existing DMA bounce buffering support.
> >
> > Note that while simple scenarios work with this patch, there's a known
> > race condition in libvfio-user that will mess up the communication
> > channel: https://github.com/nutanix/libvfio-user/issues/279 I intend to
> > contribute a fix for this problem, see discussion on the github issue
> > for more details.
> >
> > Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
> > ---
> > hw/remote/vfio-user-obj.c | 62 ++++++++++++++++++++++++++++++++++-----
> > 1 file changed, 55 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> > index 8b10c32a3c..9799580c77 100644
> > --- a/hw/remote/vfio-user-obj.c
> > +++ b/hw/remote/vfio-user-obj.c
> > @@ -300,6 +300,53 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
> > return count;
> > }
> >
> > +static MemTxResult vfu_dma_read(void *opaque, hwaddr addr, uint64_t *val,
> > + unsigned size, MemTxAttrs attrs)
> > +{
> > + MemoryRegion *region = opaque;
> > + VfuObject *o = VFU_OBJECT(region->owner);
> > +
> > + dma_sg_t *sg = alloca(dma_sg_size());
> > + vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
> > + if (vfu_addr_to_sgl(o->vfu_ctx, vfu_addr, size, sg, 1, PROT_READ) < 0 ||
> > + vfu_sgl_read(o->vfu_ctx, sg, 1, val) != 0) {
>
> Does this work on big-endian host CPUs? It looks like reading 0x12345678
> into uint64_t val would result in *val = 0x12345678xxxxxxxx instead of
> 0x12345678.
Ah, good catch, thanks! Confirmed as an issue using a cross-compiled
s390x qemu binary. I will fix this by using ld/st helpers.
>
> > + return MEMTX_ERROR;
> > + }
> > +
> > + return MEMTX_OK;
> > +}
> > +
> > +static MemTxResult vfu_dma_write(void *opaque, hwaddr addr, uint64_t val,
> > + unsigned size, MemTxAttrs attrs)
> > +{
> > + MemoryRegion *region = opaque;
> > + VfuObject *o = VFU_OBJECT(region->owner);
> > +
> > + dma_sg_t *sg = alloca(dma_sg_size());
> > + vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
> > + if (vfu_addr_to_sgl(o->vfu_ctx, vfu_addr, size, sg, 1, PROT_WRITE) < 0 ||
> > + vfu_sgl_write(o->vfu_ctx, sg, 1, &val) != 0) {
>
> Same potential endianness issue here.
>
> Stefan
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-08-23 9:29 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-04 8:06 [PATCH 0/3] Support message-based DMA in vfio-user server Mattias Nissler
2023-07-04 8:06 ` [PATCH 1/3] softmmu: Support concurrent bounce buffers Mattias Nissler
2023-07-20 18:10 ` Stefan Hajnoczi
2023-08-23 9:27 ` Mattias Nissler
2023-07-20 18:14 ` Stefan Hajnoczi
2023-07-04 8:06 ` [PATCH 2/3] softmmu: Remove DMA unmap notification callback Mattias Nissler
2023-07-20 18:14 ` Stefan Hajnoczi
2023-08-23 9:28 ` Mattias Nissler
2023-07-04 8:06 ` [PATCH 3/3] vfio-user: Message-based DMA support Mattias Nissler
2023-07-20 18:32 ` Stefan Hajnoczi
2023-08-23 9:28 ` Mattias Nissler
2023-07-04 8:20 ` [PATCH 0/3] Support message-based DMA in vfio-user server David Hildenbrand
2023-07-20 18:41 ` Stefan Hajnoczi
2023-07-20 22:10 ` Mattias Nissler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).