qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/5] Support message-based DMA in vfio-user server
@ 2023-09-20  8:06 Mattias Nissler
  2023-09-20  8:06 ` [PATCH v5 1/5] softmmu: Per-AddressSpace bounce buffering Mattias Nissler
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Mattias Nissler @ 2023-09-20  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Elena Ufimtseva, David Hildenbrand, Richard Henderson,
	Philippe Mathieu-Daudé, john.levon, peterx, Marcel Apfelbaum,
	stefanha, Jagannathan Raman, Paolo Bonzini, Michael S. Tsirkin,
	Mattias Nissler

This series adds basic support for message-based DMA in qemu's vfio-user
server. This is useful for cases where the client does not provide file
descriptors for accessing system memory via memory mappings. My motivating use
case is to hook up device models as PCIe endpoints to a hardware design. This
works by bridging the PCIe transaction layer to vfio-user, and the endpoint
does not access memory directly, but sends memory requests TLPs to the hardware
design in order to perform DMA.

Note that more work is needed to make message-based DMA work well: qemu
currently breaks down DMA accesses into chunks of size 8 bytes at maximum, each
of which will be handled in a separate vfio-user DMA request message. This is
quite terrible for large DMA accesses, such as when nvme reads and writes
page-sized blocks for example. Thus, I would like to improve qemu to be able to
perform larger accesses, at least for indirect memory regions. I have something
working locally, but since this will likely result in more involved surgery and
discussion, I am leaving this to be addressed in a separate patch.

Changes from v1:

* Address Stefan's review comments. In particular, enforce an allocation limit
  and don't drop the map client callbacks given that map requests can fail when
  hitting size limits.

* libvfio-user version bump now included in the series.

* Tested as well on big-endian s390x. This uncovered another byte order issue
  in vfio-user server code that I've included a fix for.

Changes from v2:

* Add a preparatory patch to make bounce buffering an AddressSpace-specific
  concept.

* The total buffer size limit parameter is now per AdressSpace and can be
  configured for PCIDevice via a property.

* Store a magic value in first bytes of bounce buffer struct as a best effort
  measure to detect invalid pointers in address_space_unmap.

Changes from v3:

* libvfio-user now supports twin-socket mode which uses separate sockets for
  client->server and server->client commands, respectively. This addresses the
  concurrent command bug triggered by server->client DMA access commands. See
  https://github.com/nutanix/libvfio-user/issues/279 for details.

* Add missing teardown code in do_address_space_destroy.

* Fix bounce buffer size bookkeeping race condition.

* Generate unmap notification callbacks unconditionally.

* Some cosmetic fixes.

Changes from v4:

* Fix accidentally dropped memory_region_unref, control flow restored to match
  previous code to simplify review.

* Some cosmetic fixes.

Mattias Nissler (5):
  softmmu: Per-AddressSpace bounce buffering
  softmmu: Support concurrent bounce buffers
  Update subprojects/libvfio-user
  vfio-user: Message-based DMA support
  vfio-user: Fix config space access byte order

 hw/pci/pci.c                  |   8 ++
 hw/remote/trace-events        |   2 +
 hw/remote/vfio-user-obj.c     |  88 ++++++++++++++++++---
 include/exec/cpu-common.h     |   2 -
 include/exec/memory.h         |  41 +++++++++-
 include/hw/pci/pci_device.h   |   3 +
 softmmu/dma-helpers.c         |   4 +-
 softmmu/memory.c              |   8 ++
 softmmu/physmem.c             | 141 ++++++++++++++++++----------------
 subprojects/libvfio-user.wrap |   2 +-
 10 files changed, 218 insertions(+), 81 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v5 1/5] softmmu: Per-AddressSpace bounce buffering
  2023-09-20  8:06 [PATCH v5 0/5] Support message-based DMA in vfio-user server Mattias Nissler
@ 2023-09-20  8:06 ` Mattias Nissler
  2023-09-20  8:06 ` [PATCH v5 2/5] softmmu: Support concurrent bounce buffers Mattias Nissler
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Mattias Nissler @ 2023-09-20  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Elena Ufimtseva, David Hildenbrand, Richard Henderson,
	Philippe Mathieu-Daudé, john.levon, peterx, Marcel Apfelbaum,
	stefanha, Jagannathan Raman, Paolo Bonzini, Michael S. Tsirkin,
	Mattias Nissler

Instead of using a single global bounce buffer, give each AddressSpace
its own bounce buffer. The MapClient callback mechanism moves to
AddressSpace accordingly.

This is in preparation for generalizing bounce buffer handling further
to allow multiple bounce buffers, with a total allocation limit
configured per AddressSpace.

Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
---
 include/exec/cpu-common.h |   2 -
 include/exec/memory.h     |  45 ++++++++++++++++-
 softmmu/dma-helpers.c     |   4 +-
 softmmu/memory.c          |   7 +++
 softmmu/physmem.c         | 101 ++++++++++++++++----------------------
 5 files changed, 93 insertions(+), 66 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 41788c0bdd..63463c415d 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -138,8 +138,6 @@ void *cpu_physical_memory_map(hwaddr addr,
                               bool is_write);
 void cpu_physical_memory_unmap(void *buffer, hwaddr len,
                                bool is_write, hwaddr access_len);
-void cpu_register_map_client(QEMUBH *bh);
-void cpu_unregister_map_client(QEMUBH *bh);
 
 bool cpu_physical_memory_is_io(hwaddr phys_addr);
 
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 68284428f8..7d68936157 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1076,6 +1076,19 @@ struct MemoryListener {
     QTAILQ_ENTRY(MemoryListener) link_as;
 };
 
+typedef struct AddressSpaceMapClient {
+    QEMUBH *bh;
+    QLIST_ENTRY(AddressSpaceMapClient) link;
+} AddressSpaceMapClient;
+
+typedef struct {
+    MemoryRegion *mr;
+    void *buffer;
+    hwaddr addr;
+    hwaddr len;
+    bool in_use;
+} BounceBuffer;
+
 /**
  * struct AddressSpace: describes a mapping of addresses to #MemoryRegion objects
  */
@@ -1092,6 +1105,12 @@ struct AddressSpace {
     struct MemoryRegionIoeventfd *ioeventfds;
     QTAILQ_HEAD(, MemoryListener) listeners;
     QTAILQ_ENTRY(AddressSpace) address_spaces_link;
+
+    /* Bounce buffer to use for this address space. */
+    BounceBuffer bounce;
+    /* List of callbacks to invoke when buffers free up */
+    QemuMutex map_client_list_lock;
+    QLIST_HEAD(, AddressSpaceMapClient) map_client_list;
 };
 
 typedef struct AddressSpaceDispatch AddressSpaceDispatch;
@@ -2832,8 +2851,8 @@ bool address_space_access_valid(AddressSpace *as, hwaddr addr, hwaddr len,
  * May return %NULL and set *@plen to zero(0), if resources needed to perform
  * the mapping are exhausted.
  * Use only for reads OR writes - not for read-modify-write operations.
- * Use cpu_register_map_client() to know when retrying the map operation is
- * likely to succeed.
+ * Use address_space_register_map_client() to know when retrying the map
+ * operation is likely to succeed.
  *
  * @as: #AddressSpace to be accessed
  * @addr: address within that address space
@@ -2858,6 +2877,28 @@ void *address_space_map(AddressSpace *as, hwaddr addr,
 void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
                          bool is_write, hwaddr access_len);
 
+/*
+ * address_space_register_map_client: Register a callback to invoke when
+ * resources for address_space_map() are available again.
+ *
+ * address_space_map may fail when there are not enough resources available,
+ * such as when bounce buffer memory would exceed the limit. The callback can
+ * be used to retry the address_space_map operation. Note that the callback
+ * gets automatically removed after firing.
+ *
+ * @as: #AddressSpace to be accessed
+ * @bh: callback to invoke when address_space_map() retry is appropriate
+ */
+void address_space_register_map_client(AddressSpace *as, QEMUBH *bh);
+
+/*
+ * address_space_unregister_map_client: Unregister a callback that has
+ * previously been registered and not fired yet.
+ *
+ * @as: #AddressSpace to be accessed
+ * @bh: callback to unregister
+ */
+void address_space_unregister_map_client(AddressSpace *as, QEMUBH *bh);
 
 /* Internal functions, part of the implementation of address_space_read.  */
 MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
index 2463964805..d9fc26c063 100644
--- a/softmmu/dma-helpers.c
+++ b/softmmu/dma-helpers.c
@@ -167,7 +167,7 @@ static void dma_blk_cb(void *opaque, int ret)
     if (dbs->iov.size == 0) {
         trace_dma_map_wait(dbs);
         dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs);
-        cpu_register_map_client(dbs->bh);
+        address_space_register_map_client(dbs->sg->as, dbs->bh);
         goto out;
     }
 
@@ -197,7 +197,7 @@ static void dma_aio_cancel(BlockAIOCB *acb)
     }
 
     if (dbs->bh) {
-        cpu_unregister_map_client(dbs->bh);
+        address_space_unregister_map_client(dbs->sg->as, dbs->bh);
         qemu_bh_delete(dbs->bh);
         dbs->bh = NULL;
     }
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7d9494ce70..ffa37fc327 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -3105,6 +3105,9 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
     as->ioeventfds = NULL;
     QTAILQ_INIT(&as->listeners);
     QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link);
+    as->bounce.in_use = false;
+    qemu_mutex_init(&as->map_client_list_lock);
+    QLIST_INIT(&as->map_client_list);
     as->name = g_strdup(name ? name : "anonymous");
     address_space_update_topology(as);
     address_space_update_ioeventfds(as);
@@ -3112,6 +3115,10 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
 
 static void do_address_space_destroy(AddressSpace *as)
 {
+    assert(!qatomic_read(&as->bounce.in_use));
+    assert(QLIST_EMPTY(&as->map_client_list));
+    qemu_mutex_destroy(&as->map_client_list_lock);
+
     assert(QTAILQ_EMPTY(&as->listeners));
 
     flatview_unref(as->current_map);
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 18277ddd67..29c9fc506c 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2926,55 +2926,37 @@ void cpu_flush_icache_range(hwaddr start, hwaddr len)
                                      NULL, len, FLUSH_CACHE);
 }
 
-typedef struct {
-    MemoryRegion *mr;
-    void *buffer;
-    hwaddr addr;
-    hwaddr len;
-    bool in_use;
-} BounceBuffer;
-
-static BounceBuffer bounce;
-
-typedef struct MapClient {
-    QEMUBH *bh;
-    QLIST_ENTRY(MapClient) link;
-} MapClient;
-
-QemuMutex map_client_list_lock;
-static QLIST_HEAD(, MapClient) map_client_list
-    = QLIST_HEAD_INITIALIZER(map_client_list);
-
-static void cpu_unregister_map_client_do(MapClient *client)
+static void
+address_space_unregister_map_client_do(AddressSpaceMapClient *client)
 {
     QLIST_REMOVE(client, link);
     g_free(client);
 }
 
-static void cpu_notify_map_clients_locked(void)
+static void address_space_notify_map_clients_locked(AddressSpace *as)
 {
-    MapClient *client;
+    AddressSpaceMapClient *client;
 
-    while (!QLIST_EMPTY(&map_client_list)) {
-        client = QLIST_FIRST(&map_client_list);
+    while (!QLIST_EMPTY(&as->map_client_list)) {
+        client = QLIST_FIRST(&as->map_client_list);
         qemu_bh_schedule(client->bh);
-        cpu_unregister_map_client_do(client);
+        address_space_unregister_map_client_do(client);
     }
 }
 
-void cpu_register_map_client(QEMUBH *bh)
+void address_space_register_map_client(AddressSpace *as, QEMUBH *bh)
 {
-    MapClient *client = g_malloc(sizeof(*client));
+    AddressSpaceMapClient *client = g_malloc(sizeof(*client));
 
-    qemu_mutex_lock(&map_client_list_lock);
+    qemu_mutex_lock(&as->map_client_list_lock);
     client->bh = bh;
-    QLIST_INSERT_HEAD(&map_client_list, client, link);
+    QLIST_INSERT_HEAD(&as->map_client_list, client, link);
     /* Write map_client_list before reading in_use.  */
     smp_mb();
-    if (!qatomic_read(&bounce.in_use)) {
-        cpu_notify_map_clients_locked();
+    if (!qatomic_read(&as->bounce.in_use)) {
+        address_space_notify_map_clients_locked(as);
     }
-    qemu_mutex_unlock(&map_client_list_lock);
+    qemu_mutex_unlock(&as->map_client_list_lock);
 }
 
 void cpu_exec_init_all(void)
@@ -2990,28 +2972,27 @@ void cpu_exec_init_all(void)
     finalize_target_page_bits();
     io_mem_init();
     memory_map_init();
-    qemu_mutex_init(&map_client_list_lock);
 }
 
-void cpu_unregister_map_client(QEMUBH *bh)
+void address_space_unregister_map_client(AddressSpace *as, QEMUBH *bh)
 {
-    MapClient *client;
+    AddressSpaceMapClient *client;
 
-    qemu_mutex_lock(&map_client_list_lock);
-    QLIST_FOREACH(client, &map_client_list, link) {
+    qemu_mutex_lock(&as->map_client_list_lock);
+    QLIST_FOREACH(client, &as->map_client_list, link) {
         if (client->bh == bh) {
-            cpu_unregister_map_client_do(client);
+            address_space_unregister_map_client_do(client);
             break;
         }
     }
-    qemu_mutex_unlock(&map_client_list_lock);
+    qemu_mutex_unlock(&as->map_client_list_lock);
 }
 
-static void cpu_notify_map_clients(void)
+static void address_space_notify_map_clients(AddressSpace *as)
 {
-    qemu_mutex_lock(&map_client_list_lock);
-    cpu_notify_map_clients_locked();
-    qemu_mutex_unlock(&map_client_list_lock);
+    qemu_mutex_lock(&as->map_client_list_lock);
+    address_space_notify_map_clients_locked(as);
+    qemu_mutex_unlock(&as->map_client_list_lock);
 }
 
 static bool flatview_access_valid(FlatView *fv, hwaddr addr, hwaddr len,
@@ -3078,8 +3059,8 @@ flatview_extend_translation(FlatView *fv, hwaddr addr,
  * May map a subset of the requested range, given by and returned in *plen.
  * May return NULL if resources needed to perform the mapping are exhausted.
  * Use only for reads OR writes - not for read-modify-write operations.
- * Use cpu_register_map_client() to know when retrying the map operation is
- * likely to succeed.
+ * Use address_space_register_map_client() to know when retrying the map
+ * operation is likely to succeed.
  */
 void *address_space_map(AddressSpace *as,
                         hwaddr addr,
@@ -3102,25 +3083,25 @@ void *address_space_map(AddressSpace *as,
     mr = flatview_translate(fv, addr, &xlat, &l, is_write, attrs);
 
     if (!memory_access_is_direct(mr, is_write)) {
-        if (qatomic_xchg(&bounce.in_use, true)) {
+        if (qatomic_xchg(&as->bounce.in_use, true)) {
             *plen = 0;
             return NULL;
         }
         /* Avoid unbounded allocations */
         l = MIN(l, TARGET_PAGE_SIZE);
-        bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l);
-        bounce.addr = addr;
-        bounce.len = l;
+        as->bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l);
+        as->bounce.addr = addr;
+        as->bounce.len = l;
 
         memory_region_ref(mr);
-        bounce.mr = mr;
+        as->bounce.mr = mr;
         if (!is_write) {
             flatview_read(fv, addr, MEMTXATTRS_UNSPECIFIED,
-                               bounce.buffer, l);
+                          as->bounce.buffer, l);
         }
 
         *plen = l;
-        return bounce.buffer;
+        return as->bounce.buffer;
     }
 
 
@@ -3138,7 +3119,7 @@ void *address_space_map(AddressSpace *as,
 void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
                          bool is_write, hwaddr access_len)
 {
-    if (buffer != bounce.buffer) {
+    if (buffer != as->bounce.buffer) {
         MemoryRegion *mr;
         ram_addr_t addr1;
 
@@ -3154,15 +3135,15 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
         return;
     }
     if (is_write) {
-        address_space_write(as, bounce.addr, MEMTXATTRS_UNSPECIFIED,
-                            bounce.buffer, access_len);
+        address_space_write(as, as->bounce.addr, MEMTXATTRS_UNSPECIFIED,
+                            as->bounce.buffer, access_len);
     }
-    qemu_vfree(bounce.buffer);
-    bounce.buffer = NULL;
-    memory_region_unref(bounce.mr);
+    qemu_vfree(as->bounce.buffer);
+    as->bounce.buffer = NULL;
+    memory_region_unref(as->bounce.mr);
     /* Clear in_use before reading map_client_list.  */
-    qatomic_set_mb(&bounce.in_use, false);
-    cpu_notify_map_clients();
+    qatomic_set_mb(&as->bounce.in_use, false);
+    address_space_notify_map_clients(as);
 }
 
 void *cpu_physical_memory_map(hwaddr addr,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 2/5] softmmu: Support concurrent bounce buffers
  2023-09-20  8:06 [PATCH v5 0/5] Support message-based DMA in vfio-user server Mattias Nissler
  2023-09-20  8:06 ` [PATCH v5 1/5] softmmu: Per-AddressSpace bounce buffering Mattias Nissler
@ 2023-09-20  8:06 ` Mattias Nissler
  2023-09-22 15:28   ` Peter Xu
  2023-09-20  8:06 ` [PATCH v5 3/5] Update subprojects/libvfio-user Mattias Nissler
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Mattias Nissler @ 2023-09-20  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Elena Ufimtseva, David Hildenbrand, Richard Henderson,
	Philippe Mathieu-Daudé, john.levon, peterx, Marcel Apfelbaum,
	stefanha, Jagannathan Raman, Paolo Bonzini, Michael S. Tsirkin,
	Mattias Nissler

When DMA memory can't be directly accessed, as is the case when
running the device model in a separate process without shareable DMA
file descriptors, bounce buffering is used.

It is not uncommon for device models to request mapping of several DMA
regions at the same time. Examples include:
 * net devices, e.g. when transmitting a packet that is split across
   several TX descriptors (observed with igb)
 * USB host controllers, when handling a packet with multiple data TRBs
   (observed with xhci)

Previously, qemu only provided a single bounce buffer per AddressSpace
and would fail DMA map requests while the buffer was already in use. In
turn, this would cause DMA failures that ultimately manifest as hardware
errors from the guest perspective.

This change allocates DMA bounce buffers dynamically instead of
supporting only a single buffer. Thus, multiple DMA mappings work
correctly also when RAM can't be mmap()-ed.

The total bounce buffer allocation size is limited individually for each
AddressSpace. The default limit is 4096 bytes, matching the previous
maximum buffer size. A new x-max-bounce-buffer-size parameter is
provided to configure the limit for PCI devices.

Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
---
 hw/pci/pci.c                |  8 ++++
 include/exec/memory.h       | 14 +++----
 include/hw/pci/pci_device.h |  3 ++
 softmmu/memory.c            |  5 ++-
 softmmu/physmem.c           | 80 +++++++++++++++++++++++++------------
 5 files changed, 74 insertions(+), 36 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 881d774fb6..d071ac8091 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -85,6 +85,8 @@ static Property pci_props[] = {
                     QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
     DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
                     QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
+    DEFINE_PROP_SIZE("x-max-bounce-buffer-size", PCIDevice,
+                     max_bounce_buffer_size, DEFAULT_MAX_BOUNCE_BUFFER_SIZE),
     DEFINE_PROP_END_OF_LIST()
 };
 
@@ -1208,6 +1210,8 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev,
                        "bus master container", UINT64_MAX);
     address_space_init(&pci_dev->bus_master_as,
                        &pci_dev->bus_master_container_region, pci_dev->name);
+    pci_dev->bus_master_as.max_bounce_buffer_size =
+        pci_dev->max_bounce_buffer_size;
 
     if (phase_check(PHASE_MACHINE_READY)) {
         pci_init_bus_master(pci_dev);
@@ -2664,6 +2668,10 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
     k->unrealize = pci_qdev_unrealize;
     k->bus_type = TYPE_PCI_BUS;
     device_class_set_props(k, pci_props);
+    object_class_property_set_description(
+        klass, "x-max-bounce-buffer-size",
+        "Maximum buffer size allocated for bounce buffers used for mapped "
+        "access to indirect DMA memory");
 }
 
 static void pci_device_class_base_init(ObjectClass *klass, void *data)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 7d68936157..67379bd9cc 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1081,13 +1081,7 @@ typedef struct AddressSpaceMapClient {
     QLIST_ENTRY(AddressSpaceMapClient) link;
 } AddressSpaceMapClient;
 
-typedef struct {
-    MemoryRegion *mr;
-    void *buffer;
-    hwaddr addr;
-    hwaddr len;
-    bool in_use;
-} BounceBuffer;
+#define DEFAULT_MAX_BOUNCE_BUFFER_SIZE (4096)
 
 /**
  * struct AddressSpace: describes a mapping of addresses to #MemoryRegion objects
@@ -1106,8 +1100,10 @@ struct AddressSpace {
     QTAILQ_HEAD(, MemoryListener) listeners;
     QTAILQ_ENTRY(AddressSpace) address_spaces_link;
 
-    /* Bounce buffer to use for this address space. */
-    BounceBuffer bounce;
+    /* Maximum DMA bounce buffer size used for indirect memory map requests */
+    uint64_t max_bounce_buffer_size;
+    /* Total size of bounce buffers currently allocated, atomically accessed */
+    uint64_t bounce_buffer_size;
     /* List of callbacks to invoke when buffers free up */
     QemuMutex map_client_list_lock;
     QLIST_HEAD(, AddressSpaceMapClient) map_client_list;
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d3dd0f64b2..f4027c5379 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -160,6 +160,9 @@ struct PCIDevice {
     /* ID of standby device in net_failover pair */
     char *failover_pair_id;
     uint32_t acpi_index;
+
+    /* Maximum DMA bounce buffer size used for indirect memory map requests */
+    uint64_t max_bounce_buffer_size;
 };
 
 static inline int pci_intx(PCIDevice *pci_dev)
diff --git a/softmmu/memory.c b/softmmu/memory.c
index ffa37fc327..24d90b10b2 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -3105,7 +3105,8 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
     as->ioeventfds = NULL;
     QTAILQ_INIT(&as->listeners);
     QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link);
-    as->bounce.in_use = false;
+    as->max_bounce_buffer_size = DEFAULT_MAX_BOUNCE_BUFFER_SIZE;
+    as->bounce_buffer_size = 0;
     qemu_mutex_init(&as->map_client_list_lock);
     QLIST_INIT(&as->map_client_list);
     as->name = g_strdup(name ? name : "anonymous");
@@ -3115,7 +3116,7 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
 
 static void do_address_space_destroy(AddressSpace *as)
 {
-    assert(!qatomic_read(&as->bounce.in_use));
+    assert(qatomic_read(&as->bounce_buffer_size) == 0);
     assert(QLIST_EMPTY(&as->map_client_list));
     qemu_mutex_destroy(&as->map_client_list_lock);
 
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 29c9fc506c..73d2c3eda5 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2926,6 +2926,20 @@ void cpu_flush_icache_range(hwaddr start, hwaddr len)
                                      NULL, len, FLUSH_CACHE);
 }
 
+/*
+ * A magic value stored in the first 8 bytes of the bounce buffer struct. Used
+ * to detect illegal pointers passed to address_space_unmap.
+ */
+#define BOUNCE_BUFFER_MAGIC 0xb4017ceb4ffe12ed
+
+typedef struct {
+    uint64_t magic;
+    MemoryRegion *mr;
+    hwaddr addr;
+    size_t len;
+    uint8_t buffer[];
+} BounceBuffer;
+
 static void
 address_space_unregister_map_client_do(AddressSpaceMapClient *client)
 {
@@ -2951,9 +2965,9 @@ void address_space_register_map_client(AddressSpace *as, QEMUBH *bh)
     qemu_mutex_lock(&as->map_client_list_lock);
     client->bh = bh;
     QLIST_INSERT_HEAD(&as->map_client_list, client, link);
-    /* Write map_client_list before reading in_use.  */
+    /* Write map_client_list before reading bounce_buffer_size. */
     smp_mb();
-    if (!qatomic_read(&as->bounce.in_use)) {
+    if (qatomic_read(&as->bounce_buffer_size) < as->max_bounce_buffer_size) {
         address_space_notify_map_clients_locked(as);
     }
     qemu_mutex_unlock(&as->map_client_list_lock);
@@ -3083,28 +3097,38 @@ void *address_space_map(AddressSpace *as,
     mr = flatview_translate(fv, addr, &xlat, &l, is_write, attrs);
 
     if (!memory_access_is_direct(mr, is_write)) {
-        if (qatomic_xchg(&as->bounce.in_use, true)) {
+        size_t size = qatomic_add_fetch(&as->bounce_buffer_size, l);
+        if (size > as->max_bounce_buffer_size) {
+            /*
+             * Note that the overshot might be larger than l if threads are
+             * racing and bump bounce_buffer_size at the same time.
+             */
+            size_t excess = MIN(size - as->max_bounce_buffer_size, l);
+            l -= excess;
+            qatomic_sub(&as->bounce_buffer_size, excess);
+        }
+
+        if (l == 0) {
             *plen = 0;
             return NULL;
         }
-        /* Avoid unbounded allocations */
-        l = MIN(l, TARGET_PAGE_SIZE);
-        as->bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l);
-        as->bounce.addr = addr;
-        as->bounce.len = l;
 
+        BounceBuffer *bounce = g_malloc0(l + sizeof(BounceBuffer));
+        bounce->magic = BOUNCE_BUFFER_MAGIC;
         memory_region_ref(mr);
-        as->bounce.mr = mr;
+        bounce->mr = mr;
+        bounce->addr = addr;
+        bounce->len = l;
+
         if (!is_write) {
             flatview_read(fv, addr, MEMTXATTRS_UNSPECIFIED,
-                          as->bounce.buffer, l);
+                          bounce->buffer, l);
         }
 
         *plen = l;
-        return as->bounce.buffer;
+        return bounce->buffer;
     }
 
-
     memory_region_ref(mr);
     *plen = flatview_extend_translation(fv, addr, len, mr, xlat,
                                         l, is_write, attrs);
@@ -3119,12 +3143,11 @@ void *address_space_map(AddressSpace *as,
 void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
                          bool is_write, hwaddr access_len)
 {
-    if (buffer != as->bounce.buffer) {
-        MemoryRegion *mr;
-        ram_addr_t addr1;
+    MemoryRegion *mr;
+    ram_addr_t addr1;
 
-        mr = memory_region_from_host(buffer, &addr1);
-        assert(mr != NULL);
+    mr = memory_region_from_host(buffer, &addr1);
+    if (mr != NULL) {
         if (is_write) {
             invalidate_and_set_dirty(mr, addr1, access_len);
         }
@@ -3134,15 +3157,22 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
         memory_region_unref(mr);
         return;
     }
+
+
+    BounceBuffer *bounce = container_of(buffer, BounceBuffer, buffer);
+    assert(bounce->magic == BOUNCE_BUFFER_MAGIC);
+
     if (is_write) {
-        address_space_write(as, as->bounce.addr, MEMTXATTRS_UNSPECIFIED,
-                            as->bounce.buffer, access_len);
-    }
-    qemu_vfree(as->bounce.buffer);
-    as->bounce.buffer = NULL;
-    memory_region_unref(as->bounce.mr);
-    /* Clear in_use before reading map_client_list.  */
-    qatomic_set_mb(&as->bounce.in_use, false);
+        address_space_write(as, bounce->addr, MEMTXATTRS_UNSPECIFIED,
+                            bounce->buffer, access_len);
+    }
+
+    qatomic_sub(&as->bounce_buffer_size, bounce->len);
+    bounce->magic = ~BOUNCE_BUFFER_MAGIC;
+    memory_region_unref(bounce->mr);
+    g_free(bounce);
+    /* Write bounce_buffer_size before reading map_client_list. */
+    smp_mb();
     address_space_notify_map_clients(as);
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 3/5] Update subprojects/libvfio-user
  2023-09-20  8:06 [PATCH v5 0/5] Support message-based DMA in vfio-user server Mattias Nissler
  2023-09-20  8:06 ` [PATCH v5 1/5] softmmu: Per-AddressSpace bounce buffering Mattias Nissler
  2023-09-20  8:06 ` [PATCH v5 2/5] softmmu: Support concurrent bounce buffers Mattias Nissler
@ 2023-09-20  8:06 ` Mattias Nissler
  2023-09-20  8:06 ` [PATCH v5 4/5] vfio-user: Message-based DMA support Mattias Nissler
  2023-09-20  8:06 ` [PATCH v5 5/5] vfio-user: Fix config space access byte order Mattias Nissler
  4 siblings, 0 replies; 11+ messages in thread
From: Mattias Nissler @ 2023-09-20  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Elena Ufimtseva, David Hildenbrand, Richard Henderson,
	Philippe Mathieu-Daudé, john.levon, peterx, Marcel Apfelbaum,
	stefanha, Jagannathan Raman, Paolo Bonzini, Michael S. Tsirkin,
	Mattias Nissler

Brings in assorted bug fixes. The following are of particular interest
with respect to message-based DMA support:

* bb308a2 "Fix address calculation for message-based DMA"
  Corrects a bug in DMA address calculation.

* 1569a37 "Pass server->client command over a separate socket pair"
  Adds support for separate sockets for either command direction,
  addressing a bug where libvfio-user gets confused if both client and
  server send commands concurrently.

Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
---
 subprojects/libvfio-user.wrap | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/subprojects/libvfio-user.wrap b/subprojects/libvfio-user.wrap
index 416955ca45..cdf0a7a375 100644
--- a/subprojects/libvfio-user.wrap
+++ b/subprojects/libvfio-user.wrap
@@ -1,4 +1,4 @@
 [wrap-git]
 url = https://gitlab.com/qemu-project/libvfio-user.git
-revision = 0b28d205572c80b568a1003db2c8f37ca333e4d7
+revision = 1569a37a54ecb63bd4008708c76339ccf7d06115
 depth = 1
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 4/5] vfio-user: Message-based DMA support
  2023-09-20  8:06 [PATCH v5 0/5] Support message-based DMA in vfio-user server Mattias Nissler
                   ` (2 preceding siblings ...)
  2023-09-20  8:06 ` [PATCH v5 3/5] Update subprojects/libvfio-user Mattias Nissler
@ 2023-09-20  8:06 ` Mattias Nissler
  2023-10-04 14:54   ` Jag Raman
  2023-09-20  8:06 ` [PATCH v5 5/5] vfio-user: Fix config space access byte order Mattias Nissler
  4 siblings, 1 reply; 11+ messages in thread
From: Mattias Nissler @ 2023-09-20  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Elena Ufimtseva, David Hildenbrand, Richard Henderson,
	Philippe Mathieu-Daudé, john.levon, peterx, Marcel Apfelbaum,
	stefanha, Jagannathan Raman, Paolo Bonzini, Michael S. Tsirkin,
	Mattias Nissler

Wire up support for DMA for the case where the vfio-user client does not
provide mmap()-able file descriptors, but DMA requests must be performed
via the VFIO-user protocol. This installs an indirect memory region,
which already works for pci_dma_{read,write}, and pci_dma_map works
thanks to the existing DMA bounce buffering support.

Note that while simple scenarios work with this patch, there's a known
race condition in libvfio-user that will mess up the communication
channel. See https://github.com/nutanix/libvfio-user/issues/279 for
details as well as a proposed fix.

Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
---
 hw/remote/trace-events    |  2 +
 hw/remote/vfio-user-obj.c | 84 +++++++++++++++++++++++++++++++++++----
 2 files changed, 79 insertions(+), 7 deletions(-)

diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 0d1b7d56a5..358a68fb34 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -9,6 +9,8 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x <- 0x%x"
 vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
 vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
+vfu_dma_read(uint64_t gpa, size_t len) "vfu: DMA read 0x%"PRIx64", %zu bytes"
+vfu_dma_write(uint64_t gpa, size_t len) "vfu: DMA write 0x%"PRIx64", %zu bytes"
 vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
 vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
 vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 8b10c32a3c..6a561f7969 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -300,6 +300,63 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
     return count;
 }
 
+static MemTxResult vfu_dma_read(void *opaque, hwaddr addr, uint64_t *val,
+                                unsigned size, MemTxAttrs attrs)
+{
+    MemoryRegion *region = opaque;
+    vfu_ctx_t *vfu_ctx = VFU_OBJECT(region->owner)->vfu_ctx;
+    uint8_t buf[sizeof(uint64_t)];
+
+    trace_vfu_dma_read(region->addr + addr, size);
+
+    g_autofree dma_sg_t *sg = g_malloc0(dma_sg_size());
+    vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
+    if (vfu_addr_to_sgl(vfu_ctx, vfu_addr, size, sg, 1, PROT_READ) < 0 ||
+        vfu_sgl_read(vfu_ctx, sg, 1, buf) != 0) {
+        return MEMTX_ERROR;
+    }
+
+    *val = ldn_he_p(buf, size);
+
+    return MEMTX_OK;
+}
+
+static MemTxResult vfu_dma_write(void *opaque, hwaddr addr, uint64_t val,
+                                 unsigned size, MemTxAttrs attrs)
+{
+    MemoryRegion *region = opaque;
+    vfu_ctx_t *vfu_ctx = VFU_OBJECT(region->owner)->vfu_ctx;
+    uint8_t buf[sizeof(uint64_t)];
+
+    trace_vfu_dma_write(region->addr + addr, size);
+
+    stn_he_p(buf, size, val);
+
+    g_autofree dma_sg_t *sg = g_malloc0(dma_sg_size());
+    vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
+    if (vfu_addr_to_sgl(vfu_ctx, vfu_addr, size, sg, 1, PROT_WRITE) < 0 ||
+        vfu_sgl_write(vfu_ctx, sg, 1, buf) != 0) {
+        return MEMTX_ERROR;
+    }
+
+    return MEMTX_OK;
+}
+
+static const MemoryRegionOps vfu_dma_ops = {
+    .read_with_attrs = vfu_dma_read,
+    .write_with_attrs = vfu_dma_write,
+    .endianness = DEVICE_HOST_ENDIAN,
+    .valid = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+        .unaligned = true,
+    },
+    .impl = {
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
+};
+
 static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
 {
     VfuObject *o = vfu_get_private(vfu_ctx);
@@ -308,17 +365,30 @@ static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
     g_autofree char *name = NULL;
     struct iovec *iov = &info->iova;
 
-    if (!info->vaddr) {
-        return;
-    }
-
     name = g_strdup_printf("mem-%s-%"PRIx64"", o->device,
-                           (uint64_t)info->vaddr);
+                           (uint64_t)iov->iov_base);
 
     subregion = g_new0(MemoryRegion, 1);
 
-    memory_region_init_ram_ptr(subregion, NULL, name,
-                               iov->iov_len, info->vaddr);
+    if (info->vaddr) {
+        memory_region_init_ram_ptr(subregion, OBJECT(o), name,
+                                   iov->iov_len, info->vaddr);
+    } else {
+        /*
+         * Note that I/O regions' MemoryRegionOps handle accesses of at most 8
+         * bytes at a time, and larger accesses are broken down. However,
+         * many/most DMA accesses are larger than 8 bytes and VFIO-user can
+         * handle large DMA accesses just fine, thus this size restriction
+         * unnecessarily hurts performance, in particular given that each
+         * access causes a round trip on the VFIO-user socket.
+         *
+         * TODO: Investigate how to plumb larger accesses through memory
+         * regions, possibly by amending MemoryRegionOps or by creating a new
+         * memory region type.
+         */
+        memory_region_init_io(subregion, OBJECT(o), &vfu_dma_ops, subregion,
+                              name, iov->iov_len);
+    }
 
     dma_as = pci_device_iommu_address_space(o->pci_dev);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 5/5] vfio-user: Fix config space access byte order
  2023-09-20  8:06 [PATCH v5 0/5] Support message-based DMA in vfio-user server Mattias Nissler
                   ` (3 preceding siblings ...)
  2023-09-20  8:06 ` [PATCH v5 4/5] vfio-user: Message-based DMA support Mattias Nissler
@ 2023-09-20  8:06 ` Mattias Nissler
  2023-10-05 16:30   ` Jag Raman
  4 siblings, 1 reply; 11+ messages in thread
From: Mattias Nissler @ 2023-09-20  8:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Elena Ufimtseva, David Hildenbrand, Richard Henderson,
	Philippe Mathieu-Daudé, john.levon, peterx, Marcel Apfelbaum,
	stefanha, Jagannathan Raman, Paolo Bonzini, Michael S. Tsirkin,
	Mattias Nissler

PCI config space is little-endian, so on a big-endian host we need to
perform byte swaps for values as they are passed to and received from
the generic PCI config space access machinery.

Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
---
 hw/remote/vfio-user-obj.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 6a561f7969..6043a91b11 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -281,7 +281,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
     while (bytes > 0) {
         len = (bytes > pci_access_width) ? pci_access_width : bytes;
         if (is_write) {
-            memcpy(&val, ptr, len);
+            val = ldn_le_p(ptr, len);
             pci_host_config_write_common(o->pci_dev, offset,
                                          pci_config_size(o->pci_dev),
                                          val, len);
@@ -289,7 +289,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
         } else {
             val = pci_host_config_read_common(o->pci_dev, offset,
                                               pci_config_size(o->pci_dev), len);
-            memcpy(ptr, &val, len);
+            stn_le_p(ptr, len, val);
             trace_vfu_cfg_read(offset, val);
         }
         offset += len;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/5] softmmu: Support concurrent bounce buffers
  2023-09-20  8:06 ` [PATCH v5 2/5] softmmu: Support concurrent bounce buffers Mattias Nissler
@ 2023-09-22 15:28   ` Peter Xu
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2023-09-22 15:28 UTC (permalink / raw)
  To: Mattias Nissler
  Cc: qemu-devel, Elena Ufimtseva, David Hildenbrand, Richard Henderson,
	Philippe Mathieu-Daudé, john.levon, Marcel Apfelbaum,
	stefanha, Jagannathan Raman, Paolo Bonzini, Michael S. Tsirkin

On Wed, Sep 20, 2023 at 01:06:19AM -0700, Mattias Nissler wrote:
> When DMA memory can't be directly accessed, as is the case when
> running the device model in a separate process without shareable DMA
> file descriptors, bounce buffering is used.
> 
> It is not uncommon for device models to request mapping of several DMA
> regions at the same time. Examples include:
>  * net devices, e.g. when transmitting a packet that is split across
>    several TX descriptors (observed with igb)
>  * USB host controllers, when handling a packet with multiple data TRBs
>    (observed with xhci)
> 
> Previously, qemu only provided a single bounce buffer per AddressSpace
> and would fail DMA map requests while the buffer was already in use. In
> turn, this would cause DMA failures that ultimately manifest as hardware
> errors from the guest perspective.
> 
> This change allocates DMA bounce buffers dynamically instead of
> supporting only a single buffer. Thus, multiple DMA mappings work
> correctly also when RAM can't be mmap()-ed.
> 
> The total bounce buffer allocation size is limited individually for each
> AddressSpace. The default limit is 4096 bytes, matching the previous
> maximum buffer size. A new x-max-bounce-buffer-size parameter is
> provided to configure the limit for PCI devices.
> 
> Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>

Acked-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 4/5] vfio-user: Message-based DMA support
  2023-09-20  8:06 ` [PATCH v5 4/5] vfio-user: Message-based DMA support Mattias Nissler
@ 2023-10-04 14:54   ` Jag Raman
  2023-10-06  7:56     ` Mattias Nissler
  0 siblings, 1 reply; 11+ messages in thread
From: Jag Raman @ 2023-10-04 14:54 UTC (permalink / raw)
  To: Mattias Nissler
  Cc: QEMU Developers, Elena Ufimtseva, David Hildenbrand,
	Richard Henderson, Philippe Mathieu-Daudé,
	john.levon@nutanix.com, peterx@redhat.com, Marcel Apfelbaum,
	stefanha@redhat.com, Paolo Bonzini, Michael S. Tsirkin



> On Sep 20, 2023, at 4:06 AM, Mattias Nissler <mnissler@rivosinc.com> wrote:
> 
> Wire up support for DMA for the case where the vfio-user client does not
> provide mmap()-able file descriptors, but DMA requests must be performed
> via the VFIO-user protocol. This installs an indirect memory region,
> which already works for pci_dma_{read,write}, and pci_dma_map works
> thanks to the existing DMA bounce buffering support.
> 
> Note that while simple scenarios work with this patch, there's a known
> race condition in libvfio-user that will mess up the communication
> channel. See https://github.com/nutanix/libvfio-user/issues/279 for
> details as well as a proposed fix.
> 
> Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
> ---
> hw/remote/trace-events    |  2 +
> hw/remote/vfio-user-obj.c | 84 +++++++++++++++++++++++++++++++++++----
> 2 files changed, 79 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/remote/trace-events b/hw/remote/trace-events
> index 0d1b7d56a5..358a68fb34 100644
> --- a/hw/remote/trace-events
> +++ b/hw/remote/trace-events
> @@ -9,6 +9,8 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x -> 0x%x"
> vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x <- 0x%x"
> vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
> vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
> +vfu_dma_read(uint64_t gpa, size_t len) "vfu: DMA read 0x%"PRIx64", %zu bytes"
> +vfu_dma_write(uint64_t gpa, size_t len) "vfu: DMA write 0x%"PRIx64", %zu bytes"
> vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
> vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
> vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 8b10c32a3c..6a561f7969 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -300,6 +300,63 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
>     return count;
> }
> 
> +static MemTxResult vfu_dma_read(void *opaque, hwaddr addr, uint64_t *val,
> +                                unsigned size, MemTxAttrs attrs)
> +{
> +    MemoryRegion *region = opaque;
> +    vfu_ctx_t *vfu_ctx = VFU_OBJECT(region->owner)->vfu_ctx;
> +    uint8_t buf[sizeof(uint64_t)];
> +
> +    trace_vfu_dma_read(region->addr + addr, size);
> +
> +    g_autofree dma_sg_t *sg = g_malloc0(dma_sg_size());
> +    vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
> +    if (vfu_addr_to_sgl(vfu_ctx, vfu_addr, size, sg, 1, PROT_READ) < 0 ||
> +        vfu_sgl_read(vfu_ctx, sg, 1, buf) != 0) {
> +        return MEMTX_ERROR;
> +    }
> +
> +    *val = ldn_he_p(buf, size);
> +
> +    return MEMTX_OK;
> +}
> +
> +static MemTxResult vfu_dma_write(void *opaque, hwaddr addr, uint64_t val,
> +                                 unsigned size, MemTxAttrs attrs)
> +{
> +    MemoryRegion *region = opaque;
> +    vfu_ctx_t *vfu_ctx = VFU_OBJECT(region->owner)->vfu_ctx;
> +    uint8_t buf[sizeof(uint64_t)];
> +
> +    trace_vfu_dma_write(region->addr + addr, size);
> +
> +    stn_he_p(buf, size, val);
> +
> +    g_autofree dma_sg_t *sg = g_malloc0(dma_sg_size());
> +    vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
> +    if (vfu_addr_to_sgl(vfu_ctx, vfu_addr, size, sg, 1, PROT_WRITE) < 0 ||
> +        vfu_sgl_write(vfu_ctx, sg, 1, buf) != 0) {
> +        return MEMTX_ERROR;
> +    }
> +
> +    return MEMTX_OK;
> +}
> +
> +static const MemoryRegionOps vfu_dma_ops = {
> +    .read_with_attrs = vfu_dma_read,
> +    .write_with_attrs = vfu_dma_write,
> +    .endianness = DEVICE_HOST_ENDIAN,
> +    .valid = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +        .unaligned = true,
> +    },
> +    .impl = {
> +        .min_access_size = 1,
> +        .max_access_size = 8,
> +    },
> +};
> +
> static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
> {
>     VfuObject *o = vfu_get_private(vfu_ctx);
> @@ -308,17 +365,30 @@ static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
>     g_autofree char *name = NULL;
>     struct iovec *iov = &info->iova;
> 
> -    if (!info->vaddr) {
> -        return;
> -    }
> -
>     name = g_strdup_printf("mem-%s-%"PRIx64"", o->device,
> -                           (uint64_t)info->vaddr);
> +                           (uint64_t)iov->iov_base);
> 
>     subregion = g_new0(MemoryRegion, 1);
> 
> -    memory_region_init_ram_ptr(subregion, NULL, name,
> -                               iov->iov_len, info->vaddr);
> +    if (info->vaddr) {
> +        memory_region_init_ram_ptr(subregion, OBJECT(o), name,
> +                                   iov->iov_len, info->vaddr);
> +    } else {
> +        /*
> +         * Note that I/O regions' MemoryRegionOps handle accesses of at most 8
> +         * bytes at a time, and larger accesses are broken down. However,
> +         * many/most DMA accesses are larger than 8 bytes and VFIO-user can
> +         * handle large DMA accesses just fine, thus this size restriction
> +         * unnecessarily hurts performance, in particular given that each
> +         * access causes a round trip on the VFIO-user socket.
> +         *
> +         * TODO: Investigate how to plumb larger accesses through memory
> +         * regions, possibly by amending MemoryRegionOps or by creating a new
> +         * memory region type.
> +         */
> +        memory_region_init_io(subregion, OBJECT(o), &vfu_dma_ops, subregion,
> +                              name, iov->iov_len);

Hi Mattias,

We should update dma_unregister() to ensure we remove this subregion.
dma_unregister() presently removes the RAM region, but not this one.

--
Jag

> +    }
> 
>     dma_as = pci_device_iommu_address_space(o->pci_dev);
> 
> -- 
> 2.34.1
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 5/5] vfio-user: Fix config space access byte order
  2023-09-20  8:06 ` [PATCH v5 5/5] vfio-user: Fix config space access byte order Mattias Nissler
@ 2023-10-05 16:30   ` Jag Raman
  2023-10-06  6:44     ` Mattias Nissler
  0 siblings, 1 reply; 11+ messages in thread
From: Jag Raman @ 2023-10-05 16:30 UTC (permalink / raw)
  To: Mattias Nissler
  Cc: QEMU Developers, Elena Ufimtseva, David Hildenbrand,
	Richard Henderson, Philippe Mathieu-Daudé,
	john.levon@nutanix.com, peterx@redhat.com, Marcel Apfelbaum,
	stefanha@redhat.com, Paolo Bonzini, Michael S. Tsirkin



> On Sep 20, 2023, at 4:06 AM, Mattias Nissler <mnissler@rivosinc.com> wrote:
> 
> PCI config space is little-endian, so on a big-endian host we need to
> perform byte swaps for values as they are passed to and received from
> the generic PCI config space access machinery.
> 
> Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
> ---
> hw/remote/vfio-user-obj.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 6a561f7969..6043a91b11 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -281,7 +281,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
>     while (bytes > 0) {
>         len = (bytes > pci_access_width) ? pci_access_width : bytes;
>         if (is_write) {
> -            memcpy(&val, ptr, len);
> +            val = ldn_le_p(ptr, len);
>             pci_host_config_write_common(o->pci_dev, offset,
>                                          pci_config_size(o->pci_dev),
>                                          val, len);
> @@ -289,7 +289,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
>         } else {
>             val = pci_host_config_read_common(o->pci_dev, offset,
>                                               pci_config_size(o->pci_dev), len);
> -            memcpy(ptr, &val, len);
> +            stn_le_p(ptr, len, val);
>             trace_vfu_cfg_read(offset, val);
>         }
>         offset += len;
> -- 
> 2.34.1
> 

Hey,

When you tested on s390x, could you see the correct values for the config space in the Kernel? For example, were any known device's vendor and device IDs valid?

I'm asking because flatview_read_continue() / flatview_write_continue() does endianness adjustment. So, I want to confirm that the endianness adjustment in your code also makes sense from Kernel's perspective.

I'm trying to access a big-endian system, but no luck.

#0  0x0000555555b97a30 in vfio_user_io_region_read (vbasedev=0x555557802c80, index=7 '\a', off=4, size=2, data=0x7fff6cfb945c) at ../hw/vfio/user.c:1985
#1  0x0000555555b8dcfb in vfio_pci_read_config (pdev=0x555557802250, addr=4, len=2) at ../hw/vfio/pci.c:1202
#2  0x000055555599d3f9 in pci_host_config_read_common (pci_dev=0x555557802250, addr=addr@entry=4, limit=<optimized out>, limit@entry=256, len=len@entry=2) at ../hw/pci/pci_host.c:107
#3  0x000055555599d74a in pci_data_read (s=<optimized out>, addr=2147493892, len=2) at ../hw/pci/pci_host.c:143
#4  0x000055555599d84f in pci_host_data_read (opaque=<optimized out>, addr=<optimized out>, len=<optimized out>) at ../hw/pci/pci_host.c:188
#5  0x0000555555bc3c4d in memory_region_read_accessor (mr=mr@entry=0x5555569de370, addr=0, value=value@entry=0x7fff6cfb9710, size=size@entry=2, shift=0, mask=mask@entry=65535, attrs=...) at ../softmmu/memory.c:441
#6  0x0000555555bc3fce in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fff6cfb9710, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x555555bc3c10 <memory_region_read_accessor>, mr=0x5555569de370, attrs=...) at ../softmmu/memory.c:569
#7  0x0000555555bc41a1 in memory_region_dispatch_read1 (attrs=..., size=2, pval=0x7fff6cfb9710, addr=<optimized out>, mr=<optimized out>) at ../softmmu/memory.c:1443
#8  0x0000555555bc41a1 in memory_region_dispatch_read (mr=mr@entry=0x5555569de370, addr=<optimized out>, pval=pval@entry=0x7fff6cfb9710, op=MO_16, attrs=attrs@entry=...) at ../softmmu/memory.c:1476
#9  0x0000555555bce13b in flatview_read_continue (fv=fv@entry=0x7fff6861e120, addr=addr@entry=3324, attrs=..., ptr=ptr@entry=0x7ffff7fdf000, len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x5555569de370) at /home/github/oracle/qemu/include/qemu/host-utils.h:219
#10 0x0000555555bce2f7 in flatview_read (fv=0x7fff6861e120, addr=addr@entry=3324, attrs=attrs@entry=..., buf=buf@entry=0x7ffff7fdf000, len=len@entry=2) at ../softmmu/physmem.c:2762
#11 0x0000555555bce448 in address_space_read_full (as=0x555556671620 <address_space_io>, addr=3324, attrs=..., buf=0x7ffff7fdf000, len=2) at ../softmmu/physmem.c:2775
#12 0x0000555555bce595 in address_space_rw (as=as@entry=0x555556671620 <address_space_io>, addr=addr@entry=3324, attrs=..., attrs@entry=..., buf=<optimized out>, len=len@entry=2, is_write=is_write@entry=false) at ../softmmu/physmem.c:2803
#13 0x0000555555c1973f in kvm_handle_io (count=1, size=2, direction=<optimized out>, data=<optimized out>, attrs=..., port=3324) at ../accel/kvm/kvm-all.c:2778
#14 0x0000555555c1973f in kvm_cpu_exec (cpu=cpu@entry=0x5555569ab390) at ../accel/kvm/kvm-all.c:3029
#15 0x0000555555c1a8dd in kvm_vcpu_thread_fn (arg=arg@entry=0x5555569ab390) at ../accel/kvm/kvm-accel-ops.c:51
#16 0x0000555555d8f4fb in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:541
#17 0x00007ffff577dea5 in start_thread () at /lib64/libpthread.so.0
#18 0x00007ffff54a6b0d in clone () at /lib64/libc.so.6

FYI, the above is the stack trace from the client. vfio_user_io_region_read() in the above sends a message to the server, and the server ends up calling either vfu_object_cfg_access() or vfu_object_bar_rw()  (which also does the endianness correction) depending on the region.

--
Jag



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 5/5] vfio-user: Fix config space access byte order
  2023-10-05 16:30   ` Jag Raman
@ 2023-10-06  6:44     ` Mattias Nissler
  0 siblings, 0 replies; 11+ messages in thread
From: Mattias Nissler @ 2023-10-06  6:44 UTC (permalink / raw)
  To: Jag Raman
  Cc: QEMU Developers, Elena Ufimtseva, David Hildenbrand,
	Richard Henderson, Philippe Mathieu-Daudé,
	john.levon@nutanix.com, peterx@redhat.com, Marcel Apfelbaum,
	stefanha@redhat.com, Paolo Bonzini, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 6751 bytes --]

On Thu, Oct 5, 2023 at 6:30 PM Jag Raman <jag.raman@oracle.com> wrote:

>
>
> > On Sep 20, 2023, at 4:06 AM, Mattias Nissler <mnissler@rivosinc.com>
> wrote:
> >
> > PCI config space is little-endian, so on a big-endian host we need to
> > perform byte swaps for values as they are passed to and received from
> > the generic PCI config space access machinery.
> >
> > Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
> > ---
> > hw/remote/vfio-user-obj.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> > index 6a561f7969..6043a91b11 100644
> > --- a/hw/remote/vfio-user-obj.c
> > +++ b/hw/remote/vfio-user-obj.c
> > @@ -281,7 +281,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t
> *vfu_ctx, char * const buf,
> >     while (bytes > 0) {
> >         len = (bytes > pci_access_width) ? pci_access_width : bytes;
> >         if (is_write) {
> > -            memcpy(&val, ptr, len);
> > +            val = ldn_le_p(ptr, len);
> >             pci_host_config_write_common(o->pci_dev, offset,
> >                                          pci_config_size(o->pci_dev),
> >                                          val, len);
> > @@ -289,7 +289,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t
> *vfu_ctx, char * const buf,
> >         } else {
> >             val = pci_host_config_read_common(o->pci_dev, offset,
> >
>  pci_config_size(o->pci_dev), len);
> > -            memcpy(ptr, &val, len);
> > +            stn_le_p(ptr, len, val);
> >             trace_vfu_cfg_read(offset, val);
> >         }
> >         offset += len;
> > --
> > 2.34.1
> >
>
> Hey,
>
> When you tested on s390x, could you see the correct values for the config
> space in the Kernel? For example, were any known device's vendor and device
> IDs valid?
>

I don't exactly remember whether I checked vendor and device IDs, but I've
done something more comprehensive: I set up a qemu vfio-user server
exposing an nvme device and then connected a qemu client (with the
vfio-user client patches from the oracle qemu github repo). Linux running
in the client probes the nvme device successfully and I've mounted a file
system on it. Both qemu binaries are s390x.


>
> I'm asking because flatview_read_continue() / flatview_write_continue()
> does endianness adjustment. So, I want to confirm that the endianness
> adjustment in your code also makes sense from Kernel's perspective.
>

The conversion in the flatview access path is adjusting from the endianness
of the memory region to what the emulated CPU needs. Since the PCI config
space memory region is little-endian (see pci_host_data_le_ops), we're
doing a swap there. The code I'm changing is backing the memory region, so
incoming/outgoing data for writes/reads must be in little-endian to adhere
to the endianness declared by the memory region.


>
> I'm trying to access a big-endian system, but no luck.
>

Btw. I don't have access to big-endian hardware either, but it was
surprisingly straightforward to make my x86 ubuntu machine run s390x
binaries via multiarch + qemu user mode (qemu turtles all the way down :-D)


>
> #0  0x0000555555b97a30 in vfio_user_io_region_read
> (vbasedev=0x555557802c80, index=7 '\a', off=4, size=2, data=0x7fff6cfb945c)
> at ../hw/vfio/user.c:1985
> #1  0x0000555555b8dcfb in vfio_pci_read_config (pdev=0x555557802250,
> addr=4, len=2) at ../hw/vfio/pci.c:1202
> #2  0x000055555599d3f9 in pci_host_config_read_common
> (pci_dev=0x555557802250, addr=addr@entry=4, limit=<optimized out>,
> limit@entry=256, len=len@entry=2) at ../hw/pci/pci_host.c:107
> #3  0x000055555599d74a in pci_data_read (s=<optimized out>,
> addr=2147493892, len=2) at ../hw/pci/pci_host.c:143
> #4  0x000055555599d84f in pci_host_data_read (opaque=<optimized out>,
> addr=<optimized out>, len=<optimized out>) at ../hw/pci/pci_host.c:188
> #5  0x0000555555bc3c4d in memory_region_read_accessor (mr=mr@entry=0x5555569de370,
> addr=0, value=value@entry=0x7fff6cfb9710, size=size@entry=2, shift=0,
> mask=mask@entry=65535, attrs=...) at ../softmmu/memory.c:441
> #6  0x0000555555bc3fce in access_with_adjusted_size (addr=addr@entry=0,
> value=value@entry=0x7fff6cfb9710, size=size@entry=2,
> access_size_min=<optimized out>, access_size_max=<optimized out>,
> access_fn=0x555555bc3c10 <memory_region_read_accessor>, mr=0x5555569de370,
> attrs=...) at ../softmmu/memory.c:569
> #7  0x0000555555bc41a1 in memory_region_dispatch_read1 (attrs=..., size=2,
> pval=0x7fff6cfb9710, addr=<optimized out>, mr=<optimized out>) at
> ../softmmu/memory.c:1443
> #8  0x0000555555bc41a1 in memory_region_dispatch_read (mr=mr@entry=0x5555569de370,
> addr=<optimized out>, pval=pval@entry=0x7fff6cfb9710, op=MO_16,
> attrs=attrs@entry=...) at ../softmmu/memory.c:1476
> #9  0x0000555555bce13b in flatview_read_continue (fv=fv@entry=0x7fff6861e120,
> addr=addr@entry=3324, attrs=..., ptr=ptr@entry=0x7ffff7fdf000,
> len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
> mr=0x5555569de370) at /home/github/oracle/qemu/include/qemu/host-utils.h:219
> #10 0x0000555555bce2f7 in flatview_read (fv=0x7fff6861e120, addr=addr@entry=3324,
> attrs=attrs@entry=..., buf=buf@entry=0x7ffff7fdf000, len=len@entry=2) at
> ../softmmu/physmem.c:2762
> #11 0x0000555555bce448 in address_space_read_full (as=0x555556671620
> <address_space_io>, addr=3324, attrs=..., buf=0x7ffff7fdf000, len=2) at
> ../softmmu/physmem.c:2775
> #12 0x0000555555bce595 in address_space_rw (as=as@entry=0x555556671620
> <address_space_io>, addr=addr@entry=3324, attrs=..., attrs@entry=...,
> buf=<optimized out>, len=len@entry=2, is_write=is_write@entry=false) at
> ../softmmu/physmem.c:2803
> #13 0x0000555555c1973f in kvm_handle_io (count=1, size=2,
> direction=<optimized out>, data=<optimized out>, attrs=..., port=3324) at
> ../accel/kvm/kvm-all.c:2778
> #14 0x0000555555c1973f in kvm_cpu_exec (cpu=cpu@entry=0x5555569ab390) at
> ../accel/kvm/kvm-all.c:3029
> #15 0x0000555555c1a8dd in kvm_vcpu_thread_fn (arg=arg@entry=0x5555569ab390)
> at ../accel/kvm/kvm-accel-ops.c:51
> #16 0x0000555555d8f4fb in qemu_thread_start (args=<optimized out>) at
> ../util/qemu-thread-posix.c:541
> #17 0x00007ffff577dea5 in start_thread () at /lib64/libpthread.so.0
> #18 0x00007ffff54a6b0d in clone () at /lib64/libc.so.6
>
> FYI, the above is the stack trace from the client.
> vfio_user_io_region_read() in the above sends a message to the server, and
> the server ends up calling either vfu_object_cfg_access() or
> vfu_object_bar_rw()  (which also does the endianness correction) depending
> on the region.
>
> --
> Jag
>
>

[-- Attachment #2: Type: text/html, Size: 8189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 4/5] vfio-user: Message-based DMA support
  2023-10-04 14:54   ` Jag Raman
@ 2023-10-06  7:56     ` Mattias Nissler
  0 siblings, 0 replies; 11+ messages in thread
From: Mattias Nissler @ 2023-10-06  7:56 UTC (permalink / raw)
  To: Jag Raman
  Cc: QEMU Developers, Elena Ufimtseva, David Hildenbrand,
	Richard Henderson, Philippe Mathieu-Daudé,
	john.levon@nutanix.com, peterx@redhat.com, Marcel Apfelbaum,
	stefanha@redhat.com, Paolo Bonzini, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 6561 bytes --]

On Wed, Oct 4, 2023 at 4:54 PM Jag Raman <jag.raman@oracle.com> wrote:

>
>
> > On Sep 20, 2023, at 4:06 AM, Mattias Nissler <mnissler@rivosinc.com>
> wrote:
> >
> > Wire up support for DMA for the case where the vfio-user client does not
> > provide mmap()-able file descriptors, but DMA requests must be performed
> > via the VFIO-user protocol. This installs an indirect memory region,
> > which already works for pci_dma_{read,write}, and pci_dma_map works
> > thanks to the existing DMA bounce buffering support.
> >
> > Note that while simple scenarios work with this patch, there's a known
> > race condition in libvfio-user that will mess up the communication
> > channel. See https://github.com/nutanix/libvfio-user/issues/279 for
> > details as well as a proposed fix.
> >
> > Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
> > ---
> > hw/remote/trace-events    |  2 +
> > hw/remote/vfio-user-obj.c | 84 +++++++++++++++++++++++++++++++++++----
> > 2 files changed, 79 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/remote/trace-events b/hw/remote/trace-events
> > index 0d1b7d56a5..358a68fb34 100644
> > --- a/hw/remote/trace-events
> > +++ b/hw/remote/trace-events
> > @@ -9,6 +9,8 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg:
> 0x%x -> 0x%x"
> > vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x <- 0x%x"
> > vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA
> 0x%"PRIx64", %zu bytes"
> > vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
> > +vfu_dma_read(uint64_t gpa, size_t len) "vfu: DMA read 0x%"PRIx64", %zu
> bytes"
> > +vfu_dma_write(uint64_t gpa, size_t len) "vfu: DMA write 0x%"PRIx64",
> %zu bytes"
> > vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr
> 0x%"PRIx64" size 0x%"PRIx64""
> > vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR
> address 0x%"PRIx64""
> > vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR
> address 0x%"PRIx64""
> > diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> > index 8b10c32a3c..6a561f7969 100644
> > --- a/hw/remote/vfio-user-obj.c
> > +++ b/hw/remote/vfio-user-obj.c
> > @@ -300,6 +300,63 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t
> *vfu_ctx, char * const buf,
> >     return count;
> > }
> >
> > +static MemTxResult vfu_dma_read(void *opaque, hwaddr addr, uint64_t
> *val,
> > +                                unsigned size, MemTxAttrs attrs)
> > +{
> > +    MemoryRegion *region = opaque;
> > +    vfu_ctx_t *vfu_ctx = VFU_OBJECT(region->owner)->vfu_ctx;
> > +    uint8_t buf[sizeof(uint64_t)];
> > +
> > +    trace_vfu_dma_read(region->addr + addr, size);
> > +
> > +    g_autofree dma_sg_t *sg = g_malloc0(dma_sg_size());
> > +    vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
> > +    if (vfu_addr_to_sgl(vfu_ctx, vfu_addr, size, sg, 1, PROT_READ) < 0
> ||
> > +        vfu_sgl_read(vfu_ctx, sg, 1, buf) != 0) {
> > +        return MEMTX_ERROR;
> > +    }
> > +
> > +    *val = ldn_he_p(buf, size);
> > +
> > +    return MEMTX_OK;
> > +}
> > +
> > +static MemTxResult vfu_dma_write(void *opaque, hwaddr addr, uint64_t
> val,
> > +                                 unsigned size, MemTxAttrs attrs)
> > +{
> > +    MemoryRegion *region = opaque;
> > +    vfu_ctx_t *vfu_ctx = VFU_OBJECT(region->owner)->vfu_ctx;
> > +    uint8_t buf[sizeof(uint64_t)];
> > +
> > +    trace_vfu_dma_write(region->addr + addr, size);
> > +
> > +    stn_he_p(buf, size, val);
> > +
> > +    g_autofree dma_sg_t *sg = g_malloc0(dma_sg_size());
> > +    vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
> > +    if (vfu_addr_to_sgl(vfu_ctx, vfu_addr, size, sg, 1, PROT_WRITE) < 0
> ||
> > +        vfu_sgl_write(vfu_ctx, sg, 1, buf) != 0) {
> > +        return MEMTX_ERROR;
> > +    }
> > +
> > +    return MEMTX_OK;
> > +}
> > +
> > +static const MemoryRegionOps vfu_dma_ops = {
> > +    .read_with_attrs = vfu_dma_read,
> > +    .write_with_attrs = vfu_dma_write,
> > +    .endianness = DEVICE_HOST_ENDIAN,
> > +    .valid = {
> > +        .min_access_size = 1,
> > +        .max_access_size = 8,
> > +        .unaligned = true,
> > +    },
> > +    .impl = {
> > +        .min_access_size = 1,
> > +        .max_access_size = 8,
> > +    },
> > +};
> > +
> > static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
> > {
> >     VfuObject *o = vfu_get_private(vfu_ctx);
> > @@ -308,17 +365,30 @@ static void dma_register(vfu_ctx_t *vfu_ctx,
> vfu_dma_info_t *info)
> >     g_autofree char *name = NULL;
> >     struct iovec *iov = &info->iova;
> >
> > -    if (!info->vaddr) {
> > -        return;
> > -    }
> > -
> >     name = g_strdup_printf("mem-%s-%"PRIx64"", o->device,
> > -                           (uint64_t)info->vaddr);
> > +                           (uint64_t)iov->iov_base);
> >
> >     subregion = g_new0(MemoryRegion, 1);
> >
> > -    memory_region_init_ram_ptr(subregion, NULL, name,
> > -                               iov->iov_len, info->vaddr);
> > +    if (info->vaddr) {
> > +        memory_region_init_ram_ptr(subregion, OBJECT(o), name,
> > +                                   iov->iov_len, info->vaddr);
> > +    } else {
> > +        /*
> > +         * Note that I/O regions' MemoryRegionOps handle accesses of at
> most 8
> > +         * bytes at a time, and larger accesses are broken down.
> However,
> > +         * many/most DMA accesses are larger than 8 bytes and VFIO-user
> can
> > +         * handle large DMA accesses just fine, thus this size
> restriction
> > +         * unnecessarily hurts performance, in particular given that
> each
> > +         * access causes a round trip on the VFIO-user socket.
> > +         *
> > +         * TODO: Investigate how to plumb larger accesses through memory
> > +         * regions, possibly by amending MemoryRegionOps or by creating
> a new
> > +         * memory region type.
> > +         */
> > +        memory_region_init_io(subregion, OBJECT(o), &vfu_dma_ops,
> subregion,
> > +                              name, iov->iov_len);
>
> Hi Mattias,
>
> We should update dma_unregister() to ensure we remove this subregion.
> dma_unregister() presently removes the RAM region, but not this one.
>

Oops. Fixing.


>
> --
> Jag
>
> > +    }
> >
> >     dma_as = pci_device_iommu_address_space(o->pci_dev);
> >
> > --
> > 2.34.1
> >
>
>

[-- Attachment #2: Type: text/html, Size: 8527 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-10-06  7:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-20  8:06 [PATCH v5 0/5] Support message-based DMA in vfio-user server Mattias Nissler
2023-09-20  8:06 ` [PATCH v5 1/5] softmmu: Per-AddressSpace bounce buffering Mattias Nissler
2023-09-20  8:06 ` [PATCH v5 2/5] softmmu: Support concurrent bounce buffers Mattias Nissler
2023-09-22 15:28   ` Peter Xu
2023-09-20  8:06 ` [PATCH v5 3/5] Update subprojects/libvfio-user Mattias Nissler
2023-09-20  8:06 ` [PATCH v5 4/5] vfio-user: Message-based DMA support Mattias Nissler
2023-10-04 14:54   ` Jag Raman
2023-10-06  7:56     ` Mattias Nissler
2023-09-20  8:06 ` [PATCH v5 5/5] vfio-user: Fix config space access byte order Mattias Nissler
2023-10-05 16:30   ` Jag Raman
2023-10-06  6:44     ` Mattias Nissler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).