qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PULL 0/4] vfio queue
@ 2024-10-24  5:31 Cédric Le Goater
  2024-10-24  5:32 ` [PULL 1/4] vfio/migration: Report only stop-copy size in vfio_state_pending_exact() Cédric Le Goater
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Cédric Le Goater @ 2024-10-24  5:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Williamson, Cédric Le Goater

The following changes since commit 6f625ce2f21d6a1243065d236298277c56f972d5:

  Merge tag 'pull-request-2024-10-21' of https://gitlab.com/thuth/qemu into staging (2024-10-21 17:12:59 +0100)

are available in the Git repository at:

  https://github.com/legoater/qemu/ tags/pull-vfio-20241024

for you to fetch changes up to 00b519c0bca0e933ed22e2e6f8bca6b23f41f950:

  vfio/helpers: Align mmaps (2024-10-23 14:46:24 +0200)

----------------------------------------------------------------
vfio queue:

* Fixed size reported in vfio_state_pending_exact()
* Added support for PMD or PUD aligned mappings

----------------------------------------------------------------
Alex Williamson (2):
      vfio/helpers: Refactor vfio_region_mmap() error handling
      vfio/helpers: Align mmaps

Avihai Horon (2):
      vfio/migration: Report only stop-copy size in vfio_state_pending_exact()
      vfio/migration: Change trace formats from hex to decimal

 hw/vfio/helpers.c    | 66 +++++++++++++++++++++++++++++++++++++---------------
 hw/vfio/migration.c  |  3 ---
 hw/vfio/trace-events | 10 ++++----
 3 files changed, 52 insertions(+), 27 deletions(-)



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PULL 1/4] vfio/migration: Report only stop-copy size in vfio_state_pending_exact()
  2024-10-24  5:31 [PULL 0/4] vfio queue Cédric Le Goater
@ 2024-10-24  5:32 ` Cédric Le Goater
  2024-10-24  5:32 ` [PULL 2/4] vfio/migration: Change trace formats from hex to decimal Cédric Le Goater
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Cédric Le Goater @ 2024-10-24  5:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Williamson, Avihai Horon, Cédric Le Goater

From: Avihai Horon <avihaih@nvidia.com>

vfio_state_pending_exact() is used to update migration core how much
device data is left for the device migration. Currently, the sum of
pre-copy and stop-copy sizes of the VFIO device are reported.

The pre-copy size is obtained via the VFIO_MIG_GET_PRECOPY_INFO ioctl,
which returns the amount of device data available to be transferred
while the device is in the PRE_COPY states.

The stop-copy size is obtained via the VFIO_DEVICE_FEATURE_MIG_DATA_SIZE
ioctl, which returns the total amount of device data left to be
transferred in order to complete the device migration.

According to the above, current implementation is wrong -- it reports
extra overlapping data because pre-copy size is already contained in
stop-copy size. Fix it by reporting only stop-copy size.

Fixes: eda7362af959 ("vfio/migration: Add VFIO migration pre-copy support")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/migration.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 17199b73aeea02545338b41b180edade2ec2ddcc..992dc3b1025729877d9fbe6ce9a4dbaf4dbd8a07 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -576,9 +576,6 @@ static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
 
     if (vfio_device_state_is_precopy(vbasedev)) {
         vfio_query_precopy_size(migration);
-
-        *must_precopy +=
-            migration->precopy_init_size + migration->precopy_dirty_size;
     }
 
     trace_vfio_state_pending_exact(vbasedev->name, *must_precopy, *can_postcopy,
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PULL 2/4] vfio/migration: Change trace formats from hex to decimal
  2024-10-24  5:31 [PULL 0/4] vfio queue Cédric Le Goater
  2024-10-24  5:32 ` [PULL 1/4] vfio/migration: Report only stop-copy size in vfio_state_pending_exact() Cédric Le Goater
@ 2024-10-24  5:32 ` Cédric Le Goater
  2024-10-24  5:32 ` [PULL 3/4] vfio/helpers: Refactor vfio_region_mmap() error handling Cédric Le Goater
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Cédric Le Goater @ 2024-10-24  5:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Williamson, Avihai Horon, Cédric Le Goater

From: Avihai Horon <avihaih@nvidia.com>

Data sizes in VFIO migration trace events are printed in hex format
while in migration core trace events they are printed in decimal format.

This inconsistency makes it less readable when using both trace event
types. Hence, change the data sizes print format to decimal in VFIO
migration trace events.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/trace-events | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index c475c273fd8de156c68bca3f6eaf804c94276ff6..29789e8d276dcd39270edb3636d7f329452e9186 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -151,7 +151,7 @@ vfio_display_edid_write_error(void) ""
 vfio_load_cleanup(const char *name) " (%s)"
 vfio_load_device_config_state(const char *name) " (%s)"
 vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
-vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d"
+vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size %"PRIu64" ret %d"
 vfio_migration_realize(const char *name) " (%s)"
 vfio_migration_set_device_state(const char *name, const char *state) " (%s) state %s"
 vfio_migration_set_state(const char *name, const char *new_state, const char *recover_state) " (%s) new state %s, recover state %s"
@@ -160,10 +160,10 @@ vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_cleanup(const char *name) " (%s)"
 vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
 vfio_save_device_config_state(const char *name) " (%s)"
-vfio_save_iterate(const char *name, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
-vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64
-vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
-vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
+vfio_save_iterate(const char *name, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy initial size %"PRIu64" precopy dirty size %"PRIu64
+vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size %"PRIu64
+vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy %"PRIu64" postcopy %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64
+vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy %"PRIu64" postcopy %"PRIu64" stopcopy size %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64
 vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
 vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
 
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PULL 3/4] vfio/helpers: Refactor vfio_region_mmap() error handling
  2024-10-24  5:31 [PULL 0/4] vfio queue Cédric Le Goater
  2024-10-24  5:32 ` [PULL 1/4] vfio/migration: Report only stop-copy size in vfio_state_pending_exact() Cédric Le Goater
  2024-10-24  5:32 ` [PULL 2/4] vfio/migration: Change trace formats from hex to decimal Cédric Le Goater
@ 2024-10-24  5:32 ` Cédric Le Goater
  2024-10-24  5:32 ` [PULL 4/4] vfio/helpers: Align mmaps Cédric Le Goater
  2024-10-25 14:23 ` [PULL 0/4] vfio queue Peter Maydell
  4 siblings, 0 replies; 6+ messages in thread
From: Cédric Le Goater @ 2024-10-24  5:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Williamson, Peter Xu, Cédric Le Goater

From: Alex Williamson <alex.williamson@redhat.com>

Move error handling code to the end of the function so that it can more
easily be shared by new mmap failure conditions.  No functional change
intended.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/helpers.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index ea15c79db0a3643f260fc1ce3abfeaa7001ab306..b9e606e364a2dd267bacd63094cdedae5dd7d8b2 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -395,7 +395,7 @@ static void vfio_subregion_unmap(VFIORegion *region, int index)
 
 int vfio_region_mmap(VFIORegion *region)
 {
-    int i, prot = 0;
+    int i, ret, prot = 0;
     char *name;
 
     if (!region->mem) {
@@ -411,22 +411,8 @@ int vfio_region_mmap(VFIORegion *region)
                                      region->fd_offset +
                                      region->mmaps[i].offset);
         if (region->mmaps[i].mmap == MAP_FAILED) {
-            int ret = -errno;
-
-            trace_vfio_region_mmap_fault(memory_region_name(region->mem), i,
-                                         region->fd_offset +
-                                         region->mmaps[i].offset,
-                                         region->fd_offset +
-                                         region->mmaps[i].offset +
-                                         region->mmaps[i].size - 1, ret);
-
-            region->mmaps[i].mmap = NULL;
-
-            for (i--; i >= 0; i--) {
-                vfio_subregion_unmap(region, i);
-            }
-
-            return ret;
+            ret = -errno;
+            goto no_mmap;
         }
 
         name = g_strdup_printf("%s mmaps[%d]",
@@ -446,6 +432,20 @@ int vfio_region_mmap(VFIORegion *region)
     }
 
     return 0;
+
+no_mmap:
+    trace_vfio_region_mmap_fault(memory_region_name(region->mem), i,
+                                 region->fd_offset + region->mmaps[i].offset,
+                                 region->fd_offset + region->mmaps[i].offset +
+                                 region->mmaps[i].size - 1, ret);
+
+    region->mmaps[i].mmap = NULL;
+
+    for (i--; i >= 0; i--) {
+        vfio_subregion_unmap(region, i);
+    }
+
+    return ret;
 }
 
 void vfio_region_unmap(VFIORegion *region)
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PULL 4/4] vfio/helpers: Align mmaps
  2024-10-24  5:31 [PULL 0/4] vfio queue Cédric Le Goater
                   ` (2 preceding siblings ...)
  2024-10-24  5:32 ` [PULL 3/4] vfio/helpers: Refactor vfio_region_mmap() error handling Cédric Le Goater
@ 2024-10-24  5:32 ` Cédric Le Goater
  2024-10-25 14:23 ` [PULL 0/4] vfio queue Peter Maydell
  4 siblings, 0 replies; 6+ messages in thread
From: Cédric Le Goater @ 2024-10-24  5:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Williamson, Peter Xu, Cédric Le Goater

From: Alex Williamson <alex.williamson@redhat.com>

Thanks to work by Peter Xu, support is introduced in Linux v6.12 to
allow pfnmap insertions at PMD and PUD levels of the page table.  This
means that provided a properly aligned mmap, the vfio driver is able
to map MMIO at significantly larger intervals than PAGE_SIZE.  For
example on x86_64 (the only architecture currently supporting huge
pfnmaps for PUD), rather than 4KiB mappings, we can map device MMIO
using 2MiB and even 1GiB page table entries.

Typically mmap will already provide PMD aligned mappings, so devices
with moderately sized MMIO ranges, even GPUs with standard 256MiB BARs,
will already take advantage of this support.  However in order to better
support devices exposing multi-GiB MMIO, such as 3D accelerators or GPUs
with resizable BARs enabled, we need to manually align the mmap.

There doesn't seem to be a way for userspace to easily learn about PMD
and PUD mapping level sizes, therefore this takes the simple approach
to align the mapping to the power-of-two size of the region, up to 1GiB,
which is currently the maximum alignment we care about.

Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/helpers.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index b9e606e364a2dd267bacd63094cdedae5dd7d8b2..913796f437f84eece8711cb4b4b654a44040d17c 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -27,6 +27,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
+#include "qemu/units.h"
 #include "monitor/monitor.h"
 
 /*
@@ -406,8 +407,35 @@ int vfio_region_mmap(VFIORegion *region)
     prot |= region->flags & VFIO_REGION_INFO_FLAG_WRITE ? PROT_WRITE : 0;
 
     for (i = 0; i < region->nr_mmaps; i++) {
-        region->mmaps[i].mmap = mmap(NULL, region->mmaps[i].size, prot,
-                                     MAP_SHARED, region->vbasedev->fd,
+        size_t align = MIN(1ULL << ctz64(region->mmaps[i].size), 1 * GiB);
+        void *map_base, *map_align;
+
+        /*
+         * Align the mmap for more efficient mapping in the kernel.  Ideally
+         * we'd know the PMD and PUD mapping sizes to use as discrete alignment
+         * intervals, but we don't.  As of Linux v6.12, the largest PUD size
+         * supporting huge pfnmap is 1GiB (ARCH_SUPPORTS_PUD_PFNMAP is only set
+         * on x86_64).  Align by power-of-two size, capped at 1GiB.
+         *
+         * NB. qemu_memalign() and friends actually allocate memory, whereas
+         * the region size here can exceed host memory, therefore we manually
+         * create an oversized anonymous mapping and clean it up for alignment.
+         */
+        map_base = mmap(0, region->mmaps[i].size + align, PROT_NONE,
+                        MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+        if (map_base == MAP_FAILED) {
+            ret = -errno;
+            goto no_mmap;
+        }
+
+        map_align = (void *)ROUND_UP((uintptr_t)map_base, (uintptr_t)align);
+        munmap(map_base, map_align - map_base);
+        munmap(map_align + region->mmaps[i].size,
+               align - (map_align - map_base));
+
+        region->mmaps[i].mmap = mmap(map_align, region->mmaps[i].size, prot,
+                                     MAP_SHARED | MAP_FIXED,
+                                     region->vbasedev->fd,
                                      region->fd_offset +
                                      region->mmaps[i].offset);
         if (region->mmaps[i].mmap == MAP_FAILED) {
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PULL 0/4] vfio queue
  2024-10-24  5:31 [PULL 0/4] vfio queue Cédric Le Goater
                   ` (3 preceding siblings ...)
  2024-10-24  5:32 ` [PULL 4/4] vfio/helpers: Align mmaps Cédric Le Goater
@ 2024-10-25 14:23 ` Peter Maydell
  4 siblings, 0 replies; 6+ messages in thread
From: Peter Maydell @ 2024-10-25 14:23 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-devel, Alex Williamson

On Thu, 24 Oct 2024 at 06:33, Cédric Le Goater <clg@redhat.com> wrote:
>
> The following changes since commit 6f625ce2f21d6a1243065d236298277c56f972d5:
>
>   Merge tag 'pull-request-2024-10-21' of https://gitlab.com/thuth/qemu into staging (2024-10-21 17:12:59 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/legoater/qemu/ tags/pull-vfio-20241024
>
> for you to fetch changes up to 00b519c0bca0e933ed22e2e6f8bca6b23f41f950:
>
>   vfio/helpers: Align mmaps (2024-10-23 14:46:24 +0200)
>
> ----------------------------------------------------------------
> vfio queue:
>
> * Fixed size reported in vfio_state_pending_exact()
> * Added support for PMD or PUD aligned mappings
>



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.2
for any user-visible changes.

-- PMM


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-10-25 14:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-24  5:31 [PULL 0/4] vfio queue Cédric Le Goater
2024-10-24  5:32 ` [PULL 1/4] vfio/migration: Report only stop-copy size in vfio_state_pending_exact() Cédric Le Goater
2024-10-24  5:32 ` [PULL 2/4] vfio/migration: Change trace formats from hex to decimal Cédric Le Goater
2024-10-24  5:32 ` [PULL 3/4] vfio/helpers: Refactor vfio_region_mmap() error handling Cédric Le Goater
2024-10-24  5:32 ` [PULL 4/4] vfio/helpers: Align mmaps Cédric Le Goater
2024-10-25 14:23 ` [PULL 0/4] vfio queue Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).