qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: qemu-devel@nongnu.org
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Cedric Le Goater <clg@redhat.com>,
	Avihai Horon <avihaih@nvidia.com>, Peter Xu <peterx@redhat.com>,
	Joao Martins <joao.m.martins@oracle.com>
Subject: [PATCH RFC] vfio/common: Add an option to relax vIOMMU migration blocker
Date: Fri,  8 Sep 2023 13:05:21 +0100	[thread overview]
Message-ID: <20230908120521.50903-1-joao.m.martins@oracle.com> (raw)

Add an option 'x-migration-iommu-pt' to VFIO that allows it to relax
whether the vIOMMU usage blocks the migration. The current behaviour
is kept and we block migration in the following conditions:

* By default if the guest does try to use vIOMMU migration is blocked
when migration is attempted, just like having the migration blocker in
the first place [Current behaviour]

* Migration starts with no vIOMMU mappings, but guest kexec's itself
with IOMMU on ('iommu=on intel_iommu=on') and ends up using the vIOMMU.
here we cancel the migration with an error message [Added behaviour]

This is meant to be used for older VMs (5.10) cases where we can relax
the usage and that IOMMU is passed for the sole need of interrupt
remapping while the guest is old enough to not check for DMA translation
services while probe its IOMMU devices[0]. The option is useful for
managed VMs where you *steer* some of the guest behaviour and you know
you won't use it for more than interrupt remapping.

[0] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/

Default is 'disabled' for this option given the second bullet point
above depends on guest behaviour (thus undeterministic). But let the
user enable it if it can tolerate migration failures.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Followup from discussion here:
https://lore.kernel.org/qemu-devel/d5d30f58-31f0-1103-6956-377de34a790c@redhat.com/

This is a smaller (and simpler) take than [0], but is likely the only
option thinking in old guests, or managed guests that only want to use
vIOMMU for interrupt remapping. The work in [0] has stronger 'migration
will work' guarantees (of course except for the usual no convergence 
or network failuresi that are agnostic to vIOMMU), and a bit better in
limiting what guest can do. But it also depends in slightly recent
guests. I think both are useful.

About the patch itself:

* cancelling migration was done via vfio_migration_set_error() but
I can always use migrate_cancel() if migration is active, or add
a migration blocker when it's not active.

---
 include/hw/vfio/vfio-common.h |  2 ++
 hw/vfio/common.c              | 66 +++++++++++++++++++++++++++++++++++
 hw/vfio/migration.c           |  7 +++-
 hw/vfio/pci.c                 |  2 ++
 4 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e9b895459534..95ef386af45f 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -140,6 +140,7 @@ typedef struct VFIODevice {
     bool no_mmap;
     bool ram_block_discard_allowed;
     OnOffAuto enable_migration;
+    bool iommu_passthrough;
     VFIODeviceOps *ops;
     unsigned int num_irqs;
     unsigned int num_regions;
@@ -227,6 +228,7 @@ extern VFIOGroupList vfio_group_list;
 bool vfio_mig_active(void);
 int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp);
 void vfio_unblock_multiple_devices_migration(void);
+bool vfio_devices_all_iommu_passthrough(void);
 bool vfio_viommu_preset(VFIODevice *vbasedev);
 int64_t vfio_mig_bytes_transferred(void);
 void vfio_reset_bytes_transferred(void);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 134649226d43..4adf9fec08f1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -433,6 +433,22 @@ void vfio_unblock_multiple_devices_migration(void)
     multiple_devices_migration_blocker = NULL;
 }
 
+bool vfio_devices_all_iommu_passthrough(void)
+{
+    VFIODevice *vbasedev;
+    VFIOGroup *group;
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (!vbasedev->iommu_passthrough) {
+                return false;
+            }
+        }
+    }
+
+    return true;
+}
+
 bool vfio_viommu_preset(VFIODevice *vbasedev)
 {
     return vbasedev->group->container->space->as != &address_space_memory;
@@ -1194,6 +1210,18 @@ static void vfio_listener_region_add(MemoryListener *listener,
             goto fail;
         }
         QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+
+        /*
+         * Any attempts to use make vIOMMU mappings will fail the live migration
+         */
+        if (vfio_devices_all_iommu_passthrough()) {
+            MigrationState *ms = migrate_get_current();
+
+            if (migration_is_setup_or_active(ms->state)) {
+                vfio_set_migration_error(-EOPNOTSUPP);
+            }
+        }
+
         memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
 
         return;
@@ -1628,6 +1656,44 @@ static int vfio_devices_dma_logging_start(VFIOContainer *container)
     VFIOGroup *group;
     int ret = 0;
 
+    /*
+     * vIOMMU models traditionally define the maximum address space width, which
+     * is a superset of the effective IOVA addresses being used e.g.
+     * intel-iommu defines 39-bit and 48-bit, and similarly AMD hardware.  The
+     * problem is that these limits are really big, for device dirty trackers
+     * when the iommu gets passed for the sole usage of interrupt remapping i.e.
+     * >=256 vCPUs while IOMMU is kept in passthrough mode.
+     *
+     * With x-migration-iommu-pt assume that a guest being migrated won't use
+     * the vIOMMU in a non passthrough manner (throughout migration). In that
+     * case, try to use the boot memory layout that VFIO DMA maps to minimize
+     * having to stress high dirty tracking limits, and fail on any new gIOMMU
+     * mappings which may:
+     *
+     * 1) Prevent the migration from starting i.e. gIOMMU mappings done
+     * migration which would be no different than having the migration blocker.
+     * So this behaviour is obviously kept.
+     *
+     * 2) Cancelling an active migration i.e. new gIOMMU mappings mid migration
+     * From a Linux guest perspective this means for example the guest kexec's
+     * with 'iommu=on intel_iommu=on amd_iommu=on' or etc and at boot it will
+     * establish some vIOMMU mappings.
+     *
+     * This option should be specially relevant for old guests (<5.10) which
+     * don't probe for DMA translation services being off when initializing
+     * IOMMU devices, thus ending up in crashes when dma-translation=off is
+     * passed.
+     *
+     */
+    if (vfio_devices_all_iommu_passthrough() &&
+        !QLIST_EMPTY(&container->giommu_list)) {
+        ret = EOPNOTSUPP;
+        error_report("vIOMMU mappings active "
+                     "cannot start dirty tracking, err %d (%s)",
+                     -ret, strerror(ret));
+        return -ret;
+    }
+
     vfio_dirty_tracking_init(container, &ranges);
     feature = vfio_device_feature_dma_logging_start_create(container,
                                                            &ranges);
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index da43dcd2fe07..21304c8a90bc 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -970,10 +970,15 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
         goto out_deinit;
     }
 
-    if (vfio_viommu_preset(vbasedev)) {
+    if (vfio_viommu_preset(vbasedev) &&
+        !vfio_devices_all_iommu_passthrough()) {
         error_setg(&err, "%s: Migration is currently not supported "
                    "with vIOMMU enabled", vbasedev->name);
         goto add_blocker;
+    } else if (vfio_devices_all_iommu_passthrough()) {
+        warn_report("%s: Migration maybe blocked or cancelled"
+                    "if vIOMMU is used beyond interrupt remapping",
+                    vbasedev->name);
     }
 
     trace_vfio_migration_realize(vbasedev->name);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 3c37d036e727..5276a49fca12 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3507,6 +3507,8 @@ static Property vfio_pci_dev_properties[] = {
                     VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
     DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
                             vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
+    DEFINE_PROP_BOOL("x-migration-iommu-pt", VFIOPCIDevice,
+                     vbasedev.iommu_passthrough, false),
     DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
     DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
                      vbasedev.ram_block_discard_allowed, false),
-- 
2.39.3



             reply	other threads:[~2023-09-08 12:06 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08 12:05 Joao Martins [this message]
2023-09-20 20:21 ` [PATCH RFC] vfio/common: Add an option to relax vIOMMU migration blocker Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230908120521.50903-1-joao.m.martins@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=alex.williamson@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=clg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).