qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>, qemu-devel@nongnu.org
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>,
	Leonardo Bras <leobras@redhat.com>,
	Zhenzhong Duan <zhenzhong.duan@intel.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Maor Gottlieb <maorg@nvidia.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	Tarun Gupta <targupta@nvidia.com>,
	Joao Martins <joao.m.martins@oracle.com>
Subject: Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
Date: Mon, 26 Jun 2023 15:20:05 +0200	[thread overview]
Message-ID: <cd94caa3-cb16-f44e-6ffc-2e8fc37e9441@redhat.com> (raw)
In-Reply-To: <20230626082353.18535-4-avihaih@nvidia.com>

Hello Avihai,

On 6/26/23 10:23, Avihai Horon wrote:
> The major parts of VFIO migration are supported today in QEMU. This
> includes basic VFIO migration, device dirty page tracking and precopy
> support.
> 
> Thus, at this point in time, it seems appropriate to make VFIO migration
> non-experimental: remove the x prefix from enable_migration property,
> change it to ON_OFF_AUTO and let the default value be AUTO.
> 
> In addition, make the following adjustments:
> 1. Require device dirty tracking support when enable_migration is AUTO
>     (i.e., not explicitly enabled). This is because device dirty tracking
>     is currently the only method to do dirty page tracking, which is
>     essential for migrating in a reasonable downtime. 

hmm, I don't think QEMU should decide to disable a feature for all
devices supposedly because it could be slow for some. That's too
restrictive. What about devices with have small states ? for which
the downtime would be reasonable even without device dirty tracking
support.


> Setting
>     enable_migration to ON will not require device dirty tracking.
> 2. Make migration blocker messages more elaborate.
> 3. Remove error prints in vfio_migration_query_flags().
> 4. Remove a redundant assignment in vfio_migration_realize().
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>   include/hw/vfio/vfio-common.h |  2 +-
>   hw/vfio/migration.c           | 29 ++++++++++++++++-------------
>   hw/vfio/pci.c                 |  4 ++--
>   3 files changed, 19 insertions(+), 16 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index b4c28f318f..387eabde60 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -139,7 +139,7 @@ typedef struct VFIODevice {
>       bool needs_reset;
>       bool no_mmap;
>       bool ram_block_discard_allowed;
> -    bool enable_migration;
> +    OnOffAuto enable_migration;
>       VFIODeviceOps *ops;
>       unsigned int num_irqs;
>       unsigned int num_regions;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 79eb81dfd7..d8e0848635 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags)
>       feature->argsz = sizeof(buf);
>       feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
>       if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> -        if (errno == ENOTTY) {
> -            error_report("%s: VFIO migration is not supported in kernel",
> -                         vbasedev->name);
> -        } else {
> -            error_report("%s: Failed to query VFIO migration support, err: %s",
> -                         vbasedev->name, strerror(errno));
> -        }
> -
>           return -errno;
>       }
>   
> @@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void)
>   
>   int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>   {
> -    int ret = -ENOTSUP;
> +    int ret;
>   
> -    if (!vbasedev->enable_migration) {
> +    if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
> +        error_setg(&vbasedev->migration_blocker,
> +                   "%s: Migration is disabled for VFIO device", vbasedev->name);
>           goto add_blocker;
>       }
>   
>       ret = vfio_migration_init(vbasedev);
>       if (ret) {

It would be good to keep the message for 'errno == ENOTTY' as it was in
vfio_migration_query_flags(). When migration fails, it is an important
information to know that it is because the VFIO PCI host device driver
doesn't support the feature. The root cause could be deep below in FW or
how the VF was set up.

> +        error_setg(&vbasedev->migration_blocker,
> +                   "%s: Migration couldn't be initialized for VFIO device, "
> +                   "err: %d (%s)",
> +                   vbasedev->name, ret, strerror(-ret));
> +        goto add_blocker;
> +    }
> +
> +    if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO &&
> +        !vbasedev->dirty_pages_supported) {

I don't agree with this test.

> +        error_setg(&vbasedev->migration_blocker,
> +                   "%s: VFIO device doesn't support device dirty tracking",
> +                   vbasedev->name);
>           goto add_blocker;
>       }
I agree that with ON_OFF_AUTO_AUTO, errors at realize time should be recorded
in a migration blocker. What about the ON_OFF_AUTO_ON case ? If migration was
explicitly requested for the device and the conditions on the host are not met,
I think realize should fail and the machine abort.

Thanks,

C.



> @@ -856,9 +862,6 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>       return 0;
>   
>   add_blocker:
> -    error_setg(&vbasedev->migration_blocker,
> -               "VFIO device doesn't support migration");
> -
>       ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
>       if (ret < 0) {
>           error_free(vbasedev->migration_blocker);
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 73874a94de..48584e3b01 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3347,8 +3347,8 @@ static Property vfio_pci_dev_properties[] = {
>                       VFIO_FEATURE_ENABLE_REQ_BIT, true),
>       DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
>                       VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
> -    DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice,
> -                     vbasedev.enable_migration, false),
> +    DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
> +                            vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
>       DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
>       DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
>                        vbasedev.ram_block_discard_allowed, false),



  reply	other threads:[~2023-06-26 13:20 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26  8:23 [PATCH 0/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
2023-06-26  8:23 ` [PATCH 1/3] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
2023-06-26  9:56   ` Cédric Le Goater
2023-06-26 11:31     ` Avihai Horon
2023-06-26  8:23 ` [PATCH 2/3] vfio/migration: Reset bytes_transferred properly Avihai Horon
2023-06-26  9:52   ` Cédric Le Goater
2023-06-26 11:46     ` Avihai Horon
2023-06-26 12:01       ` Cédric Le Goater
2023-06-26 12:08   ` Avihai Horon
2023-06-26  8:23 ` [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
2023-06-26 13:20   ` Cédric Le Goater [this message]
2023-06-26 13:40     ` Joao Martins
2023-06-26 15:26       ` Cédric Le Goater
2023-06-26 16:19         ` Jason Gunthorpe
2023-06-26 16:39           ` Cédric Le Goater
2023-06-26 16:36         ` Joao Martins
2023-06-26 17:27         ` Alex Williamson
2023-06-27  8:00           ` Avihai Horon
2023-06-27 12:21             ` Cédric Le Goater
2023-06-27 12:30               ` Jason Gunthorpe
2023-06-28  3:09                 ` Shameerali Kolothum Thodi via
2023-06-27 14:20             ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd94caa3-cb16-f44e-6ffc-2e8fc37e9441@redhat.com \
    --to=clg@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kwankhede@nvidia.com \
    --cc=leobras@redhat.com \
    --cc=maorg@nvidia.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=targupta@nvidia.com \
    --cc=yishaih@nvidia.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).