Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Alex Williamson <alex.williamson@redhat.com>
To: "Cédric Le Goater" <clg@redhat.com>
Cc: Joao Martins <joao.m.martins@oracle.com>,
	Avihai Horon <avihaih@nvidia.com>,
	Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>,
	Leonardo Bras <leobras@redhat.com>,
	Zhenzhong Duan <zhenzhong.duan@intel.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Maor Gottlieb <maorg@nvidia.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	Tarun Gupta <targupta@nvidia.com>,
	qemu-devel@nongnu.org
Subject: Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
Date: Mon, 26 Jun 2023 11:27:29 -0600	[thread overview]
Message-ID: <20230626112729.06dc48d8.alex.williamson@redhat.com> (raw)
In-Reply-To: <78ef9a6d-25ac-e0ac-ca52-5a0673df661e@redhat.com>

On Mon, 26 Jun 2023 17:26:42 +0200
Cédric Le Goater <clg@redhat.com> wrote:

> On 6/26/23 15:40, Joao Martins wrote:
> > On 26/06/2023 14:20, Cédric Le Goater wrote:  
> >> Hello Avihai,
> >>
> >> On 6/26/23 10:23, Avihai Horon wrote:  
> >>> The major parts of VFIO migration are supported today in QEMU. This
> >>> includes basic VFIO migration, device dirty page tracking and precopy
> >>> support.
> >>>
> >>> Thus, at this point in time, it seems appropriate to make VFIO migration
> >>> non-experimental: remove the x prefix from enable_migration property,
> >>> change it to ON_OFF_AUTO and let the default value be AUTO.
> >>>
> >>> In addition, make the following adjustments:
> >>> 1. Require device dirty tracking support when enable_migration is AUTO
> >>>      (i.e., not explicitly enabled). This is because device dirty tracking
> >>>      is currently the only method to do dirty page tracking, which is
> >>>      essential for migrating in a reasonable downtime.  
> >>
> >> hmm, I don't think QEMU should decide to disable a feature for all
> >> devices supposedly because it could be slow for some. That's too
> >> restrictive. What about devices with have small states ? for which
> >> the downtime would be reasonable even without device dirty tracking
> >> support.
> >>  
> > 
> > device dirty tracking refers to the ability to tracking dirty IOVA used by the
> > device which will DMA into RAM. It is required because the
> > consequence/alternative is to transfer all RAM in stop copy phase. Device state
> > size at that point is the least of our problems downtime wise.  
> 
> Arg. thanks for reminding me. I tend to take this for granted ...

My initial reaction was the same, for instance we could have a non-DMA
device (ex. PCI serial card) that supports migration w/o dirty
tracking, but QEMU cannot assume the device doesn't do DMA, nor is it
worth our time to monitor whether bus-master is ever enabled for the
device, so QEMU needs to assume all memory that's DMA accessible for
the device is perpetually dirty.  Also, if such a corner case existed
for a non-DMA migratable device, the easier path is simply to require
dirty tracking stubs in the variant driver to report no memory dirtied.
 
> > I can imagine that allowing without dirty tracking is useful for developer
> > testing of the suspend/device-state flows, but as real default (auto) is very
> > questionable to let it go through without dirty tracking. When we have IOMMUFD
> > dirty tracking that's when we can relieve this restriction as a default.
> >
> > But then note that (...)
> >   
> >>  
> >>> Setting
> >>>      enable_migration to ON will not require device dirty tracking.  
> > 
> > (...) this lets it ignore dirty tracking as you would like.
> >
> >   
> >>> 2. Make migration blocker messages more elaborate.
> >>> 3. Remove error prints in vfio_migration_query_flags().
> >>> 4. Remove a redundant assignment in vfio_migration_realize().
> >>>
> >>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> >>> ---
> >>>    include/hw/vfio/vfio-common.h |  2 +-
> >>>    hw/vfio/migration.c           | 29 ++++++++++++++++-------------
> >>>    hw/vfio/pci.c                 |  4 ++--
> >>>    3 files changed, 19 insertions(+), 16 deletions(-)
> >>>
> >>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >>> index b4c28f318f..387eabde60 100644
> >>> --- a/include/hw/vfio/vfio-common.h
> >>> +++ b/include/hw/vfio/vfio-common.h
> >>> @@ -139,7 +139,7 @@ typedef struct VFIODevice {
> >>>        bool needs_reset;
> >>>        bool no_mmap;
> >>>        bool ram_block_discard_allowed;
> >>> -    bool enable_migration;
> >>> +    OnOffAuto enable_migration;
> >>>        VFIODeviceOps *ops;
> >>>        unsigned int num_irqs;
> >>>        unsigned int num_regions;
> >>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >>> index 79eb81dfd7..d8e0848635 100644
> >>> --- a/hw/vfio/migration.c
> >>> +++ b/hw/vfio/migration.c
> >>> @@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice
> >>> *vbasedev, uint64_t *mig_flags)
> >>>        feature->argsz = sizeof(buf);
> >>>        feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
> >>>        if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> >>> -        if (errno == ENOTTY) {
> >>> -            error_report("%s: VFIO migration is not supported in kernel",
> >>> -                         vbasedev->name);
> >>> -        } else {
> >>> -            error_report("%s: Failed to query VFIO migration support, err: %s",
> >>> -                         vbasedev->name, strerror(errno));
> >>> -        }
> >>> -
> >>>            return -errno;
> >>>        }
> >>>    @@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void)
> >>>      int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> >>>    {
> >>> -    int ret = -ENOTSUP;
> >>> +    int ret;
> >>>    -    if (!vbasedev->enable_migration) {
> >>> +    if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
> >>> +        error_setg(&vbasedev->migration_blocker,
> >>> +                   "%s: Migration is disabled for VFIO device", vbasedev->name);
> >>>            goto add_blocker;
> >>>        }
> >>>          ret = vfio_migration_init(vbasedev);
> >>>        if (ret) {  
> >>
> >> It would be good to keep the message for 'errno == ENOTTY' as it was in
> >> vfio_migration_query_flags(). When migration fails, it is an important
> >> information to know that it is because the VFIO PCI host device driver
> >> doesn't support the feature. The root cause could be deep below in FW or
> >> how the VF was set up.
> >>  
> > +1 As I have been in this rabbit hole
> >   
> >>> +        error_setg(&vbasedev->migration_blocker,
> >>> +                   "%s: Migration couldn't be initialized for VFIO device, "
> >>> +                   "err: %d (%s)",
> >>> +                   vbasedev->name, ret, strerror(-ret));
> >>> +        goto add_blocker;
> >>> +    }
> >>> +
> >>> +    if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO &&
> >>> +        !vbasedev->dirty_pages_supported) {  
> >>
> >> I don't agree with this test.
> >>  
> > 
> > The alternative right now is perceptual dirty tracking. How is that OK as a
> > default? It's not like we have even an option :(
> > 
> > Maybe perhaps you refer to avoid strongly enforcing *always* it to allow testing
> > of the non dirty tracking parts? Maybe when you 'force' enabling with
> > enable-migration=on is when you ignore the dirty tracking which is what this is
> > doing.  
> 
> 
> I see ON_OFF_AUTO_ON as a way to abort the machine startup while
> ON_OFF_AUTO_AUTO would let it run but block migration.

Agreed.  There's a little bit of redundancy between the device-level
"enable-migration=on" option and the global "-only-migratable" option
relative to preventing machine startup, but it also doesn't make sense
to me if the device-level option let realize complete successfully if
the device doesn't support or fails migration setup.  So I think we'd
generally rely on using the -only-migratable option with the default
ON_OFF_AUTO_AUTO value, allow the ON_OFF_AUTO_ON value to enable
dis-recommended support, and live with the redundancy that it should
also cause the device realize to fail if migration is not supported.
Thanks,

Alex

> We would
> need an extra property to relax the checks, else we are hijacking
> some pre-existing option to fit our need.
> 
> Since dirty tracking is a must-have to implement migration support
> for any existing and future VFIO PCI variant driver, anything else
> would be experimental code and we are trying to remove the flag !
> Please correct me if I am wrong.
> 
> So, the case !vbasedev->dirty_pages_supported is just an extra
> information to report for why migration is not supported. Does
> that sound reasonable ?
> 
> Thanks,
> 
> C.
> 
> 
> 
> >   
> >>> +        error_setg(&vbasedev->migration_blocker,
> >>> +                   "%s: VFIO device doesn't support device dirty tracking",
> >>> +                   vbasedev->name);
> >>>            goto add_blocker;
> >>>        }  
> >> I agree that with ON_OFF_AUTO_AUTO, errors at realize time should be recorded
> >> in a migration blocker. What about the ON_OFF_AUTO_ON case ? If migration was
> >> explicitly requested for the device and the conditions on the host are not met,
> >> I think realize should fail and the machine abort.
> >>  
> > +1 Good point
> >   
> >> Thanks,
> >>
> >> C.
> >>
> >>
> >>  
> >>> @@ -856,9 +862,6 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error
> >>> **errp)
> >>>        return 0;
> >>>      add_blocker:
> >>> -    error_setg(&vbasedev->migration_blocker,
> >>> -               "VFIO device doesn't support migration");
> >>> -
> >>>        ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
> >>>        if (ret < 0) {
> >>>            error_free(vbasedev->migration_blocker);
> >>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >>> index 73874a94de..48584e3b01 100644
> >>> --- a/hw/vfio/pci.c
> >>> +++ b/hw/vfio/pci.c
> >>> @@ -3347,8 +3347,8 @@ static Property vfio_pci_dev_properties[] = {
> >>>                        VFIO_FEATURE_ENABLE_REQ_BIT, true),
> >>>        DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
> >>>                        VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
> >>> -    DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice,
> >>> -                     vbasedev.enable_migration, false),
> >>> +    DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
> >>> +                            vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
> >>>        DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
> >>>        DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
> >>>                         vbasedev.ram_block_discard_allowed, false),  
> >>  
>

next prev parent reply	other threads:[~2023-06-26 17:28 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26  8:23 [PATCH 0/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
2023-06-26  8:23 ` [PATCH 1/3] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
2023-06-26  9:56   ` Cédric Le Goater
2023-06-26 11:31     ` Avihai Horon
2023-06-26  8:23 ` [PATCH 2/3] vfio/migration: Reset bytes_transferred properly Avihai Horon
2023-06-26  9:52   ` Cédric Le Goater
2023-06-26 11:46     ` Avihai Horon
2023-06-26 12:01       ` Cédric Le Goater
2023-06-26 12:08   ` Avihai Horon
2023-06-26  8:23 ` [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
2023-06-26 13:20   ` Cédric Le Goater
2023-06-26 13:40     ` Joao Martins
2023-06-26 15:26       ` Cédric Le Goater
2023-06-26 16:19         ` Jason Gunthorpe
2023-06-26 16:39           ` Cédric Le Goater
2023-06-26 16:36         ` Joao Martins
2023-06-26 17:27         ` Alex Williamson [this message]
2023-06-27  8:00           ` Avihai Horon
2023-06-27 12:21             ` Cédric Le Goater
2023-06-27 12:30               ` Jason Gunthorpe
2023-06-28  3:09                 ` Shameerali Kolothum Thodi via
2023-06-27 14:20             ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230626112729.06dc48d8.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=clg@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kwankhede@nvidia.com \
    --cc=leobras@redhat.com \
    --cc=maorg@nvidia.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=targupta@nvidia.com \
    --cc=yishaih@nvidia.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).