From: Alex Williamson <alex.williamson@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>
Cc: "Cédric Le Goater" <clg@redhat.com>,
"Joao Martins" <joao.m.martins@oracle.com>,
"Juan Quintela" <quintela@redhat.com>,
"Peter Xu" <peterx@redhat.com>,
"Leonardo Bras" <leobras@redhat.com>,
"Zhenzhong Duan" <zhenzhong.duan@intel.com>,
"Yishai Hadas" <yishaih@nvidia.com>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Maor Gottlieb" <maorg@nvidia.com>,
"Kirti Wankhede" <kwankhede@nvidia.com>,
"Tarun Gupta" <targupta@nvidia.com>,
qemu-devel@nongnu.org
Subject: Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
Date: Tue, 27 Jun 2023 08:20:13 -0600 [thread overview]
Message-ID: <20230627082013.384810ca.alex.williamson@redhat.com> (raw)
In-Reply-To: <976b7037-ee62-f86d-5772-6a2deca0816d@nvidia.com>
On Tue, 27 Jun 2023 11:00:46 +0300
Avihai Horon <avihaih@nvidia.com> wrote:
> On 26/06/2023 20:27, Alex Williamson wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, 26 Jun 2023 17:26:42 +0200
> > Cédric Le Goater <clg@redhat.com> wrote:
> >
> >> On 6/26/23 15:40, Joao Martins wrote:
> >>> On 26/06/2023 14:20, Cédric Le Goater wrote:
> >>>> Hello Avihai,
> >>>>
> >>>> On 6/26/23 10:23, Avihai Horon wrote:
> >>>>> The major parts of VFIO migration are supported today in QEMU. This
> >>>>> includes basic VFIO migration, device dirty page tracking and precopy
> >>>>> support.
> >>>>>
> >>>>> Thus, at this point in time, it seems appropriate to make VFIO migration
> >>>>> non-experimental: remove the x prefix from enable_migration property,
> >>>>> change it to ON_OFF_AUTO and let the default value be AUTO.
> >>>>>
> >>>>> In addition, make the following adjustments:
> >>>>> 1. Require device dirty tracking support when enable_migration is AUTO
> >>>>> (i.e., not explicitly enabled). This is because device dirty tracking
> >>>>> is currently the only method to do dirty page tracking, which is
> >>>>> essential for migrating in a reasonable downtime.
> >>>> hmm, I don't think QEMU should decide to disable a feature for all
> >>>> devices supposedly because it could be slow for some. That's too
> >>>> restrictive. What about devices with have small states ? for which
> >>>> the downtime would be reasonable even without device dirty tracking
> >>>> support.
> >>>>
> >>> device dirty tracking refers to the ability to tracking dirty IOVA used by the
> >>> device which will DMA into RAM. It is required because the
> >>> consequence/alternative is to transfer all RAM in stop copy phase. Device state
> >>> size at that point is the least of our problems downtime wise.
> >> Arg. thanks for reminding me. I tend to take this for granted ...
> > My initial reaction was the same, for instance we could have a non-DMA
> > device (ex. PCI serial card) that supports migration w/o dirty
> > tracking, but QEMU cannot assume the device doesn't do DMA, nor is it
> > worth our time to monitor whether bus-master is ever enabled for the
> > device, so QEMU needs to assume all memory that's DMA accessible for
> > the device is perpetually dirty. Also, if such a corner case existed
> > for a non-DMA migratable device, the easier path is simply to require
> > dirty tracking stubs in the variant driver to report no memory dirtied.
> >
> >>> I can imagine that allowing without dirty tracking is useful for developer
> >>> testing of the suspend/device-state flows, but as real default (auto) is very
> >>> questionable to let it go through without dirty tracking. When we have IOMMUFD
> >>> dirty tracking that's when we can relieve this restriction as a default.
> >>>
> >>> But then note that (...)
> >>>
> >>>>> Setting
> >>>>> enable_migration to ON will not require device dirty tracking.
> >>> (...) this lets it ignore dirty tracking as you would like.
> >>>
> >>>
> >>>>> 2. Make migration blocker messages more elaborate.
> >>>>> 3. Remove error prints in vfio_migration_query_flags().
> >>>>> 4. Remove a redundant assignment in vfio_migration_realize().
> >>>>>
> >>>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> >>>>> ---
> >>>>> include/hw/vfio/vfio-common.h | 2 +-
> >>>>> hw/vfio/migration.c | 29 ++++++++++++++++-------------
> >>>>> hw/vfio/pci.c | 4 ++--
> >>>>> 3 files changed, 19 insertions(+), 16 deletions(-)
> >>>>>
> >>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >>>>> index b4c28f318f..387eabde60 100644
> >>>>> --- a/include/hw/vfio/vfio-common.h
> >>>>> +++ b/include/hw/vfio/vfio-common.h
> >>>>> @@ -139,7 +139,7 @@ typedef struct VFIODevice {
> >>>>> bool needs_reset;
> >>>>> bool no_mmap;
> >>>>> bool ram_block_discard_allowed;
> >>>>> - bool enable_migration;
> >>>>> + OnOffAuto enable_migration;
> >>>>> VFIODeviceOps *ops;
> >>>>> unsigned int num_irqs;
> >>>>> unsigned int num_regions;
> >>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >>>>> index 79eb81dfd7..d8e0848635 100644
> >>>>> --- a/hw/vfio/migration.c
> >>>>> +++ b/hw/vfio/migration.c
> >>>>> @@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice
> >>>>> *vbasedev, uint64_t *mig_flags)
> >>>>> feature->argsz = sizeof(buf);
> >>>>> feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
> >>>>> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> >>>>> - if (errno == ENOTTY) {
> >>>>> - error_report("%s: VFIO migration is not supported in kernel",
> >>>>> - vbasedev->name);
> >>>>> - } else {
> >>>>> - error_report("%s: Failed to query VFIO migration support, err: %s",
> >>>>> - vbasedev->name, strerror(errno));
> >>>>> - }
> >>>>> -
> >>>>> return -errno;
> >>>>> }
> >>>>> @@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void)
> >>>>> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
> >>>>> {
> >>>>> - int ret = -ENOTSUP;
> >>>>> + int ret;
> >>>>> - if (!vbasedev->enable_migration) {
> >>>>> + if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
> >>>>> + error_setg(&vbasedev->migration_blocker,
> >>>>> + "%s: Migration is disabled for VFIO device", vbasedev->name);
> >>>>> goto add_blocker;
> >>>>> }
> >>>>> ret = vfio_migration_init(vbasedev);
> >>>>> if (ret) {
> >>>> It would be good to keep the message for 'errno == ENOTTY' as it was in
> >>>> vfio_migration_query_flags(). When migration fails, it is an important
> >>>> information to know that it is because the VFIO PCI host device driver
> >>>> doesn't support the feature. The root cause could be deep below in FW or
> >>>> how the VF was set up.
> >>>>
> >>> +1 As I have been in this rabbit hole
>
> Sure, I will add it.
>
> >>>
> >>>>> + error_setg(&vbasedev->migration_blocker,
> >>>>> + "%s: Migration couldn't be initialized for VFIO device, "
> >>>>> + "err: %d (%s)",
> >>>>> + vbasedev->name, ret, strerror(-ret));
> >>>>> + goto add_blocker;
> >>>>> + }
> >>>>> +
> >>>>> + if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO &&
> >>>>> + !vbasedev->dirty_pages_supported) {
> >>>> I don't agree with this test.
> >>>>
> >>> The alternative right now is perceptual dirty tracking. How is that OK as a
> >>> default? It's not like we have even an option :(
> >>>
> >>> Maybe perhaps you refer to avoid strongly enforcing *always* it to allow testing
> >>> of the non dirty tracking parts? Maybe when you 'force' enabling with
> >>> enable-migration=on is when you ignore the dirty tracking which is what this is
> >>> doing.
> >>
> >> I see ON_OFF_AUTO_ON as a way to abort the machine startup while
> >> ON_OFF_AUTO_AUTO would let it run but block migration.
> > Agreed. There's a little bit of redundancy between the device-level
> > "enable-migration=on" option and the global "-only-migratable" option
> > relative to preventing machine startup, but it also doesn't make sense
> > to me if the device-level option let realize complete successfully if
> > the device doesn't support or fails migration setup. So I think we'd
> > generally rely on using the -only-migratable option with the default
> > ON_OFF_AUTO_AUTO value, allow the ON_OFF_AUTO_ON value to enable
> > dis-recommended support, and live with the redundancy that it should
> > also cause the device realize to fail if migration is not supported.
> > Thanks,
> >
> > Alex
>
> OK.
>
> When enable_migration=AUTO we allow blockers.
> When enable_migration=ON we don't allow blockers and instead prevent
> realization of VFIO device.
>
> Regarding device dirty tracking, we keep current behavior, right?
> That is:
> When enable_migration=AUTO we block migration if device dirty tracking
> is not supported.
> When enable_migration=ON we allow migration even if device dirty
> tracking is not supported (in which case DMA-able memory will be
> perpetually dirtied).
Yes, and I think we'd be justified in logging a warning when migration
is enabled without any dirty page tracking support. Thanks,
Alex
prev parent reply other threads:[~2023-06-27 14:21 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-26 8:23 [PATCH 0/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
2023-06-26 8:23 ` [PATCH 1/3] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
2023-06-26 9:56 ` Cédric Le Goater
2023-06-26 11:31 ` Avihai Horon
2023-06-26 8:23 ` [PATCH 2/3] vfio/migration: Reset bytes_transferred properly Avihai Horon
2023-06-26 9:52 ` Cédric Le Goater
2023-06-26 11:46 ` Avihai Horon
2023-06-26 12:01 ` Cédric Le Goater
2023-06-26 12:08 ` Avihai Horon
2023-06-26 8:23 ` [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental Avihai Horon
2023-06-26 13:20 ` Cédric Le Goater
2023-06-26 13:40 ` Joao Martins
2023-06-26 15:26 ` Cédric Le Goater
2023-06-26 16:19 ` Jason Gunthorpe
2023-06-26 16:39 ` Cédric Le Goater
2023-06-26 16:36 ` Joao Martins
2023-06-26 17:27 ` Alex Williamson
2023-06-27 8:00 ` Avihai Horon
2023-06-27 12:21 ` Cédric Le Goater
2023-06-27 12:30 ` Jason Gunthorpe
2023-06-28 3:09 ` Shameerali Kolothum Thodi via
2023-06-27 14:20 ` Alex Williamson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230627082013.384810ca.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=avihaih@nvidia.com \
--cc=clg@redhat.com \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=kwankhede@nvidia.com \
--cc=leobras@redhat.com \
--cc=maorg@nvidia.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=targupta@nvidia.com \
--cc=yishaih@nvidia.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).