From: Peter Xu <peterx@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>
Cc: qemu-devel@nongnu.org,
"Alex Williamson" <alex.williamson@redhat.com>,
"Juan Quintela" <quintela@redhat.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Cornelia Huck" <cohuck@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
"Cédric Le Goater" <clg@redhat.com>,
"Yishai Hadas" <yishaih@nvidia.com>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Maor Gottlieb" <maorg@nvidia.com>,
"Kirti Wankhede" <kwankhede@nvidia.com>,
"Tarun Gupta" <targupta@nvidia.com>,
"Joao Martins" <joao.m.martins@oracle.com>,
"Fabiano Rosas" <farosas@suse.de>, "Zhiyi Guo" <zhguo@redhat.com>
Subject: Re: [PATCH v11 08/11] vfio/migration: Implement VFIO migration protocol v2
Date: Wed, 4 Sep 2024 09:00:05 -0400 [thread overview]
Message-ID: <ZthZ1aW_JmO3V9dr@x1n> (raw)
In-Reply-To: <20230216143630.25610-9-avihaih@nvidia.com>
Hello, Avihai,
Reviving this thread just to discuss one issue below..
On Thu, Feb 16, 2023 at 04:36:27PM +0200, Avihai Horon wrote:
> +/*
> + * Migration size of VFIO devices can be as little as a few KBs or as big as
> + * many GBs. This value should be big enough to cover the worst case.
> + */
> +#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
> +
> +/*
> + * Only exact function is implemented and not estimate function. The reason is
> + * that during pre-copy phase of migration the estimate function is called
> + * repeatedly while pending RAM size is over the threshold, thus migration
> + * can't converge and querying the VFIO device pending data size is useless.
> + */
> +static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
> + uint64_t *can_postcopy)
> +{
> + VFIODevice *vbasedev = opaque;
> + uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
> +
> + /*
> + * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is
> + * reported so downtime limit won't be violated.
> + */
> + vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
> + *must_precopy += stop_copy_size;
Is this the chunk of data only can be copied during VM stopped? If so, I
wonder why it's reported as "must precopy" if we know precopy won't ever
move them..
The issue is if with such reporting (and now in latest master branch we do
have the precopy size too, which was reported both in exact() and
estimate()), we can observe weird reports like this:
23411@1725380798968696657 migrate_pending_estimate estimate pending size 0 (pre = 0 post=0)
23411@1725380799050766000 migrate_pending_exact exact pending size 21038628864 (pre = 21038628864 post=0)
23411@1725380799050896975 migrate_pending_estimate estimate pending size 0 (pre = 0 post=0)
23411@1725380799138657103 migrate_pending_exact exact pending size 21040144384 (pre = 21040144384 post=0)
23411@1725380799140166709 migrate_pending_estimate estimate pending size 0 (pre = 0 post=0)
23411@1725380799217246861 migrate_pending_exact exact pending size 21038628864 (pre = 21038628864 post=0)
23411@1725380799217384969 migrate_pending_estimate estimate pending size 0 (pre = 0 post=0)
23411@1725380799305147722 migrate_pending_exact exact pending size 21039976448 (pre = 21039976448 post=0)
23411@1725380799306639956 migrate_pending_estimate estimate pending size 0 (pre = 0 post=0)
23411@1725380799385118245 migrate_pending_exact exact pending size 21038796800 (pre = 21038796800 post=0)
23411@1725380799385709382 migrate_pending_estimate estimate pending size 0 (pre = 0 post=0)
So estimate() keeps reporting zero but the exact() reports much larger, and
it keeps spinning like this. I think that's not how it was designed to be
used..
Does this stop copy size change for a VFIO device or not?
IIUC, we may want some other mechanism to report stop copy size for a
device, rather than reporting it with the current exact()/estimate() api.
That's, per my undertanding, only used for iterable data, while
stop-copy-size may not fall into that category if so.
> +
> + trace_vfio_state_pending_exact(vbasedev->name, *must_precopy, *can_postcopy,
> + stop_copy_size);
> +}
--
Peter Xu
next prev parent reply other threads:[~2024-09-04 13:08 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-16 14:36 [PATCH v11 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2023-02-16 14:36 ` [PATCH v11 01/11] linux-headers: Update to v6.2-rc8 Avihai Horon
2023-02-16 14:36 ` [PATCH v11 02/11] vfio/migration: Fix NULL pointer dereference bug Avihai Horon
2023-02-16 14:36 ` [PATCH v11 03/11] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support Avihai Horon
2023-02-16 14:36 ` [PATCH v11 04/11] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one Avihai Horon
2023-02-16 14:53 ` Juan Quintela
2023-02-16 14:36 ` [PATCH v11 05/11] vfio/migration: Block multiple devices migration Avihai Horon
2023-05-16 10:03 ` Shameerali Kolothum Thodi via
2023-05-16 11:59 ` Jason Gunthorpe
2023-05-16 13:57 ` Shameerali Kolothum Thodi via
2023-05-16 14:04 ` Jason Gunthorpe
2023-05-16 14:27 ` Alex Williamson
2023-05-16 14:35 ` Shameerali Kolothum Thodi via
2023-05-16 16:11 ` Jason Gunthorpe
2023-02-16 14:36 ` [PATCH v11 06/11] vfio/migration: Move migration v1 logic to vfio_migration_init() Avihai Horon
2023-02-16 14:50 ` Juan Quintela
2023-02-16 14:36 ` [PATCH v11 07/11] vfio/migration: Rename functions/structs related to v1 protocol Avihai Horon
2023-02-16 14:54 ` Juan Quintela
2023-02-16 14:36 ` [PATCH v11 08/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2023-02-16 15:43 ` Juan Quintela
2023-02-16 16:40 ` Avihai Horon
2023-02-16 16:52 ` Juan Quintela
2023-02-16 19:53 ` Alex Williamson
2024-09-04 13:00 ` Peter Xu [this message]
2024-09-04 15:41 ` Avihai Horon
2024-09-04 16:16 ` Peter Xu
2024-09-05 11:41 ` Avihai Horon
2024-09-05 15:17 ` Peter Xu
2024-09-05 16:07 ` Avihai Horon
2024-09-05 16:23 ` Peter Xu
2024-09-05 16:45 ` Avihai Horon
2024-09-05 18:31 ` Peter Xu
2024-09-09 12:52 ` Avihai Horon
2024-09-09 15:11 ` Peter Xu
2024-09-12 8:09 ` Avihai Horon
2024-09-12 9:41 ` Cédric Le Goater
2024-09-12 13:45 ` Peter Xu
2023-02-16 14:36 ` [PATCH v11 09/11] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
2023-02-16 14:36 ` [PATCH v11 10/11] vfio: Alphabetize migration section of VFIO trace-events file Avihai Horon
2023-02-16 14:36 ` [PATCH v11 11/11] docs/devel: Align VFIO migration docs to v2 protocol Avihai Horon
2023-02-16 14:57 ` Juan Quintela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZthZ1aW_JmO3V9dr@x1n \
--to=peterx@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=avihaih@nvidia.com \
--cc=clg@redhat.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=farosas@suse.de \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=kwankhede@nvidia.com \
--cc=maorg@nvidia.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=targupta@nvidia.com \
--cc=vsementsov@yandex-team.ru \
--cc=yishaih@nvidia.com \
--cc=zhguo@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).