From: "Cédric Le Goater" <clg@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>, Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org,
Alex Williamson <alex.williamson@redhat.com>,
Juan Quintela <quintela@redhat.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Cornelia Huck <cohuck@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>,
Yishai Hadas <yishaih@nvidia.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Maor Gottlieb <maorg@nvidia.com>,
Kirti Wankhede <kwankhede@nvidia.com>,
Tarun Gupta <targupta@nvidia.com>,
Joao Martins <joao.m.martins@oracle.com>,
Fabiano Rosas <farosas@suse.de>, Zhiyi Guo <zhguo@redhat.com>
Subject: Re: [PATCH v11 08/11] vfio/migration: Implement VFIO migration protocol v2
Date: Thu, 12 Sep 2024 11:41:25 +0200 [thread overview]
Message-ID: <600d8239-3066-4792-a414-0c36761b2beb@redhat.com> (raw)
In-Reply-To: <bc31b6c4-89c8-4e04-b74d-e84422eb9901@nvidia.com>
On 9/12/24 10:09, Avihai Horon wrote:
>
> On 09/09/2024 18:11, Peter Xu wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Mon, Sep 09, 2024 at 03:52:39PM +0300, Avihai Horon wrote:
>>> On 05/09/2024 21:31, Peter Xu wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On Thu, Sep 05, 2024 at 07:45:43PM +0300, Avihai Horon wrote:
>>>>>> Does it also mean then that the currently reported stop-size - precopy-size
>>>>>> will be very close to the constant non-iterable data size?
>>>>> It's not constant, while the VM is running it can change.
>>>> I wonder how heavy is VFIO_DEVICE_FEATURE_MIG_DATA_SIZE ioctl.
>>>>
>>>> I just gave it a quick shot with a busy VM migrating and estimate() is
>>>> invoked only every ~100ms.
>>>>
>>>> VFIO might be different, but I wonder whether we can fetch stop-size in
>>>> estimate() somehow, so it's still a pretty fast estimate() meanwhile we
>>>> avoid the rest of exact() calls (which are destined to be useless without
>>>> VFIO).
>>>>
>>>> IIUC so far the estimate()/exact() was because ram sync is heavy when
>>>> exact(). When idle it's 80+ms now for 32G VM with current master (which
>>>> has a bug and I'm fixing it up [1]..), even if after the fix it's 3ms (I
>>>> think both numbers contain dirty bitmap sync for both vfio and kvm). So in
>>>> that case maybe we can still try fetching stop-size only for both
>>>> estimate() and exact(), but only sync bitmap in exact().
>>> IIUC, the end goal is to prevent migration thread spinning uselessly in
>>> pre-copy in such scenarios, right?
>>> If eventually we do call get stop-copy-size in estimate(), we will move the
>>> spinning from "exact() -> estimate() -> exact() -> estimate() ..." to
>>> "estimate() -> estimate() -> ...".
>>> If so, what benefit would we get from this? We only move the useless work to
>>> other place.
>> We can avoid exact() calls invoked for other vmstate handlers, e.g. RAM,
>> which can be much heavier and can require BQL during the slow process,
>> which can further block more vcpu operations during migration.
>>
>> And as mentioned previously, VFIO is, AFAIK, the only handler that provide
>> different definitions of estimate() and exact(), which can be confusing,
>> and it's against the "estimate() is the fast-path" logic.
>>
>> But I agree it's not fundamentally changing much..
>>
>>> Shouldn't we directly go for the non precopy-able vs precopy-able report
>>> that you suggested?
>> Yep, I just thought the previous one would be much easier to achieve.
>
> Yes, though I prefer not to add the get stop-copy-size ioctl in the estimate() flow because: a) it's guaranteed to be called (possibly many times) in every migration (either well configured which is the probable case or misconfigured which spins), and b) because how "heavy" get stop-copy-size is may differ from VFIO device to the other.
>
> Maybe I am being a bit overcautious here, but let's explore other options first :)
>
>> And
>> as you said, VFIO is still pretty special that the user will need manual
>> involvements anyway to specify e.g. very large downtimes, so this condition
>> shouldn't be a major case to happen. Said that, if you have a solid idea
>> on this please feel free to go ahead directly with a complete solution.
>
> I think it's possible to do it with what we currently have (VFIO uAPI-wise), I will try to think of one.
>
> BTW, I checked again and I think we should drop this line from vfio_state_pending_exact():
> *must_precopy += migration->precopy_init_size + migration->precopy_dirty_size;
>
> I can send a patch for that.
Please do. We can then provide a scratch build for further testing
and experiments with vGPUs.
Thanks,
C.
next prev parent reply other threads:[~2024-09-12 9:42 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-16 14:36 [PATCH v11 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2023-02-16 14:36 ` [PATCH v11 01/11] linux-headers: Update to v6.2-rc8 Avihai Horon
2023-02-16 14:36 ` [PATCH v11 02/11] vfio/migration: Fix NULL pointer dereference bug Avihai Horon
2023-02-16 14:36 ` [PATCH v11 03/11] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support Avihai Horon
2023-02-16 14:36 ` [PATCH v11 04/11] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one Avihai Horon
2023-02-16 14:53 ` Juan Quintela
2023-02-16 14:36 ` [PATCH v11 05/11] vfio/migration: Block multiple devices migration Avihai Horon
2023-05-16 10:03 ` Shameerali Kolothum Thodi via
2023-05-16 11:59 ` Jason Gunthorpe
2023-05-16 13:57 ` Shameerali Kolothum Thodi via
2023-05-16 14:04 ` Jason Gunthorpe
2023-05-16 14:27 ` Alex Williamson
2023-05-16 14:35 ` Shameerali Kolothum Thodi via
2023-05-16 16:11 ` Jason Gunthorpe
2023-02-16 14:36 ` [PATCH v11 06/11] vfio/migration: Move migration v1 logic to vfio_migration_init() Avihai Horon
2023-02-16 14:50 ` Juan Quintela
2023-02-16 14:36 ` [PATCH v11 07/11] vfio/migration: Rename functions/structs related to v1 protocol Avihai Horon
2023-02-16 14:54 ` Juan Quintela
2023-02-16 14:36 ` [PATCH v11 08/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2023-02-16 15:43 ` Juan Quintela
2023-02-16 16:40 ` Avihai Horon
2023-02-16 16:52 ` Juan Quintela
2023-02-16 19:53 ` Alex Williamson
2024-09-04 13:00 ` Peter Xu
2024-09-04 15:41 ` Avihai Horon
2024-09-04 16:16 ` Peter Xu
2024-09-05 11:41 ` Avihai Horon
2024-09-05 15:17 ` Peter Xu
2024-09-05 16:07 ` Avihai Horon
2024-09-05 16:23 ` Peter Xu
2024-09-05 16:45 ` Avihai Horon
2024-09-05 18:31 ` Peter Xu
2024-09-09 12:52 ` Avihai Horon
2024-09-09 15:11 ` Peter Xu
2024-09-12 8:09 ` Avihai Horon
2024-09-12 9:41 ` Cédric Le Goater [this message]
2024-09-12 13:45 ` Peter Xu
2023-02-16 14:36 ` [PATCH v11 09/11] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
2023-02-16 14:36 ` [PATCH v11 10/11] vfio: Alphabetize migration section of VFIO trace-events file Avihai Horon
2023-02-16 14:36 ` [PATCH v11 11/11] docs/devel: Align VFIO migration docs to v2 protocol Avihai Horon
2023-02-16 14:57 ` Juan Quintela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=600d8239-3066-4792-a414-0c36761b2beb@redhat.com \
--to=clg@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=avihaih@nvidia.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=farosas@suse.de \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=kwankhede@nvidia.com \
--cc=maorg@nvidia.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=targupta@nvidia.com \
--cc=vsementsov@yandex-team.ru \
--cc=yishaih@nvidia.com \
--cc=zhguo@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).